David Hoang
@davidhoang
Proof of Concept: Building and investing tools that revolutionize the internet.
David Hoang
@davidhoang
Proof of Concept: Building and investing tools that revolutionize the internet.
That feedback is then used to do additional training, fine-tuning the AI’s performance to fit the preferences of the human, providing additional learning that reinforces good answers and reduces bad answers, which is why the process is called Reinforcement Learning from Human Feedback (RLHF).
As Jony has said, “A big definition of who you are as a designer, it’s the way you look at the world. And I guess one of the curses of what you do, is you are constantly looking at something and thinking, ‘Why? Why is it like that? Why is it like that and not like this?’”27 Very likely, as a man who see himself “constantly