David Hoang
@davidhoang
Proof of Concept: Building and investing tools that revolutionize the internet.
David Hoang
@davidhoang
Proof of Concept: Building and investing tools that revolutionize the internet.
That feedback is then used to do additional training, fine-tuning the AI’s performance to fit the preferences of the human, providing additional learning that reinforces good answers and reduces bad answers, which is why the process is called Reinforcement Learning from Human Feedback (RLHF).

One of my favorite talks by Frans Johansson, the author of a great book, “The Medici Effect.”
A strategy enables you to act
“The more shots you take, the more likely you are to hit a bullseye—especially if you're learning as you go.”
It’s essential to make multiple attempts, iterate fast, and not get attached to a single idea.