David Hoang
@davidhoang
Proof of Concept: Building and investing tools that revolutionize the internet.
David Hoang
@davidhoang
Proof of Concept: Building and investing tools that revolutionize the internet.
That feedback is then used to do additional training, fine-tuning the AI’s performance to fit the preferences of the human, providing additional learning that reinforces good answers and reduces bad answers, which is why the process is called Reinforcement Learning from Human Feedback (RLHF).