“Imagine teaching a child to ride a bike. You could give them a detailed manual (Supervised Fine Tuning), but they'll likely learn better by trying it themselves (Reinforcement Learning), falling, getting up, & gradually improving.”
- @McDonaghMatthew
ELI5 on DeepSeek, link 👇
Agent Planning with World Knowledge Model
Introduces a parametric world knowledge model to facilitate agent planning.
The agent model can self-synthesize knowledge from expert and sampled trajectories. This is used to train the world knowledge model.
Prior task... See more
A computer trained on reinforcement learning needs only search and memory, not reasoning or any other cognitive mechanism, in order to form associations and maximize rewards.