Emerging reasoning with reinforcement learning | Hacker News
Reinforcement Learning, Explained With a Minimum of Math and Jargon
Timothy B. Leeunderstandingai.orgThis report is long but very good.
“With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward... See more
Chamath Palihapitiyax.com
DeepSeek-R1-Zero (no supervised fine-tuning) showing human-like reasoning skills in natural language just by virtue of reinforcement learning (RL). https://t.co/P0wCC3wVtN