Emerging reasoning with reinforcement learning | Hacker News
This report is long but very good.
“With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward... See more
Chamath Palihapitiyax.com
This "Aha moment" in the DeepSeek-R1 paper is huge:
Pure reinforcement learning (RL) enables an LLM to automatically learn to think and reflect.
This challenges the prior belief that replicating OpenAI's o1 reasoning models requires extensive CoT data. It turns out you just need to give it... See more
If AI starts to generate intelligence by itself, there’s no guarantee that it will be human-like. Rather than humans teaching machines to think like humans, machines might teach humans new ways of thinking.
Will Douglas Heaven • AI is learning how to create itself

Great paper on why RL actually works for LLM reasoning.
Apparently, "aha moments" during training aren't random. They're markers of something deeper.
Researchers analyzed RL training dynamics across eight models, including Qwen, LLaMA, and vision-language models. The findings challenge how... See more