Emerging reasoning with reinforcement learning

Emerging reasoning with reinforcement learning | Hacker News

news.ycombinator.com

RelatedInsightsHighlights

This report is long but very good. “With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward... See more

Chamath Palihapitiya x.com

Thumbnail of www-x-com-yuchenj-uw-status-1881398653845410286-b0e79ba85e3c4373

This "Aha moment" in the DeepSeek-R1 paper is huge: Pure reinforcement learning (RL) enables an LLM to automatically learn to think and reflect. This challenges the prior belief that replicating OpenAI's o1 reasoning models requires extensive CoT data. It turns out you just need to give it... See more

Yuchen Jin

x.com

If AI starts to generate intelligence by itself, there’s no guarantee that it will be human-like. Rather than humans teaching machines to think like humans, machines might teach humans new ways of thinking.

Will Douglas Heaven • AI is learning how to create itself

Thumbnail of www-x-com-omarsar0-status-1999483394963701911-481e5901d57f47a4

Great paper on why RL actually works for LLM reasoning. Apparently, "aha moments" during training aren't random. They're markers of something deeper. Researchers analyzed RL training dynamics across eight models, including Qwen, LLaMA, and vision-language models. The findings challenge how... See more

elvis

x.com

Reinforcement Learning, Explained With a Minimum of Math and Jargon

Timothy B. Lee understandingai.org