The Q* hypothesis: Tree-of-thoughts reasoning, process rewar...

The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic data

RelatedInsightsHighlights

Thumbnail of www-x-com-byebyescaling-status-2003900947488227381-97c8c0f82cc14eb6

HOW IS THIS ALPHA EVEN PUBLIC? 10x SEARCH DEPTH VIA GRPO The intuition has always been that scaling agentic search is a compute problem. It’s not. It’s a "stability-of-objective" problem. Most 8B models suffer from "horizon collapse" - they are mathematically "anxious" to terminate the search loop because their training... See more

return of the research era ꙮ

x.com

A New Generation of AIs: Claude 3.7 and Grok 3

Ethan Mollick oneusefulthing.org

Thumbnail of www-x-com-omarsar0-status-1999483394963701911-481e5901d57f47a4

Great paper on why RL actually works for LLM reasoning. Apparently, "aha moments" during training aren't random. They're markers of something deeper. Researchers analyzed RL training dynamics across eight models, including Qwen, LLaMA, and vision-language models. The findings challenge how... See more

elvis

x.com

What is Q*? My first approach (1/2) Foreword: Q* has not yet been published or made publicly available; there are no papers on it and OpenAI is holding back on information about it (Sam Altman: "We are not ready to talk about that" https://t.co/ah00i87Kbu , 2:45 minutes). Since the first hints, the community has been... See more

Chubby

x.com

Slowly, then suddenly. https://t.co/5r2Rzqy3qG

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

x.com