- What a modern LLM does during training is, essentially, very very quickly skim the textbook, the words just flying by , not spending much brain power on it.
- Rather, when you or I read that math textbook, we read a couple pages slowly; then have an internal monologue about the material in our heads and talk about it with a few study-buddies; read an
SITUATIONAL AWARENESS - The Decade Ahead • I. From GPT-4 to AGI: Counting the OOMs
Perhaps R1’s biggest breakthrough is the confirmation that you no longer need enormous data centers or thousands of labelers to push the limits of LLMs. If you can define what “correctness” means in your domain —whether it’s coding, finance, medical diagnostics, or creative writing— you can apply reasoning-oriented RL to train or fine-tune your own
... See moreEvan Armstrong • What Actually Matters (And What Doesn’t) for DeepSeek
core components of Deep RL that enabled success like AlphaGo: self-play and look-ahead planning.
Self-play is the idea that an agent can improve its gameplay by playing against slightly different versions of itself because it’ll progressively encounter more challenging situations. In the space of LLMs, it is almost certain that the largest portion o... See more
Self-play is the idea that an agent can improve its gameplay by playing against slightly different versions of itself because it’ll progressively encounter more challenging situations. In the space of LLMs, it is almost certain that the largest portion o... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
赛道特点?
1)劳动力替代/补强需求下的大市场(TAM),远期2C场景是万亿规模产业链;2)人工智能技术驱动下高进入壁垒(特别是偏具身模型的算法公司);3)存在规模效应带来头部集中(类比新能源车竞争格局);4)目前智能发展阶段仍然较早
1)劳动力替代/补强需求下的大市场(TAM),远期2C场景是万亿规模产业链;2)人工智能技术驱动下高进入壁垒(特别是偏具身模型的算法公司);3)存在规模效应带来头部集中(类比新能源车竞争格局);4)目前智能发展阶段仍然较早