Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

Welcome to the Era of Experience

The text discusses a new AI era focused on superhuman agents learning primarily from their autonomous, continuous interactions with the real world, surpassing prior human-data-based models by leveraging experiential learning and grounded rewards.

storage.googleapis.com

Rich Sutton just published his most important essay on AI since The Bitter Lesson: "Welcome to the Era of Experience" Sutton and his advisee Silver argue that the “era of human data,” dominated by supervised pre‑training and RL‑from‑human‑feedback, has hit diminishing returns; the future will belong to agents... See more

Deedy

x.com

fresh-eggs-flying-lessons

Tips and insights for brands and marketing: Avoid common mistakes; think differently; understand customer needs deeply; don't rely solely on data; strive for loyalty and customer satisfaction.

Link

How @karpathy learnt Reinforcement Learning https://t.co/EY5inLnv2l

x.com

the ‘multi-armed bandit’, and it's the textbook example of the Explore/Exploit trade-off. The optimal solution is to start out by exploring as much as possible, and gradually move towards exploiting as your time runs out.

Richard Meadows • Optionality: How to Survive and Thrive in a Volatile World

ה-Bitter Lesson של AI Research. ריצ׳ארד סוּטוֹן הוא אחד מאבות ה-Reinforcement Learning וחתן פרס טיורינג לשנת 2025 על תרומתו לתחום. ב-2019 הוא כתב טור סופר חשוב בשם The Bitter Lesson. זה קרה לפני ChatGPT ולפני StableDiffusion. מהו השיעור המר? >>

Nir Ben-Zvi x.com

Agent Design Pattern: Parallel Rollouts Inspired by Tree-of-Thought [1] and @corbtt's Universal Reward Function [2], lately I've been using a best-of-n pattern (dubbed “parallel rollouts” internally) and seeing consistently strong results When designing an agent, retry-on-failure is... See more

Jamie Voynow

x.com

Okay okay, spent my weekend gooning around learning GRPO math. Here's some takes. Essentially, this is me yapping through a recap of smaller details on how GRPO is implemented, what Dr. GRPO changes, why, DAPO, connections to PPO, aggregating batches... Reading list below.... See more

Nathan Lambert x.com

Thumbnail of www-x-com-michael-nielsen-status-1417971141306687497-11873fa189ae4102

This is fascinating: Rich Sutton on the "bitter lesson" of AI research: https://t.co/LW5TOGTIKw https://t.co/z4nsRbH01V

Michael Nielsen

x.com