
🚀Ever wondered how to make RL work on impossible hard tasks where pass@k = 0%? 🤔
In our new work, we share the RL Grokking Recipe: a training recipe that enables LLMs to solve previously unsolvable coding problems! I will be at #CoLM2025 next week so happy to chat about it!
We also dive... See more
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the... See more
Andrej Karpathyx.com我们可以通过确定agent是否了解环境模型来划分可用的RL算法。 了解模型可以使agent提前知道状态转移概率矩阵和未来的reward