
🚀Ever wondered how to make RL work on impossible hard tasks where pass@k = 0%? 🤔
In our new work, we share the RL Grokking Recipe: a training recipe that enables LLMs to solve previously unsolvable coding problems! I will be at #CoLM2025 next week so happy to chat about it!
We also dive... See more
我们可以通过确定agent是否了解环境模型来划分可用的RL算法。 了解模型可以使agent提前知道状态转移概率矩阵和未来的reward
【重磅综述】用于机器人操作的深度强化学习- 知乎
样本效率做到了之前的deepmind的5-6倍。预计能在10h内实现双足的实机学习