sim2real确实不是大问题了,但是zero-shot/few-shot sim2real永远是大问题,因为zero/few-shot learning本身就是大问题
在机器人领域应用深度强化学习,目前主流的一些思路是什么? - 知乎

Interesting to see Bytedance working on solving the 0-gradient problem. Their idea is to address it through an adaptive compute budget; we approach it from a reward perspective. GRPO training typically uses small, carefully curated datasets, the data needs to be really hard to provide rich learning signals and enable discovery. Training on easier... See more