【重磅综述】用于机器人操作的深度强化学习- 知乎
The real world has a large number of potential states; these states can connect to each other in complex or even uncertain ways; and they can do so through many possible actions. Much effort was put into finding the value function in these trickier situations.
Grace Lindsay • Models of the Mind
作为非专业人士,这几天也看了不少资料和视频,大概了解了最新的OpenAI-o1模型,以及人工智能未来的趋势,基本可以确定这是未来5年内最主流的科技方向,在此节选了几段视频分享一下 https://t.co/leCBXt8Vbr
You can do it by learning how much reward certain states or actions can bring (“value” learning), or by simply knowing which strategies tend on the whole to do better than which others (“policy” learning).
Brian Christian • The Alignment Problem
core components of Deep RL that enabled success like AlphaGo: self-play and look-ahead planning.
Self-play is the idea that an agent can improve its gameplay by playing against slightly different versions of itself because it’ll progressively encounter more challenging situations. In the space of LLMs, it is almost certain that the largest portion o... See more
Self-play is the idea that an agent can improve its gameplay by playing against slightly different versions of itself because it’ll progressively encounter more challenging situations. In the space of LLMs, it is almost certain that the largest portion o... See more