Sublime
An inspiration engine for ideas




#分享 从 DeepSeek R1 了解推理模型的训练的四种方法
1. Inference-time scaling
在推理过程中增加计算资源以提高输出质量。一个经典的例子,就是 CoT,在 Prompt 中包含类似 `Think step by step` 的短语,它通过输出更多的 token 增加了计算资源。
2. 纯 RL
DeepSeek-R1-Zero 仅通过强化学习而没有初始 SFT 阶段进行训练。同时,也没有使用使用接受人类偏好的奖励模型,而是采用了准确性(用 LeetCode 编译器来验证代码结果,以及一个确定性系统来验证数学答案)和格式奖励(将模型的思考过程强制包裹在 ` ` 之间)。
3. SFT + RL
DeepSeek 使用 DeepSeek-R1-Zer... See more
become fascinated by the 'inner' experience of the body; the sensations, textures and temperatures that make up the 'inside' of the body - the part that can be felt but not seen
there's so much more happening in there than at first seems. the more curious and intrigued by it we become, the better
Genesis
genesis.xyz
To spot your trauma answer 3 questions:
- What memories am I pushing away?
- What people, places, or objects am I avoiding and why?
- In which situations do I struggle to control my emotions?
Be as detailed as possible - those answers reveal the traumas controlling your life.