Sublime
An inspiration engine for ideas
an idea for a website
anideafora.websiteRyo Lu
ryo.lu



#分享 从 DeepSeek R1 了解推理模型的训练的四种方法
1. Inference-time scaling
在推理过程中增加计算资源以提高输出质量。一个经典的例子,就是 CoT,在 Prompt 中包含类似 `Think step by step` 的短语,它通过输出更多的 token 增加了计算资源。
2. 纯 RL
DeepSeek-R1-Zero 仅通过强化学习而没有初始 SFT 阶段进行训练。同时,也没有使用使用接受人类偏好的奖励模型,而是采用了准确性(用 LeetCode 编译器来验证代码结果,以及一个确定性系统来验证数学答案)和格式奖励(将模型的思考过程强制包裹在 ` ` 之间)。
3. SFT + RL
DeepSeek 使用 DeepSeek-R1-Zer... See more
become fascinated by the 'inner' experience of the body; the sensations, textures and temperatures that make up the 'inside' of the body - the part that can be felt but not seen
there's so much more happening in there than at first seems. the more curious and intrigued by it we become, the better
i live my life this way. almost every day is full of multiple tiny interactions with no aim beyond a hello, a how are you, a smile and then i continue on my way
it makes me happier. introduces incredible adventures into my life, makes me feel like i’m surfing my way through my day and seems to make other people happy in turn
the best part is that... See more