What Actually Matters (And What Doesn’t) for DeepSeek

DeepSeek Coder comprises a series of code language models trained from scratch on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T tokens. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on repo-level code corpus by employing a window size of 16K ... See more

DeepSeek Coder

Generative AI’s Act O1

Pat Grady sequoiacap.com

Over the past 2 and half years we’ve seen the rise of the LLM’s but one of the great contributers to LLM’s Yann LeCun believes that LLM’s are actually old news and that we’re now just making them marginally better and he’s much more focused on other things.

Firstly he thinks these models need to understand the physical world. Right now, LLMs are gr

Thumbnail of www-x-com-dotey-status-1835395891483554105

去年 Jim Fan 对于 Q* 的预测，现在结合刚发布的 o1 来看基本上都是准确的！ Jim 将 Q* 和 AlphaGo 做了类比，猜测 Q* 可能类似于 AlphaGo，是通过与自己之前的版本进行对弈，自我对弈不断进步，甚至于架构都是类似的。 AlphaGo 的架构核心有四个组件： 1. 策略神经网络（Policy NN，学习部分）：负责选择下一步最有可能赢的走法 2. 价值神经网络（Value NN，学习部分）：评估当前棋局 3. 蒙特卡洛树搜索（MCTS，搜索部分）：模拟从当前位置开始落子的多种可能，类似于人类在算棋步（假如我放在A位置，那么对手可能下在哪几个位置，然后我再下一步怎么应对……） 4. 输赢判定：根据围棋规则判定谁赢了。这个架构的神奇之处在于整个训练过... See more

宝玉

x.com

“The kind of computing these LLMs are doing, this kind of intelligence, is just different. Is that an existential risk? Well, I come back to the fact that we have no idea how the human brain works.” It’s easier to imagine catastrophic things than see the opportunities, he points out.

On DeepSeek and Export Controls

Dario Amodei darioamodei.com