Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs.... See more

Andrej Karpathy x.com

Thumbnail of www-x-com-gneubig-status-1663544322233606148-267165d718c849e4

I wrote a more efficient/robust OpenAI querying wrapper: 1. Parallel execution with adjustable rate limits 2. Automatic retries on failure 3. Interface to Huggingface/Cohere for comparison This finished a 33k completions in ≈1... See more

Graham Neubig

x.com

DataRobot for Leaders | DataRobot

Possibly the fastest new model to launch on OpenRouter - introducing GLM-4.5 from a new model lab, @Zai_org ! Family of powerful, balanced models punching very high for their weight. Reasoning can be toggled on and off via API. See 👇 for more

OpenRouter x.com

Thumbnail of www-x-com-k7agar-status-1877735352531567001-32dcd9d6712449ba

chinese bros back at it again - train a Decoder-Only transformer < 3 hrs on a 3090 - fully studded with LoRA, DPO, SFT. - well documented training - vision, moe, and other goodies also avail. https://t.co/QQtYdyxJTr

atharva

x.com

As someone who has dabbled with this a lot some lessons: 1. If you can fit your model and optimizer on 1 gpu then use ddp. Use grad accumulation to increase batch size as needed. 2. If 1 doesn't work then try using an 8 bit optimizer via bitsandbytes. (Praise be @Tim_Dettmers). On 80GB GPUs, 7B params... See more

Raj Dabre x.com

Early this year, we trained a 70B model optimized for reasoning and coding. This model roughly matches LLAMA 3 70B despite being trained on 7x less data. Today, we’re releasing a toolkit to help others do the same, including: • 11 sanitized and extended NLP reasoning benchmarks including... See more

Imbue

x.com

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n

DeepSeek x.com

Thumbnail of www-x-com-tom-doerr-status-1883647307494621435-98416d9fbaf34549

Training small-scale language models fast https://t.co/ehhn1BimvJ

Tom Dörr

x.com