Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

Interactive Reasoning Benchmarks are the next step in frontier evaluations Hear @GregKamradt share why measuring human-like intelligence requires multi-turn environments Including a sneak peak of ARC-AGI-3 Want to help us build interactive evaluations? We're... See more

ARC Prize x.com

NVIDIA presents Upcycling Large Language Models into Mixture of Experts Finds that upcycling outperforms continued dense model training based on large-scale experiments using Nemotron-4 15B trained on 1T tokens https://t.co/lKEtbMeQX8 https://t.co/L4LiEKrWDm

Aran Komatsuzaki

x.com

DeepSeek-v2-Coder is really so impressive. This blog did a great work on checkin 180+ LLMs on code writing quality. There are only 3 models (Anthropic Claude 3 Opus, DeepSeek-v2-Coder, GPT-4o) that had 100% compilable Java code, while no model had 100% for... See more

Rohan Paul

x.com

education

an AI math tutor that has some twitter traction

Ali Elfakharany • 3 cards

Thumbnail of www-x-com-tom-doerr-status-1935117385381376415-1f0400135ab14574

Code for training and fine-tuning GPT language models https://t.co/k641kChBMu

Tom Dörr

x.com

Talks from the Open-Source Generative AI workshop. Ying Sheng - Bridging Human and LLM Systems @ying11231 Ying talks about SGLang, https://t.co/kDkZSjAnCZ, (which is my personal favorite LLM frontend language.) https://t.co/H674XM2K6a

Sasha Rush x.com

Kimi's founder, Zhilin Yang's interview is out. Again, you can let Kimi translate for you: ) lots of insights there. https://t.co/nCEb1Cyq5b Several takes: 1/ Base Model Focus: K2 aims to be a solid base model. We've found that high-quality data growth is slow,... See more

Crystal

x.com

Grok 3 with reasoning is *really good*. It just one-shotted a difficult code task that involved modifying GRPO rewards (highly unlikely @xai trained on this as it's very recent). o1 pro / o3 mini got this wrong. Really impressive to see Grok generalizing!

Matt Shumer x.com

Thumbnail of www-x-com-moreisdifferent-status-1626968428601909249-00105dbbbc994e58

1/ Trying to signal boost w a short 🧵 Amateur Go player Kellin Pelrine can consistently beat "KataGo", an AI system that was once classified as "strongly superhuman". Strikingly, the strategies employed to beat the AI do not foil other amateur players. https://t.co/2AgK3gKVYP... See more

Dan Elton

x.com