Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

Introducing OpenBench 0.1: Open, Reproducible Evals 🧵 https://t.co/S5LlHEzDxv

Thumbnail of www-x-com-scaling01-status-1928510435164037342-7dec604ff6a14c14

Introducing LisanBench LisanBench is a simple, scalable, and precise benchmark designed to evaluate large language models on knowledge, forward-planning, constraint adherence, memory and attention, and long context reasoning and "stamina". "I see possible futures, all at once. Our enemies... See more

Lisan al Gaib

x.com

here's my hyperdev-1 setup, an elite American software developer agent (proudly running in Texas!): - rented a H100 - installed Claude Code - used it to install Qwen3-Coder-480B - and created an agent in Claude which uses this local 480B model .. connected to Github, Vercel, few... See more

Varun x.com

LLMs are far worse at competitive programming than we thought. Every one scored 0% on Hard problems. LiveCodeBench-Pro is a new benchmark with 584 always updating problems from IOI, ICPC and Codeforces. What's most interesting is the categories they perform really poorly on:... See more

Deedy

x.com

My current tech stack: ⚡ Next.js – frontend 🧠 Convex – backend & DB 🔐 Clerk – auth 💸 DodoPayments – payments 🤖 OpenRouter – AI API 📡 OpenPipe – AI logs 📏 Zod – schema validation 🎨 ShadCN – components 🚀 Vercel –... See more

Shak x.com

I'm going to spend a few hours signing up for a few "Heroku for LLMs" comparing the getting started experiences. Who should I test? - @OLLAMA + @flydotio - @replicate - @basetenco - @modal_labs - Vanilla @awscloud - @awscloud Bedrock – not quite... See more

Max Schoening x.com

DeepSeek R1 671B running on 2 M2 Ultras faster than reading speed. Getting close to open-source O1, at home, on consumer hardware. With mlx.distributed and mlx-lm, 3-bit quantization (~4 bpw) https://t.co/RnkYxwZG3c

Awni Hannun x.com

A 3-person startup built a cloud infrastructure price comparison service: one that compares 200,000+ different server prices on AWS, GCP, Azure and Hetzner. It also benchmarks them. I got interested how the team built this & the tech stack, and they shared everything: https://t.co/dbGIjJEEcJ

Gergely Orosz

x.com

Essential tooling for a modern monorepo setup: - Bun (dependency manager & runtime) - Turborepo (task runner & caching) - Biome (linter, formatter, type-checker)

Pontus Abrahamsson — oss/acc x.com