Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

Let's reverse engineer the phenomenal Tesla Optimus. No insider info, just my own analysis. Long read: 1. The smooth hand movements are almost certainly trained by imitation learning ("behavior cloning") from human operators. The alternative is reinforcement learning in simulation, but that typically leads to jittery m... See more

Jim Fan x.com

LongNet: Scaling Transformers to 1,000,000,000 Tokens Presents LONGNET, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences abs: https://t.co/5rf4tcVDuk repo:... See more

Aran Komatsuzaki

x.com

Master Finetuning LLMs on a single GPU! The NeurIPS 2023 Efficiency Challenge is here & it focuses on training 1 LLM on 1 GPU for 1 Day. Here's a quick start guide for you: 1️⃣ Competition Overview: ❗️GPUs: ___LINEBREAK_... See more

Akshay 🚀

x.com

RoPE is actually a pretty good practice for tensor manipulation. https://t.co/U18qBu3HFR

eigenron

x.com

Introducing 1.58bit DeepSeek-R1 GGUFs! 🐋 DeepSeek-R1 can now run in 1.58-bit, while being fully functional. We shrank the 671B parameter model from 720GB to just 131GB - a 80% size reduction. Naively quantizing all layers breaks the model entirely, causing endless loops & gibberish outputs.... See more

Unsloth AI

x.com

deep learning

Prashanth Narayan and • 4 cards

Thumbnail of www-x-com-mrsiipa-status-1817859849830699148-4513774a78064f23

"Tensors from scratch" - second part of the blog series where I go through elementwise binary operations and shape broadcasting along with its algorithm and examples! https://t.co/tAIxzLjUI9

maharshi

x.com

Subobject-level Image Tokenization Transformer-based vision models typically tokenize images into fixed-size square patches as input units, which lacks the adaptability to image content and overlooks the inherent pixel grouping structure. Inspired by the subword tokenization widely adopted in language models, we propose... See more

x.com

FlexiViT: One Model for All Patch Sizes Shows that randomizing the patch size at training leads allows the model to performs well across various patch sizes, making it possible to tailor the model to different compute budgets at deployment. https://t.co/FCsYIi3fgN... See more

Aran Komatsuzaki

x.com