Sublime
An inspiration engine for ideas

INF-LLaVA
Dual-perspective Perception for High-Resolution Multimodal Large Language Model
With advancements in data availability and computing resources, Multimodal Large Language Models (MLLMs) have showcased capabilities across various fields. However, the quadratic complexity of the visio... See more

Many recent frontier LLMs like Grok-3 and DeepSeek-R1 use a Mixture-of-Experts (MoE) architecture. To understand how it works, let’s pretrain an MoE-based LLM from scratch in PyTorch…
nanoMoE is a simple (~500 lines of code) but functional implementation of a mid-sized MoE model that can be pretrained on commodity hardw... See more

Training Trajectories of Language Models Across Scales
https://t.co/eElbO6uLv6... See more
Ambassador of Insights