Sublime
An inspiration engine for ideas
Excited to introduce R1-V!
We use RL with verifiable rewards to incentivize VLMs to learn general counting abilities.
2B model surpasses the 72B with only 100 training steps, costing less than $3.
The project will be fully open source.
___LINEBREA... See more
Liang Chenx.com
We've partnered with Modular to create Large Scale Inference (LSI), a new OpenAI-compatible inference service.
It's up to 85% cheaper than other offerings & can handle trillion-token scale.
We originally created it at the request of a major AI lab to do large scale multimodal synthetic data... See more

🚨 This one simple trick will level up your LLM🚀🚀
Wait...don't go. This isn't a blue check grifter tweet!
Instruction tuning with this easy trick will *actually* boost AlpacaEval scores, even for large (70B) and llama2-chat base models…by a lot 🧵 https://t.co/1OBMENFSxb
📜🚨📜🚨
NN loss landscapes are full of permutation symmetries, ie. swap any 2 units in a hidden layer. What does this mean for SGD? Is this practically useful?
For the past 5 yrs these Qs have fascinated me. Today, I am ready to announce "Git Re-Basin"!
https://t.co/mRu5k3ONUm... See more
Samuel "curry-howard fanboi" Ainsworthx.com



META JUST KILLED TOKENIZATION !!!
A few hours ago they released "Byte Latent Transformer". A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness!
(I was just talking about how we need dynamic tokenization that is lea... See more
Meet Jan-nano, a 4B model that outscores DeepSeek-v3-671B using MCP.
It's built on Qwen3-4B with DAPO fine-tuning, it handles:
- real-time web search
- deep research
Model + GGUF: https://t.co/i8KSXcDhA9
To... See more
Menlo Researchx.comjust deployed my first model: a japanese city name generator ⛩️
believe it or not this is just a 3-layer MLP trained on <2K real Japanese cities (from government data)
try it out! https://t.co/2zA5L6tvah... See more
Freeman Jiangx.com