GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

Things we learned about LLMs in 2024

Simon Willisonsimonwillison.net
Thumbnail of Things we learned about LLMs in 2024

Scaling: The State of Play in AI

Ethan Mollickoneusefulthing.org
Thumbnail of Scaling: The State of Play in AI

What We Learned From a Year of Building With LLMs

Bryan Bischoforeilly.com
Thumbnail of What We Learned From a Year of Building With LLMs

All the Hard Stuff Nobody Talks About When Building Products With LLMs

honeycomb.iohoneycomb.io
Thumbnail of All the Hard Stuff Nobody Talks About When Building Products With LLMs

microsoft GitHub - microsoft/LLMLingua: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.