GitHub - mit-han-lab/streaming-llm: Efficient Streaming Lang...

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

RelatedInsightsCollectionsHighlights

How to Fine-Tune LLMs in 2024 with Hugging Face

There is often a notable gap between state of the art research and what practitioners can reasonably use. However, I'm glad to say that attention sinks can be added to any pretrained LLM at near to no additional effort.

I have released the attention_sinks Python module, which acts as a drop-in replacement for the transformers API. This Python module... See more

Tom Aarsen • 🕳️ Attention Sinks in LLMs for endless fluency

Thumbnail of www-x-com-goyal-pramod-status-1925034798717997280-62def5f0017743f1

Now that Google has released a text diffusion model, it's time to read this paper. https://t.co/Gn6C30Auy8

Pramod Goyal

x.com