GitHub - mit-han-lab/streaming-llm: Efficient Streaming Lang...

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

added by Darren LI and · updated 1y ago

StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
from GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks by mit-han-lab
Darren LI added 1y ago
In streaming settings, StreamingLLM outperforms the sliding window recomputation baseline by up to 22.2x speedup.
from GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks by mit-han-lab
Darren LI added 1y ago