updated 1y ago
🕳️ Attention Sinks in LLMs for endless fluency
GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
by mit-han-lab
2 highlights
Darren LI and added
- in the satirical Pretraining on the Test Set Is All You Need paper, the author trains a small 1M parameter LLM that outperforms all other models, including the 1.3B phi-1.5 model. This is achieved by training the model on all downstream academic benchmarks.
from Ahead of AI #12: LLM Businesses and Busyness by Sebastian Raschka
Nicolay Gerold added
It is necessary to introduce a better benchmarking system with holdout datasets that no model can access that are private by default (this would probably a separate entity unaffiliated with the model developers). - StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
from GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks by mit-han-lab
Darren LI added