🕳️ Attention Sinks in LLMs for endless fluency

RelatedInsightsHighlights

lilian weng's blog is an underrated goldmine people keep forgetting about (me too lol). i keep turning to it again and again. https://t.co/DjXVU0cBcL

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

FocusLLM Scaling LLM's Context by Parallel Decoding discuss: https://t.co/xaUZeLWoTf Empowering... See more