DeepSpeed-FastGen
GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
mit-han-labgithub.com
Great read - "Understanding LLMs: A Comprehensive Overview from Training to Inference"
The journey from self-attention mechanism to the final LLMs.
This paper reviews the evolution of large language model training techniques and inference deployment technologies.
___LINEBREAK__... See more