DeepSpeed-FastGen
DeepSpeed-FastGen
microsoft
github.com
Related
Insights
Highlights
4
4
Introduction | LLM Inference in Production
bentoml.com
5
5
Defeating Nondeterminism in LLM Inference
Thinking Machines Lab
thinkingmachines.ai
StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
mit-han-lab
•
GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
Unlock unlimited Related cards