DeepSpeed-FastGen

Introduction | LLM Inference in Production

bentoml.com
Thumbnail of Introduction | LLM Inference in Production

Defeating Nondeterminism in LLM Inference

Thinking Machines Labthinkingmachines.ai
Thumbnail of Defeating Nondeterminism in LLM Inference

mit-han-lab GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks