GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs
GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
mit-han-labgithub.comIntroduction | LLM Inference in Production
bentoml.com
Llama 2 - Resource Overview - Meta AI
ai.meta.com
