GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

turboderp github.com

RelatedHighlights

Self-Host DeepSeek with Ollama and Open WebUI

Jeremy noted.lol

Thumbnail of Self-Host DeepSeek with Ollama and Open WebUI

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

mit-han-lab github.com

Thumbnail of GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

Introduction | LLM Inference in Production

Thumbnail of Introduction | LLM Inference in Production

Llama 2 - Resource Overview - Meta AI

Thumbnail of Llama 2 - Resource Overview - Meta AI

What We Learned From a Year of Building With LLMs

Bryan Bischof oreilly.com

Thumbnail of What We Learned From a Year of Building With LLMs