GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

turboderpgithub.com
Thumbnail of GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

Self-Host DeepSeek with Ollama and Open WebUI

Jeremynoted.lol
Thumbnail of Self-Host DeepSeek with Ollama and Open WebUI

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

mit-han-labgithub.com
Thumbnail of GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

Introduction | LLM Inference in Production

bentoml.com
Thumbnail of Introduction | LLM Inference in Production

Llama 2 - Resource Overview - Meta AI

ai.meta.com
Thumbnail of Llama 2 - Resource Overview - Meta AI

What We Learned From a Year of Building With LLMs

Bryan Bischoforeilly.com
Thumbnail of What We Learned From a Year of Building With LLMs