GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

turboderpgithub.com
Thumbnail of GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

Llama 2 - Resource Overview - Meta AI

ai.meta.com
Thumbnail of Llama 2 - Resource Overview - Meta AI

Discover, Download, and Run Local LLMs

lmstudio.ailmstudio.ai
Thumbnail of Discover, Download, and Run Local LLMs

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

mit-han-labgithub.com
Thumbnail of GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

Transformer Explainer: LLM Transformer Model Visually Explained

Polo Chaupoloclub.github.io
Thumbnail of Transformer Explainer: LLM Transformer Model Visually Explained

okuvshynov GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization