GitHub - turboderp/exllamav2: A fast inference library for r...

GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

updated 10mo ago

ExLlamaV2

ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.

Overview of differences compared to V1

Faster, better kernels

Cleaner and more versatile codebase

Support for a new quant format (see below)
from GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs by turboderp
Nicolay Gerold added 10mo ago