GitHub - turboderp/exllamav2: A fast inference library for r...

GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

RelatedHighlights

ExLlamaV2

ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.

Overview of differences compared to V1

turboderp • GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs