GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

by turboderp

Thumbnail of GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

updated 10mo ago

  • from GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization by okuvshynov

    Nicolay Gerold added

  • from GitHub - unslothai/unsloth: 5X faster 50% less memory LLM finetuning by unslothai

    Nicolay Gerold added

  • from This AI newsletter is all you need #68

    Nicolay Gerold added