GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization
Ollama
ollama.com
Llama 2 - Resource Overview - Meta AI
ai.meta.com

ExLlamaV2
ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.
Overview of differences compared to V1
ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.
Overview of differences compared to V1
- Faster, better kernels
- Cleaner and more versatile codebase
- Support for a new quant format (see below)
turboderp • GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs
GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
mit-han-labgithub.comGitHub - transformerlab/transformerlab-app: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
github.com