GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization
Ollama
ollama.com
Llama 2 - Resource Overview - Meta AI
ai.meta.com

ExLlamaV2
ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.
Overview of differences compared to V1
ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.
Overview of differences compared to V1
- Faster, better kernels
- Cleaner and more versatile codebase
- Support for a new quant format (see below)
turboderp • GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs
GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
mit-han-labgithub.comOllama
Get up and running with large language models locally.
macOS
Download
Windows
Coming soon!
Linux & WSL2
curl https://ollama.ai/install.sh | sh
Manual install instructions
Docker
The official Ollama Docker image ollama/ollama is available on Docker Hub.
Quickstart
To run and chat with Llama 2:
ollama run llama2
Model library
Ollama supports a lis... See more
Get up and running with large language models locally.
macOS
Download
Windows
Coming soon!
Linux & WSL2
curl https://ollama.ai/install.sh | sh
Manual install instructions
Docker
The official Ollama Docker image ollama/ollama is available on Docker Hub.
Quickstart
To run and chat with Llama 2:
ollama run llama2
Model library
Ollama supports a lis... See more
jmorganca • GitHub - jmorganca/ollama: Get up and running with Llama 2 and other large language models locally
llamafile lets you distribute and run LLMs with a single file. (announcement blog post)
Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executa... See more
Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executa... See more
Mozilla-Ocho • GitHub - Mozilla-Ocho/llamafile: Distribute and run LLMs with a single file.
2-5x faster 50% less memory local LLM finetuning
- Manual autograd engine - hand derived backprop steps.
- 2x to 5x faster than QLoRA. 50% less memory usage.
- All kernels written in OpenAI's Triton language.
- 0% loss in accuracy - no approximation methods - all exact.
- No change of hardware necessary. Supports NVIDIA GPUs since 2018+. Minimum CUDA Compute Cap