GitHub - okuvshynov/slowllama: Finetune llama2-70b and codel...

GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization

okuvshynov github.com

RelatedHighlights

Ollama

ollama.com

Llama 2 - Resource Overview - Meta AI

ai.meta.com

Discover, Download, and Run Local LLMs

lmstudio.ai lmstudio.ai

ExLlamaV2

ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.

Overview of differences compared to V1

Faster, better kernels

Cleaner and more versatile codebase

Support for a new quant format (see below)

turboderp • GitHub - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

mit-han-lab github.com

Ollama

Get up and running with large language models locally.

macOS

Download

Windows

Coming soon!

Linux & WSL2

curl https://ollama.ai/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Quickstart

To run and chat with Llama 2:

ollama run llama2

Model library

Ollama supports a lis... See more

jmorganca • GitHub - jmorganca/ollama: Get up and running with Llama 2 and other large language models locally

llamafile lets you distribute and run LLMs with a single file. (announcement blog post)

Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executa... See more

Mozilla-Ocho • GitHub - Mozilla-Ocho/llamafile: Distribute and run LLMs with a single file.

2-5x faster 50% less memory local LLM finetuning

Manual autograd engine - hand derived backprop steps.

2x to 5x faster than QLoRA. 50% less memory usage.

All kernels written in OpenAI's Triton language.

0% loss in accuracy - no approximation methods - all exact.

No change of hardware necessary. Supports NVIDIA GPUs since 2018+. Minimum CUDA Compute Cap