AI on the Edge // Local First
slowllama
Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.
slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch (unattainable... See more
Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.
slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch (unattainable... See more
okuvshynov • GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization
🎤 audapolis
audapolis aims to make the workflow for spoken-word-heavy media editing easier, faster and more accessible.
An editor for spoken-word media with transcription.
audapolis aims to make the workflow for spoken-word-heavy media editing easier, faster and more accessible.
- It gives you a wordprocessor-like experience for media editing.
- It can automatically transcribe your audio to text.
- It can be used for Video, Audio and mixed editing - Do radio show
bugbakery • GitHub - bugbakery/audapolis: an editor for spoken-word audio with automatic transcription
am using my own hardware at home to infer, train, and fine-tune (or trying to; my training efforts have been pretty disasterous so far, but inference works very well).
My current uses of LLM inference are:
My current uses of LLM inference are:
- Asking questions of a RAG system backed by a locally indexed Wikipedia dump, mainly with Marx-3B and PuddleJumper-13B-v2,
- Code co-pilot with Rift-C
r/LocalLLaMA - Reddit
Ollama
Get up and running with large language models locally.
macOS
Download
Windows
Coming soon!
Linux & WSL2
curl https://ollama.ai/install.sh | sh
Manual install instructions
Docker
The official Ollama Docker image ollama/ollama is available on Docker Hub.
Quickstart
To run and chat with Llama 2:
ollama run llama2
Model library
Ollama supports a lis... See more
Get up and running with large language models locally.
macOS
Download
Windows
Coming soon!
Linux & WSL2
curl https://ollama.ai/install.sh | sh
Manual install instructions
Docker
The official Ollama Docker image ollama/ollama is available on Docker Hub.
Quickstart
To run and chat with Llama 2:
ollama run llama2
Model library
Ollama supports a lis... See more
jmorganca • GitHub - jmorganca/ollama: Get up and running with Llama 2 and other large language models locally
Jazz.ToolsPowerSyncFireproofAutomergeDXOSElectricSQLand of course, Berlin's own: Yjs .
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
TorchMultimodal (Beta Release)
Introduction
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:
Introduction
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:
- A repository of modular and composable building blocks (models, fusion layers, loss functions, datasets and utilities).
- A repository of examples that show how to combine these building bloc
facebookresearch • GitHub - facebookresearch/multimodal at a33a8b888a542a4578b16972aecd072eff02c1a6
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Table of Contents
1. Introduction
Large langu... See more
Table of Contents
- Introduction
- Key LLM Serving Techniques
- Dynamic SplitFuse: A Novel Prompt and Generation Composition Strategy
- Performance Evaluation
- DeepSpeed-FastGen: Implementation and Usage
- Try out DeepSpeed-FastGen
- Acknowledgements
1. Introduction
Large langu... See more
microsoft • DeepSpeed-FastGen
M3 max is objectively worse than the M2 for inference.
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M 1... See more
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M 1... See more
r/LocalLLaMA - Reddit
llamafile lets you distribute and run LLMs with a single file. (announcement blog post)
Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executa... See more
Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executa... See more