AI on the Edge // Local First
The Mac Studio is an absolute monster for inferencing; but there are a couple of caveats.
- Its slower, pound for pound, than a 4090 when dealing with models the 4090 can fit in its VRAM. So a 13b model on the 4090 is almost twice as fast as it running on the M2.
- The M1 Ultra Mac Studio with 128GB costs far less ($3700 or so) and the inference speed is
r/LocalLLaMA - Reddit
Mem0: The Memory Layer for Personalized AI
Mem0 provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI experiences across applications.
Mem0 provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI experiences across applications.
Note: The Mem0 repository now also includes the Embedchain project. We continue to maintain and support Embedchain ❤️. You can find the Embedchain codebase in the embedchai... See more
GitHub - mem0ai/mem0: The memory layer for Personalized AI
Tabby
Opensource, self-hosted AI coding assistant
Opensource, self-hosted AI coding assistant
Home | Tabby
My recommendation for anyone looking to do ML work, especially training LLMs, is to use a cloud service like Lambda Labs. You'll spend less time training and you'll still be able to code while it's going on.
The 36GB RAM is dynamically shared between your system and your GPU. If you're planning to run containers and an IDE and a browser alongside... See more
The 36GB RAM is dynamically shared between your system and your GPU. If you're planning to run containers and an IDE and a browser alongside... See more
r/MachineLearning - Reddit
🎤 audapolis
audapolis aims to make the workflow for spoken-word-heavy media editing easier, faster and more accessible.
An editor for spoken-word media with transcription.
audapolis aims to make the workflow for spoken-word-heavy media editing easier, faster and more accessible.
- It gives you a wordprocessor-like experience for media editing.
- It can automatically transcribe your audio to text.
- It can be used for Video, Audio and mixed editing - Do radio
bugbakery • GitHub - bugbakery/audapolis: an editor for spoken-word audio with automatic transcription
GPT4All: An ecosystem of open-source on-edge large language models.
Important
GPT4All v2.5.0 and newer only supports models in GGUF format (.gguf). Models used with a previous version of GPT4All (.bin extension) will no longer work.
GPT4All is an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs... See more
Important
GPT4All v2.5.0 and newer only supports models in GGUF format (.gguf). Models used with a previous version of GPT4All (.bin extension) will no longer work.
GPT4All is an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs... See more
nomic-ai • GitHub - nomic-ai/gpt4all: gpt4all: open-source LLM chatbots that you can run anywhere
slowllama
Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.
slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch... See more
Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.
slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch... See more
okuvshynov • GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization
M3 max is objectively worse than the M2 for inference.
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M... See more
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M... See more
r/LocalLLaMA - Reddit
Ollama
Get up and running with large language models locally.
macOS
Download
Windows
Coming soon!
Linux & WSL2
curl https://ollama.ai/install.sh | sh
Manual install instructions
Docker
The official Ollama Docker image ollama/ollama is available on Docker Hub.
Quickstart
To run and chat with Llama 2:
ollama run llama2
Model library
Ollama supports a list of... See more
Get up and running with large language models locally.
macOS
Download
Windows
Coming soon!
Linux & WSL2
curl https://ollama.ai/install.sh | sh
Manual install instructions
Docker
The official Ollama Docker image ollama/ollama is available on Docker Hub.
Quickstart
To run and chat with Llama 2:
ollama run llama2
Model library
Ollama supports a list of... See more