AI on the Edge // Local First

The Mac Studio is an absolute monster for inferencing; but there are a couple of caveats.

Its slower, pound for pound, than a 4090 when dealing with models the 4090 can fit in its VRAM. So a 13b model on the 4090 is almost twice as fast as it running on the M2.

The M1 Ultra Mac Studio with 128GB costs far less ($3700 or so) and the inference speed is

r/LocalLLaMA - Reddit

Mem0: The Memory Layer for Personalized AI

Mem0 provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI experiences across applications.

Note: The Mem0 repository now also includes the Embedchain project. We continue to maintain and support Embedchain ❤️. You can find the Embedchain codebase in the embedchai

GitHub - mem0ai/mem0: The memory layer for Personalized AI

Tabby

Opensource, self-hosted AI coding assistant

My recommendation for anyone looking to do ML work, especially training LLMs, is to use a cloud service like Lambda Labs. You'll spend less time training and you'll still be able to code while it's going on.

The 36GB RAM is dynamically shared between your system and your GPU. If you're planning to run containers and an IDE and a browser alongside... See more

r/MachineLearning - Reddit

🎤 audapolis

An editor for spoken-word media with transcription.

audapolis aims to make the workflow for spoken-word-heavy media editing easier, faster and more accessible.

It gives you a wordprocessor-like experience for media editing.

It can automatically transcribe your audio to text.

It can be used for Video, Audio and mixed editing - Do radio

bugbakery • GitHub - bugbakery/audapolis: an editor for spoken-word audio with automatic transcription

GPT4All: An ecosystem of open-source on-edge large language models.

Important

GPT4All v2.5.0 and newer only supports models in GGUF format (.gguf). Models used with a previous version of GPT4All (.bin extension) will no longer work.

GPT4All is an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs... See more

nomic-ai • GitHub - nomic-ai/gpt4all: gpt4all: open-source LLM chatbots that you can run anywhere

slowllama

Fine-tune Llama2 and CodeLLama models, including 70B/35B on Apple M1/M2 devices (for example, Macbook Air or Mac Mini) or consumer nVidia GPUs.

slowllama is not using any quantization. Instead, it offloads parts of model to SSD or main memory on both forward/backward passes. In contrast with training large models from scratch... See more

okuvshynov • GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization

M3 max is objectively worse than the M2 for inference.

The M2 ultra has a higher max RAM size of 192 GB

The M1 ultra has 128 GB max ram.

When it comes to these ram numbers something like 2/3 of it is available for inference.

So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M... See more

r/LocalLLaMA - Reddit

Ollama

Get up and running with large language models locally.

macOS

Download

Windows

Coming soon!

Linux & WSL2

curl https://ollama.ai/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Quickstart

To run and chat with Llama 2:

ollama run llama2

Model library

Ollama supports a list of... See more

AI on the Edge // Local First

r/LocalLLaMA - Reddit

GitHub - mem0ai/mem0: The memory layer for Personalized AI

Home | Tabby

r/MachineLearning - Reddit

bugbakery • GitHub - bugbakery/audapolis: an editor for spoken-word audio with automatic transcription

nomic-ai • GitHub - nomic-ai/gpt4all: gpt4all: open-source LLM chatbots that you can run anywhere

okuvshynov • GitHub - okuvshynov/slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization

r/LocalLLaMA - Reddit

jmorganca • GitHub - jmorganca/ollama: Get up and running with Llama 2 and other large language models locally