AI on the Edge // Local First
Get Started · Examples · Try the Demo · Docs · Discord
Instant is a client-side database that makes it easy to build real-time and collaborative apps like Notion or Figma.
You write relational queries in the shape of the data you want and Instant handles all the data fetching, permission checking, and offline caching. When you change data,... See more
Instant is a client-side database that makes it easy to build real-time and collaborative apps like Notion or Figma.
You write relational queries in the shape of the data you want and Instant handles all the data fetching, permission checking, and offline caching. When you change data,... See more
GitHub - instantdb/instant: The realtime client-side database
The Mac Studio is an absolute monster for inferencing; but there are a couple of caveats.
- Its slower, pound for pound, than a 4090 when dealing with models the 4090 can fit in its VRAM. So a 13b model on the 4090 is almost twice as fast as it running on the M2.
- The M1 Ultra Mac Studio with 128GB costs far less ($3700 or so) and the inference speed is
r/LocalLLaMA - Reddit
Mem0: The Memory Layer for Personalized AI
Mem0 provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI experiences across applications.
Mem0 provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI experiences across applications.
Note: The Mem0 repository now also includes the Embedchain project. We continue to maintain and support Embedchain ❤️. You can find the Embedchain codebase in the embedchai... See more
GitHub - mem0ai/mem0: The memory layer for Personalized AI
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Table of Contents
1. Introduction
Large... See more
Table of Contents
- Introduction
- Key LLM Serving Techniques
- Dynamic SplitFuse: A Novel Prompt and Generation Composition Strategy
- Performance Evaluation
- DeepSpeed-FastGen: Implementation and Usage
- Try out DeepSpeed-FastGen
- Acknowledgements
1. Introduction
Large... See more
microsoft • DeepSpeed-FastGen
🎤 audapolis
audapolis aims to make the workflow for spoken-word-heavy media editing easier, faster and more accessible.
An editor for spoken-word media with transcription.
audapolis aims to make the workflow for spoken-word-heavy media editing easier, faster and more accessible.
- It gives you a wordprocessor-like experience for media editing.
- It can automatically transcribe your audio to text.
- It can be used for Video, Audio and mixed editing - Do radio
bugbakery • GitHub - bugbakery/audapolis: an editor for spoken-word audio with automatic transcription
M3 max is objectively worse than the M2 for inference.
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M... See more
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M... See more
r/LocalLLaMA - Reddit
- If you are looking to develop an AI application, and you have a Mac or Linux machine, Ollama is great because it’s very easy to set up, easy to work with, and fast.
- If you are looking to chat locally with documents, GPT4All is the best out of the box solution that is also easy to set up
- If you are looking for advanced control and insight into neural
Moyi • 10 Ways To Run LLMs Locally And Which One Works Best For You
GPT4All: An ecosystem of open-source on-edge large language models.
Important
GPT4All v2.5.0 and newer only supports models in GGUF format (.gguf). Models used with a previous version of GPT4All (.bin extension) will no longer work.
GPT4All is an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs... See more
Important
GPT4All v2.5.0 and newer only supports models in GGUF format (.gguf). Models used with a previous version of GPT4All (.bin extension) will no longer work.
GPT4All is an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs... See more
nomic-ai • GitHub - nomic-ai/gpt4all: gpt4all: open-source LLM chatbots that you can run anywhere
eneral-purpose models
- 1.1B: TinyDolphin 2.8 1.1B. Takes about ~700MB RAM and tested on my Pi 4 with 2 gigs of RAM. Hallucinates a lot, but works for basic conversation.
- 2.7B: Dolphin 2.6 Phi-2. Takes over ~2GB RAM and tested on my 3GB 32-bit phone via llama.cpp on Termux.
- 7B: Nous Hermes Mistral 7B DPO. Takes about ~4-5GB RAM depending on