r/LocalLLaMA - Reddit
The Mac Studio is an absolute monster for inferencing; but there are a couple of caveats.
- Its slower, pound for pound, than a 4090 when dealing with models the 4090 can fit in its VRAM. So a 13b model on the 4090 is almost twice as fast as it running on the M2.
- The M1 Ultra Mac Studio with 128GB costs far less ($3700 or so) and the inference speed is
r/LocalLLaMA - Reddit
Trying to get a better understanding of how prompts work in relation to fine-tunes, and trying to see if any of them are actually reliable enough to be used in a "production" type environment.
My end goals are basically
My end goals are basically
- A reliable AI assistant that I know is safe, secure and private. Any information about myself, my household or my proprietary ideas
r/LocalLLaMA - Reddit
am using my own hardware at home to infer, train, and fine-tune (or trying to; my training efforts have been pretty disasterous so far, but inference works very well).
My current uses of LLM inference are:
My current uses of LLM inference are:
- Asking questions of a RAG system backed by a locally indexed Wikipedia dump, mainly with Marx-3B and PuddleJumper-13B-v2,
- Code co-pilot with
r/LocalLLaMA - Reddit
M3 max is objectively worse than the M2 for inference.
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M... See more
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M... See more