r/LocalLLaMA - Reddit

reddit.com

RelatedInsightsHighlights

The Mac Studio is an absolute monster for inferencing; but there are a couple of caveats.

Its slower, pound for pound, than a 4090 when dealing with models the 4090 can fit in its VRAM. So a 13b model on the 4090 is almost twice as fast as it running on the M2.

The M1 Ultra Mac Studio with 128GB costs far less ($3700 or so) and the inference speed is

r/LocalLLaMA - Reddit

Nicolay Gerold

Trying to get a better understanding of how prompts work in relation to fine-tunes, and trying to see if any of them are actually reliable enough to be used in a "production" type environment.

My end goals are basically

A reliable AI assistant that I know is safe, secure and private. Any information about myself, my household or my proprietary ideas

r/LocalLLaMA - Reddit

Nicolay Gerold

am using my own hardware at home to infer, train, and fine-tune (or trying to; my training efforts have been pretty disasterous so far, but inference works very well).

My current uses of LLM inference are:

Asking questions of a RAG system backed by a locally indexed Wikipedia dump, mainly with Marx-3B and PuddleJumper-13B-v2,

Code co-pilot with

r/LocalLLaMA - Reddit

Nicolay Gerold

M3 max is objectively worse than the M2 for inference.

The M2 ultra has a higher max RAM size of 192 GB

The M1 ultra has 128 GB max ram.

When it comes to these ram numbers something like 2/3 of it is available for inference.

So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M... See more

r/LocalLLaMA - Reddit

Nicolay Gerold