r/MachineLearning - Reddit
- Its slower, pound for pound, than a 4090 when dealing with models the 4090 can fit in its VRAM. So a 13b model on the 4090 is almost twice as fast as it running on the M2.
- The M1 Ultra Mac Studio with 128GB costs far less ($3700 or so) and the inference speed is
r/LocalLLaMA - Reddit
Nicolay Gerold added
The M2 ultra has a higher max RAM size of 192 GB
The M1 ultra has 128 GB max ram.
When it comes to these ram numbers something like 2/3 of it is available for inference.
So I see no reason why not to make a general recommendation for the M1 ultra unless you have some reason you want to run q5_K_M 1... See more
r/LocalLLaMA - Reddit
Nicolay Gerold added
Understanding the Cost of Generative AI Models in Production
Nicolay Gerold added
r/MachineLearning - Reddit
Nicolay Gerold added
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
Nicolay Gerold added
learnings from one experiment into the next, like a guided search to find the best idea (Lg2, Sm4,
Lg5). Lg5 described their ideological shift from random search to guided search:
Previously, I tried to do a lot of parallelization. If I focus on one idea, a week at a time,
then it boosts my productivity a lot more.
By following a guided search, engineers are, essentially, significantly pruning a large subset of
experiment ideas without executing them. While it may seem like there are unlimited computational
resources, the search space is much larger, and developer time and energy is limited. At the end of
the day, experiments are human-validated and deployed. Mature ML engineers know their personal
tradeoff between parallelizing disjoint experiment ideas and pipelining ideas that build on top of
each other, ultimately yielding successful deployments
Reply
reply
LinuxSpinach
•
5h ago
^ this. And especially classification as a task, because businesses don’t want to pay llm buck... See more
r/MachineLearning - Reddit
Nicolay Gerold added
We're doing NER on hundreds of millions of documents in a specialised niche. LLMs are terrible for this. Slow, expensive and horrifyingly inaccurate. Even with agents, pydantic parsing and the like. Supervised methods are the way to go. Hell, I'd take an old school rule based approach over LLMs for this.
- Traditional AI - The most secure, understandable, and performant. However, Good implementations of traditional AI require that we define the rules behind the system, which makes it unfeasible for many of the use cases that the other 2 techniques thrive on.
- Supervised Machine Learning- Middle of the road b/w AI and Deep Learning. Good when we have
Devansh • How to Pick between Traditional AI, Supervised Machine Learning, and Deep Learning [Thoughts]
Nicolay Gerold added
Where would I add generative AI? Generative AI has the ease of accessibility of traditional AI, where people think it is understandable, but it does not have that feature in itself. It also has the opaque and costly nature of DL. Many companies are at the moment rushing into developing things with generative AI without having any prior foundation in AI and any processes set up to manage it: data ops, devops, …
Traditional AI forces you to think about how something works, understand the system, and then define the rules for it. ML lets you use features and feature importance to shortcut some. Deep Learning allows you to brute force it. Generative AI allows you to brute force without any background in DL.
Developing Rapidly with Generative AI
Nicolay Gerold added