r/LocalLLaMA - Reddit
My recommendation for anyone looking to do ML work, especially training LLMs, is to use a cloud service like Lambda Labs. You'll spend less time training and you'll still be able to code while it's going on.
The 36GB RAM is dynamically shared between your system and your GPU. If you're planning to run containers and an IDE and a browser alongside yo... See more
The 36GB RAM is dynamically shared between your system and your GPU. If you're planning to run containers and an IDE and a browser alongside yo... See more
r/MachineLearning - Reddit
Nicolay Gerold added
Nicolay Gerold and added
In 2019, OpenAI announced GPT-2 with this post:
https://t.co/jjP8IXmu8D
Today (~5 years later) you can train your own for ~$672, running on one 8XH100 GPU node for 24 hours. Our latest llm.c post gives the walkthrough in some detail:
https://t.co/XjLWE2P0Hp... See more
Nathan Storey added
When we deliver a model we make sure we don't reach X seconds of latency in our API. Before even going into performance of LLMs for classification, I can tell you that with the current available tech they are just infeasible.
Reply
reply
LinuxSpinach
•
5h ago
^ this. And especially classification as a task, because businesses don’t want to pay llm buck... See more
Reply
reply
LinuxSpinach
•
5h ago
^ this. And especially classification as a task, because businesses don’t want to pay llm buck... See more
r/MachineLearning - Reddit
Nicolay Gerold added
We're doing NER on hundreds of millions of documents in a specialised niche. LLMs are terrible for this. Slow, expensive and horrifyingly inaccurate. Even with agents, pydantic parsing and the like. Supervised methods are the way to go. Hell, I'd take an old school rule based approach over LLMs for this.
We generally lean towards picking more advanced commercial LLMs to quickly validate our ideas and obtain early feedback from users. Although they may be expensive, the general idea is that if problems can't be adequately solved with state-of-the-art foundational models like GPT-4, then more often than not, those problems may not be addressable usin... See more
Developing Rapidly with Generative AI
Nicolay Gerold added
eneral-purpose models
- 1.1B: TinyDolphin 2.8 1.1B. Takes about ~700MB RAM and tested on my Pi 4 with 2 gigs of RAM. Hallucinates a lot, but works for basic conversation.
- 2.7B: Dolphin 2.6 Phi-2. Takes over ~2GB RAM and tested on my 3GB 32-bit phone via llama.cpp on Termux.
- 7B: Nous Hermes Mistral 7B DPO. Takes about ~4-5GB RAM depending on contex
r/LocalLLaMA - Reddit
Nicolay Gerold added
If you made a thousand versions of an LLM, that’s good at a thousand different things, and you have to load each of those into the GPUs and serve them, it becomes very expensive. The big holy grail right now that everybody’s looking for is: are there techniques, where you can just do small modifications where you can get really good results? There... See more
Sarah Wang • What Builders Talk About When They Talk About AI | Andreessen Horowitz
Nicolay Gerold added
PEFT in a nutshell.