Understanding the Cost of Generative AI Models in Production
A solution is to self-host an open-sourced or custom fine-tuned LLM. Opting for a self-hosted model can reduce costs dramatically - but with additional development time, maintenance overhead, and possible performance implications. Considering self-hosted solutions requires weighing these different trade-offs carefully.
Developing Rapidly with Generative AI
Nicolay Gerold added
One major challenge is the inference time of the models. While models like ChatGPT have improved in speed, they still take time to process information. When dealing with a large number of agents, there can be significant latency in real-time interactions. Optimizations and fine-tuning will be necessary to make the models faster and more efficient. ... See more
Sandhya Hegde • Autonomous AI agents could change the world, but what do they actually do well?
Darren LI added
Actually, this is disputed, costs of API calls are crucial at this stage, due to simple, 1-sentence action roughly requires 40~75k tokens burn.
Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the in... See more
Developing Rapidly with Generative AI
Nicolay Gerold added
For example they have lower gross margins. Anecdotally, we have seen gross margins of ML companies at around 50/60% vs 60–80% of comparable SaaS businesses. This is due to two factors. Firstly, the hidden cost of the cloud infrastructure. Training a single ML model costs thousands of dollars in computing resources. Secondly, many applications requi... See more
Matilde Giglio • What we look for in AI companies (and why they are different)
sari added
Top considerations when choosing foundation models
Accuracy
Cost
Latency
Privacy
Top challenges when deploying production AI
Serving cost
Evaluation
Infra reliability
Model quality
Accuracy
Cost
Latency
Privacy
Top challenges when deploying production AI
Serving cost
Evaluation
Infra reliability
Model quality
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.
Nicolay Gerold added
When we deliver a model we make sure we don't reach X seconds of latency in our API. Before even going into performance of LLMs for classification, I can tell you that with the current available tech they are just infeasible.
Reply
reply
LinuxSpinach
•
5h ago
^ this. And especially classification as a task, because businesses don’t want to pay llm buck... See more
Reply
reply
LinuxSpinach
•
5h ago
^ this. And especially classification as a task, because businesses don’t want to pay llm buck... See more
r/MachineLearning - Reddit
Nicolay Gerold added
We're doing NER on hundreds of millions of documents in a specialised niche. LLMs are terrible for this. Slow, expensive and horrifyingly inaccurate. Even with agents, pydantic parsing and the like. Supervised methods are the way to go. Hell, I'd take an old school rule based approach over LLMs for this.
In 2019, OpenAI announced GPT-2 with this post:
https://t.co/jjP8IXmu8D
Today (~5 years later) you can train your own for ~$672, running on one 8XH100 GPU node for 24 hours. Our latest llm.c post gives the walkthrough in some detail:
https://t.co/XjLWE2P0Hp... See more
Nathan Storey added