Scaling AI Models Like You Mean It
Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the in... See more
Developing Rapidly with Generative AI
Nicolay Gerold added
Several engineers also maintained fallback models for reverting to: either older or simpler versions (Lg2, Lg3, Md6, Lg5, Lg6). Lg5 mentioned that it was important to always keep some model up and running, even if they “switched to a less economic model and had to just cut the losses.” Similarly, when doing data science work, both Passi and Jackson... See more
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
Nicolay Gerold added
You’ve got a vector database that has all the right database fundamentals you require, has the right incremental indexing strategy for your use case, has a good story around your metadata filtering needs, and will keep its index up-to-date with latencies you can tolerate. Awesome.
Your ML team (or maybe OpenAI) comes out with a new version of their... See more
Your ML team (or maybe OpenAI) comes out with a new version of their... See more
6 Hard Problems Scaling Vector Search
Nicolay Gerold added
Top considerations when choosing foundation models
Accuracy
Cost
Latency
Privacy
Top challenges when deploying production AI
Serving cost
Evaluation
Infra reliability
Model quality
Accuracy
Cost
Latency
Privacy
Top challenges when deploying production AI
Serving cost
Evaluation
Infra reliability
Model quality
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.
Nicolay Gerold added
The human-centric platform for production ML & AI
Access data easily, scale compute cost-efficiently, and ship to production confidently with fully managed infrastructure, running securely in your cloud.
Access data easily, scale compute cost-efficiently, and ship to production confidently with fully managed infrastructure, running securely in your cloud.
Infrastructure for ML, AI, and Data Science | Outerbounds
Nicolay Gerold added
Deploying a Generative AI model requires more than a VM with a GPU. It normally includes:
- Container Service : Most often Kubernetes to run LLM Serving solutions like Hugging Face Text Generation Inference or vLLM.
- Compute Resources : GPUs for running models, CPUs for management services
- Networking and DNS : Routing traffic to the appropriate servic
Understanding the Cost of Generative AI Models in Production
Nicolay Gerold added
In 2020, Jared Kaplan and his collaborators at OpenAI suggested that there was a set of “scaling laws” for neural network models of language; they found that the more data they fed into their neural networks, the better those networks performed.10 The implication was that we could do better and better AI if we gather more data and apply deep learni... See more
Gary Marcus • Deep Learning Is Hitting a Wall
Prashanth Narayan added