The Many Ways to Deploy a Model | Outerbounds
Top considerations when choosing foundation models
Accuracy
Cost
Latency
Privacy
Top challenges when deploying production AI
Serving cost
Evaluation
Infra reliability
Model quality
Accuracy
Cost
Latency
Privacy
Top challenges when deploying production AI
Serving cost
Evaluation
Infra reliability
Model quality
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.
Nicolay Gerold added
Amershi et al . [3] state that software teams “flight” changes or updates to ML models, often by testing them on a few cases prior to live deployment. Our work provides further context into the evaluation and deployment process for production ML pipelines: we found that several organizations, particularly those with many customers, employed a multi... See more
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
Nicolay Gerold added
In general, I see LLMs to be used in two broad categories: data processing, which is more of a worker use-cases, where the latency isn't the biggest issue but rather quality, and in user-interactions, where latency is a big factor. I think for the faster case a faster fallback is necessary. Or you escalate upwards, you first rely on a smaller more ... See more
Discord - A New Way to Chat with Friends & Communities
Nicolay Gerold added
Several engineers also maintained fallback models for reverting to: either older or simpler versions (Lg2, Lg3, Md6, Lg5, Lg6). Lg5 mentioned that it was important to always keep some model up and running, even if they “switched to a less economic model and had to just cut the losses.” Similarly, when doing data science work, both Passi and Jackson... See more
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
Nicolay Gerold added
Hi everyone! How do you guys go about choosing the granularity of your ML response ?. For instance, let us say you have been tasked with predicting the purchase probability for an item and this is how your merch hierarchy looks -
1) department
2) category
3) sub category
4) item
The trade off here is between granularity and response sparsity ie if you ... See more
1) department
2) category
3) sub category
4) item
The trade off here is between granularity and response sparsity ie if you ... See more
Discord - A New Way to Chat with Friends & Communities
Nicolay Gerold added
However development time, and maintenance can offset these savings. Hiring skilled data scientists, machine learning engineers, and DevOps professionals can be expensive and time consuming. Using available resources for “reimplementing” solutions hinder innovation and lead to a lack of focus. Since You not longer work on improving your model or pro... See more
Understanding the Cost of Generative AI Models in Production
Nicolay Gerold added
Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the in... See more
Developing Rapidly with Generative AI
Nicolay Gerold added
All of the effort spent deliberating on edge cases and long tails stems from the fact that many junior devs are not actually thinking hard enough about what the experiment should be, and what the metrics should look like.
The goal of building out these probabilistic software systems is not a milestone or a feature. Instead, what we're looking for a... See more
The goal of building out these probabilistic software systems is not a milestone or a feature. Instead, what we're looking for a... See more
Jason Liu • Tips for probabilistic software - jxnl.co
Nicolay Gerold added