Developing Rapidly with Generative AI
The Importance of Monitoring and Assessing Generative AI Use Cases
Summary:
The Chief Data and Technology Officer of a large marketing services company emailed about the potential of generative AI within the organization.
They prefer a decentralized approach but want to monitor progress. Only 8% of organizations have such systems in production.
They
... See moreHBR IdeaCast • Leading a Workforce Empowered by New AI Tools
Deploying a Generative AI model requires more than a VM with a GPU. It normally includes:
- Container Service : Most often Kubernetes to run LLM Serving solutions like Hugging Face Text Generation Inference or vLLM.
- Compute Resources : GPUs for running models, CPUs for management services
- Networking and DNS : Routing traffic to the appropriate
Understanding the Cost of Generative AI Models in Production
In general, I see LLMs to be used in two broad categories: data processing, which is more of a worker use-cases, where the latency isn't the biggest issue but rather quality, and in user-interactions, where latency is a big factor. I think for the faster case a faster fallback is necessary. Or you escalate upwards, you first rely on a smaller more... See more