Announcing Together Inference Engine – the fastest inference available
Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the in... See more
Developing Rapidly with Generative AI
Nicolay Gerold added
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Table of Contents
1. Introduction
Large langu... See more
Table of Contents
- Introduction
- Key LLM Serving Techniques
- Dynamic SplitFuse: A Novel Prompt and Generation Composition Strategy
- Performance Evaluation
- DeepSpeed-FastGen: Implementation and Usage
- Try out DeepSpeed-FastGen
- Acknowledgements
1. Introduction
Large langu... See more
microsoft • DeepSpeed-FastGen
Nicolay Gerold added
TL;DR
We are thrilled to announce the release of the FASTEST Voice LLM to date! Experience real-time speech streaming from text in 300ms or less. Dive in and test it using our Playground, available SDKs, or these Replit demos for both Nodejs and Python and a chatGPT integration.
Introduction
At PlayHT, our vision revolves around redefining human int... See more
We are thrilled to announce the release of the FASTEST Voice LLM to date! Experience real-time speech streaming from text in 300ms or less. Dive in and test it using our Playground, available SDKs, or these Replit demos for both Nodejs and Python and a chatGPT integration.
Introduction
At PlayHT, our vision revolves around redefining human int... See more
Introducing PlayHT 2.0 Turbo ⚡️ - The Fastest Generative AI Text-to-Speech API
Nicolay Gerold added
The Most Affordable Cloud for AI/ML Inference at Scale
Deploy AI/ML production models without headaches on the lowest priced GPUs (starting from $0.02/hr) in the market. Get 10X-100X more inferences per dollar compared to managed services and hyperscalers.
Deploy AI/ML production models without headaches on the lowest priced GPUs (starting from $0.02/hr) in the market. Get 10X-100X more inferences per dollar compared to managed services and hyperscalers.
Salad - GPU Cloud | 10k+ GPUs for Generative AI
Nicolay Gerold added
Workers AI? It’s another building block that we’re adding to our developer platform - one that helps developers run well-known AI models on serverless GPUs, all on Cloudflare’s trusted global network. As one of the latest additions to our developer platform, it works seamlessly with Workers + Pages, but to make it truly accessible, we’ve made it pl... See more
Phil Wittig • Workers AI: serverless GPU-powered inference on Cloudflare’s global network
Nicolay Gerold added
The human-centric platform for production ML & AI
Access data easily, scale compute cost-efficiently, and ship to production confidently with fully managed infrastructure, running securely in your cloud.
Access data easily, scale compute cost-efficiently, and ship to production confidently with fully managed infrastructure, running securely in your cloud.
Infrastructure for ML, AI, and Data Science | Outerbounds
Nicolay Gerold added
4. Introducing Stable LM 3B: Bringing Sustainable, High-Performance Language Models to Smart Devices
Stability AI introduced Stable LM 3B, a high-performing language model designed for smart devices. With 3 billion parameters, it outperforms state-of-the-art 3B models and reduces operating costs and power consumption. The model enables a broader ran... See more
Stability AI introduced Stable LM 3B, a high-performing language model designed for smart devices. With 3 billion parameters, it outperforms state-of-the-art 3B models and reduces operating costs and power consumption. The model enables a broader ran... See more
This AI newsletter is all you need #68
Nicolay Gerold added
We went to OpenAI's office in San Francisco yesterday to ask them all the questions we had on Quivr (YC W24), here is what we learned:
1. Their office is super nice & you can eat damn good croissant in SF!
2. We can expect GPT 3.5 & 4 price to keep going down
3. A lot of people are using the Assistants API to build their use cases
4. It costs ... See more
1. Their office is super nice & you can eat damn good croissant in SF!
2. We can expect GPT 3.5 & 4 price to keep going down
3. A lot of people are using the Assistants API to build their use cases
4. It costs ... See more