Fireworks Console
November 13, 2023・By Together
The Together Inference Engine is multiple times faster than any other inference service, with 117 tokens per second on Llama-2-70B-Chat and 171 tokens per second on Llama-2-13B-Chat
Today we are announcing Together Inference Engine, the world’s fast... See more
Announcing Together Inference Engine – the fastest inference available
Nicolay Gerold added
endpoint from any ML model
All the infrastructure required to run AI models with a simple API call
curl -X POST 'https://www.mystic.ai/v3/runs' -H 'Authorization: Bearer YOUR_TOKEN' -H 'Content-Type: application/json' -d '{"pipeline_id_or_pointer": "meta/llama2-70B-chat:latest", "input_data": [{"type":"string","value":"... See more
Mystic.ai
Nicolay Gerold added
Deploy AI/ML production models without headaches on the lowest priced GPUs (starting from $0.02/hr) in the market. Get 10X-100X more inferences per dollar compared to managed services and hyperscalers.
Salad - GPU Cloud | 10k+ GPUs for Generative AI
Nicolay Gerold added
Blaze | The AI Tool for Teams of One
blaze.ai✅ Blazing fast (9.9x faster) with a tiny footprint (~45kb installed)
✅ Load balance across multiple models, providers, and keys
✅ Fallbacks make sure your app ... See more
Portkey-AI • GitHub - Portkey-AI/gateway: A Blazing Fast AI Gateway. Route to 100+ LLMs with 1 fast & friendly API.
Nicolay Gerold added
How to Host Powerful AI Models in the Cloud using Groq Cloud & OpenWebUI
youtube.com# Key Information Summary
## Query on Hosting Large Language Models
- Context: Many users are interested in hosting large language models locally without having access to a powerful GPU or with limited computing resources.
## Overview of Grock
- Grock GQ: A relevant option called Grock is highlighted for hosting large language models remotely.
- Functionality: Grock provides access to large models through an API, allowing users to leverage cloud resources.
- User Action: Users must sign up at grockcloud.com, create an API key, and use it within their applications like Open Web UI.
## Model Examples
- Llama 3 Model: Example given is Llama 3 with 70 billion parameters, which is too large for personal hosting capabilities (e.g., only able to host Llama with 34 billion parameters on an Nvidia 4090 GPU).
- Performance:
- Inference time for the Llama 3 model is 888 milliseconds.
- Tokens processed per second: 311.
## Setup Instructions
1. Sign Up: Sign in to Grock Cloud.
2. Create API Key: After account setup, generate an API key.
3. Integration:
- Paste the API key in the Open Web UI interface under the admin panel settings.
- Verify the connection to ensure successful integration.
## Cost and Accessibility
- Free vs Paid:
- A free version of Grock is available, but usage is limited.
- Additional usage requires a paid subscription, which is described as reasonably priced.
## Engagement Call
- User Interaction: Viewers are encouraged to leave comments or suggestions for future content.
## Additional Resources
- Information about pricing plans will be shared through links in the description.
- References to an Open Web UI playlist for further guidance are also provided.
Access data easily, scale compute cost-efficiently, and ship to production confidently with fully managed infrastructure, running securely in your cloud.
Infrastructure for ML, AI, and Data Science | Outerbounds
Nicolay Gerold added
Sonya Huang • Generative AI’s Act Two
Darren LI added