Announcing Together Inference Engine – the fastest inference available
November 13, 2023・By Together
The Together Inference Engine is multiple times faster than any other inference service, with 117 tokens per second on Llama-2-70B-Chat and 171 tokens per second on Llama-2-13B-Chat
Today we are announcing Together Inference Engine, the world’s fast... See more
The explosion of generative AI made us pause and consider what was possible now that wasn’t a year ago. We tried many ideas which didn’t really click, eventually discovering the power of turning every feed and job posting into a springboard to:
Get information faster , e.g. takeaways from a post or learn about the latest from a company.
Access data easily, scale compute cost-efficiently, and ship to production confidently with fully managed infrastructure, running securely in your cloud.
Setting up the necessary machine learning infrastructure to run these big models is another challenge. We need a dedicated model server for running model inference (using frameworks like Triton oder vLLM), powerful GPUs to run everything robustly, and configurability in our servers to make sure they're high throughput and low latency. Tuning the in... See more