GitHub - michaelfeil/infinity: Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of sentence-transformer models and frameworks.
Why Infinity:
Infinity provides the following features:
Infinity provides the following features:
- Deploy virtually any SentenceTransformer - deploy the model you know from SentenceTransformers
- Fast inference backends : The inference server is built on top of torch, fastembed(onnx-cpu) and CTranslate2, getting most out of your CUDA or CPU hardware.
- Dynamic batching : New embedding requests