Nicolay Gerold
@nicolaygerold
Nicolay Gerold
@nicolaygerold
Backend-Tools and Rust
Intuition : Prompt tokens add very little latency to completion calls. Time to generate completion tokens is much longer, as tokens are generated one at a time. Longer generation lengths will accumulate latency due to generation required for each token.
LLM applications backed by Indexify will never answer outdated information.