Like all machine learning, LLMs turn a logic problem into a statistics problem: instead of people writing the pattern for each possible question by hand, which doesn’t scale, you give the machine a meaningful sample of all the text and data that there is and it works out the patterns for itself, and that does scale (or should do). You get the machi... See more
In general, I see LLMs to be used in two broad categories: data processing, which is more of a worker use-cases, where the latency isn't the biggest issue but rather quality, and in user-interactions, where latency is a big factor. I think for the faster case a faster fallback is necessary. Or you escalate upwards, you first rely on a smaller more ... See more