Scalability is crucial - systems need to be designed with the assumption that query volume, document corpus size, indexing complexity etc. could increase by 10x. What works at one scale may completely break at a higher scale.
Sharding the index, either by document or by word, is important to distribute the indexing and querying load across machines.
Multiple indices. Splitting the document corpus up into multiple indices and then routing queries based on some criteria. This means that the search is over a much smaller set of documents rather than the entire dataset. Again, it is not always useful, but it can be helpful for certain datasets. The same approach works with the LLMs themselves.