Systematically Improving Your RAG - jxnl.co
Synthetic data can be used to establish baseline precision and recall metrics for your reverse search. The simplest kind of synthetic data is to take existing text chunks, generate synthetic questions, and verify that when we query our synthetic questions, the sourced text chunk is retrieved correctly.
Benefi... See more
Low-Hanging Fruit for RAG Search - jxnl.co
Nicolay Gerold added
- Multiple indices. Splitting the document corpus up into multiple indices and then routing queries based on some criteria. This means that the search is over a much smaller set of documents rather than the entire dataset. Again, it is not always useful, but it can be helpful for certain datasets. The same approach works with the LLMs themselves.
- Cu
Matt Rickard • Improving RAG: Strategies
Nicolay Gerold added
All information tools have to give users some wa... See more
thesephist.com • Navigate, don't search
- Scalability is crucial - systems need to be designed with the assumption that query volume, document corpus size, indexing complexity etc. could increase by 10x. What works at one scale may completely break at a higher scale.
- Sharding the index, either by document or by word, is important to distribute the indexing and querying load across machines.
Claude
Nicolay Gerold added
The goal of building out these probabilistic software systems is not a milestone or a feature. Instead, what we're looking for a... See more
Jason Liu • Tips for probabilistic software - jxnl.co
Nicolay Gerold added
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
Nicolay Gerold added
learnings from one experiment into the next, like a guided search to find the best idea (Lg2, Sm4,
Lg5). Lg5 described their ideological shift from random search to guided search:
Previously, I tried to do a lot of parallelization. If I focus on one idea, a week at a time,
then it boosts my productivity a lot more.
By following a guided search, engineers are, essentially, significantly pruning a large subset of
experiment ideas without executing them. While it may seem like there are unlimited computational
resources, the search space is much larger, and developer time and energy is limited. At the end of
the day, experiments are human-validated and deployed. Mature ML engineers know their personal
tradeoff between parallelizing disjoint experiment ideas and pipelining ideas that build on top of
each other, ultimately yielding successful deployments
sari and added