Sublime
An inspiration engine for ideas
1. Synthetic Data for Baseline Metrics¶
Synthetic data can be used to establish baseline precision and recall metrics for your reverse search. The simplest kind of synthetic data is to take existing text chunks, generate synthetic questions, and verify that when we query our synthetic questions, the sourced text chunk is retrieved correctly.
Benefi... See more
Synthetic data can be used to establish baseline precision and recall metrics for your reverse search. The simplest kind of synthetic data is to take existing text chunks, generate synthetic questions, and verify that when we query our synthetic questions, the sourced text chunk is retrieved correctly.
Benefi... See more
Low-Hanging Fruit for RAG Search - jxnl.co

- Scalability is crucial - systems need to be designed with the assumption that query volume, document corpus size, indexing complexity etc. could increase by 10x. What works at one scale may completely break at a higher scale.
- Sharding the index, either by document or by word, is important to distribute the indexing and querying load across machines.
Claude
a Paul Graham, http://paulgraham.com/ds.html
Joanne Molesky • Lean Enterprise: How High Performance Organizations Innovate at Scale
Intermittent resource availability.
Josh Kaufman • The First 20 Hours: How to Learn Anything . . . Fast!
We measure customer adoption, usage, & retention on a daily basis. We believe that an iterative model that evolves quickly based on feedback wins.