
When Guessing Isn’t Good Enough

I think the biggest mistake around improving the system is that most people are spending too much time on the actual synthesis without actually understanding whether or not the data is being retrieved correctly. To avoid this:
- Create synthetic questions for each text chunk in your database
- Use these questions to test your retrieval system
- Calculate p
Systematically Improving Your RAG - jxnl.co
Search is not just getting the answer
Second, using a LLM to seek and find information severely limits your ability to learn. Chirag Shah and Emily M. Bender argue in their 2022 paper Situating Search that using language models for search is “flawed in technical and conceptual terms.”
Second, using a LLM to seek and find information severely limits your ability to learn. Chirag Shah and Emily M. Bender argue in their 2022 paper Situating Search that using language models for search is “flawed in technical and conceptual terms.”
such approaches miss the big picture of why people seek informati... See more
Not Another Chatbot! - Ben Tsai
In addition to insider bullishness, I think there’s a strong intuitive case for why it should be possible to find ways to train models with much better sample efficiency (algorithmic improvements that let them learn more from limited data). Consider how you or I would learn from a really dense math textbook:
- What a modern LLM does during training i