r/LocalLLaMA - Reddit

Query the RAG anyway and let the LLM itself chose whether to use the the RAG context or its built in knowledge

Query the RAG but only provide the result to the LLM if it meets some level of relevancy (ie embedding distance) to the question

Run the LLM both on it's own and with the RAG response, use a heuristic (or another LLM) to pick the best answer