arXiv:2405.02048v1 [cs.IR] 3 May 2024
ColBERT is a
fast
and
accurate
retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See more
fast
and
accurate
retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See more
stanford-futuredata • GitHub - stanford-futuredata/ColBERT: Stanford ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22)
Nicolay Gerold added
If all of these research ambitions were to come to fruition, the resulting system would be a very early version of the system that we envisioned in the introduction. That is, the resulting system would be able to provide domain expert answers to a wide range of information needs in a way that neither modern IR systems, question answering systems, o... See more
Donald Metzler • Rethinking Search: Making Domain Experts out of Dilettantes
Benjamin Searle added
Sounds fancy. Why do we care? GAR involves taking the source documents and having an LLM enrich them, prior to indexing. For example, the LLM might... * Generate titles for documents that are missing them * Standardize author names/formats* Extract dates, URLs, citations and other elements that might be valuable to search as separate fields* Create... See more
Feed | LinkedIn
We envision using the same corpus model as a multi-task learner for multiple IR tasks. To this end, once a corpus model has been trained, it can of course be used for the most classical of all IR tasks – document retrieval. However, by leveraging recent advances in multi-task learning, such a model can very likely be applied to a diverse range of t... See more
Donald Metzler • Rethinking Search: Making Domain Experts out of Dilettantes
Benjamin Searle added
- Multiple indices. Splitting the document corpus up into multiple indices and then routing queries based on some criteria. This means that the search is over a much smaller set of documents rather than the entire dataset. Again, it is not always useful, but it can be helpful for certain datasets. The same approach works with the LLMs themselves.
- Cu
Matt Rickard • Improving RAG: Strategies
Nicolay Gerold added
Michael Iversen added
Cross-Encoder for Hallucination Detection
This model was trained using SentenceTransformers Cross-Encoder class.
The model outputs a probabilitity from 0 to 1, 0 being a hallucination and 1 being factually consistent.
The predictions can be thresholded at 0.5 to predict whether a document is consistent with its source.
Training Data
This model is base... See more
This model was trained using SentenceTransformers Cross-Encoder class.
The model outputs a probabilitity from 0 to 1, 0 being a hallucination and 1 being factually consistent.
The predictions can be thresholded at 0.5 to predict whether a document is consistent with its source.
Training Data
This model is base... See more
vectara/hallucination_evaluation_model · Hugging Face
Nicolay Gerold added