If all of these research ambitions were to come to fruition, the resulting system would be a very early version of the system that we envisioned in the introduction. That is, the resulting system would be able to provide domain expert answers to a wide range of information needs in a way that neither modern IR systems, question answering systems, o... See more
The very fact that ranking is a critical component of this paradigm is a symptom of the retrieval system providing users a selection of potential answers, which induces a rather significant cognitive burden on the user. The desire to return answers instead of ranked lists of results was one of the motivating factors for developing question answerin... See more
We envision using the same corpus model as a multi-task learner for multiple IR tasks. To this end, once a corpus model has been trained, it can of course be used for the most classical of all IR tasks – document retrieval. However, by leveraging recent advances in multi-task learning, such a model can very likely be applied to a diverse range of t... See more
Building such domain experts would likely require developing an artificial general intelligence, which is beyond the scope of this paper. Instead, by “domain expert” we specifically mean that the system is capable of producing results (with or without actual “understanding”) that are of the same quality as a human expert in the given domain.
Pre-trained language models (LM), by contrast, are capable of directly generating prose that may be responsive to an information need, but at present they are *dilettantes* rather than domain experts – they do not have a true understanding of the world, they are prone to hallucinating, and crucially they are incapable of justifying their utterances... See more
This represents a fundamentally different way of thinking about IR systems. Within the index-retrieve-then-rank paradigm, modeling work (e.g., query understanding, document understanding, retrieval, ranking, etc.) is done on top of the index itself. This results in modern IR systems being comprised of a disparate mix of heterogeneous models (e.g., ... See more
[...] Today’s cutting edge IR systems are not fundamentally different than classical IR systems developed many decades ago. Indeed, a majority of today’s systems boil down to: (a) building an efficient queryable index for each document in the corpus, (b) retrieving a set of candidates for a given query, and (c) computing a relevance score for each ... See more