To replace indexes with a single, consolidated model, it must be possible for the model itself to have knowledge about the universe of document identifiers, in the same way that traditional indexes do. One way to accomplish this is to move away from traditional LMs and towards corpus models that jointly model term-term, term-document, and... See more
[...] Today’s cutting edge IR systems are not fundamentally different than classical IR systems developed many decades ago. Indeed, a majority of today’s systems boil down to: (a) building an efficient queryable index for each document in the corpus, (b) retrieving a set of candidates for a given query, and (c) computing a relevance score for each... See more
Pre-trained language models (LM), by contrast, are capable of directly generating prose that may be responsive to an information need, but at present they are *dilettantes* rather than domain experts – they do not have a true understanding of the world, they are prone to hallucinating, and crucially they are incapable of justifying their utterances... See more
- The problem is one of content. The misconception is that without deep content, design is reduced to pure style, a bag of dubious tricks. In graphic-design circles, form-follows-function is reconfigured as form-follows-content. If content is the source of form, always preceding it and imbuing it with meaning, form without content (as if that were... See more
If all of these research ambitions were to come to fruition, the resulting system would be a very early version of the system that we envisioned in the introduction. That is, the resulting system would be able to provide domain expert answers to a wide range of information needs in a way that neither modern IR systems, question answering systems,... See more