updated 1y ago
GitHub - MaartenGr/KeyBERT: Minimal keyword extraction with BERT
- ColBERT is a
fast
and
accurate
retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See morefrom GitHub - stanford-futuredata/ColBERT: Stanford ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22) by stanford-futuredata
Nicolay Gerold added
- Text embeddings are a critical piece of many pipelines, from search, to RAG, to vector databases and more. Most embedding models are BERT/Transformer-based and typically have short context lengths (e.g., 512). That’s only about two pages of text, but documents can be very long – books, legal cases, TV screenplays, code repositories, etc can be tens... See more
from Long-Context Retrieval Models with Monarch Mixer
Nicolay Gerold added
- Token Embeddings: These are vector representations of tokens, which can be characters, subwords, or other text units. Token embeddings are particularly useful in languages with complex morphology or when handling out-of-vocabulary words. Models like BERT use token embeddings to represent subword units.
Word Embeddings: These are dense vector represe... See morefrom Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Nicolay Gerold added