jinaai/jina-embeddings-v2-base-en · Hugging Face
Jina-ColBERT-v2 is here. https://t.co/FvBbeXsftS Superior retrieval performance vs the original ColBERT-v2 from @stanfordnlp (+6.5%) & our previous jina-colbert-v1-en(+5.4%). Multilingual support for 89 languages and programming languages. User-controlled output embedding sizes (128/96/64-dim) through Matryoshka representation learning, and finally... See more
Jina AIx.comLarge language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. EasyLM can scale up LLM training to hundreds of TPU/GPU accelerators by leveraging JAX's pjit functionality.
Building on top of Hugginface's transformers and datasets, this repo provides an easy to use and easy... See more
Building on top of Hugginface's transformers and datasets, this repo provides an easy to use and easy... See more
young-geng • GitHub - young-geng/EasyLM: Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
Text embeddings are a critical piece of many pipelines, from search, to RAG, to vector databases and more. Most embedding models are BERT/Transformer-based and typically have short context lengths (e.g., 512). That’s only about two pages of text, but documents can be very long – books, legal cases, TV screenplays, code repositories, etc can be tens... See more