DeepSeek Coder
Text embeddings are a critical piece of many pipelines, from search, to RAG, to vector databases and more. Most embedding models are BERT/Transformer-based and typically have short context lengths (e.g., 512). Thatโs only about two pages of text, but documents can be very long โ books, legal cases, TV screenplays, code repositories, etc can be tens... See more
Long-Context Retrieval Models with Monarch Mixer
Nicolay Gerold added
Matei Zaharia, Omar Khattab, Lingjiao Chen, et al. โข The Shift From Models to Compound AI Systems
Nicolay Gerold added
The text embedding set trained by Jina AI, Finetuner team.
Intended Usage & Model Info
jina-embeddings-v2-base-en is an English, monolingual embedding model supporting 8192 sequence length.
It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length.
The backbone jina-bert-v2-... See more
Intended Usage & Model Info
jina-embeddings-v2-base-en is an English, monolingual embedding model supporting 8192 sequence length.
It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length.
The backbone jina-bert-v2-... See more
jinaai/jina-embeddings-v2-base-en ยท Hugging Face
Nicolay Gerold added
Deep-ML
deep-ml.com๐บ๐ฒ๐๐ต๐ผ๐ฑ๐ ๐ผ๐ณ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ถ๐ป๐ด ๐ฎ๐ป ๐ผ๐ฝ๐ฒ๐ป-๐๐ผ๐๐ฟ๐ฐ๐ฒ ๐๐๐ ๐ฒ๐
๐ถ๐t โ
- ๐๐ฐ๐ฏ๐ต๐ช๐ฏ๐ถ๐ฆ๐ฅ ๐ฑ๐ณ๐ฆ-๐ต๐ณ๐ข๐ช๐ฏ๐ช๐ฏ๐จ: utilize domain-specific data to apply the same pre-training process (next token prediction) on the pre-trained (base) model
- ๐๐ฏ๐ด๐ต๐ณ๐ถ๐ค๐ต๐ช๐ฐ๐ฏ ๐ง๐ช๐ฏ๐ฆ-๐ต๐ถ๐ฏ๐ช๐ฏ๐จ: the pre-trained (base) model is fine-tuned on ... See more
- ๐๐ฐ๐ฏ๐ต๐ช๐ฏ๐ถ๐ฆ๐ฅ ๐ฑ๐ณ๐ฆ-๐ต๐ณ๐ข๐ช๐ฏ๐ช๐ฏ๐จ: utilize domain-specific data to apply the same pre-training process (next token prediction) on the pre-trained (base) model
- ๐๐ฏ๐ด๐ต๐ณ๐ถ๐ค๐ต๐ช๐ฐ๐ฏ ๐ง๐ช๐ฏ๐ฆ-๐ต๐ถ๐ฏ๐ช๐ฏ๐จ: the pre-trained (base) model is fine-tuned on ... See more
Shortwave โ rajhesh.panchanadhan@gmail.com [Gmail alternative]
Nicolay Gerold added
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. This project is being actively updated and maintained, and we will periodically enhance and add more features and data recipes. We welcome you to join us in pro... See more
Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. This project is being actively updated and maintained, and we will periodically enhance and add more features and data recipes. We welcome you to join us in pro... See more
alibaba โข GitHub - alibaba/data-juicer: A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! ๐ ๐ ๐ฝ โก๏ธ โก๏ธ๐ธ ๐น ๐ทไธบๅคง่ฏญ่จๆจกๅๆไพๆด้ซ่ดจ้ใๆดไธฐๅฏใๆดๆโๆถๅโ็ๆฐๆฎ๏ผ
Nicolay Gerold added
Fine-Tuning for LLM Research by AI Hero
This repo contains the code that will be run inside the container. Alternatively, this code can also be run natively. The container is built and pushed to the repo using Github actions (see below). You can launch the fine tuning job using the examples in the https://github.com/ai-hero/llm-research-examples pr... See more
This repo contains the code that will be run inside the container. Alternatively, this code can also be run natively. The container is built and pushed to the repo using Github actions (see below). You can launch the fine tuning job using the examples in the https://github.com/ai-hero/llm-research-examples pr... See more
GitHub - ai-hero/llm-research-fine-tuning
Nicolay Gerold added