Models

pair-preference-model-LLaMA3-8B by RLHFlow: Really strong reward model, trained to take in two inputs at once, which is the top open reward model on RewardBench (beating one of Cohere’s).

DeepSeek-V2 by deepseek-ai (21B active, 236B total param.): Another strong MoE base model from the DeepSeek team. Some people are questioning the very high MMLU sc... See more

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

DiscoLM German 7B v1 - GGUF

Model creator: Disco Research

Original model: DiscoLM German 7B v1

Description

This repo contains GGUF format model files for Disco Research's DiscoLM German 7B v1.

These files were quantised using hardware kindly provided by Massed Compute.

About GGUF

GGUF is a new format introduced by the llama.cpp team on August 21st 2023. I... See more

TheBloke/DiscoLM_German_7b_v1-GGUF · Hugging Face

Stable Beluga 2

Use Stable Chat (Research Preview) to test Stability AI's best language models for free

Model Description

Stable Beluga 2 is a Llama2 70B model finetuned on an Orca style Dataset

stabilityai/StableBeluga2 · Hugging Face

Model description

Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM.

The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks.

... See more

NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO · Hugging Face

Glaive-coder-7b

Glaive-coder-7b is a 7B parameter code model trained on a dataset of ~140k programming related problems and solutions generated from Glaive’s synthetic data generation platform.

The model is fine-tuned on the CodeLlama-7b model.

Usage:

The model is trained to act as a code assistant, and can do both single instruction following and mult... See more

glaiveai/glaive-coder-7b · Hugging Face

AI That Quacks: Introducing DuckDB-NSQL-7B, A LLM for DuckDB2024/01/25BY Till Döhmen and Jordan TiganiSubscribe to MotherDuck BlogE-mailAlso subscribe to other MotherDuck updatesSubmitWhat does a database have to do with AI, anyway?After a truly new technology arrives, it makes the future a lot harder to predict. The one thing you can be sure of is... See more

Till Döhmen • AI That Quacks: Introducing DuckDB-NSQL-7B, A LLM for DuckDB

Text embeddings are a critical piece of many pipelines, from search, to RAG, to vector databases and more. Most embedding models are BERT/Transformer-based and typically have short context lengths (e.g., 512). That’s only about two pages of text, but documents can be very long – books, legal cases, TV screenplays, code repositories, etc can be tens... See more

Long-Context Retrieval Models with Monarch Mixer

ColBERT is a

fast

and

accurate

retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.

As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See more

stanford-futuredata • GitHub - stanford-futuredata/ColBERT: Stanford ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22)

One of the focus areas at Together Research is new architectures for long context, improved training, and inference performance over the Transformer architecture. Spinning out of a research program from our team and academic collaborators, with roots in signal processing-inspired sequence models, we are excited to introduce the StripedHyena models.... See more