Models
pair-preference-model-LLaMA3-8B by RLHFlow: Really strong reward model, trained to take in two inputs at once, which is the top open reward model on RewardBench (beating one of Cohere’s).
DeepSeek-V2 by deepseek-ai (21B active, 236B total param.): Another strong MoE base model from the DeepSeek team. Some people are questioning the very high MMLU sc... See more
DeepSeek-V2 by deepseek-ai (21B active, 236B total param.): Another strong MoE base model from the DeepSeek team. Some people are questioning the very high MMLU sc... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Nicolay Gerold added 6mo
DiscoLM German 7B v1 - GGUF
Model creator: Disco Research
Original model: DiscoLM German 7B v1
Description
This repo contains GGUF format model files for Disco Research's DiscoLM German 7B v1.
These files were quantised using hardware kindly provided by Massed Compute.
About GGUF
GGUF is a new format introduced by the llama.cpp team on August 21st 2023. I... See more
Model creator: Disco Research
Original model: DiscoLM German 7B v1
Description
This repo contains GGUF format model files for Disco Research's DiscoLM German 7B v1.
These files were quantised using hardware kindly provided by Massed Compute.
About GGUF
GGUF is a new format introduced by the llama.cpp team on August 21st 2023. I... See more
TheBloke/DiscoLM_German_7b_v1-GGUF · Hugging Face
Nicolay Gerold added 9mo
Stable Beluga 2
Use Stable Chat (Research Preview) to test Stability AI's best language models for free
Model Description
Stable Beluga 2 is a Llama2 70B model finetuned on an Orca style Dataset
Use Stable Chat (Research Preview) to test Stability AI's best language models for free
Model Description
Stable Beluga 2 is a Llama2 70B model finetuned on an Orca style Dataset
stabilityai/StableBeluga2 · Hugging Face
Nicolay Gerold added 10mo
Model description
Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM.
The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks.
... See more
Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM.
The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks.
... See more
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO · Hugging Face
Nicolay Gerold added 10mo
Glaive-coder-7b
Glaive-coder-7b is a 7B parameter code model trained on a dataset of ~140k programming related problems and solutions generated from Glaive’s synthetic data generation platform.
The model is fine-tuned on the CodeLlama-7b model.
Usage:
The model is trained to act as a code assistant, and can do both single instruction following and mult... See more
Glaive-coder-7b is a 7B parameter code model trained on a dataset of ~140k programming related problems and solutions generated from Glaive’s synthetic data generation platform.
The model is fine-tuned on the CodeLlama-7b model.
Usage:
The model is trained to act as a code assistant, and can do both single instruction following and mult... See more
glaiveai/glaive-coder-7b · Hugging Face
Nicolay Gerold added 10mo
AI That Quacks: Introducing DuckDB-NSQL-7B, A LLM for DuckDB2024/01/25BY Till Döhmen and Jordan TiganiSubscribe to MotherDuck BlogE-mailAlso subscribe to other MotherDuck updatesSubmitWhat does a database have to do with AI, anyway?After a truly new technology arrives, it makes the future a lot harder to predict. The one thing you can be sure of is... See more
Till Döhmen • AI That Quacks: Introducing DuckDB-NSQL-7B, A LLM for DuckDB
Nicolay Gerold added 10mo
Text embeddings are a critical piece of many pipelines, from search, to RAG, to vector databases and more. Most embedding models are BERT/Transformer-based and typically have short context lengths (e.g., 512). That’s only about two pages of text, but documents can be very long – books, legal cases, TV screenplays, code repositories, etc can be tens... See more
Long-Context Retrieval Models with Monarch Mixer
Nicolay Gerold added 10mo
ColBERT is a
fast
and
accurate
retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See more
fast
and
accurate
retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction : it encod... See more
stanford-futuredata • GitHub - stanford-futuredata/ColBERT: Stanford ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22)
Nicolay Gerold added 10mo
One of the focus areas at Together Research is new architectures for long context, improved training, and inference performance over the Transformer architecture. Spinning out of a research program from our team and academic collaborators, with roots in signal processing-inspired sequence models, we are excited to introduce the StripedHyena models.... See more
Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers
Nicolay Gerold added 1y
multimodal-maestro
👋 hello
Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!
👋 hello
Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!
roboflow • GitHub - roboflow/multimodal-maestro: Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Nicolay Gerold added 1y