ehartford/dolphin · Datasets at Hugging Face

This dataset is an attempt to replicate the results of Microsoft's Orca

Our dataset consists of:

~1 million of FLANv2 augmented with GPT-4 completions (flan1m-alpaca-uncensored.jsonl)

~3.5 million of FLANv2 augmented with GPT-3.5 completions (flan5m-alpaca-uncensored.jsonl)

We followed the submix and system prompt distribution outlined in the Orca... See more