TabLib

pipizhao/Pandalyst-7B-V1.2 · Hugging Face

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face

GitHub - NVIDIA/NeMo-Curator: Scalable toolkit for data curation

Long-Context Retrieval Models with Monarch Mixer

kaistAI GitHub - kaistAI/CoT-Collection: [Under Review] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Darren LI added

Eric Siegel Predictive Analytics