GitHub - databonsai/databonsai: clean & curate your data with LLMs.

alibaba GitHub - alibaba/data-juicer: A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!

Creative AI Lab

creative-ai.org
Thumbnail of Creative AI Lab

GitHub - FlowiseAI/Flowise: Drag & drop UI to build your customized LLM flow

github.com
Thumbnail of GitHub - FlowiseAI/Flowise: Drag & drop UI to build your customized LLM flow

Andrés added

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

GitHub - arthur-ai/bench: A tool for evaluating LLMs

BA Builder added

and added

GitHub - run-llama/llama-hub: A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain

GitHub - NVIDIA/NeMo-Curator: Scalable toolkit for data curation