GitHub - databonsai/databonsai: clean & curate your data with LLMs.

GitHub - Shubhamsaboo/awesome-llm-apps: Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

github.com
Thumbnail of GitHub - Shubhamsaboo/awesome-llm-apps: Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

alibaba • GitHub - alibaba/data-juicer: A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! šŸŽ šŸ‹ 🌽 āž”ļø āž”ļøšŸø šŸ¹ šŸ·äøŗå¤§čÆ­čØ€ęØ”åž‹ęä¾›ę›“é«˜č“Øé‡ć€ę›“äø°åÆŒć€ę›“ę˜“ā€ę¶ˆåŒ–ā€œēš„ę•°ę®ļ¼

GitHub - NVIDIA/NeMo-Curator: Scalable toolkit for data curation

CambioML • GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...