GitHub - IBM/unitxt: 🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

GitHub - IBM/unitxt: 🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

github.com
Thumbnail of GitHub - IBM/unitxt: 🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

Gitingest

gitingest.com
Thumbnail of Gitingest

GitHub - anthropics/courses: Anthropic's educational courses

Anthropicgithub.com
Thumbnail of GitHub - anthropics/courses: Anthropic's educational courses

Unstructured-IO • GitHub - Unstructured-IO/unstructured: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

huggingface • GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

GitHub - run-llama/llama-hub: A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain