GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Unitxt is a python library for getting data fired up and set for utilization. In one line of code, it preps a dataset or mixtures-of-datasets into an input-output format for training and evaluation. We aspire to be simple, adaptable and transparent.
Unitxt builds on separation. Separation allows adding a dataset, without knowing anything about the... See more
Unitxt builds on separation. Separation allows adding a dataset, without knowing anything about the... See more
IBM • GitHub - IBM/unitxt: 🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
Pool
pooldata.io
