GitHub - nomic-ai/nomic: Interact, analyze and structure massive text, image, embedding, audio and video datasets

GitHub - nomic-ai/nomic: Interact, analyze and structure massive text, image, embedding, audio and video datasets

github.com
Thumbnail of GitHub - nomic-ai/nomic: Interact, analyze and structure massive text, image, embedding, audio and video datasets

GitHub - NVIDIA/NeMo-Curator: Scalable toolkit for data curation

Unstructured-IO GitHub - Unstructured-IO/unstructured: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.