Data Loading

Instill AI

Nicolay Gerold added 9mo

Unstructured-IO GitHub - Unstructured-IO/unstructured: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Nicolay Gerold added 10mo

Carbon | Data Connectors for LLMs

Nicolay Gerold added 9mo

Bill Mill notes.billmill.org

Nicolay Gerold added 4mo

jina-ai jina-ai/reader: Convert any URL to an LLM-friendly input ... - GitHub

Nicolay Gerold added 7mo

samuelcolvin GitHub - samuelcolvin/watchfiles: Simple, modern and fast file watching and code reload in python.

Nicolay Gerold added 9mo

tensorlakeai GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications

Nicolay Gerold added 10mo

WebDataset

Nicolay Gerold added 10mo

Bap Our 5 favourite open-source customer data platforms

Nicolay Gerold added 7mo

Filimoa GitHub - Filimoa/open-parse: Improved file parsing for LLM’s

Nicolay Gerold added 7mo