Data Loading

Core Concepts | Airbyte Documentation

Data Engineering The Open Data Stack Distilled into Four Core Tools

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Filimoa GitHub - Filimoa/open-parse: Improved file parsing for LLM’s

jina-ai jina-ai/reader: Convert any URL to an LLM-friendly input ... - GitHub

Instill AI

google GitHub - google/magika: Detect file content types with deep learning

samuelcolvin GitHub - samuelcolvin/watchfiles: Simple, modern and fast file watching and code reload in python.

The Warehouse Native Customer Data Platform