Data Processing

Data Engineering Data Orchestration Trends: The Shift From Data Pipelines to Data Products

towhee-io GitHub - towhee-io/towhee: Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

spiceai GitHub - spiceai/spiceai: A unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.

Overview — Apache Arrow Ballista documentation

datafold GitHub - datafold/data-diff: Compare tables within or across databases

Data Engineering The Open Data Stack Distilled into Four Core Tools

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Data Engineering The Open Data Stack Distilled into Four Core Tools

Jacopo Tagliabue Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.