Data Processing

datafold GitHub - datafold/data-diff: Compare tables within or across databases

Instill AI

How Levels.fyi Built Scalable Search with PostgreSQL

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Why you should move your ETL stack to Modal

The programmable data streaming platform

dlt-hub GitHub - dlt-hub/dlt: data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Overview — Apache Arrow Ballista documentation

tobymao GitHub - tobymao/sqlglot: Python SQL Parser and Transpiler