Data Processing

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Nicolay Gerold added 10mo

datafold GitHub - datafold/data-diff: Compare tables within or across databases

Nicolay Gerold added 10mo

How Levels.fyi Built Scalable Search with PostgreSQL

Nicolay Gerold added 7mo

databonsai GitHub - databonsai/databonsai: clean & curate your data with LLMs.

Nicolay Gerold added 7mo

The programmable data streaming platform

Nicolay Gerold added 9mo

Why you should move your ETL stack to Modal

Nicolay Gerold added 7mo

Jacopo Tagliabue Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.

Nicolay Gerold added 7mo

Overview — Apache Arrow Ballista documentation

Nicolay Gerold added 9mo

Bap Our 5 favourite open-source customer data platforms

Nicolay Gerold added 7mo

dlt-hub GitHub - dlt-hub/dlt: data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Nicolay Gerold added 9mo