Data Processing

hatchet-dev GitHub - hatchet-dev/hatchet: A distributed, fault-tolerant task queue

Data Engineering Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Why you should move your ETL stack to Modal

huggingface GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Bap Our 5 favourite open-source customer data platforms

The programmable data streaming platform

How Levels.fyi Built Scalable Search with PostgreSQL

databonsai GitHub - databonsai/databonsai: clean & curate your data with LLMs.

spiceai GitHub - spiceai/spiceai: A unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.