Data Processing

Why you should move your ETL stack to Modal

GitHub - Nike-Inc/koheesio: Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

dgarnitz GitHub - dgarnitz/vectorflow: VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.

SQLMesh

Data Engineering Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Why you should move your ETL stack to Modal

Instill AI

Jacopo Tagliabue Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.

hatchet-dev GitHub - hatchet-dev/hatchet: A distributed, fault-tolerant task queue