Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.

Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.

Reproducible data science is enabled through Bauplan and Nessie, providing time-travel and branching semantics on data lakes, decoupling compute from data management.

arxiv.org

Data Engineering Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Arend van Beelen jr. Post-Architecture: Premature Abstraction Is the Root of All Evil

Data Engineering The Open Data Stack Distilled into Four Core Tools

Shreya Shankar "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.

Real-time Machine Learning For Recommendations

Shreya Shankar "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.

https://github.com/paradedb/paradedb/tree/dev/pg_l...

DuckDB Doesn’t Need Data To Be a Database