Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.

Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.

Reproducible data science is enabled through Bauplan and Nessie, providing time-travel and branching semantics on data lakes, decoupling compute from data management.

arxiv.org

DuckDB Doesn’t Need Data To Be a Database

nikolasgoebel.com
Thumbnail of DuckDB Doesn’t Need Data To Be a Database

Datasets as Imagination

Lila Shroffjoinreboot.org
Thumbnail of Datasets as Imagination

Data Engineering Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Arend van Beelen jr. Post-Architecture: Premature Abstraction Is the Root of All Evil

Data Engineering The Open Data Stack Distilled into Four Core Tools

Shreya Shankar "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.