The what and why of Dagster

The what and why of Dagster | Dagster Docs

docs.dagster.io

RelatedHighlights

The last core data stack tool is the orchestrator. It’s used quickly as a data orchestrator to model dependencies between tasks in complex heterogeneous cloud environments end-to-end. It is integrated with above-mentioned open data stack tools. They are especially effective if you have some glue code that needs to be run on a certain cadence, trigg... See more

Data Engineering • The Open Data Stack Distilled into Four Core Tools

Nicolay Gerold added

So what abstractions do we have as of today? For example, let’s take the resource abstraction (Dagster, Prefect, referred to as an operator in Airflow). You abstract complex environments and connections away with a simple construct like that. You have the immediate benefits of defining that once and using it in every task or pipeline with context.r... See more

Data Engineering • Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Nicolay Gerold added

Overview¶

Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow.

Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and Kube... See more

Overview — Apache Arrow Ballista documentation

Nicolay Gerold added

Data bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more

DuckDB Doesn’t Need Data To Be a Database

Nicolay Gerold added

(1) The separation between storage and compute , as encouraged by data lake architectures (e.g. the implementation of P would look different in a traditional database like PostgreSQL, or a cloud warehouse like Snowflake). This architecture is the focus of the current system, and it is prevalent in most mid-to-large enterprises (its benefits that be... See more

Jacopo Tagliabue • Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.

Nicolay Gerold added

The toolkit available for DAOs is expanding rapidly and it can be overwhelming to determine where to start. As a first stop, we recommend defining the phases of your organization’s work. We had 4 initial phases of work when starting on CabinDAO:Phase 1: Recruiting the communityPhase 2: Securing initial fundsPhase 3: Communicating with membersPhase ... See more

CabinDAO • How to DAO 101: Choosing a Tech Stack for CabinDAO

sari added

We found the ML engineering workflow to revolve around the following stages (Figure 1): (1) Data Preparation , which includes scheduled data acquisition, cleaning, labeling, and trans-formation, (2) Experimentation , which includes both data-driven and model-driven changes to increase overall ML performance, and is typically measured by metrics suc... See more

Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.

Nicolay Gerold added

The Distributed Computing Manifesto

Amazon allthingsdistributed.com

and added