The what and why of Dagster | Dagster Docs
The last core data stack tool is the orchestrator. It’s used quickly as a data orchestrator to model dependencies between tasks in complex heterogeneous cloud environments end-to-end. It is integrated with above-mentioned open data stack tools. They are especially effective if you have some glue code that needs to be run on a certain cadence, trigg... See more
Data Engineering • The Open Data Stack Distilled into Four Core Tools
Nicolay Gerold added
So what abstractions do we have as of today? For example, let’s take the resource abstraction (Dagster, Prefect, referred to as an operator in Airflow). You abstract complex environments and connections away with a simple construct like that. You have the immediate benefits of defining that once and using it in every task or pipeline with context.r... See more
Data Engineering • Data Orchestration Trends: The Shift From Data Pipelines to Data Products
Nicolay Gerold added
Overview¶
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow.
Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and Kube... See more
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow.
Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and Kube... See more
Overview — Apache Arrow Ballista documentation
Nicolay Gerold added
Data bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more
DuckDB Doesn’t Need Data To Be a Database
Nicolay Gerold added
(1) The separation between storage and compute , as encouraged by data lake architectures (e.g. the implementation of P would look different in a traditional database like PostgreSQL, or a cloud warehouse like Snowflake). This architecture is the focus of the current system, and it is prevalent in most mid-to-large enterprises (its benefits that be... See more
Jacopo Tagliabue • Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.
Nicolay Gerold added
The toolkit available for DAOs is expanding rapidly and it can be overwhelming to determine where to start. As a first stop, we recommend defining the phases of your organization’s work. We had 4 initial phases of work when starting on CabinDAO:Phase 1: Recruiting the communityPhase 2: Securing initial fundsPhase 3: Communicating with membersPhase ... See more
CabinDAO • How to DAO 101: Choosing a Tech Stack for CabinDAO
sari added
We found the ML engineering workflow to revolve around the following stages (Figure 1): (1) Data Preparation , which includes scheduled data acquisition, cleaning, labeling, and trans-formation, (2) Experimentation , which includes both data-driven and model-driven changes to increase overall ML performance, and is typically measured by metrics suc... See more
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
Nicolay Gerold added