Sublime
An inspiration engine for ideas
(1) The separation between storage and compute , as encouraged by data lake architectures (e.g. the implementation of P would look different in a traditional database like PostgreSQL, or a cloud warehouse like Snowflake). This architecture is the focus of the current system, and it is prevalent in most mid-to-large enterprises (its benefits that be... See more
Jacopo Tagliabue • Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.
Data Transformation
Built For Growth
Don't hack custom scripts or use half-baked tools. SQLMesh ensures accurate and efficient data pipelines with the most complete DataOps solution for transformation, testing, and collaboration.
Built For Growth
Don't hack custom scripts or use half-baked tools. SQLMesh ensures accurate and efficient data pipelines with the most complete DataOps solution for transformation, testing, and collaboration.
SQLMesh
Let us briefly touch on the concepts of data science and data engineering. If we go back to the DIKW triangle, we can say that data science focuses on extracting knowledge and wisdom from the information we have. Data scientists combine tools from mathematics and statistics to analyze information to arrive at insights. Exponentially increasing amou
... See moreMurat Erder • Continuous Architecture in Practice: Software Architecture in the Age of Agility and DevOps (Addison-Wesley Signature Series (Vernon))
We found the ML engineering workflow to revolve around the following stages (Figure 1): (1) Data Preparation , which includes scheduled data acquisition, cleaning, labeling, and trans-formation, (2) Experimentation , which includes both data-driven and model-driven changes to increase overall ML performance, and is typically measured by metrics suc... See more
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
ETL
The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.
We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.
Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more
The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.
We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.
Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more
Bill Mill • notes.billmill.org
ETL
The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.
We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.
Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more
The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.
We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.
Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more
Bill Mill • notes.billmill.org
The platform needs to facilitate integrating new data, ad hoc queries, and visualization to accelerate human understanding. As valuable insights emerge from this platform, they become the requirements for changes to production systems and processes.
Thomas H. Davenport • Big Data at Work: Dispelling the Myths, Uncovering the Opportunities
You need to be able to draw insight and to structure that information such that people can then act on it. What is doing that? Today, transformation tools like dbt are doing that, if you take the lens of the data team really owning everything end-to-end, but I think also applications that are able to plug into the data warehouse, consume this raw i... See more