Sublime
An inspiration engine for ideas

So I wrote a 5400-word lecture note on the basics of data engineering for my students, covering:
* data formats (row- vs. column-based, text vs. binary)
* ETL
* batch processing vs. stream processing
* training datasets
WIP. Feedback much appreciated... See more
ETL
The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.
We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.
Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more
The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.
We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.
Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more
Bill Mill • notes.billmill.org
ETL
The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.
We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.
Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more
The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.
We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.
Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more
Bill Mill • notes.billmill.org

🚨BREAKING: New Python library for agentic data processing and ETL with AI
Introducing DocETL.
Here's what you need to know: https://t.co/94glNVRQfX


The Quantitative Trading Pipeline:
1. Data Collection
2. Data Cleaning
3. Feature Engineering
4. Machine Learning Training
5. Portfolio & Trade Strategy
6. Execute Orders & Monitor
Want help making this for yoursel... See more
Hex - Do more with data, together.
hex.tech
Let us briefly touch on the concepts of data science and data engineering. If we go back to the DIKW triangle, we can say that data science focuses on extracting knowledge and wisdom from the information we have. Data scientists combine tools from mathematics and statistics to analyze information to arrive at insights. Exponentially increasing amou
... See more