Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

Thumbnail of www-x-com-chipro-status-1357329955131191298-258ad1b7217d4879

So I wrote a 5400-word lecture note on the basics of data engineering for my students, covering: * data formats (row- vs. column-based, text vs. binary) * ETL * batch processing vs. stream processing * training datasets WIP. Feedback much... See more

Chip Huyen

x.com

ETL

The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process.

We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket.

Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code... See more

Bill Mill • notes.billmill.org

Thumbnail of www-x-com-mdancho84-status-1915727081217966094-73f9bda0855849cd

🚨BREAKING: New Python library for agentic data processing and ETL with AI Introducing DocETL. Here's what you need to know: https://t.co/94glNVRQfX

🔥 Matt Dancho (Business Science) 🔥

x.com

an end-to-end data stack in duckdb ETL is dead, long live ETV extract -> transform -> viz https://t.co/3dNqRyvapI

archie 🦋x.com

Thumbnail of www-x-com-tom-doerr-status-1874417661615984966-37bfcbe7cdfb47f6

Bruin is a data pipeline tool for ingesting, transforming, and quality-checkingdata using SQL, Python, and various platforms, with support for local machines,EC2 instances, and GitHub Actions https://t.co/rfsLShLbIA

Tom Dörr

x.com

"Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data" https://t.co/r9sai5bvQd

Tom Dörr

x.com

Hex - Do more with data, together.

hex.tech