Sublime
An inspiration engine for ideas
Often several copies and
David Uhlman • Hacking Healthcare: A Guide to Standards, Workflows, and Meaningful Use
My top issues with the foundations of the "modern data stack":
- Snowflake - you may only pay for what you use, but you pay through the nose for it. The cost is so high that in many cases it can exceed the cost of binders full of dbas.
- Fivetran - convenient, but they tried to triple my licensing cost last year, they have almost constant 15 minute o
r/dataengineering - Reddit
Delete anomaly When an attempt is made to modify (update, add, or delete from) instances, undesired side effects may follow.
John P Reilly • Implementing the TM Forum Information Framework (SID)
Participants noted that the impact on models was hard to assess when the ground truth involved live data—for example, Sm2 felt strongly about the negative impact of feedback delays on their ML pipelines: I have no idea how well [models] actually perform on live data. Feedback is always delayed by at least 2 weeks. Sometimes we might not have feedba... See more
Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
you work in manufacturing, there may well be data that is thrown off by manufacturing devices that you aren’t using to optimize your process.
Thomas H. Davenport • Big Data at Work: Dispelling the Myths, Uncovering the Opportunities
Bad data gives us false negatives (thinking the idea is dead when it’s not) and—more dangerously—false positives (convincing yourself you’re right when you’re not).
Rob Fitzpatrick • The Mom Test: How to talk to customers & learn if your business is a good idea when everyone is lying to you
Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles
nature.com