Data
Who will this data model serve? These are the stakeholders and users of the data model.
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for t... See more
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for t... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Local database for development
Each table in the database had an accompanying script that would generate a subset of the data for use in local development, since the final database was too large to run on a developer's machine.
This let each developer work with a live, local, copy of the database and enabled efficient development of changes.
I highly... See more
Each table in the database had an accompanying script that would generate a subset of the data for use in local development, since the final database was too large to run on a developer's machine.
This let each developer work with a live, local, copy of the database and enabled efficient development of changes.
I highly... See more
Bill Mill • notes.billmill.org
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette
Optimizing Further
Creating so many indices and aggregating so many tables is sub-optimal. To optimize this, we employ materialized views, which create a separate disk-based entity and hence support indexing. The only downside is that we have to keep it updated.
CREATE MATERIALIZED VIEW search_view AS
ᅠᅠSELECT c.name FROM company c UNION
ᅠᅠSELECT c.na... See more
Creating so many indices and aggregating so many tables is sub-optimal. To optimize this, we employ materialized views, which create a separate disk-based entity and hence support indexing. The only downside is that we have to keep it updated.
CREATE MATERIALIZED VIEW search_view AS
ᅠᅠSELECT c.name FROM company c UNION
ᅠᅠSELECT c.na... See more
How Levels.fyi Built Scalable Search with PostgreSQL
Snowflake is easy to use. As you mature and scale, Databricks becomes a strong competitor capable of more use cases. I consider both to have a reasonable moat against AI/LLMs, which is data governance with data lineage. I expect that moat to last at least a few years... even though most companies aren’t mature enough to have a strong data governanc... See more
r/dataengineering - Reddit
My top issues with the foundations of the "modern data stack":
- Snowflake - you may only pay for what you use, but you pay through the nose for it. The cost is so high that in many cases it can exceed the cost of binders full of dbas.
- Fivetran - convenient, but they tried to triple my licensing cost last year, they have almost constant 15 minute o
r/dataengineering - Reddit
- Bob Muglia has the best definition. Others simply miss essential parts. The MDS isn’t just about open source or dbt; it is about SaaS, Cloud, Snowflake, and more. It is the wrapper around the progress in analytics over the last years.
- You should try to go for a 100% SaaS MDS . But try not to build up too many dependencies (yes, that’s possible; yo
Sven Balnojan • Breaking Down the Modern Data Stack: Practical Insights for Leveraging Analytics Progress
Ideas related to this collection