Data Storage
SQLite Studio
Single binary, single command SQLite database explorer.
sqlite-studio <sqlite_db>
Features
More features available on the r... See more
Single binary, single command SQLite database explorer.
sqlite-studio <sqlite_db>
Features
- Overview page with common metadata.
- Tables page with each table's metadata, including the disk size being used by each table.
- Infinite scroll rows view.
- A custom query page that gives you more access to your db.
More features available on the r... See more
GitHub - frectonz/sqlite-studio: SQLite database explorer
We can't share the exact formula for our search ranking, but here are the few parameters we consider:
- Exact match (rank #1)
- Frequency of matching lexemes using ts_rank
- Similarity score using similarity
- Type of record
- Popularity of the search result
- Similarity between the result’s alias and query
- Inverse of the result’s string length
How Levels.fyi Built Scalable Search with PostgreSQL
Local database for development
Each table in the database had an accompanying script that would generate a subset of the data for use in local development, since the final database was too large to run on a developer's machine.
This let each developer work with a live, local, copy of the database and enabled efficient development of changes.
I highly... See more
Each table in the database had an accompanying script that would generate a subset of the data for use in local development, since the final database was too large to run on a developer's machine.
This let each developer work with a live, local, copy of the database and enabled efficient development of changes.
I highly... See more
Bill Mill • notes.billmill.org
It turns out there's a handy feature in PostgreSQL called row constructor comparisons that allows me to compare tuples of columns. That's exactly what we need. Instead of doing CreateAt > ?1 OR (CreateAt = ?1 AND Id > ?2) , we can do ( CreateAt, Id) > (?1, ?2) . And the row constructor comparisons are lexicographical, meaning that it's... See more
Making a Postgres query 1,000 times faster
Who will this data model serve? These are the stakeholders and users of the data model.
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for... See more
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
For High Throughput data, Grab uses Apache Avro with a strategy called Merge on Read (MOR) .
Here's the main operations with Merge on Read:
Here's the main operations with Merge on Read:
- Write Operations - When data is written, it's appended to the end of a log file. This is much more efficient than merging it in the current data and reduces the latency of writes.
- Read Operations - When you need
The Architecture of Grab's Data Lake
Spice.ai OSS
What is Spice?
Spice is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and... See more
What is Spice?
Spice is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and... See more
spiceai • GitHub - spiceai/spiceai: A unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
Rottnest : Data Lake Indices
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index... See more
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index... See more
Ziheng Wang • GitHub - marsupialtail/rottnest: Data lake indices
WebDataset
WebDataset is a library for writing I/O pipelines for large datasets. Its sequential I/O and sharding features make it especially useful for streaming large-scale datasets to a DataLoader.
The WebDataset format
A WebDataset file is a TAR archive containing a series of data files. All successive data files with the same prefix are... See more
WebDataset is a library for writing I/O pipelines for large datasets. Its sequential I/O and sharding features make it especially useful for streaming large-scale datasets to a DataLoader.
The WebDataset format
A WebDataset file is a TAR archive containing a series of data files. All successive data files with the same prefix are... See more