Data Storage
Data bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more
DuckDB Doesn’t Need Data To Be a Database
For low throughput data, Grab uses Parquet with Copy on Write (CoW) .
Here's the main operations for Copy on Write:
Here's the main operations for Copy on Write:
- Write Operations - Whenever there's a write, you create a new version of the file that includes the latest change. You can also keep the previous version for consistency and rollback purposes. This helps prevent data corruption,
The Architecture of Grab's Data Lake
Classwords are suffixes added to database column names to indicate the type of data they contain. This improves readability and makes it easier to understand the database schema. Base classwords include text, calendar, numeric and domain-specific types. It is best to avoid redundancy in column names, as this can lead to unnecessary verbosity. Using... See more
Gemini - chat to supercharge your ideas
Text Classwords
identifier (or id)
code[_<standard>]
name
description (or desc)
indicator (or ind)
number
text
Calendar Classwords
date
datetime[<timezone>] (or dt[<timezone>])
timestamp[<timezone>] (or ts[<timezone>])
Numeric Classwords
count
amount[_<currency>]
<quantity_property>[_<unit_of_measure>]
ratio
factor
percent (or pct)
Domain-Specific Classwords
uri
address
email
sku
json
geojson
Denormalization
Another way Reddit minimizes joins is by using denormalization.
They took all the metadata fields required for displaying an image post and put them together into a single JSONB field. Instead of fetching different fields and combining them, they can just fetch that single JSONB field.
This made it much more efficient to fetch all the... See more
Another way Reddit minimizes joins is by using denormalization.
They took all the metadata fields required for displaying an image post and put them together into a single JSONB field. Instead of fetching different fields and combining them, they can just fetch that single JSONB field.
This made it much more efficient to fetch all the... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette
SQL Studio
Single binary, single command SQL database explorer. SQL studio supports SQLite , libSQL , PostgreSQL , MySQL and DuckDB .
Local SQLite DB File
sql-studio sqlite [sqlite_db]
Remote libSQL Server
sql-studio libsql [url] [auth_token]
PostgreSQL Server
sql-studio postgres [url]
MySQL/MariaDB Server
sql-studio mysql [url]
Local DuckDB File
sq... See more
Single binary, single command SQL database explorer. SQL studio supports SQLite , libSQL , PostgreSQL , MySQL and DuckDB .
Local SQLite DB File
sql-studio sqlite [sqlite_db]
Remote libSQL Server
sql-studio libsql [url] [auth_token]
PostgreSQL Server
sql-studio postgres [url]
MySQL/MariaDB Server
sql-studio mysql [url]
Local DuckDB File
sq... See more
frectonz • GitHub - frectonz/sql-studio: SQL Database Explorer [SQLite, libSQL, PostgreSQL, MySQL/MariaDB, DuckDB, ClickHouse]
SQL has limitations as it is built on relational concepts and relies on binary joins.
The future of databases is shifting towards relational knowledge graphs, allowing the flexibility to work with various data structures beyond tables.
Businesses are moving towards explicitly modeling business semantics and logic, which are often stored in... See more
The future of databases is shifting towards relational knowledge graphs, allowing the flexibility to work with various data structures beyond tables.
Businesses are moving towards explicitly modeling business semantics and logic, which are often stored in... See more
Nicolay Gerold • Tweet
We can't share the exact formula for our search ranking, but here are the few parameters we consider:
- Exact match (rank #1)
- Frequency of matching lexemes using ts_rank
- Similarity score using similarity
- Type of record
- Popularity of the search result
- Similarity between the result’s alias and query
- Inverse of the result’s string length
How Levels.fyi Built Scalable Search with PostgreSQL
Rottnest : Data Lake Indices
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index... See more
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index... See more