Data Storage
pg_vectorize: a VectorDB for Postgres
A Postgres extension that automates the transformation and orchestration of text to embeddings and provides hooks into the most popular LLMs. This allows you to do vector search and build LLM applications on existing data with as little as two function calls.
This project relies heavily on the work by pgvector... See more
A Postgres extension that automates the transformation and orchestration of text to embeddings and provides hooks into the most popular LLMs. This allows you to do vector search and build LLM applications on existing data with as little as two function calls.
This project relies heavily on the work by pgvector... See more
GitHub - tembo-io/pg_vectorize: The simplest way to orchestrate vector search on Postgres
SQLite Studio
Single binary, single command SQLite database explorer.
sqlite-studio <sqlite_db>
Features
More features available on the r... See more
Single binary, single command SQLite database explorer.
sqlite-studio <sqlite_db>
Features
- Overview page with common metadata.
- Tables page with each table's metadata, including the disk size being used by each table.
- Infinite scroll rows view.
- A custom query page that gives you more access to your db.
More features available on the r... See more
GitHub - frectonz/sqlite-studio: SQLite database explorer
For High Throughput data, Grab uses Apache Avro with a strategy called Merge on Read (MOR) .
Here's the main operations with Merge on Read:
Here's the main operations with Merge on Read:
- Write Operations - When data is written, it's appended to the end of a log file. This is much more efficient than merging it in the current data and reduces the latency of writes.
- Read Operations - When you need
The Architecture of Grab's Data Lake
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette
Who will this data model serve? These are the stakeholders and users of the data model.
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for... See more
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Data bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more
DuckDB Doesn’t Need Data To Be a Database
7 must-know strategies to scale your database
Indexing:
Check the query patterns of your application and create the right indexes.
Materialized Views:
Pre-compute complex query results and store them for faster access.
Denormalization:
Reduce complex joins to improve query performance.
Vertical Scaling
Boost your database server by adding more CPU, RAM, or... See more
Indexing:
Check the query patterns of your application and create the right indexes.
Materialized Views:
Pre-compute complex query results and store them for faster access.
Denormalization:
Reduce complex joins to improve query performance.
Vertical Scaling
Boost your database server by adding more CPU, RAM, or... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
At the current pace of media content creation, Reddit expects their media metadata to be roughly 50 terabytes. This means they need to implement sharding and partition their tables across multiple Postgres instances.
Reddit shards their tables based on post_id where they use range-based partitioning. All posts with a post_id in a certain range will... See more
Reddit shards their tables based on post_id where they use range-based partitioning. All posts with a post_id in a certain range will... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
pgmock
Demo — Discord
pgmock is an in-memory PostgreSQL mock server for unit and E2E tests. It requires no external dependencies and runs entirely within WebAssembly on both Node.js and the browser.
Installation
npm install pgmock
If you'd like to run pgmock in a browser, see the Browser support section for detailed instructions.
Demo — Discord
pgmock is an in-memory PostgreSQL mock server for unit and E2E tests. It requires no external dependencies and runs entirely within WebAssembly on both Node.js and the browser.
Installation
npm install pgmock
If you'd like to run pgmock in a browser, see the Browser support section for detailed instructions.