Data Storage
Overview
pg_lakehouse is an extension that transforms Postgres into an analytical query engine over object stores like S3 and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance. Combinations of the following object stores, table formats, and file formats are supported.
Object... See more
pg_lakehouse is an extension that transforms Postgres into an analytical query engine over object stores like S3 and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance. Combinations of the following object stores, table formats, and file formats are supported.
Object... See more
https://github.com/paradedb/paradedb/tree/dev/pg_l...
Spice.ai OSS
What is Spice?
Spice is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and... See more
What is Spice?
Spice is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and... See more
spiceai • GitHub - spiceai/spiceai: A unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
For High Throughput data, Grab uses Apache Avro with a strategy called Merge on Read (MOR) .
Here's the main operations with Merge on Read:
Here's the main operations with Merge on Read:
- Write Operations - When data is written, it's appended to the end of a log file. This is much more efficient than merging it in the current data and reduces the latency of writes.
- Read Operations - When you need
The Architecture of Grab's Data Lake
Data bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more
DuckDB Doesn’t Need Data To Be a Database
Denormalization
Another way Reddit minimizes joins is by using denormalization.
They took all the metadata fields required for displaying an image post and put them together into a single JSONB field. Instead of fetching different fields and combining them, they can just fetch that single JSONB field.
This made it much more efficient to fetch all the... See more
Another way Reddit minimizes joins is by using denormalization.
They took all the metadata fields required for displaying an image post and put them together into a single JSONB field. Instead of fetching different fields and combining them, they can just fetch that single JSONB field.
This made it much more efficient to fetch all the... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
pgmock
Demo — Discord
pgmock is an in-memory PostgreSQL mock server for unit and E2E tests. It requires no external dependencies and runs entirely within WebAssembly on both Node.js and the browser.
Installation
npm install pgmock
If you'd like to run pgmock in a browser, see the Browser support section for detailed instructions.
Demo — Discord
pgmock is an in-memory PostgreSQL mock server for unit and E2E tests. It requires no external dependencies and runs entirely within WebAssembly on both Node.js and the browser.
Installation
npm install pgmock
If you'd like to run pgmock in a browser, see the Browser support section for detailed instructions.
stackframe-projects • GitHub - stackframe-projects/pgmock: In-memory Postgres for unit/E2E tests
ReadySet is a transparent database cache for Postgres & MySQL that gives you the performance and scalability of an in-memory key-value store without requiring that you rewrite your app or manually handle cache invalidation. ReadySet sits between your application and database and turns even the most complex SQL reads into lightning-fast lookups.... See more
readysettech • GitHub - readysettech/readyset: Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the...
We can't share the exact formula for our search ranking, but here are the few parameters we consider:
- Exact match (rank #1)
- Frequency of matching lexemes using ts_rank
- Similarity score using similarity
- Type of record
- Popularity of the search result
- Similarity between the result’s alias and query
- Inverse of the result’s string length
How Levels.fyi Built Scalable Search with PostgreSQL
memary: Open-Source Longterm Memory for Autonomous Agents
memary demo
Why use memary?
Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve... See more
memary demo
Why use memary?
Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve... See more
GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.
Data