Data Storage
Rottnest : Data Lake Indices
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index... See more
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index... See more
Ziheng Wang • GitHub - marsupialtail/rottnest: Data lake indices
PGlite - Postgres in WASM
PGlite is a WASM Postgres build packaged into a TypeScript client library that enables you to run Postgres in the browser, Node.js and Bun, with no need to install any other dependencies. It is only 3.7mb gzipped.
import { PGlite } from "@electric-sql/pglite"
const db = new PGlite()
await db.query("select 'Hello world' as... See more
PGlite is a WASM Postgres build packaged into a TypeScript client library that enables you to run Postgres in the browser, Node.js and Bun, with no need to install any other dependencies. It is only 3.7mb gzipped.
import { PGlite } from "@electric-sql/pglite"
const db = new PGlite()
await db.query("select 'Hello world' as... See more
electric-sql • GitHub - electric-sql/pglite: Lightweight Postgres packaged as WASM into a TypeScript library for the browser, Node.js, Bun and Deno
filesystem_spec
A specification for pythonic filesystems.
Install
pip install fsspec
would install the base fsspec. Various optionally supported features might require specification of custom extra require, e.g. pip install fsspec[ssh] will install dependencies for ssh backends support. Use pip install fsspec[full] for installation of all known... See more
A specification for pythonic filesystems.
Install
pip install fsspec
would install the base fsspec. Various optionally supported features might require specification of custom extra require, e.g. pip install fsspec[ssh] will install dependencies for ssh backends support. Use pip install fsspec[full] for installation of all known... See more
fsspec • GitHub - fsspec/filesystem_spec: A specification that python filesystems should adhere to.
For High Throughput data, Grab uses Apache Avro with a strategy called Merge on Read (MOR) .
Here's the main operations with Merge on Read:
Here's the main operations with Merge on Read:
- Write Operations - When data is written, it's appended to the end of a log file. This is much more efficient than merging it in the current data and reduces the latency of writes.
- Read Operations - When you need
The Architecture of Grab's Data Lake
Denormalization
Another way Reddit minimizes joins is by using denormalization.
They took all the metadata fields required for displaying an image post and put them together into a single JSONB field. Instead of fetching different fields and combining them, they can just fetch that single JSONB field.
This made it much more efficient to fetch all the... See more
Another way Reddit minimizes joins is by using denormalization.
They took all the metadata fields required for displaying an image post and put them together into a single JSONB field. Instead of fetching different fields and combining them, they can just fetch that single JSONB field.
This made it much more efficient to fetch all the... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Data bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more
DuckDB Doesn’t Need Data To Be a Database
We can't share the exact formula for our search ranking, but here are the few parameters we consider:
- Exact match (rank #1)
- Frequency of matching lexemes using ts_rank
- Similarity score using similarity
- Type of record
- Popularity of the search result
- Similarity between the result’s alias and query
- Inverse of the result’s string length
How Levels.fyi Built Scalable Search with PostgreSQL
memary: Open-Source Longterm Memory for Autonomous Agents
memary demo
Why use memary?
Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve... See more
memary demo
Why use memary?
Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve... See more
GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.
Data
pgmock
Demo — Discord
pgmock is an in-memory PostgreSQL mock server for unit and E2E tests. It requires no external dependencies and runs entirely within WebAssembly on both Node.js and the browser.
Installation
npm install pgmock
If you'd like to run pgmock in a browser, see the Browser support section for detailed instructions.
Demo — Discord
pgmock is an in-memory PostgreSQL mock server for unit and E2E tests. It requires no external dependencies and runs entirely within WebAssembly on both Node.js and the browser.
Installation
npm install pgmock
If you'd like to run pgmock in a browser, see the Browser support section for detailed instructions.