Data Storage
Rottnest : Data Lake Indices
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index... See more
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index... See more
Ziheng Wang • GitHub - marsupialtail/rottnest: Data lake indices
Unlike some other popular algorithms, DiskANN is designed to keep memory usage to a minimum. This makes it a great match for use cases where Turso already excels at.
#Multitenancy
Turso allows for an easy implementation of a database-per-tenant pattern, where databases can be cheaply created on-demand. Keeping memory consumption at bay is critical... See more
#Multitenancy
Turso allows for an easy implementation of a database-per-tenant pattern, where databases can be cheaply created on-demand. Keeping memory consumption at bay is critical... See more
Turso brings Native Vector Search to SQLite
PGlite - Postgres in WASM
PGlite is a WASM Postgres build packaged into a TypeScript client library that enables you to run Postgres in the browser, Node.js and Bun, with no need to install any other dependencies. It is only 3.7mb gzipped.
import { PGlite } from "@electric-sql/pglite"
const db = new PGlite()
await db.query("select 'Hello world' as... See more
PGlite is a WASM Postgres build packaged into a TypeScript client library that enables you to run Postgres in the browser, Node.js and Bun, with no need to install any other dependencies. It is only 3.7mb gzipped.
import { PGlite } from "@electric-sql/pglite"
const db = new PGlite()
await db.query("select 'Hello world' as... See more
electric-sql • GitHub - electric-sql/pglite: Lightweight Postgres packaged as WASM into a TypeScript library for the browser, Node.js, Bun and Deno
For High Throughput data, Grab uses Apache Avro with a strategy called Merge on Read (MOR) .
Here's the main operations with Merge on Read:
Here's the main operations with Merge on Read:
- Write Operations - When data is written, it's appended to the end of a log file. This is much more efficient than merging it in the current data and reduces the latency of writes.
- Read Operations - When you need
The Architecture of Grab's Data Lake
VectorDB-recipes
Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
- These are built using LanceDB, a free, open-source, serverless vectorDB that requires no setup .
- It integrates into python data ecosystem so you can simply start using these in
lancedb • GitHub - lancedb/vectordb-recipes: High quality resources & applications for LLMs, multi-modal models and VectorDBs
It turns out there's a handy feature in PostgreSQL called row constructor comparisons that allows me to compare tuples of columns. That's exactly what we need. Instead of doing CreateAt > ?1 OR (CreateAt = ?1 AND Id > ?2) , we can do ( CreateAt, Id) > (?1, ?2) . And the row constructor comparisons are lexicographical, meaning that it's... See more
Making a Postgres query 1,000 times faster
ReadySet is a transparent database cache for Postgres & MySQL that gives you the performance and scalability of an in-memory key-value store without requiring that you rewrite your app or manually handle cache invalidation. ReadySet sits between your application and database and turns even the most complex SQL reads into lightning-fast lookups.... See more
readysettech • GitHub - readysettech/readyset: Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the...
Getting Started
Keyv is a simple key-value storage system that supports multiple backends. It's designed to be a simple and consistent way to work with key-value stores.
To learn how to use Keyv, check out the keyv README. To learn how to use a specific storage adapter, check out the README for that adapter under Storage Adapters.
Keyv is a simple key-value storage system that supports multiple backends. It's designed to be a simple and consistent way to work with key-value stores.
To learn how to use Keyv, check out the keyv README. To learn how to use a specific storage adapter, check out the README for that adapter under Storage Adapters.
jaredwray • GitHub - jaredwray/keyv: Simple key-value storage with support for multiple backends
Expose Delta Tables via REST APIs
Git repo to test 3 architectures to expose delta tables via REST APIs. See also my blogpost here. Architectures can be described as follows:
Git repo to test 3 architectures to expose delta tables via REST APIs. See also my blogpost here. Architectures can be described as follows:
- Architecture A: Direct, Web App with DuckDB. In this architecture, APIs are directly connecting to the delta table and there is no layer in between. This implies that all data