Data Storage
This README is under construction as we work to build a new community driven high performance key-value store.
This project was forked from the open source Redis project right before the transition to their new source available licenses.
This README is just a fast quick start document. We are currently working on a more permanent documentation page.
W... See more
This project was forked from the open source Redis project right before the transition to their new source available licenses.
This README is just a fast quick start document. We are currently working on a more permanent documentation page.
W... See more
GitHub - valkey-io/valkey: A new project to resume development on the formerly open-source Redis project. We're calling it Valkey, like a Valkyrie.
WebDataset
WebDataset is a library for writing I/O pipelines for large datasets. Its sequential I/O and sharding features make it especially useful for streaming large-scale datasets to a DataLoader.
The WebDataset format
A WebDataset file is a TAR archive containing a series of data files. All successive data files with the same prefix are... See more
WebDataset is a library for writing I/O pipelines for large datasets. Its sequential I/O and sharding features make it especially useful for streaming large-scale datasets to a DataLoader.
The WebDataset format
A WebDataset file is a TAR archive containing a series of data files. All successive data files with the same prefix are... See more
WebDataset
Expose Delta Tables via REST APIs
Git repo to test 3 architectures to expose delta tables via REST APIs. See also my blogpost here. Architectures can be described as follows:
Git repo to test 3 architectures to expose delta tables via REST APIs. See also my blogpost here. Architectures can be described as follows:
- Architecture A: Direct, Web App with DuckDB. In this architecture, APIs are directly connecting to the delta table and there is no layer in between. This implies that all data
GitHub - rebremer/expose-deltatable-via-restapi
memary: Open-Source Longterm Memory for Autonomous Agents
memary demo
Why use memary?
Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve... See more
memary demo
Why use memary?
Agents use LLMs that are currently constrained to finite context windows. memary overcomes this limitation by allowing your agents to store a large corpus of information in knowledge graphs, infer user knowledge through our memory modules, and only retrieve... See more
GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.
Data
We can't share the exact formula for our search ranking, but here are the few parameters we consider:
- Exact match (rank #1)
- Frequency of matching lexemes using ts_rank
- Similarity score using similarity
- Type of record
- Popularity of the search result
- Similarity between the result’s alias and query
- Inverse of the result’s string length
How Levels.fyi Built Scalable Search with PostgreSQL
PGlite - Postgres in WASM
PGlite is a WASM Postgres build packaged into a TypeScript client library that enables you to run Postgres in the browser, Node.js and Bun, with no need to install any other dependencies. It is only 3.7mb gzipped.
import { PGlite } from "@electric-sql/pglite"
const db = new PGlite()
await db.query("select 'Hello world' as... See more
PGlite is a WASM Postgres build packaged into a TypeScript client library that enables you to run Postgres in the browser, Node.js and Bun, with no need to install any other dependencies. It is only 3.7mb gzipped.
import { PGlite } from "@electric-sql/pglite"
const db = new PGlite()
await db.query("select 'Hello world' as... See more
electric-sql • GitHub - electric-sql/pglite: Lightweight Postgres packaged as WASM into a TypeScript library for the browser, Node.js, Bun and Deno
Overview
pg_lakehouse is an extension that transforms Postgres into an analytical query engine over object stores like S3 and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance. Combinations of the following object stores, table formats, and file formats are supported.
Object... See more
pg_lakehouse is an extension that transforms Postgres into an analytical query engine over object stores like S3 and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance. Combinations of the following object stores, table formats, and file formats are supported.
Object... See more
https://github.com/paradedb/paradedb/tree/dev/pg_l...
- Scalability is crucial - systems need to be designed with the assumption that query volume, document corpus size, indexing complexity etc. could increase by 10x. What works at one scale may completely break at a higher scale.
- Sharding the index, either by document or by word, is important to distribute the indexing and querying load across machines.
Claude
Search
more
/
Sub-second search & analytics
engine on cloud storage
with
less
more
/
Sub-second search & analytics
engine on cloud storage
with
less