Data Storage
We can't share the exact formula for our search ranking, but here are the few parameters we consider:
- Exact match (rank #1)
- Frequency of matching lexemes using ts_rank
- Similarity score using similarity
- Type of record
- Popularity of the search result
- Similarity between the result’s alias and query
- Inverse of the result’s string length
How Levels.fyi Built Scalable Search with PostgreSQL
Denormalization
Another way Reddit minimizes joins is by using denormalization.
They took all the metadata fields required for displaying an image post and put them together into a single JSONB field. Instead of fetching different fields and combining them, they can just fetch that single JSONB field.
This made it much more efficient to fetch all the ... See more
Another way Reddit minimizes joins is by using denormalization.
They took all the metadata fields required for displaying an image post and put them together into a single JSONB field. Instead of fetching different fields and combining them, they can just fetch that single JSONB field.
This made it much more efficient to fetch all the ... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
- Scalability is crucial - systems need to be designed with the assumption that query volume, document corpus size, indexing complexity etc. could increase by 10x. What works at one scale may completely break at a higher scale.
- Sharding the index, either by document or by word, is important to distribute the indexing and querying load across machines.
Claude
VectorDB-recipes
Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
- These are built using LanceDB, a free, open-source, serverless vectorDB that requires no setup .
- It integrates into python data ecosystem so you can simply start using these
lancedb • GitHub - lancedb/vectordb-recipes: High quality resources & applications for LLMs, multi-modal models and VectorDBs
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette
Overview
pg_lakehouse is an extension that transforms Postgres into an analytical query engine over object stores like S3 and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance. Combinations of the following object stores, table formats, and file formats are supported.
Object ... See more
pg_lakehouse is an extension that transforms Postgres into an analytical query engine over object stores like S3 and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance. Combinations of the following object stores, table formats, and file formats are supported.
Object ... See more
https://github.com/paradedb/paradedb/tree/dev/pg_l...
7 must-know strategies to scale your database
Indexing:
Check the query patterns of your application and create the right indexes.
Materialized Views:
Pre-compute complex query results and store them for faster access.
Denormalization:
Reduce complex joins to improve query performance.
Vertical Scaling
Boost your database server by adding more CPU, RAM, or... See more
Indexing:
Check the query patterns of your application and create the right indexes.
Materialized Views:
Pre-compute complex query results and store them for faster access.
Denormalization:
Reduce complex joins to improve query performance.
Vertical Scaling
Boost your database server by adding more CPU, RAM, or... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
ReadySet is a transparent database cache for Postgres & MySQL that gives you the performance and scalability of an in-memory key-value store without requiring that you rewrite your app or manually handle cache invalidation. ReadySet sits between your application and database and turns even the most complex SQL reads into lightning-fast lookups.... See more
readysettech • GitHub - readysettech/readyset: Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the...
Getting Started
Keyv is a simple key-value storage system that supports multiple backends. It's designed to be a simple and consistent way to work with key-value stores.
To learn how to use Keyv, check out the keyv README. To learn how to use a specific storage adapter, check out the README for that adapter under Storage Adapters.
Keyv is a simple key-value storage system that supports multiple backends. It's designed to be a simple and consistent way to work with key-value stores.
To learn how to use Keyv, check out the keyv README. To learn how to use a specific storage adapter, check out the README for that adapter under Storage Adapters.