Data Storage
A serverless vector database
built from first principles on object storage: 10-100x cheaper, usage-based pricing, massive scalability
built from first principles on object storage: 10-100x cheaper, usage-based pricing, massive scalability
turbopuffer
Data bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more
DuckDB Doesn’t Need Data To Be a Database
WebDataset
WebDataset is a library for writing I/O pipelines for large datasets. Its sequential I/O and sharding features make it especially useful for streaming large-scale datasets to a DataLoader.
The WebDataset format
A WebDataset file is a TAR archive containing a series of data files. All successive data files with the same prefix are... See more
WebDataset is a library for writing I/O pipelines for large datasets. Its sequential I/O and sharding features make it especially useful for streaming large-scale datasets to a DataLoader.
The WebDataset format
A WebDataset file is a TAR archive containing a series of data files. All successive data files with the same prefix are... See more
WebDataset
VectorDB-recipes
Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
- These are built using LanceDB, a free, open-source, serverless vectorDB that requires no setup .
- It integrates into python data ecosystem so you can simply start using these in
lancedb • GitHub - lancedb/vectordb-recipes: High quality resources & applications for LLMs, multi-modal models and VectorDBs
Getting Started
Keyv is a simple key-value storage system that supports multiple backends. It's designed to be a simple and consistent way to work with key-value stores.
To learn how to use Keyv, check out the keyv README. To learn how to use a specific storage adapter, check out the README for that adapter under Storage Adapters.
Keyv is a simple key-value storage system that supports multiple backends. It's designed to be a simple and consistent way to work with key-value stores.
To learn how to use Keyv, check out the keyv README. To learn how to use a specific storage adapter, check out the README for that adapter under Storage Adapters.
jaredwray • GitHub - jaredwray/keyv: Simple key-value storage with support for multiple backends
Who will this data model serve? These are the stakeholders and users of the data model.
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for... See more
Why does this data model need to be built? What is the purpose and objective of the data model?
What are the data model’s core entities, attributes, and relationships? This is where you learn to see the business through the lens of data.
When is the timeframe for... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Local database for development
Each table in the database had an accompanying script that would generate a subset of the data for use in local development, since the final database was too large to run on a developer's machine.
This let each developer work with a live, local, copy of the database and enabled efficient development of changes.
I highly... See more
Each table in the database had an accompanying script that would generate a subset of the data for use in local development, since the final database was too large to run on a developer's machine.
This let each developer work with a live, local, copy of the database and enabled efficient development of changes.
I highly... See more
Bill Mill • notes.billmill.org
With Quary, engineers can:
View the documentation.
- 🔌 Connect to their Database
- 📖 Write SQL queries to transform, organize, and document tables in a database
- 📊 Create charts, dashboards and reports (in development)
- 🧪 Test, collaborate & refactor iteratively through version control
- 🚀 Deploy the organised, documented model back up to the database
View the documentation.
GitHub - quarylabs/quary: Open-source BI for engineers
For High Throughput data, Grab uses Apache Avro with a strategy called Merge on Read (MOR) .
Here's the main operations with Merge on Read:
Here's the main operations with Merge on Read:
- Write Operations - When data is written, it's appended to the end of a log file. This is much more efficient than merging it in the current data and reduces the latency of writes.
- Read Operations - When you need