GitHub - spiceai/spiceai: A unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
An in-process SQL OLAP database management system
duckdb.orgData bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more
DuckDB Doesn’t Need Data To Be a Database
Nicolay Gerold added
The language is always only as good as its community. Let’s look at some of the existing open-source tools and frameworks built in and around Rust:
- DataFusion based on Apache Arrow: Apache Arrow DataFusion SQL Query Engine similar to Spark
- Polars: It’s a faster Pandas. Probably going to compete with DuckDB (?)
- Delta Lake Rust: A native Rust library fo
Data Engineering • Rust for Data Engineering
Nicolay Gerold added
VectorDB-recipes
Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
- These are built using LanceDB, a free, open-source, serverless vectorDB that requires no setup .
- It integrates into python data ecosystem so you can simply start using these
lancedb • GitHub - lancedb/vectordb-recipes: High quality resources & applications for LLMs, multi-modal models and VectorDBs
Nicolay Gerold added
pg_vectorize: a VectorDB for Postgres
A Postgres extension that automates the transformation and orchestration of text to embeddings and provides hooks into the most popular LLMs. This allows you to do vector search and build LLM applications on existing data with as little as two function calls.
This project relies heavily on the work by pgvector f... See more
A Postgres extension that automates the transformation and orchestration of text to embeddings and provides hooks into the most popular LLMs. This allows you to do vector search and build LLM applications on existing data with as little as two function calls.
This project relies heavily on the work by pgvector f... See more
GitHub - tembo-io/pg_vectorize: The simplest way to orchestrate vector search on Postgres
Nicolay Gerold added
LanceDB
LanceDB is an open-source vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale multi-modal data. The core of LanceDB is written in Rust 🦀 and is built on top of Lance, an open-source columnar data format designed for performant ML workloads and fast random access.
Both the database and the un... See more
LanceDB is an open-source vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale multi-modal data. The core of LanceDB is written in Rust 🦀 and is built on top of Lance, an open-source columnar data format designed for performant ML workloads and fast random access.
Both the database and the un... See more
LanceDB - LanceDB
Nicolay Gerold added
Overview¶
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow.
Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and Kube... See more
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow.
Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and Kube... See more
Overview — Apache Arrow Ballista documentation
Nicolay Gerold added
Overview
pg_lakehouse is an extension that transforms Postgres into an analytical query engine over object stores like S3 and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance. Combinations of the following object stores, table formats, and file formats are supported.
Object ... See more
pg_lakehouse is an extension that transforms Postgres into an analytical query engine over object stores like S3 and table formats like Delta Lake. Queries are pushed down to Apache DataFusion, which delivers excellent analytical performance. Combinations of the following object stores, table formats, and file formats are supported.
Object ... See more
https://github.com/paradedb/paradedb/tree/dev/pg_l...
Nicolay Gerold added