GitHub - marsupialtail/rottnest: Data lake indices
There are numerous integrations for vector storage. These include Alibaba Cloud OpenSearch, AnalyticDB for PostgreSQL, Meta AI’s Annoy library for Approximate Nearest Neighbor (ANN) search, Cassandra, Chroma, Elasticsearch, Facebook AI Similarity Search (Faiss), MongoDB Atlas Vector Search, PGVector as a vector similarity search for Postgres, Pinec
... See moreBen Auffarth • Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs
Fast full-text search engine library written in Rust
If you are looking for an alternative to Elasticsearch or Apache Solr, check out Quickwit, our distributed search engine built on top of Tantivy.
Tantivy is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crat... See more
If you are looking for an alternative to Elasticsearch or Apache Solr, check out Quickwit, our distributed search engine built on top of Tantivy.
Tantivy is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crat... See more
GitHub - quickwit-oss/tantivy: Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
Nicolay Gerold added
Indexify - Extraction and Retrieval from Videos, PDF and Audio for Interactive AI Applications
Indexify is an open-source engine for buidling fast data pipelines for unstructured data(video, audio, images and documents) using re-usable extractors for embedding, transformatio... See more
LLM applications backed by Indexify will never answer outdated information.
Indexify is an open-source engine for buidling fast data pipelines for unstructured data(video, audio, images and documents) using re-usable extractors for embedding, transformatio... See more
tensorlakeai • GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications
Nicolay Gerold added
Indexify is a reactive structured extraction engine for un-structured data.
Applications leveraging LLMs for autonomous planning or queries necessitate timely index updates aligned with data changes or new extraction methods. Indexify enables both, by applying feature extractors on data in real-time and updating one or many indexes.
Why use Indexify
Applications leveraging LLMs for autonomous planning or queries necessitate timely index updates aligned with data changes or new extraction methods. Indexify enables both, by applying feature extractors on data in real-time and updating one or many indexes.
Why use Indexify
tensorlakeai • GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications
Nicolay Gerold added
Navigating the terrain of vector databases in 2023 reveals a diverse array of options each catering to different needs. The comparison table paints a clear picture, but here's a succinct summary to aid your decision:
- Open-Source and hosted cloud : If you lean towards open-source solutions, Weviate, Milvus, and Chroma emerge as top contenders. Pinec
Picking a vector database: a comparison and guide for 2023
Nicolay Gerold added
latest research on text embeddings!
TLDR: Vector databases are NOT safe. Text embeddings can be inverted. We can do this exactly for sentence-length inputs and get very close with paragraphs...
TLDR: Vector databases are NOT safe. Text embeddings can be inverted. We can do this exactly for sentence-length inputs and get very close with paragraphs...
jack morris • Tweet
Nicolay Gerold added
They also released a really clean Python library for doing embedding inversion (https://github.com/jxmorris12/vec2text/) and some models for inverting openAI ada 2 embeddings.
LanceDB
LanceDB is an open-source vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale multi-modal data. The core of LanceDB is written in Rust 🦀 and is built on top of Lance, an open-source columnar data format designed for performant ML workloads and fast random access.
Both the database and the un... See more
LanceDB is an open-source vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale multi-modal data. The core of LanceDB is written in Rust 🦀 and is built on top of Lance, an open-source columnar data format designed for performant ML workloads and fast random access.
Both the database and the un... See more
LanceDB - LanceDB
Nicolay Gerold added