GitHub - quickwit-oss/tantivy: Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
Rottnest : Data Lake Indices
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index typ... See more
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index typ... See more
Ziheng Wang • GitHub - marsupialtail/rottnest: Data lake indices
Nicolay Gerold added
The language is always only as good as its community. Let’s look at some of the existing open-source tools and frameworks built in and around Rust:
- DataFusion based on Apache Arrow: Apache Arrow DataFusion SQL Query Engine similar to Spark
- Polars: It’s a faster Pandas. Probably going to compete with DuckDB (?)
- Delta Lake Rust: A native Rust library fo
Data Engineering • Rust for Data Engineering
Nicolay Gerold added
Elasticsearch at Twitter
Elasticsearch is a search engine based on the Lucene library. It is a popular open source tool widely used in industry and is known for its distributed nature, speed, scalability, and simple REST APIs.
The Search Infrastructure team builds infrastructure to host search as a service. Since we are such a central infrastructur... See more
Elasticsearch is a search engine based on the Lucene library. It is a popular open source tool widely used in industry and is known for its distributed nature, speed, scalability, and simple REST APIs.
The Search Infrastructure team builds infrastructure to host search as a service. Since we are such a central infrastructur... See more
Stability and scalability for search
Nicolay Gerold added
orch
orch is a library for building language model powered applications and agents for the Rust programming language. It was primarily built for usage in magic-cli, but can be used in other contexts as well.
Note
If the project gains traction, this can be compiled as an addon to other languages such as Python or a standalone WebAssembly module.
Instal... See more
orch is a library for building language model powered applications and agents for the Rust programming language. It was primarily built for usage in magic-cli, but can be used in other contexts as well.
Note
If the project gains traction, this can be compiled as an addon to other languages such as Python or a standalone WebAssembly module.
Instal... See more
Guy Waldman • GitHub - guywaldman/orch: Rust framework for LLM orchestration
Nicolay Gerold added
Indexify - Extraction and Retrieval from Videos, PDF and Audio for Interactive AI Applications
Indexify is an open-source engine for buidling fast data pipelines for unstructured data(video, audio, images and documents) using re-usable extractors for embedding, transformatio... See more
LLM applications backed by Indexify will never answer outdated information.
Indexify is an open-source engine for buidling fast data pipelines for unstructured data(video, audio, images and documents) using re-usable extractors for embedding, transformatio... See more
tensorlakeai • GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications
Nicolay Gerold added
pgvector v0.5.0: Faster semantic search with HNSW indexes
supabase.comAndrés added
GitHub - sindresorhus/awesome: 😎 Awesome lists about all kinds of interesting topics
github.comJilber Najem and added
Indexify is a reactive structured extraction engine for un-structured data.
Applications leveraging LLMs for autonomous planning or queries necessitate timely index updates aligned with data changes or new extraction methods. Indexify enables both, by applying feature extractors on data in real-time and updating one or many indexes.
Why use Indexify
Applications leveraging LLMs for autonomous planning or queries necessitate timely index updates aligned with data changes or new extraction methods. Indexify enables both, by applying feature extractors on data in real-time and updating one or many indexes.
Why use Indexify
tensorlakeai • GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications
Nicolay Gerold added