GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications
uniflow provides a unified LLM interface to extract and transform and raw documents.
- Document types: Uniflow enables data extraction from PDFs, HTMLs and TXTs.
- LLM agnostic: Uniflow supports most common-used LLMs for text tranformation, including
- OpenAI models (GPT3.5 and GPT4),
- Google Gemini models (Gemini 1.5, MultiModal),
- AWS BedRock models,
- Huggingf
CambioML • GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...
Nicolay Gerold added
Rottnest : Data Lake Indices
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index typ... See more
You don't need ElasticSearch or some vector database to do full text search or vector search. Parquet + Rottnest is all you need. Rottnest is like Postgres indices for Parquet. Read more on what it can do for e.g. logs here.
Installation
Local installation: pip install rottnest .
Rottnest supports many different index typ... See more
Ziheng Wang • GitHub - marsupialtail/rottnest: Data lake indices
Nicolay Gerold added
Due to the nature of the most modern ANN vector search algorithms, incrementally updating a vector index is a massive challenge. This is a well known “hard problem”. The issue here is that these indexes are carefully organized for fast lookups and any attempt to incrementally update them with new vectors will rapidly deteriorate the fast lookup pro... See more
6 Hard Problems Scaling Vector Search
Nicolay Gerold added
The ideal solution for AI-native vectorDB would be something that would would be easy to set up and should integrate with existing APIs for rapid prototyping but should be able to scale without additional changes.
LanceDB is designed with this approach. Being server-less, it requires no setup — just import and start using. Persisted in HDD, allowing... See more
LanceDB is designed with this approach. Being server-less, it requires no setup — just import and start using. Persisted in HDD, allowing... See more
Ayush Chaurasia • LLMs, RAG, & the missing storage layer for AI
Nicolay Gerold added
You’ve got a vector database that has all the right database fundamentals you require, has the right incremental indexing strategy for your use case, has a good story around your metadata filtering needs, and will keep its index up-to-date with latencies you can tolerate. Awesome.
Your ML team (or maybe OpenAI) comes out with a new version of their... See more
Your ML team (or maybe OpenAI) comes out with a new version of their... See more
6 Hard Problems Scaling Vector Search
Nicolay Gerold added
Index
index-space.orgChris Neels added
Index provides space for the exchange of knowledge and tools. We nurture trust within the creative community through generosity and abundance of ideas and care.
Related to: Twenty Nine design and technology studio https://www.xxix.co
Evan Conrad • Where are all the crypto use cases? | Evan Conrad
Jason Badeaux added
Dense Discovery Index
index.densediscovery.comDave King added
Dense Discovery index. lists of previous things from newsletters