
GitHub - marsupialtail/rottnest: Data lake indices

Towhee is a cutting-edge framework designed to streamline the processing of unstructured data through the use of Large Language Model (LLM) based pipeline orchestration. It is uniquely positioned to extract invaluable insights from diverse unstructured data types, including lengthy text, images, audio and video files. Leveraging the capabilities of... See more
towhee-io • GitHub - towhee-io/towhee: Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
latest research on text embeddings!
TLDR: Vector databases are NOT safe. Text embeddings can be inverted. We can do this exactly for sentence-length inputs and get very close with paragraphs...
TLDR: Vector databases are NOT safe. Text embeddings can be inverted. We can do this exactly for sentence-length inputs and get very close with paragraphs...
jack morris • Tweet
GitHub - virattt/ai-hedge-fund: An AI Hedge Fund Team
github.com
There are numerous integrations for vector storage. These include Alibaba Cloud OpenSearch, AnalyticDB for PostgreSQL, Meta AI’s Annoy library for Approximate Nearest Neighbor (ANN) search, Cassandra, Chroma, Elasticsearch, Facebook AI Similarity Search (Faiss), MongoDB Atlas Vector Search, PGVector as a vector similarity search for Postgres, Pinec
... See moreBen Auffarth • Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs
- Multiple indices. Splitting the document corpus up into multiple indices and then routing queries based on some criteria. This means that the search is over a much smaller set of documents rather than the entire dataset. Again, it is not always useful, but it can be helpful for certain datasets. The same approach works with the LLMs themselves.
- Cu