finally published our latest research on text embeddings!
TLDR: Vector databases are NOT safe. 😳 Text embeddings can be inverted. We can do this exactly for sentence-length inputs and get very close with paragraphs...
finally published our latest research on text embeddings! TLDR: Vector databases are NOT safe. 😳 Text embeddings can be inverted. We can do this exactly for sentence-length inputs and get very close with paragraphs...

But regular end-users, even engineers, cannot assess safety based on model weights. And prompt injection remains a large attack vector.
Zach Tratar • Tweet
Open source, high-throughput, fault-tolerant vector embedding pipeline
Simple API endpoint that ingests large volumes of raw data, processes, and stores or returns the vectors quickly and reliably
Simple API endpoint that ingests large volumes of raw data, processes, and stores or returns the vectors quickly and reliably
dgarnitz • GitHub - dgarnitz/vectorflow: VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
Vector databases are widely used in NLP tasks such as sentiment analysis, text classification, and semantic search. By representing text as vector embeddings, it becomes easier to compare and analyze textual data.
Ben Auffarth • Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs
A serverless vector database
built from first principles on object storage: 10-100x cheaper, usage-based pricing, massive scalability
built from first principles on object storage: 10-100x cheaper, usage-based pricing, massive scalability
turbopuffer
- Cohere introduced Embed v3, an advanced model for generating document embeddings, boasting top performance on a few benchmarks. It excels in matching document topics to queries and content quality, improving search applications and retrieval-augmentation generation (RAG) systems. The new version offers models with 1024 or 384 dimensions, supports o