Data Loading

Bill Mill notes.billmill.org

GitHub - VikParuchuri/surya: OCR, layout analysis, reading order, line detection in 90+ languages

Stability and scalability for search

tensorlakeai GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications

GitHub - Stirling-Tools/Stirling-PDF: #1 Locally hosted web application that allows you to perform various operations on PDF files

Bap Our 5 favourite open-source customer data platforms

Bap Our 5 favourite open-source customer data platforms

jina-ai jina-ai/reader: Convert any URL to an LLM-friendly input ... - GitHub

Filimoa GitHub - Filimoa/open-parse: Improved file parsing for LLM’s

CambioML GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...