GitHub - getomni-ai/zerox: Zero shot pdf OCR with gpt-4o-mini
LlamaOCR.com – Document to markdown
llamaocr.comDaniel added
Распознавание чеков
Surya
Surya is a document OCR toolkit that does:
Surya is a document OCR toolkit that does:
- OCR in 90+ languages that benchmarks favorably vs cloud services
- Line-level text detection in any language
- Layout analysis (table, image, header, etc detection)
- Reading order detection
GitHub - VikParuchuri/surya: OCR, layout analysis, reading order, line detection in 90+ languages
Nicolay Gerold added
一款高性能非结构化数据提取工具:extractous,比unstructured-io快25倍,支持微软Office、PDF、网页、图片、电子书、邮件等多种格式 可以从文档中提取文字内容,支持图片和扫描文档的文字识别,支持元数据提取,能自动识别文档类型 本地运行,支持批量处理 github:https://t.co/glfTUbTXhA... See more
Merchury Charon added
Indexify - Extraction and Retrieval from Videos, PDF and Audio for Interactive AI Applications
Indexify is an open-source engine for buidling fast data pipelines for unstructured data(video, audio, images and documents) using re-usable extractors for embedding, transformatio... See more
LLM applications backed by Indexify will never answer outdated information.
Indexify is an open-source engine for buidling fast data pipelines for unstructured data(video, audio, images and documents) using re-usable extractors for embedding, transformatio... See more
tensorlakeai • GitHub - tensorlakeai/indexify: A scalable realtime and continuous indexing engine for Unstructured Data to build Generative AI Applications
Nicolay Gerold added
Merchury Charon added
PDF OCR - Recognize text - easily, online, free
tools.pdf24.orgAlex Dobrenko and added
Merchury Charon added