OCR神器:Surya,支持90+种语言、布局分析、表格识别,性能媲美Google Cloud Vision、Tesseract,每页处理速度0.62秒
1、可以进行线条级别的文本检测
2、布局分析包括表格、图像、标题等
3、阅读顺序检测
4、表格识别,能够检测行和列
github:https://t.co/EDlOl6WZ1u... See more
There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things lik... See more
GitHub - transformerlab/transformerlab-app: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense!
The general logic:
Pass in a PDF (URL or file buffer)
Turn the PDF into a series of images
Pass each image to GPT and ask nicely for Markdown