Sublime
An inspiration engine for ideas
NEW: SoTA OCR model - Apache 2.0 licensed based on Qwen 2.5 VL 🔥
https://t.co/YhDbwyHCvP
Vaibhav (VB) Srivastavx.comBeyond Text: The Rise of Vision-Driven Document Retrieval for RAG.
ColPali is probably one of the most significant innovations in complex document retrieval, so I did a deep dive into ColPali and the ViDoRe benchmark.
https://t.co/Ree8Y1HdS6
Jo Kristian Bergumx.comComputer vision
Imran Yussuff • 2 cards

InternVL is an open-source multimodal large language model integrating a VisionTransformer (ViT) with a large language model (LLM), supporting high-resolutionimage processing, multi-image and video data, and various multimodal tasks likeOCR, document QA, and cross-modal dialogue https://t.co/LdMDCSoq5B



