Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

NEW: SoTA OCR model - Apache 2.0 licensed based on Qwen 2.5 VL 🔥 https://t.co/YhDbwyHCvP

Thumbnail of www-x-com-yacinemtb-status-1841513508204380482-8edab11427054823

@karpathy andrej, you need to check out media pipe (link in next post) https://t.co/qmKq5w45UD

Using @moondreamai + OpenCV Optical Flow I built promptable, realtime object tracking for robots. As simple as: python tracking_publisher.py --prompt "person" Source code: https://t.co/PoGWWdbKU6 https://t.co/KZzpYvUelh

Ben C x.com

Beyond Text: The Rise of Vision-Driven Document Retrieval for RAG. ColPali is probably one of the most significant innovations in complex document retrieval, so I did a deep dive into ColPali and the ViDoRe benchmark. https://t.co/Ree8Y1HdS6

Jo Kristian Bergum x.com

Computer vision

Imran Yussuff • 2 cards

Thumbnail of www-x-com-tom-doerr-status-1876955866554916969-50dbd1f73a1545d9

InternVL is an open-source multimodal large language model integrating a VisionTransformer (ViT) with a large language model (LLM), supporting high-resolutionimage processing, multi-image and video data, and various multimodal tasks likeOCR, document QA, and cross-modal dialogue https://t.co/LdMDCSoq5B

Tom Dörr

x.com

Open-source OCR engine https://t.co/jaZ0NdfTEh

Tom Dörr

x.com

Video and image annotation tool for computer vision projects https://t.co/Y3D8GpWFfb

Tom Dörr

x.com

UC Berkeley's "Modern Computer Vision" Lecture Videos: https://t.co/6edGcb3bYf https://t.co/igCVUJGFh3

Math Cafe

x.com