Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

Augen Pro

Thumbnail of www-x-com-datachaz-status-1984890994073182570-a6f79a23e19548cd

DeepSeek has just unveiled an OCR monster 🤯 DeepSeek-OCR is a 3B-parameter model that redefines document intelligence. It reaches 97% character-level accuracy with 10× input compression, preserving every detail. Most OCR systems require over 6,000 tokens per... See more

Charly Wargnier

x.com

oh my.. Nano Banana Pro is so crazy it can analyse any film scene and tell you how it was shot, what camera, lens & lighting used .. and even tell you exact settings, hows this even possible.. free to try on Higgsfield now 10 crazy examples:... See more

el.cine x.com

Thumbnail of www-x-com-shpigford-status-1802340727797497895-952a8fe8dc254622

i mean seriously. the amount of work previously required to get this data from an image was bonkers. OCR calls, GPT to try to get the OCR data in to something useable, huge computer vision models to identify the object...all replaced by a single call to an openai endpoint. https://t.co/bOQ8tPBp4y

Josh Pigford

x.com

Did you know that besides @Microsoft's OmniParser, @Apple just released weights for Ferret-UI? "A new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding and reasoning capabilities" Paper (with models, demo): https://t.co/mtNQtaVR4a... See more

Niels Rogge

x.com

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

Andrew Ng x.com

Excited to share our DepthCrafter, a super consistent video depth model for long open-world videos! Project webpage: https://t.co/9SiMUv4hoW https://t.co/TZBWnjEGmk

HU Wenbo x.com

Floorplans are all you need. Here's a video of our AI that reads floorplans. It reads room labels, finds dimensions, finds dimension lines, and locates doors and windows, all in under 2 minutes. There are companies with thousands of people who's entire job is to perform that same work.... See more

Barrett Ames x.com

Introducing Gaze-LLE, a new model for gaze target estimation built on top of a frozen visual foundation model! Gaze-LLE achieves SOTA results on multiple benchmarks while learning minimal parameters, and shows strong generalization paper: https://t.co/Is2NgrrurO https://t.co/eQS9hRPyuL

Fiona Ryan x.com