Sublime

An inspiration engine for ideas

AllPeopleCollectionsArticlesAudioBooksFilesHighlightsImagesLinksNotesTextTweetsVideosSocial

Thumbnail of www-x-com-tuturetom-status-1831148670068375763

最新阿里发布的 QWen2-VL-7B 的视觉语言模型有点强⚡️ OCR 能力达到同级开源模型 SOTA 效果！英文基础测试手写字识别准确度 “100%”🤯🤯 而且中文支持还不错！🔥 在线体验 👉 https://t.co/nyT8jX9Myw https://t.co/EDV8QpG7Kw

Tom Huang

x.com

Introducing new models for research & development of health applications: MedGemma 27B Multimodal, for complex multimodal & longitudinal EHR interpretation, and MedSigLIP, a lightweight image & text encoder for classification, search, & related tasks. → https://t.co/I318jVmsYD https://t.co/LlpL269Poa

Google Research

x.com

The chineese are striking again - few days after Runway release Act one X-Portrait 2 release a super expressive Video to Video model. Link in first comment https://t.co/mqUqd00i7R

Teodora P L x.com

IBM's Diversity in Faces

ModernMind Publications • Generative AI for Beginners Made Easy: Master Artificial Intelligence and Machine Learning Fundamentals, Learn Creative AI, and Enhance Your Skills With Interactive Real-World Exercises

We’re open-sourcing a multimodal model: Fuyu-8B! Building useful AI agents requires fast foundation models that can see the visual world. Fuyu-8B performs well at standard image understanding benchmarks, but it also can do a bunch of new stuff (below) https://t.co/7bTh6mDNEY

Adept x.com

It was as if he wanted to catalog the world—not by any formal means, and not even for any particular reason, but simply because he found joy in the process.

Fei-Fei Li • The Worlds I See: Curiosity, Exploration, and Discovery at the Dawn of AI

Cool to see a 500M param model I trained myself do better than Google cloud vision, Claude, and GPT-4V on this task. (look at the thread for the results) It's a relatively narrow one (OCR), but feels nice to see that small open source models still have a place.

Vik Paruchuri x.com

Thumbnail of www-x-com-cydiar404-status-1640399013345214479-b31bd406df1f4aaf

刚才推友给了一个魔法，居然可以追加镜头参数，这景深效果，简直绝了！！！ A Chinese 20-year-old Woman, looking like Audrey Hepburn, Black hair, standing on 2023 Tokyo street, hyper realistic portrait photography, pale skin, dress, wide shot, natural lighting, kodak portra 800, 105 mm f1. 8， 32k --ar 16:9 --v 5 --s 750 --q 2

𝗖𝘆𝗱𝗶𝗮𝗿

x.com

OCR-free document understanding model https://t.co/uuhrSkfTTY

Tom Dörr

x.com