Sublime
An inspiration engine for ideas
Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision:
- a high quality, unified embedding space for image, text, and multimodal tasks
- outperforms both OpenAI CLIP and text-embedding-3-small
- open weights and code to enable indie hacking, research,... See more
CalCox.com
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Proposes EVA-CLIP, a series of models that significantly improve the efficiency and effectiveness of CLIP training.
proj: https://t.co/LNOE9rKSdq
abs: https://t.co/lgBvsgHAtC https://t.co/IrxwzNcTku

Open AI releases GPT-4V(ision) system card
paper: https://t.co/lWqSHhlCUP
GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image... See more

LLMs can now self-optimize.
A new method allows an AI to rewrite its own prompts to achieve up to 35x greater efficiency, outperforming both Reinforcement Learning and Fine-Tuning for complex reasoning.
UC Berkeley, Stanford, and Databricks introduce a new method called GEPA... See more

Improved baselines for vision-language pre-training
Finds that a simple CLIP baseline can be improved up to a 25% relative improvement on downstream zero-shot tasks, by using well-known training techniques that are popular in other subfields.
https://t.co/gfDb2AT2At https://t.co/idLYLH3iay

πIt's @openai o3-pro launch day!
our high taste guest tester @benhylak has been previewing for the past week and found an interesting pattern: (link in reply)
o3-pro doesn't noticeably outperform in normal situations, but it's just really, really, REALLY good at consuming ALL your context... See more
3D-GPT: Procedural 3D Modeling with Large Language Models
paper page: https://t.co/4UPUNNB3UG
In the pursuit of efficient automated content creation, procedural generation, leveraging modifiable parameters and rule-based systems, emerges as a promising approach. Nonetheless, it could be a... See more
AKx.com
π¨ππ¨π§π -ππππ: Unlocking the Long-Text Capability of CLIP
πππ«π¨π£: https://t.co/5QF2Mo0Ow7
ππππ¬: https://t.co/YRH7CG0As0
A plug-and-play alternative to CLIP that supports long-text input, retains its zero-shot generalizability, and aligns the CLIP latent space https://t.co/E744EfGikn