GitHub - roboflow/multimodal-maestro: Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
LLaVA v1.5, a new open-source multimodal model stepping onto the scene as a contender against GPT-4 with multimodal capabilities. It uses a simple projection matrix to connect the pre-trained CLIP ViT-L/14 vision encoder with Vicuna LLM, resulting in a robust model that can handle images and text. The model is trained in two stages: first, updated... See more
This AI newsletter is all you need #68
Key unlock: Multimodal models can reason about images, video, or even physical environments without significant tailoring.
Sarah Wang • The Next Token of Progress: 4 Unlocks on the Generative AI Horizon
