GitHub - facebookresearch/multimodal at a33a8b888a542a4578b16972aecd072eff02c1a6
voyage-multimodal-3: all-in-one embedding model for interleaved text, images, and screenshots
Voyage AIblog.voyageai.comkaiton added
LLaVA v1.5, a new open-source multimodal model stepping onto the scene as a contender against GPT-4 with multimodal capabilities. It uses a simple projection matrix to connect the pre-trained CLIP ViT-L/14 vision encoder with Vicuna LLM, resulting in a robust model that can handle images and text. The model is trained in two stages: first, updated ... See more
This AI newsletter is all you need #68
Nicolay Gerold added
multimodal-maestro
๐ hello
Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!
๐ hello
Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!
roboflow โข GitHub - roboflow/multimodal-maestro: Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. ๐ฅ
Nicolay Gerold added
Key unlock: Multimodal models can reason about images, video, or even physical environments without significant tailoring.
Sarah Wang โข The Next Token of Progress: 4 Unlocks on the Generative AI Horizon
Darren LI added
Fuyu-8B Model Card
Note: Running Fuyu requires https://github.com/huggingface/transformers/pull/26911, which may require running transformers on main!
Model
Fuyu-8B is a multi-modal text and image transformer trained by Adept AI.
Architecturally, Fuyu is a vanilla decoder-only transformer - there is no image encoder.
Image patches are instead linearly ... See more
Note: Running Fuyu requires https://github.com/huggingface/transformers/pull/26911, which may require running transformers on main!
Model
Fuyu-8B is a multi-modal text and image transformer trained by Adept AI.
Architecturally, Fuyu is a vanilla decoder-only transformer - there is no image encoder.
Image patches are instead linearly ... See more
adept/fuyu-8b ยท Hugging Face
Nicolay Gerold added
Towhee is a cutting-edge framework designed to streamline the processing of unstructured data through the use of Large Language Model (LLM) based pipeline orchestration. It is uniquely positioned to extract invaluable insights from diverse unstructured data types, including lengthy text, images, audio and video files. Leveraging the capabilities of... See more
towhee-io โข GitHub - towhee-io/towhee: Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Nicolay Gerold added
Welcome to RAGatouille
Easily use and train state of the art retrieval methods in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
The main motivation of RAGatouille is simple: bridging the gap between state-of-the-art research and alchemical RAG pipeline practices. RAG is complex, and there are many moving parts. To g... See more
Easily use and train state of the art retrieval methods in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
The main motivation of RAGatouille is simple: bridging the gap between state-of-the-art research and alchemical RAG pipeline practices. RAG is complex, and there are many moving parts. To g... See more
GitHub - bclavie/RAGatouille: Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Nicolay Gerold added
r/OpenAI - Reddit
reddit.comIsrael added
Multimodal is not ready to use #api