GitHub - facebookresearch/multimodal at a33a8b888a542a4578b1...

GitHub - facebookresearch/multimodal at a33a8b888a542a4578b16972aecd072eff02c1a6

RelatedHighlights

voyage-multimodal-3: all-in-one embedding model for interleaved text, images, and screenshots

LLaVA v1.5, a new open-source multimodal model stepping onto the scene as a contender against GPT-4 with multimodal capabilities. It uses a simple projection matrix to connect the pre-trained CLIP ViT-L/14 vision encoder with Vicuna LLM, resulting in a robust model that can handle images and text. The model is trained in two stages: first, updated ... See more

This AI newsletter is all you need #68

Nicolay Gerold added

multimodal-maestro

👋 hello

Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!

roboflow • GitHub - roboflow/multimodal-maestro: Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Nicolay Gerold added

Key unlock: Multimodal models can reason about images, video, or even physical environments without significant tailoring.

Sarah Wang • The Next Token of Progress: 4 Unlocks on the Generative AI Horizon

Darren LI added

Fuyu-8B Model Card

Note: Running Fuyu requires https://github.com/huggingface/transformers/pull/26911, which may require running transformers on main!

Model

Fuyu-8B is a multi-modal text and image transformer trained by Adept AI.

Architecturally, Fuyu is a vanilla decoder-only transformer - there is no image encoder.

Image patches are instead linearly ... See more

adept/fuyu-8b · Hugging Face

Nicolay Gerold added

Towhee is a cutting-edge framework designed to streamline the processing of unstructured data through the use of Large Language Model (LLM) based pipeline orchestration. It is uniquely positioned to extract invaluable insights from diverse unstructured data types, including lengthy text, images, audio and video files. Leveraging the capabilities of... See more

towhee-io • GitHub - towhee-io/towhee: Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Nicolay Gerold added

Welcome to RAGatouille

Easily use and train state of the art retrieval methods in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.

The main motivation of RAGatouille is simple: bridging the gap between state-of-the-art research and alchemical RAG pipeline practices. RAG is complex, and there are many moving parts. To g... See more

GitHub - bclavie/RAGatouille: Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.

Nicolay Gerold added

r/OpenAI - Reddit

reddit.com

Israel added

Multimodal is not ready to use #api