GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
🥤 Cola [NeurIPS 2023]
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen*,†,♥ Bo Li*,♥ Sheng Shen♣ Jingkang Yang♥
Chunyuan Li♠Kurt Keutzer♣ Trevor Darrell♣ Ziwei Liu✉,♥
♥S-Lab, Nanyang Technological University
♣University of California, Berkeley ♠Microsoft Research, Redmond
*Equal Contribution †Project Lead ✉Corresponding Author... See more
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen*,†,♥ Bo Li*,♥ Sheng Shen♣ Jingkang Yang♥
Chunyuan Li♠Kurt Keutzer♣ Trevor Darrell♣ Ziwei Liu✉,♥
♥S-Lab, Nanyang Technological University
♣University of California, Berkeley ♠Microsoft Research, Redmond
*Equal Contribution †Project Lead ✉Corresponding Author... See more
cliangyu • GitHub - cliangyu/Cola: [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
Nicolay Gerold added
GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
mit-han-labgithub.comDarren LI and added
TorchMultimodal (Beta Release)
Introduction
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:
Introduction
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:
- A repository of modular and composable building blocks (models, fusion layers, loss functions, datasets and utilities).
- A repository of examples that show how to combine these building bloc
facebookresearch • GitHub - facebookresearch/multimodal at a33a8b888a542a4578b16972aecd072eff02c1a6
Nicolay Gerold added
LLaVA v1.5, a new open-source multimodal model stepping onto the scene as a contender against GPT-4 with multimodal capabilities. It uses a simple projection matrix to connect the pre-trained CLIP ViT-L/14 vision encoder with Vicuna LLM, resulting in a robust model that can handle images and text. The model is trained in two stages: first, updated ... See more
This AI newsletter is all you need #68
Nicolay Gerold added
Repository for the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", including 1.84M CoT rationales extracted across 1,060 tasks"
Paper Link : https://arxiv.org/abs/2305.14045
Paper Link : https://arxiv.org/abs/2305.14045
kaistAI • GitHub - kaistAI/CoT-Collection: [Under Review] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Nicolay Gerold added
StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
mit-han-lab • GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks
Darren LI added
Key unlock: Multimodal models can reason about images, video, or even physical environments without significant tailoring.
Sarah Wang • The Next Token of Progress: 4 Unlocks on the Generative AI Horizon
Darren LI added
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
An analysis of GPT-4V, a large multimodal model with visual understanding, discussing its capabilities, input modes, working modes, prompting techniques, and potential applications in various domains.
browse.arxiv.orgDarren LI added