GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Langu...

GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

RelatedHighlights

🥤 Cola [NeurIPS 2023]

Large Language Models are Visual Reasoning Coordinators

Liangyu Chen*,†,♥ Bo Li*,♥ Sheng Shen♣ Jingkang Yang♥

Chunyuan Li♠ Kurt Keutzer♣ Trevor Darrell♣ Ziwei Liu✉,♥

♥S-Lab, Nanyang Technological University

♣University of California, Berkeley ♠Microsoft Research, Redmond

*Equal Contribution †Project Lead ✉Corresponding Author... See more

cliangyu • GitHub - cliangyu/Cola: [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"

Nicolay Gerold added

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

mit-han-lab github.com

Darren LI and added

TorchMultimodal (Beta Release)

Introduction

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale. It provides:

A repository of modular and composable building blocks (models, fusion layers, loss functions, datasets and utilities).

A repository of examples that show how to combine these building bloc

facebookresearch • GitHub - facebookresearch/multimodal at a33a8b888a542a4578b16972aecd072eff02c1a6

Nicolay Gerold added

LLaVA v1.5, a new open-source multimodal model stepping onto the scene as a contender against GPT-4 with multimodal capabilities. It uses a simple projection matrix to connect the pre-trained CLIP ViT-L/14 vision encoder with Vicuna LLM, resulting in a robust model that can handle images and text. The model is trained in two stages: first, updated ... See more

This AI newsletter is all you need #68

Nicolay Gerold added

Repository for the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", including 1.84M CoT rationales extracted across 1,060 tasks"

Paper Link : https://arxiv.org/abs/2305.14045

kaistAI • GitHub - kaistAI/CoT-Collection: [Under Review] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Nicolay Gerold added

StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.

mit-han-lab • GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

Darren LI added

Key unlock: Multimodal models can reason about images, video, or even physical environments without significant tailoring.

Sarah Wang • The Next Token of Progress: 4 Unlocks on the Generative AI Horizon

Darren LI added

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

An analysis of GPT-4V, a large multimodal model with visual understanding, discussing its capabilities, input modes, working modes, prompting techniques, and potential applications in various domains.

browse.arxiv.org

Darren LI added