Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

A survey paper that investigates the taxonomy and evolution of multimodal foundation models, focusing on their transition from specialized models to general-purpose assistants in computer vision and vision-language domains.

arxiv.org

Chunyuan Li Multimodal Foundation Models: From Specialists to General-Purpose Assistants