Multimodal Foundation Models: From Specialists to General-Purpose Assistants
A survey paper that investigates the taxonomy and evolution of multimodal foundation models, focusing on their transition from specialized models to general-purpose assistants in computer vision and vision-language domains.
arxiv.org