Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

A survey paper that investigates the taxonomy and evolution of multimodal foundation models, focusing on their transition from specialized models to general-purpose assistants in computer vision and vision-language domains.

arxiv.org

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

An analysis of GPT-4V, a large multimodal model with visual understanding, discussing its capabilities, input modes, working modes, prompting techniques, and potential applications in various domains.

browse.arxiv.org

Deep Dive into LLMs like ChatGPT

youtube.com

AI Engineering: Building Applications with Foundation Models

Chip Huyen

amazon.com
Cover of AI Engineering: Building Applications with Foundation Models

On the Biology of a Large Language Model

transformer-circuits.pub
Thumbnail of On the Biology of a Large Language Model

A practical guide to building agents

Guide to building AI agents using large language models, covering agent definition, use case selection, design components, single/multi-agent orchestration, tool integration, instruction setup, safety guardrails, and deployment best practices.

cdn.openai.com

AI Revolution - Transformers and Large Language Models (LLMs)

Elad Gilblog.eladgil.com
Thumbnail of AI Revolution - Transformers and Large Language Models (LLMs)