GitHub - facebookresearch/multimodal at a33a8b888a542a4578b16972aecd072eff02c1a6

HOLY SHITT, Microsoft dropped an open-source Multimodal (supports Audio, Vision and Text) Phi 4 - MIT licensed! 🔥
> Beats Gemini 2.0 Flash, GPT4o, Whisper, SeamlessM4T v2
> Models on Hugging Face hub, integrated with/ Transformers!
Phi-4-Multimodal:
___LINEBREAK... See more
I happen to have gathered a lot of resources on multimodal learning for music over the last few years and I finally got around to putting them together into a repo for anyone else who might be interested: https://t.co/1P9h5ryJEd
It... See more
Ilaria Mancox.com