adept/fuyu-8b · Hugging Face

Github 👨🔧: A Comprehensive Toolkit for High-Quality PDF Content Extraction
→ Integrates leading document parsing models for layout detection, formula detection, formula recognition, OCR, and table recognition.
→ Achieves high-quality parsing across diverse document types due to fine-tuning... See more

HOLY SHITT, Microsoft dropped an open-source Multimodal (supports Audio, Vision and Text) Phi 4 - MIT licensed! 🔥
> Beats Gemini 2.0 Flash, GPT4o, Whisper, SeamlessM4T v2
> Models on Hugging Face hub, integrated with/ Transformers!
Phi-4-Multimodal:... See more