This AI newsletter is all you need #68
pair-preference-model-LLaMA3-8B by RLHFlow: Really strong reward model, trained to take in two inputs at once, which is the top open reward model on RewardBench (beating one of Cohere’s).
DeepSeek-V2 by deepseek-ai (21B active, 236B total param.): Another strong MoE base model from the DeepSeek team. Some people are questioning the very high MMLU sc... See more
DeepSeek-V2 by deepseek-ai (21B active, 236B total param.): Another strong MoE base model from the DeepSeek team. Some people are questioning the very high MMLU sc... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
Nicolay Gerold added
We are excited to release the first version of our multimodal assistant Yasa-1, a language assistant with visual and auditory sensors that can take actions via code execution.
We trained Yasa-1 from scratch, including pretraining base models from ground zero, aligning them, as well as heavily optimizing both our training and serving infrastructure.
... See more
We trained Yasa-1 from scratch, including pretraining base models from ground zero, aligning them, as well as heavily optimizing both our training and serving infrastructure.
... See more
Announcing our Multimodal AI Assistant - Reka AI
Nicolay Gerold added
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
An analysis of GPT-4V, a large multimodal model with visual understanding, discussing its capabilities, input modes, working modes, prompting techniques, and potential applications in various domains.
browse.arxiv.orgDarren LI added
- Microsoft introduced Phi 1.5 – a compact AI model with multimodal capabilities, meaning it can process images as well as text. Despite being significantly smaller than OpenAI's GPT-4, with only 1.3 billion parameters, it demonstrates advanced features like those found in larger models. Phi 1.5 is open-source, emphasizing the trend towards efficient
FOD#27: "Now And Then"
Nicolay Gerold added
What a crazy day in AI 🤯
• Claude Dictation
• Synthflow Voice 2.0
• Claude Desktop app
• ElevenLabs X to Voice
• RedPanda Image Model
• OpenAI launches SearchGPT
• Google Learn About experiment
• Search Grounding on Google AI Studio
Here is everything you need to know:
multimodal-maestro
👋 hello
Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!
👋 hello
Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!
roboflow • GitHub - roboflow/multimodal-maestro: Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Nicolay Gerold added
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
1 2 Chenyang Lyu, 3 Minghao Wu, 1 * Longyue Wang, 1 Xinting Huang,
1 Bingshuai Liu, 1 Zefeng Du, 1 Shuming Shi, 1 Zhaopeng Tu
1 Tencent AI Lab, 2 Dublin City University, 3 Monash University
* Longyue Wang is the corresponding author: vinnlywang@tencent.com
Macaw... See more
1 2 Chenyang Lyu, 3 Minghao Wu, 1 * Longyue Wang, 1 Xinting Huang,
1 Bingshuai Liu, 1 Zefeng Du, 1 Shuming Shi, 1 Zhaopeng Tu
1 Tencent AI Lab, 2 Dublin City University, 3 Monash University
* Longyue Wang is the corresponding author: vinnlywang@tencent.com
Macaw... See more
lyuchenyang • GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Nicolay Gerold added