This AI newsletter is all you need #68

RelatedHighlights

pair-preference-model-LLaMA3-8B by RLHFlow: Really strong reward model, trained to take in two inputs at once, which is the top open reward model on RewardBench (beating one of Cohere’s).

DeepSeek-V2 by deepseek-ai (21B active, 236B total param.): Another strong MoE base model from the DeepSeek team. Some people are questioning the very high MMLU sc... See more

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

Nicolay Gerold added

We are excited to release the first version of our multimodal assistant Yasa-1, a language assistant with visual and auditory sensors that can take actions via code execution.

We trained Yasa-1 from scratch, including pretraining base models from ground zero, aligning them, as well as heavily optimizing both our training and serving infrastructure.

... See more

Announcing our Multimodal AI Assistant - Reka AI

Nicolay Gerold added

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

An analysis of GPT-4V, a large multimodal model with visual understanding, discussing its capabilities, input modes, working modes, prompting techniques, and potential applications in various domains.

browse.arxiv.org

Darren LI added

introduced

FOD#27: "Now And Then"

Nicolay Gerold added

What a crazy day in AI 🤯 • Claude Dictation • Synthflow Voice 2.0 • Claude Desktop app • ElevenLabs X to Voice • RedPanda Image Model • OpenAI launches SearchGPT • Google Learn About experiment • Search Grounding on Google AI Studio Here is everything you need to know:

Alvaro Cintas

x.com

added

multimodal-maestro

👋 hello

Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!

roboflow • GitHub - roboflow/multimodal-maestro: Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Nicolay Gerold added

Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration

1 2 Chenyang Lyu, 3 Minghao Wu, 1 * Longyue Wang, 1 Xinting Huang,

1 Bingshuai Liu, 1 Zefeng Du, 1 Shuming Shi, 1 Zhaopeng Tu

1 Tencent AI Lab, 2 Dublin City University, 3 Monash University

* Longyue Wang is the corresponding author: vinnlywang@tencent.com

Macaw... See more

lyuchenyang • GitHub - lyuchenyang/Macaw-LLM: Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Nicolay Gerold added

Scaling: The State of Play in AI

Ethan Mollick oneusefulthing.org

and added