how to do this, easy mode:
1. flux kontext, upload your image, and prompt the scene (https://t.co/as3Se7Zsgw)
2. Dump it into veo3 with prompts for animations
3. Eleven labs audio to audio with whosever voice you need
kontext will maintain consistency reasonably... See more
Google presents AudioPaLM: A Large Language Model That Can Speak and Listen
paper page: https://t.co/ZXFa4sRK03
introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and... See more