Getting emotional performances from AI voices

Inspiration

Adding dialogue to a game can be tough, especially if you're relying on voice actors who only have one shot to nail their performance. But what if you could generate new performances from existing dialogue without having to bring actors back to the studio?

What it does

We've created a system that uses AI to generate new dialogue based on initial recordings, and we've also developed a way to choose how each line is performed. Plus, we've combined this with a procedural animation system that seamlessly assigns emotes based on AI-generated labels. The emotes we show here are basic, they are a proof of concept.

How I built it

We trained separate ML voice synthesis models using the Elevenlabs API using recordings of performances given with 4 emotions: Angry, Sad, Happy, and Astonished. The ML models learned to produce new voice samples with the same delivery. The "angry" model sounds angry while the "sad" model sounds sad, but because they were all recorded from samples of the same person's voice, they are all consistent in identity.

Challenges I ran into

Creating emotes in a short amount of time is challenging and we wished we could have made the character look more natural, but animating dialogue is a very time-consuming process.

Accomplishments that I'm proud of

With our innovative approach, we were able to generate 250 new lines of dialogue from just 4 initial audio samples recorded with Gabriella from Womp.com. And the best part? It required minimal effort to integrate into our game.

What I learned

It's possible to get emotionally expressive performances from AI voice synthesis models. This is probably going to find a place in game dev workflows very soon.

Built With

Updates

John Dagdelen started this project — Mar 24, 2023 08:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.