Zombie Radio AI

Inspiration

"The cloud is just another person's computer", this phrase is even more true for AI cloud services. For commercial AI services not only you have to worry about their use (and misuse) of your private information, but also you have to worry about biases in the way AI models can be trained... One day you may find your favorite AI assistant has been renamed to ChatCCP.

What it does

A Python CLI Application that performs a audio-only theater play with several AI voice actors about 4 persons trapped in a lab with Zombies. The user can interact with the voice actors via voice commands at appropriate times in the story. The crucial goal I set myself for this project is to rely only on open source AI models that anyone can run locally in a PC with a relatively modern Cuda card. No external services needed! AI Models involved:

LLM to provide dynamic text lines for each actor and to react in unexpected ways to user input (Ollama's model Nemotron-mini).
Text-to-Speech (TTS) model to generate emotionally charged audios for the voice lines (F5-TTS).
Automatic Speech Recognition (ASR) with Whisper tiny model.

Trade-off made:

The downside of this project is that it can only be run in a PC with a graphic card with at least 12GB of VRAM.
The upside is privacy, control and knowledge of the internals of your AI systems.

How we built it

Python, Linux, Pre-trained open source HuggingFace models, and an RTX 4070Ti 16Gb Nvidia GPU.

Challenges we ran into

Audio recording with Python is more messy than I expected.
The emotional F5-TTS model is too emotional sometimes, and finding voice samples to clone was not trivial.
Lack of time to properly automate the installation of all the requirements for each AI model. With more time I could have produced an ansible play to automate deployment, or maybe create a Dockerfile.
The code is a mess due to lack of time: No unit testing. No proper logging. No enforced code formatting. No setuptools installer (right now it's just a single script to run). No CLI arguments defined.

Accomplishments that we're proud of

The thing works! I had a blast trying to make this work!

What we learned

The limits and real capabilities of some AI models.

What's next for Zombie Radio AI

Automate the deployment with Ansible and/or Docker.
Separate the deployment in 2 parts: a local audio handling unit and a remote AI GPU computation unit. That way we can deploy the GPU intensive parts in a cloud GPU.
Sanitize and beautify the code a bit.

Built With

f5-tts
ollama
python
whisper

Updates

Alfredo Valles Valdes started this project — Oct 30, 2024 04:24 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.