Inspiration
"The cloud is just another person's computer", this phrase is even more true for AI cloud services. For commercial AI services not only you have to worry about their use (and misuse) of your private information, but also you have to worry about biases in the way AI models can be trained... One day you may find your favorite AI assistant has been renamed to ChatCCP.
What it does
A Python CLI Application that performs a audio-only theater play with several AI voice actors about 4 persons trapped in a lab with Zombies. The user can interact with the voice actors via voice commands at appropriate times in the story. The crucial goal I set myself for this project is to rely only on open source AI models that anyone can run locally in a PC with a relatively modern Cuda card. No external services needed! AI Models involved:
- LLM to provide dynamic text lines for each actor and to react in unexpected ways to user input (Ollama's model Nemotron-mini).
- Text-to-Speech (TTS) model to generate emotionally charged audios for the voice lines (F5-TTS).
- Automatic Speech Recognition (ASR) with Whisper tiny model.
Trade-off made:
- The downside of this project is that it can only be run in a PC with a graphic card with at least 12GB of VRAM.
- The upside is privacy, control and knowledge of the internals of your AI systems.
How we built it
Python, Linux, Pre-trained open source HuggingFace models, and an RTX 4070Ti 16Gb Nvidia GPU.
Challenges we ran into
- Audio recording with Python is more messy than I expected.
- The emotional F5-TTS model is too emotional sometimes, and finding voice samples to clone was not trivial.
- Lack of time to properly automate the installation of all the requirements for each AI model. With more time I could have produced an ansible play to automate deployment, or maybe create a Dockerfile.
- The code is a mess due to lack of time: No unit testing. No proper logging. No enforced code formatting. No setuptools installer (right now it's just a single script to run). No CLI arguments defined.
Accomplishments that we're proud of
The thing works! I had a blast trying to make this work!
What we learned
The limits and real capabilities of some AI models.
What's next for Zombie Radio AI
- Automate the deployment with Ansible and/or Docker.
- Separate the deployment in 2 parts: a local audio handling unit and a remote AI GPU computation unit. That way we can deploy the GPU intensive parts in a cloud GPU.
- Sanitize and beautify the code a bit.
Built With
- f5-tts
- ollama
- python
- whisper
Log in or sign up for Devpost to join the conversation.