TUNE IN HERE: https://www.twitch.tv/grokkdiofm
(video if youtube takes us down: https://drive.google.com/file/d/1FhgmjCTfLCf_dHlBuT3SDocEtj5lMxn5/view?usp=sharing)

Why Pick Us?
- Does this product have a strong market demand?: The product fits a rapidly growing market since global live streaming exceeds 87 billion dollars, audio streaming exceeds 46 billion dollars, and GTA style radio content remains culturally iconic and highly consumed.
- How polished is this product? Are users willing to pay $$ to use this product today?: We are up on Twitch right now, with no marketing we had hit ~50 unique viewers. Built-in monetization paths like donations and subscriptions, and Twitch ads that already show users are willing to pay for engaging live audio content.
- Using grok API: yes!
- Naturalness of voice + Smartness of voice: Tune in and see! We employ several techniques to give life to the podcast speakers.
Inspiration
Remember GTA radio? Those insane talk show hosts you'd listen to while cruising around the city; chaotic, hilarious, and weirdly addictive. You'd keep driving just to hear what they'd say next.
Previous TTS sounded robotic, and slow not for entertainment you'd choose to listen to. Also, cloning voices required the subject to follow a script, not viable for most use cases. Grok voice changed that. It's expressive, emotional, easy to clone, and fast enough for live interaction.
So we asked, how to take best advantage of Grok Voice's novel advantages: Live AI radio with personality. A 24/7 talk show with multiple distinct hosts, real-time trending topics polled from X, and listeners calling in to join, all powered by Grok voice AI that actually sounds worth listening to.
That's something people would tune into every day. Whether during their commute back from their 36 hour shift, or their road trip to Tahoe.
Neither of us had any experience with voice engineering, FFmpeg, LiveKit etc. so this had been a great learning experience with room to improve!
What it does
Grokkdio FM is a 24/7 AI radio station: GTA radio brought to life, only possible because of Grok voice.
Novel Grok Voice Use Cases:
- Multi-personality live radio: multiple AI hosts with cloned voices having natural conversations, interrupting each other, building on points. Not turn-based chatbot dialogue. Actual radio banter.
- Live caller interaction: dial a real phone number, and Grok responds to you in real-time on air. Voice-to-voice, live, unscripted.
- Expressive entertainment: hosts yell, whisper, laugh, get heated. Grok voice delivers performance, not just speech.
- Voice cloning for character: each host has a recognizable voice you'd tune in to hear again
Why people would use this every day:
- Fresh content always: topics pulled from X trends, so it's different every time
- Entertaining, not robotic: you'd actually want this on in the background
- Interactive: call in and become part of the show
- Habitual format: like morning radio, something to tune into daily
How we built it
Every piece of Grokkdio FM is built to showcase what Grok voice can do and create something people use daily.
Pushing Grok Voice into New Territory:
- XAI Voice API with Voice Cloning: we clone distinct voices for each host. This is what makes "AI radio personality" possible. Without Grok's expressiveness and cloning, every host would sound the same.
- Live voice-to-voice interaction: Twilio captures caller audio, we transcribe it, and Grok responds live.
- Multi-agent voice orchestration: multiple Grok voices running concurrently, naturally taking turns and interrupting.
- Emotional range: prompts designed to make Grok yell, whisper, laugh, argue. We're using the full expressiveness of the voice model.
Main tools:
- XAI Grok: AI intelligence
- X: Trending tweets
- Twilio: anyone can call in from a regular phone
- FFmpeg + Twitch: 24/7 (ish) streaming infrastructure
- Puppeteer: displays tweets on stream for visual engagement
Challenges we ran into
- ffmpeg: Neither of us had any experience with streaming/voice engineering. We had a few niche ffmpeg errors that we couldn't Cursor our way out of. Had to actually think.
- 24/7 reliability: daily-use products can't crash. We engineered for stability across hours of continuous multi-stream operation, but there still are multiple modes of failure.
- Making multiple Grok voices work together: orchestrating turn-taking, interruptions, and natural flow between AI hosts. This multi-voice coordination was a novel technical challenge.
Stack logos
![]()
![]()
Built With
- ffmpeg
- grok-voice
- livekit
- livekit-agents
- puppeteer
- twilio
- websockets
Log in or sign up for Devpost to join the conversation.