Grokkdio FM

Twitch Status TUNE IN HERE: https://www.twitch.tv/grokkdiofm

(video if youtube takes us down: https://drive.google.com/file/d/1FhgmjCTfLCf_dHlBuT3SDocEtj5lMxn5/view?usp=sharing)

View the Imgur album

Why Pick Us?

Does this product have a strong market demand?: The product fits a rapidly growing market since global live streaming exceeds 87 billion dollars, audio streaming exceeds 46 billion dollars, and GTA style radio content remains culturally iconic and highly consumed.
How polished is this product? Are users willing to pay $$ to use this product today?: We are up on Twitch right now, with no marketing we had hit ~50 unique viewers. Built-in monetization paths like donations and subscriptions, and Twitch ads that already show users are willing to pay for engaging live audio content.
Using grok API: yes!
Naturalness of voice + Smartness of voice: Tune in and see! We employ several techniques to give life to the podcast speakers.

Inspiration

Remember GTA radio? Those insane talk show hosts you'd listen to while cruising around the city; chaotic, hilarious, and weirdly addictive. You'd keep driving just to hear what they'd say next.

Previous TTS sounded robotic, and slow not for entertainment you'd choose to listen to. Also, cloning voices required the subject to follow a script, not viable for most use cases. Grok voice changed that. It's expressive, emotional, easy to clone, and fast enough for live interaction.

So we asked, how to take best advantage of Grok Voice's novel advantages: Live AI radio with personality. A 24/7 talk show with multiple distinct hosts, real-time trending topics polled from X, and listeners calling in to join, all powered by Grok voice AI that actually sounds worth listening to.

That's something people would tune into every day. Whether during their commute back from their 36 hour shift, or their road trip to Tahoe.

Neither of us had any experience with voice engineering, FFmpeg, LiveKit etc. so this had been a great learning experience with room to improve!

What it does

Grokkdio FM is a 24/7 AI radio station: GTA radio brought to life, only possible because of Grok voice.

Novel Grok Voice Use Cases:

Multi-personality live radio: multiple AI hosts with cloned voices having natural conversations, interrupting each other, building on points. Not turn-based chatbot dialogue. Actual radio banter.
Live caller interaction: dial a real phone number, and Grok responds to you in real-time on air. Voice-to-voice, live, unscripted.
Expressive entertainment: hosts yell, whisper, laugh, get heated. Grok voice delivers performance, not just speech.
Voice cloning for character: each host has a recognizable voice you'd tune in to hear again

Why people would use this every day:

Fresh content always: topics pulled from X trends, so it's different every time
Entertaining, not robotic: you'd actually want this on in the background
Interactive: call in and become part of the show
Habitual format: like morning radio, something to tune into daily

How we built it

Every piece of Grokkdio FM is built to showcase what Grok voice can do and create something people use daily.

Pushing Grok Voice into New Territory:

XAI Voice API with Voice Cloning: we clone distinct voices for each host. This is what makes "AI radio personality" possible. Without Grok's expressiveness and cloning, every host would sound the same.
Live voice-to-voice interaction: Twilio captures caller audio, we transcribe it, and Grok responds live.
Multi-agent voice orchestration: multiple Grok voices running concurrently, naturally taking turns and interrupting.
Emotional range: prompts designed to make Grok yell, whisper, laugh, argue. We're using the full expressiveness of the voice model.

Main tools:

XAI Grok: AI intelligence
X: Trending tweets
Twilio: anyone can call in from a regular phone
FFmpeg + Twitch: 24/7 (ish) streaming infrastructure
Puppeteer: displays tweets on stream for visual engagement

Challenges we ran into

ffmpeg: Neither of us had any experience with streaming/voice engineering. We had a few niche ffmpeg errors that we couldn't Cursor our way out of. Had to actually think.
24/7 reliability: daily-use products can't crash. We engineered for stability across hours of continuous multi-stream operation, but there still are multiple modes of failure.
Making multiple Grok voices work together: orchestrating turn-taking, interruptions, and natural flow between AI hosts. This multi-voice coordination was a novel technical challenge.