Radio AI: The Autonomous Voice Operator

Inspiration

Traditional radio is a one-way street. Even with AI music, the "lean-back" experience remains passive. We were inspired to build a Radio Operator—not just a player—that bridges the gap between the listener’s voice and the music’s visual soul. We wanted to see if we could use sub-300ms voice intelligence to let a user "co-create" their broadcast in real-time.

What it does

Radio AI is an end-to-end agentic platform. It features a Conversational DJ that uses Deepgram’s Nova-2 to listen to natural language requests while music is playing. This agent doesn't just search; it interprets "vibes," fetches live news/market updates, and manages a complex waterfall search pipeline via Gemini 3 Pro. Simultaneously, it uses Perfect Corp’s Generative AI to render unique AR Visualizers in the user's room, making the music physically present.

How we built it

The Voice Agent: We replaced our legacy file-upload logic with a real-time Deepgram WebSocket pipeline. This allows for "Barge-in" support and instant endpointing.

The Intelligence: We used Gemini to act as the "Brain," parsing transcripts into structured search criteria for our MongoDB music library.

The Visuals: We integrated Perfect Corp’s Text-to-Image API to generate textures based on track metadata. These are then projected into a 3D space using the Flutter ARKit/ARCore framework.

The Backend: A robust Node.js/Express server coordinates the "Waterfall Search," ensuring that if a user asks for a "Russian folk-techno vibe," the system finds or generates the perfect match.

Challenges we ran into

Managing the latency between a live voice command and a generative AR texture was our biggest hurdle. We solved this by implementing an asynchronous pipeline where the audio stream starts immediately while the AR visualizer loads in the background.

Accomplishments that we're proud of

Successfully moving from a "Searcher" (file-based) to an "Operator" (agent-based) workflow. Achieving sub-300ms transcription latency makes the interaction feel like a real human conversation.

What we learned

What's next for Radio AI: The Autonomous Voice Operator

Built With

deepgram-nova-2
express.js
flutter
google-gemini-3
mongodb
node.js
perfect-corp-ai/ar-api
websockets

Updates

Michael Rybachenko started this project — Feb 19, 2026 04:41 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.