Inspiration
Storytelling is a powerful medium, but producing a high-quality radio drama or podcast traditionally requires a full team: scriptwriters, voice actors, sound engineers, and visual artists. I wanted to democratize this process. Gemini Radio Drama Studio was born from the idea that anyone with a creative concept should be able to bring their story to life. By leveraging the multimodal capabilities of Google Gemini, we can now automate the heavy lifting—from writing the script to performing the voices and even designing the cover art.
What it does
This application is a full-featured creation suite running entirely in the browser.
- Script Generation: Users provide a simple premise, and Gemini generates a structured script complete with characters, dialogue, and atmospheric descriptors.
- AI Voice Synthesis: It integrates Gemini and ElevenLabs to convert dialogue into emotive speech, assigning distinct voices to different characters.
- Visuals & Publishing: The app generates scene images for visual context and helps package the final output for platforms like YouTube or RSS feeds.
How we built it
The project is built as a modern Single Page Application (SPA) using React 19 and Vite for a lightning-fast development experience.
- Frontend Core: We used TypeScript for type safety across our complex data structures (Scripts, Lines, Characters).
- AI Integration: The core logic relies on the @google/genai SDK to interact with Gemini models for multiple modalities (Text-to-Text, Text-to-Speech, and Image Generation). Evals and prompting strategies are stored in promptTemplates.ts to ensure consistent high-quality outputs.
- Audio Processing: Handling raw audio buffers in the browser was a challenge. We utilized lamejs for encoding and managed asynchronous batches for generating large amounts of audio lines efficiently.
Challenges we ran into
One of the biggest hurdles was state management for large media files. Keeping track of dozens of generated audio clips and images without bloating the browser's memory required careful handling and efficient storage strategies using IndexedDB-like patterns. Another challenge was strictly enforcing the "Director's output"—ensuring the AI adheres to specific JSON structures for the script so the application can parse and render it correctly.
Log in or sign up for Devpost to join the conversation.