Inspiration

Storytelling is a powerful medium, but producing a high-quality radio drama or podcast traditionally requires a full team: scriptwriters, voice actors, sound engineers, and visual artists. I wanted to democratize this process. Gemini Radio Drama Studio was born from the idea that anyone with a creative concept should be able to bring their story to life. By leveraging the multimodal capabilities of Google Gemini, we can now automate the heavy lifting—from writing the script to performing the voices and even designing the cover art.

What it does

This application is a full-featured creation suite running entirely in the browser.

  1. Script Generation: Users provide a simple premise, and Gemini generates a structured script complete with characters, dialogue, and atmospheric descriptors.
  2. AI Voice Synthesis: It integrates Gemini and ElevenLabs to convert dialogue into emotive speech, assigning distinct voices to different characters.
  3. Visuals & Publishing: The app generates scene images for visual context and helps package the final output for platforms like YouTube or RSS feeds.

How we built it

The project is built as a modern Single Page Application (SPA) using React 19 and Vite for a lightning-fast development experience.

  • Frontend Core: We used TypeScript for type safety across our complex data structures (Scripts, Lines, Characters).
  • AI Integration: The core logic relies on the @google/genai SDK to interact with Gemini models for multiple modalities (Text-to-Text, Text-to-Speech, and Image Generation). Evals and prompting strategies are stored in promptTemplates.ts to ensure consistent high-quality outputs.
  • Audio Processing: Handling raw audio buffers in the browser was a challenge. We utilized lamejs for encoding and managed asynchronous batches for generating large amounts of audio lines efficiently.

Challenges we ran into

One of the biggest hurdles was state management for large media files. Keeping track of dozens of generated audio clips and images without bloating the browser's memory required careful handling and efficient storage strategies using IndexedDB-like patterns. Another challenge was strictly enforcing the "Director's output"—ensuring the AI adheres to specific JSON structures for the script so the application can parse and render it correctly.

Accomplishments that we're proud of

What we learned

What's next for Gemini Radio Drama Studio

Built With

Share this project:

Updates