Inspiration

The magic of storytelling has always been universal, but what if we could make it truly personal and voice-driven?

The Personal Story Behind Wonder Tales

I used to tell my little son (he's six years old now) bedtime stories to help him fall asleep. The stories he loved most were the ones I made up on the spot—sometimes silly, sometimes adventurous, but always personal. I noticed that when a story felt "custom-made" for him, he listened more closely and enjoyed it much more. Often, all I needed to start was a name, an age, and a theme—and I would build a story around those simple inputs. That's how the idea for Wonder Tales was born.

My wife and I have always loved bedtime stories, and I especially grew up loving A Thousand and One Nights. At some point, I wondered: what if Wonder Tales could generate stories not only for kids, but also stories that feel right for older audiences too? That question turned Wonder Tales into an all-ages storytelling app—one that creates age-appropriate stories for different life stages.

The ElevenLabs Challenge Vision

We were inspired by the ElevenLabs Challenge to push the boundaries of conversational AI beyond simple chatbots into something truly magical. Our vision: Create a platform where you simply talk about what you love, and a personalized story with matching visuals appears before your eyes. From a 5-year-old who loves butterflies to a 70-year-old seeking wisdom, everyone deserves stories that speak directly to them through natural conversation.

What it does

Wonder Tales is an all-ages AI storytelling platform that creates personalized stories through natural voice conversation:

🎙️ Voice Collection: ElevenLabs Conversational Agent collects your name, age, and favorite themes through natural conversation

📖 Story Generation: Gemini AI creates age-appropriate stories tailored to your preferences

🎨 Dynamic Visuals: Vertex AI generates background images that perfectly match your story content (not generic themes!)

🎵 Cinematic Experience: Age-based TTS narration with custom ElevenLabs voices and background music

Custom Voice Design: Created three specialized narrator voices:

  • Wonder-Tales-Child: Soft, gentle storyteller with higher pitch for ages 3-12
  • Wonder-Tales-Young: Warm, mystical narrator with timeless comfort for ages 13-59
  • Wonder-Tales-Elder: Wise grandparent voice with nostalgic warmth for ages 60+

Age Ranges Supported: 3-99 years with content adaptation for Children, Teens, Adults, and Elders

How we built it

ElevenLabs Agent Mastery

We built a sophisticated two-node conversational agent:

  • Data Collection Node: Gently collects name, age, themes with graceful fallbacks
  • Webhook Integration: Sends data to Google Cloud Run backend
  • Client Tools: Seamless story ID transfer to frontend
  • Widget Customization: Production-ready error handling
  • Custom Voice Creation: Designed three age-appropriate narrator voices

Technical Implementation: With Gemini, Wonder Tales generates stories based on the user's name, age, and theme, then narrates them using voices designed specifically for different age groups. I had already done voice design work with ElevenLabs before, so it was a natural fit to bring that experience into Wonder Tales: I created multiple voices tailored to different ages and integrated them directly into the app.

The main challenge was collecting the name, age, and theme in a truly voice-driven way. I solved this by building an agent with ElevenLabs. On the frontend, the agent uses a widget to gather the user's inputs, then sends them via a webhook tool to Cloud Run. Cloud Run generates the story using prompts designed for different age groups. Once the story is ready, a client tool inside the agent notifies the frontend and returns a story ID, which takes the user to the story screen.

Multi-AI Pipeline

Voice Input → ElevenLabs Agent → Backend → Gemini Story → Vertex AI Image → TTS Narration

Key Innovation: Instead of generic theme-based images, we analyze the actual story content with Gemini to generate perfect matching backgrounds with Vertex AI.

On the story screen, the frontend uses our age-tailored custom ElevenLabs voices to read the story aloud. At the end, the app reads a short moral statement. We also display a story-specific background image generated with Vertex AI. Gemini creates the image prompt using the user's age, theme, and the story content—so each story gets a unique visual that matches its characters, objects, atmosphere, tone, and theme.

The result is a unique experience where story, voice, and visuals come together in a cohesive, creative way.

Production Architecture

  • Frontend: Firebase Hosting with vanilla JavaScript
  • Backend: Google Cloud Run with FastAPI
  • AI Services: ElevenLabs + Gemini + Vertex AI
  • Storage: Google Cloud Storage for assets

Challenges we ran into

ElevenLabs Agent Complexity

Learning ElevenLabs workflow system, webhook configuration, and client tools integration required deep technical mastery. We spent significant time optimizing conversation flows and error handling.

Agent-1 Timeout Crisis & Breakthrough Solution

The Problem: Our initial synchronous architecture caused Agent-1 timeouts (16s limit) because story + image generation took 40-50 seconds.

The Innovation: We developed an async architecture breakthrough:

  • Fast Story Response: 10.9 seconds (within Agent-1 timeout limit)
  • Background Image Generation: 30 seconds during loading screen
  • Production Validation: Real stories like "Henry and the Whispering Willow's Gentle Secret" working perfectly

This async pattern maintains dynamic image generation while ensuring 100% Agent-1 compatibility.

Perfect Story-Image Alignment

Generic theme-based images created poor matches. Our breakthrough: analyzing actual story content with Gemini before generating images with Vertex AI.

Before: Alice + "animals" theme → Generic garden image After: "Alice and Sparkle-Wing Butterfly" story → Enchanted butterfly garden with sparkles

Vertex AI Safety Filter Innovation

We solved Vertex AI safety filters blocking human characters by implementing intelligent content filtering in Gemini prompts, focusing on environments and animal characters while maintaining story quality.

Multi-AI Orchestration

Coordinating ElevenLabs, Gemini, and Vertex AI required sophisticated error handling, fallback systems, and performance optimization.

Accomplishments that we're proud of

🏆 ElevenLabs Technical Mastery

  • Complete conversational agent with webhook and client tools
  • Production widget integration with custom event handling
  • Custom Voice Design: Created three specialized age-appropriate narrator voices
  • Advanced voice prompt engineering for optimal storytelling experience
  • Agent-1 Timeout Resolution: Breakthrough async architecture (10.9s response vs 40-50s original)

🤖 Revolutionary AI Innovation

  • Story-content-based image generation (95%+ alignment success)
  • Age-adaptive system (3-99 years with appropriate content)
  • Sequential AI pipeline for optimal quality
  • Vertex AI Safety Breakthrough: Solved human character filtering while maintaining story quality

🚀 Production Excellence

  • Production Deployment: Fully deployed system ready for demonstration
  • Complete cinematic experience with background music
  • Professional glass morphism UI design
  • Performance Validation: Real stories like "Henry and the Whispering Willow's Gentle Secret" with perfect image alignment

What we learned

ElevenLabs Expertise

  • Conversational AI workflow architecture and system prompt engineering
  • Custom Voice Creation: Designed age-appropriate narrator voices with detailed prompts
  • Webhook systems and client tools for seamless integration
  • Widget customization and production error handling
  • Voice prompt engineering for optimal storytelling experience
  • Testing tools for agent optimization

AI Orchestration

  • Multi-AI coordination requires careful sequencing and fallback strategies
  • Content-aware generation produces far better results than generic prompts
  • Age-adaptive systems need deep understanding of appropriate content

Production Deployment

  • Scalable AI applications require comprehensive monitoring
  • Security best practices for API key management
  • Performance optimization through asset preloading

What's next for Wonder Tales

Enhanced ElevenLabs Integration

  • Multi-language agents for global storytelling
  • Voice cloning for personalized narrator experiences
  • Interactive choice-driven narratives with voice commands

Advanced AI Features

  • Persistent characters across multiple stories
  • Educational curriculum integration with assessment
  • Collaborative family storytelling experiences

Platform Evolution

  • Personal story libraries with voice-activated selection
  • Community features with voice-based recommendations
  • Analytics dashboard for storytelling preferences

Technologies Used

Core AI Stack:

  • ElevenLabs Conversational AI: Agent development, webhook integration, client tools
  • ElevenLabs TTS API: Custom voice narration with age-appropriate selection
  • Gemini API: Story generation, content analysis for image prompts
  • Vertex AI Imagen API: Dynamic background generation based on story content

Infrastructure:

  • Google Cloud Run: Scalable backend deployment
  • Firebase Hosting: Global frontend delivery
  • Google Cloud Storage: Asset storage and CDN
  • FastAPI: High-performance API framework

Development:

  • Vanilla JavaScript: Frontend implementation
  • Python 3.11+: Backend development
  • Docker: Containerized deployment

Technical Documentation

For detailed technical implementation and architecture:


🚀 Production Demo: Available for judges only (API cost management)

🎙️ ElevenLabs Agent: Sophisticated conversational AI with webhook integration

📱 Complete System: Frontend + Backend + Multi-AI pipeline fully deployed

Wonder Tales represents the future of voice-first AI storytelling - where ElevenLabs Conversational AI meets Google Cloud AI to create truly magical, personalized experiences for all ages.

Built With

Share this project:

Updates