Inspiration

Screen time is one of the biggest concerns for parents of young children. Most apps aimed at kids are passive — children just watch, tap, and swipe. We asked: what if screen time could be creative, educational, and deeply personal?

We wanted to build something where a child's imagination drives the experience — where their drawings and words become the raw material for a unique story that only they could create. That's how LiveTales was born.

What it does

LiveTales is an interactive AI storytelling app for children ages 3–13. Kids draw on a canvas and talk in real-time with Tali — an AI friend who sounds like an excited 7-year-old. Tali watches the child draw, reacts with genuine enthusiasm ("Ooh is that a dragon? I love dragons!"), asks imaginative questions, and helps co-create a story.

When the child is ready, LiveTales transforms their drawings and conversation into a fully illustrated, narrated, animated 6-page storybook:

  • Draw on an interactive canvas with colors and brushes
  • Talk with Tali in real-time — she responds instantly with voice
  • Generate a 6-page story based on the drawing + conversation
  • See watercolor-style AI illustrations for each page
  • Listen to the full story narrated expressively
  • Watch gentle animated video clips of each illustration

Every story is unique because it comes from the child's own imagination.

How we built it

Tali is a Google ADK Agent. We used the Google Agent Development Kit (ADK) to define Tali as an Agent with a personality-driven system prompt and run her via Runner.run_live() with LiveRequestQueue for real-time bidirectional audio streaming. ADK handles session management, streaming lifecycle, voice configuration, turn-taking, and interruption handling — letting us focus on the experience rather than low-level API plumbing.

The architecture has three pipelines:

  1. Voice Pipeline (real-time): Browser microphone → WebSocket → ADK Runner + LiveRequestQueue → Gemini 2.0 Flash Live → audio response back to browser. Three concurrent async tasks in a TaskGroup keep this responsive at all times.

  2. Story Pipeline (async): Triggered on demand. The child's canvas snapshot (JPEG) + tagged conversation context (Tali: ... Child: ...) is sent to Gemini 2.5 Flash, which generates structured story JSON. Each page's illustration prompt is then sent to Imagen 3.0 to generate watercolor-style children's storybook art.

  3. Video Pipeline (async, parallel): Triggered after the story is complete. Gemini 2.5 Flash TTS generates a single expressive narration for the full story. Simultaneously, Veo 2.0 animates each illustration into a 5-second video clip via Google Cloud Storage.

Frontend: React 18 + TypeScript with Tailwind CSS, shadcn/ui, Zustand for state management, and Framer Motion for animations. The useVoiceSession hook manages the entire WebSocket + Web Audio API lifecycle.

Backend: Python 3.12 + FastAPI, deployed on Google Cloud Run with Vertex AI.

Challenges we faced

  • Keeping Tali responsive during heavy generation. Story generation, illustration, and video are slow calls. We had to strictly decouple them from the live voice pipeline — ADK handles voice, a separate Vertex AI client handles everything else as async background tasks.

  • WebSocket lifecycle with long-running video generation. Veo can take 1-3 minutes per page. We added keepalive progress messages and task tracking to prevent WebSocket timeouts, and wait for pending video tasks before closing connections.

  • Child-safe content at every layer. No single safety mechanism is enough. We combined Imagen's block_low_and_above safety filter, carefully scoped prompts ("watercolor style, warm colors, safe for children"), and Tali's personality rules ("never be scary or negative") for defense in depth.

  • Making conversation context useful for story generation. We tag transcriptions as Tali: and Child: in a rolling buffer, so when Gemini 2.5 Flash generates the story, it knows what the child actually said vs. what Tali said — making stories feel truly personal.

What we learned

  1. Google ADK makes real-time voice agents accessible. The Agent + Runner + LiveRequestQueue pattern gave us production-grade bidi-streaming with minimal boilerplate.

  2. Latency is everything when your user is 5 years old. Gemini 2.0 Flash Live's sub-second response time is what makes the experience feel magical. Progressive delivery (text first, then illustration) keeps children engaged while heavier models work in the background.

  3. Multimodal orchestration = decoupled pipelines. Live streaming and traditional API calls require fundamentally different patterns. Keeping them separate was the key architectural decision.

What's next for LiveTales

  • Multi-language support (Tali already speaks English, but kids everywhere deserve a creative AI friend)
  • Collaborative storytelling (multiple children drawing and talking together)
  • Story sharing and a parent dashboard
  • Expanding to more age groups with adaptive difficulty

Built With

Share this project:

Updates