Inspiration

Every parent wants to tell their child a story where they are the hero. We were inspired by the idea of turning a simple photo of a child into a fully animated, narrated storybook — complete with unique art styles, companion characters, and a video they can watch again and again. We wanted to make storytelling magical, personal, and accessible to every family, regardless of language (supporting both English and Chinese).

What it does

MyStoryBook turns a child's photo into a personalized, AI-generated animated storybook video.

  1. Character Creation — Upload a photo and the AI generates the child as a character in 5 distinct art styles (Ghibli, Watercolor, 3D Cute, Claymation, Colored Pencil).
  2. Companion Selection — AI suggests companion characters (friends, animals, fantasy creatures) that fit the protagonist's personality.
  3. Story Generation — Choose a theme and pick from 3 AI-generated synopsis styles (Sensory Wonder, Heartfelt Bond, or Brave Adventure). The AI writes a full illustrated chapter with cover art and NPC portraits.
  4. Video Production — The story is automatically converted into a narrated video with per-character voice acting, scene illustrations, camera directions, and burned-in subtitles.
  5. Story Continuation — Each chapter ends with choices, letting kids pick what happens next for an ongoing, branching adventure.

How we built it

  • Next.js 15 (App Router) + React 19 + TypeScript for the full-stack framework
  • Google Gemini API as the core AI engine — using 4 different Gemini models for text generation, image generation, text-to-speech, and speech-to-text
  • Interleaved generation — a single Gemini call produces both story text and matching character portraits/scene illustrations together, ensuring visual-narrative coherence
  • FFmpeg for the video pipeline — composing scene clips from images + multi-line audio, concatenating them, and burning in subtitles (with CJK support)
  • Prisma + SQLite for data persistence, with optional Google Cloud Storage for production file storage
  • Tailwind CSS for the UI with a warm, child-friendly design system
  • Bilingual i18n support (English & Simplified Chinese) throughout prompts, UI, and voice selection

Challenges we ran into

  • Interleaved text+image parsing — Gemini returns mixed text and image parts in a single response. Reliably mapping which image belongs to which scene or character required careful sequential parsing with fallback handling for partial failures.
  • FFmpeg resource tuning — Video encoding on cloud hardware with limited CPU/RAM required auto-detecting cores and tuning thread counts, presets, and quality settings to avoid OOM kills while maintaining reasonable output quality.
  • Story continuation coherence — Maintaining narrative consistency across branching chapters meant injecting prior story context and validating that user choices match the previous chapter's ending options.

Accomplishments that we're proud of

  • End-to-end AI pipeline — From a single photo to a fully narrated, illustrated, subtitled video, all automated.
  • 5 art style variants in parallel — Characters are rendered in 5 distinct styles simultaneously using reference images, giving users real creative choice.
  • Interleaved generation — Getting text and matching images from a single API call significantly improved coherence and reduced latency compared to generating them separately.
  • Bilingual from day one — Full Chinese and English support across the entire pipeline: prompts, UI, voice selection, and subtitle rendering with CJK font support.
  • Choice-driven branching stories — Stories end with meaningful choices that drive the next chapter, creating a replayable experience.

What we learned

  • Gemini's multimodal capabilities are powerful but require careful prompt engineering and robust parsing — especially when mixing text and image generation in a single call.
  • Video processing (FFmpeg) in a serverless/cloud environment demands careful resource management; you can't just throw default settings at it.
  • Building for two languages simultaneously forces better abstraction and makes the codebase more maintainable overall.

What's next for MyStoryBook

  • More art styles and voice options — Expanding the creative palette with additional illustration styles and character voices.
  • Collaborative storytelling — Let multiple family members contribute characters and co-create stories together.
  • Longer-form narratives — Multi-chapter story arcs with persistent world-building and character development.

Built With

Share this project:

Updates