Inspiration

As parents, we noticed something: our children loved stories, but they quickly lost interest in generic ones. They wanted to be the hero — not just hear about one. We tried personalized storybook services, but they were expensive, took weeks to deliver, and once printed, the story never changed.

When we saw Gemini's interleaved text + image output capability, it clicked — what if a child could upload their photo and instantly be inside a fully illustrated adventure where they choose what happens next? Not a template with their name swapped in, but a truly unique story with AI-generated illustrations featuring their likeness, narrated aloud so even pre-readers can enjoy it.

What it does

Nimora is an interactive AI storybook where every child is the hero. Here's the experience:

  1. Upload a photo: The child's photo is analyzed (never saved) to create a consistent character description
  2. Choose your adventure: Pick from 8 themes (Space Mission, Pirate Island, Magic Kingdom...) and 4 art styles (Watercolor, 3D, Claymation...)
  3. Read & listen: Each page features a personalized illustration with the child as the hero, plus AI voice narration
  4. Choose what happens next: After every scene, pick from 2-3 choices that shape the story in a different direction
  5. Save your story: Download the completed adventure as a beautifully formatted PDF storybook to keep or print

Every playthrough is unique. Same child, same theme — completely different story based on their choices.

How we built it

Core architecture — three Gemini models working together:

  • gemini-2.5-flash-image — The heart of Nimora. Uses interleaved TEXT + IMAGE response modalities to generate both story narration and illustration in a single API call. The child's photo and character description are passed as context so the model draws the child into every scene.
  • gemini-2.5-flash — Generates story choices, character descriptions, and creative story titles.
  • gemini-2.5-flash-preview-tts — Converts each story page into child-friendly audio narration in the background.

Backend (Python/FastAPI):

  • Story Orchestrator manages the generation pipeline — character analysis → scene generation → choices → TTS, all coordinated per session
  • Built with Google GenAI SDK and ADK for agent-style tool orchestration
  • Retry logic handles intermittent model failures gracefully

Frontend (Flutter Web):

  • Responsive design — book-style layout on desktop, stacked cards on mobile
  • Step-by-step onboarding with image carousels for theme and style selection
  • Real-time audio playback and PDF export with custom branded design

Deployment:

  • Dockerized backend and frontend on Google Cloud Run
  • Google Cloud Storage for serving generated illustrations and audio

Challenges we ran into

Interleaved output reliability — Gemini's interleaved text + image generation doesn't always return an image on the first attempt. We built a retry mechanism that detects when the response has text but no image and automatically retries up to 3 times, ensuring every story page gets its illustration.

Character consistency across pages — Getting the AI to draw the same child consistently across 5-6 story pages was tricky. Our solution: on the first page, we generate a detailed character description from the photo, then include it in every subsequent prompt so the model has a consistent reference.

PDF size optimization — Generated illustrations are high-quality PNGs (~2.4MB each). A 6-page story PDF was hitting 16MB. We added PNG-to-JPEG compression at quality 85 — visually identical but 5-10x smaller file sizes.

Flutter Web single-threaded limitation — PDF generation blocks the UI thread in Flutter Web. We solved this by splitting the work into phases (fetch images → build pages) with frame yields between each page so the loading spinner stays responsive.

Accomplishments that we're proud of

  • Single-call story + illustration — Most AI story apps make separate calls for text and images, then stitch them together. Nimora generates both in one Gemini call, making the text and illustration naturally coherent.
  • True interactivity — This isn't a "fill in the name" template. Every choice creates a genuinely different story branch, and the AI adapts illustrations to match.
  • The PDF storybook — Children can hold their adventure in their hands. The branded PDF with custom fonts, rounded image corners, and "The End" page feels like a real published book.
  • Privacy-first design — The child's photo is never saved to disk or cloud. It exists only in memory during the generation session and is discarded after.

What we learned

  • Interleaved generation is powerful but needs guardrails — The ability to get text + image in one call is a game-changer for coherent storytelling, but you need retry logic and careful prompt engineering to get reliable results.
  • Prompt design matters more than architecture — The quality jump from our first prompts to our final ones was dramatic. Including character descriptions, style references, and explicit constraints about child-appropriate content made all the difference.
  • Gemini TTS quality surprised us — We expected robotic narration but got expressive, child-friendly voices that genuinely enhance the experience.
  • Flutter Web has trade-offs — Cross-platform from a single codebase is great, but single-threaded execution means you have to be creative about keeping the UI responsive during heavy computation.

What's next for Nimora

  • Arabic language support — Full RTL story generation and narration in Arabic, bringing Nimora to the MENA region
  • Multi-child stories — Siblings and friends as co-heroes in the same adventure
  • Story library — Save past adventures to a personal bookshelf and re-read them anytime
  • More themes and styles — Seasonal adventures, educational themes, and new art styles based on community feedback
  • Collaborative storytelling — Real-time multiplayer where two children make choices together from different devices.

Built With

Share this project:

Updates