Inspiration

Over 55 million people worldwide live with Alzheimer's or dementia. Every 3 seconds, someone loses a memory. We watched families struggle to preserve the stories that define their loved ones, childhood adventures, first loves, proudest moments, before they fade forever. Traditional memory books cost $500–$2,000 and take weeks. We asked: what if AI could make this accessible to everyone, in minutes, for less than a dollar?

What it does

Memory Book transforms life stories and reference photos into beautiful, personalized illustrated books using AI. Users fill in memories across four life phases (childhood, teenage years, adult life, later years), upload 1–5 reference photos, and choose from four art styles: watercolor, cartoon, anime, or coloring book. A 12-agent Gemini pipeline then generates a complete illustrated book, cover, 10 content pages, and back cover, with consistent character representation across every page. The finished book can be viewed in an interactive page-flip viewer or downloaded as a professional PDF.

How we built it

We built a FastAPI backend orchestrating 12 specialized Gemini-powered agents in an async pipeline. The Visual Analyzer uses Gemini's multimodal vision to extract a "visual fingerprint" from reference photos — facial features, body characteristics, and style attributes. The Narrative Planner creates an editorial arc, while the Prompt Writer crafts detailed generation instructions embedding the fingerprint for consistency. Gemini 2.5 Flash Image generates all illustrations natively. A quality control loop with Illustrator Reviewer, Designer Reviewer, and Image Validator agents ensures every image meets standards — with iterative fixing for rejected outputs. The React + TypeScript frontend connects through Firebase (Auth, Firestore, Storage) with real-time progress tracking. Agents run in parallel where possible using asyncio.gather() to minimize generation time.

Challenges we ran into

The biggest challenge was maintaining visual consistency across 12+ generated images. A character that looks like grandma on page 1 needs to look like the same grandma on page 10, just at different ages. We solved this by building a "visual fingerprint" system: Gemini's multimodal vision analyzes reference photos to extract detailed facial and body characteristics, which are then injected into every generation prompt alongside the original photos. We also created a Character Sheet Generator that produces a reference portrait used as a visual anchor for all subsequent pages. Another challenge was orchestrating 12 agents reliably, we solved this with strict Pydantic schemas for type-safe JSON communication between agents and comprehensive retry logic throughout the pipeline.

Accomplishments that we're proud of

  • Built a fully functional production app in 7 weeks as a solo developer
  • Orchestrated 12 specialized Gemini agents into a cohesive pipeline using 3 different models (2.0-flash, 2.0-pro-exp, 2.5-flash-image)
  • Achieved consistent character likeness across all illustrations using our visual fingerprint + character sheet system
  • Made professional memory books accessible: under $1 in API costs vs $500–$2,000 for traditional services
  • Shipped with multi-language support (6 languages), interactive book viewer, PDF export, and real-time generation progress tracking
  • Gemini reviews its own work, reviewer agents catch quality issues and trigger automatic regeneration ## What we learned Gemini's capabilities are transformative when properly orchestrated. The combination of multimodal vision, structured JSON output, and native image generation enables complex multi-agent workflows that would previously require stitching together multiple external services. We learned that prompt engineering for visual consistency is as much art as science, small changes in character descriptions dramatically affect output. We also discovered that a self-reviewing AI pipeline (where agents validate each other's work) produces significantly better results than single-shot generation. Building reliable multi-agent systems requires careful error handling and retry logic, but the quality payoff is worth it. ## What's next for Memory Book
  • Voice-powered memory capture: Let users narrate their memories by voice, Gemini transcribes and structures the stories automatically, making it easier for elderly users or family members to share details without typing
  • More art styles: 3D illustration, pencil sketch, realistic painting
  • Video memories: Integrate short video clips into the book narrative
  • Collaborative editing: Multiple family members contributing memories to the same book
  • Print-on-demand: Partner with printing services to deliver physical hardcover books
  • Mobile app: Native iOS/Android for easier photo uploads and on-the-go creation
  • AI narration: Generate audio narration for each page using Gemini's text-to-speech
  • Memory prompts: AI-guided questions to help users recall more detailed memories

Built With

Share this project:

Updates