Memory Book

Home - Hero
Home - Why it matters
Home - How it Works
Home - Example
Home - Call to Action
Login
Dashboard
Menu
Languages
Wizard - First form
Wizard - Select pages
Wizard - Select Caracteristics
Wizard - Select Style
Generate Memory Book
Generating Memory Book
Agent Pipeline Diagram
System Architecture Diagram
Core Data Models

Inspiration

Over 55 million people worldwide live with Alzheimer's or dementia. Every 3 seconds, someone loses a memory. We watched families struggle to preserve the stories that define their loved ones, childhood adventures, first loves, proudest moments, before they fade forever. Traditional memory books cost $500–$2,000 and take weeks. We asked: what if AI could make this accessible to everyone, in minutes, for less than a dollar?

What it does

Memory Book transforms life stories and reference photos into beautiful, personalized illustrated books using AI. Users fill in memories across four life phases (childhood, teenage years, adult life, later years), upload 1–5 reference photos, and choose from four art styles: watercolor, cartoon, anime, or coloring book. A 12-agent Gemini pipeline then generates a complete illustrated book, cover, 10 content pages, and back cover, with consistent character representation across every page. The finished book can be viewed in an interactive page-flip viewer or downloaded as a professional PDF.

How we built it

We built a FastAPI backend orchestrating 12 specialized Gemini-powered agents in an async pipeline. The Visual Analyzer uses Gemini's multimodal vision to extract a "visual fingerprint" from reference photos — facial features, body characteristics, and style attributes. The Narrative Planner creates an editorial arc, while the Prompt Writer crafts detailed generation instructions embedding the fingerprint for consistency. Gemini 2.5 Flash Image generates all illustrations natively. A quality control loop with Illustrator Reviewer, Designer Reviewer, and Image Validator agents ensures every image meets standards — with iterative fixing for rejected outputs. The React + TypeScript frontend connects through Firebase (Auth, Firestore, Storage) with real-time progress tracking. Agents run in parallel where possible using asyncio.gather() to minimize generation time.

Challenges we ran into

The biggest challenge was maintaining visual consistency across 12+ generated images. A character that looks like grandma on page 1 needs to look like the same grandma on page 10, just at different ages. We solved this by building a "visual fingerprint" system: Gemini's multimodal vision analyzes reference photos to extract detailed facial and body characteristics, which are then injected into every generation prompt alongside the original photos. We also created a Character Sheet Generator that produces a reference portrait used as a visual anchor for all subsequent pages. Another challenge was orchestrating 12 agents reliably, we solved this with strict Pydantic schemas for type-safe JSON communication between agents and comprehensive retry logic throughout the pipeline.

Accomplishments that we're proud of

Built a fully functional production app in 7 weeks as a solo developer
Orchestrated 12 specialized Gemini agents into a cohesive pipeline using 3 different models (2.0-flash, 2.0-pro-exp, 2.5-flash-image)
Achieved consistent character likeness across all illustrations using our visual fingerprint + character sheet system
Made professional memory books accessible: under $1 in API costs vs $500–$2,000 for traditional services
Shipped with multi-language support (6 languages), interactive book viewer, PDF export, and real-time generation progress tracking
Gemini reviews its own work, reviewer agents catch quality issues and trigger automatic regeneration ## What we learned Gemini's capabilities are transformative when properly orchestrated. The combination of multimodal vision, structured JSON output, and native image generation enables complex multi-agent workflows that would previously require stitching together multiple external services. We learned that prompt engineering for visual consistency is as much art as science, small changes in character descriptions dramatically affect output. We also discovered that a self-reviewing AI pipeline (where agents validate each other's work) produces significantly better results than single-shot generation. Building reliable multi-agent systems requires careful error handling and retry logic, but the quality payoff is worth it. ## What's next for Memory Book
Voice-powered memory capture: Let users narrate their memories by voice, Gemini transcribes and structures the stories automatically, making it easier for elderly users or family members to share details without typing
More art styles: 3D illustration, pencil sketch, realistic painting
Video memories: Integrate short video clips into the book narrative
Collaborative editing: Multiple family members contributing memories to the same book
Print-on-demand: Partner with printing services to deliver physical hardcover books
Mobile app: Native iOS/Android for easier photo uploads and on-the-go creation
AI narration: Generate audio narration for each page using Gemini's text-to-speech
Memory prompts: AI-guided questions to help users recall more detailed memories

Built With

agent
css
cursor
fastapi
firebase
gemini
google
pillow
python
react
tailwind
typescript

Submitted to

Gemini 3 Hackathon

Created by

As the sole developer, I designed and built every aspect of Memory Book from scratch during this hackathon:

Frontend: Built a modern React 19 / TypeScript app with a 3-step book creation wizard, interactive page-flip book viewer, responsive landing page with animated sections, and a full dashboard. Implemented multi-language support (EN, PT, ES, FR, DE) and client-side PDF generation with custom fonts and layouts.
Backend & AI Pipeline: Architected a multi-agent AI pipeline with 11 specialized agents (normalizer, narrative planner, visual analyzer, prompt writer, prompt reviewer, image generator, quality validator, designer reviewer, iterative fix, cover creator, back cover creator). Each agent has a focused role, and they work in coordinated sequence with parallel execution where possible. The pipeline uses 3 different Gemini models (2.0 Flash for analysis, 2.0 Pro for creative writing, 2.5 Flash for native image generation) and includes a quality control loop with up to 3 retry attempts per image.

Infrastructure: Configured Firebase Authentication (Google OAuth + email/password), Firestore for data persistence, Firebase Storage for images, and Firebase Hosting for deployment. Implemented real-time progress tracking from backend to frontend.

Design & UX: Designed the entire UI/UX focused on accessibility — a guided flow that anyone can use regardless of technical skill, with visual character description tools (skin color, hair style, accessories) as an alternative to photo uploads.

The project was completed in approximately 7 days.

Marivaldo Torres Junior

Updates

Marivaldo Torres Junior started this project — Feb 08, 2026 06:42 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.