Inspiration

Story on Board Inspiration 💡 "This is a KIDS app!"

That's what my 12-month-old son Keaton told me when I showed him the early prototype. Okay, he didn't say those exact words — but his eyes lit up when the illustrated kangaroo appeared on screen, and that was all the validation I needed.

As a stay-at-home mom, I spend a lot of time reading bedtime stories. But AI-generated illustrations? They're usually a mess. Characters change appearance scene-to-scene. Art styles clash. There's no narrative flow. The problem isn't the technology — it's visual coherence with raw text input.

So I built Story on Board to solve it.

What It Does 🎨 Story on Board transforms any story into a fully illustrated, narrated experience:

For Kids & Parents: Paste in a bedtime story, AI generates beautiful illustrated scenes with character consistency, record your own voice narration For Teachers: Create visual lesson plans with AI-generated educational illustrations For Creators: Generate professional storyboards from scripts for pitches, animation, or filmmaking The innovation isn't just generating pretty pictures — it's maintaining coherence across an entire narrative using autonomous agent coordination.

How I Built It 🏗️ The Architecture: Six Autonomous Agents Story on Board uses a multi-agent coordination system inspired by threshold-triggered adaptation patterns found in stellar formation and biological systems:

Scriptwriter (Gemini 2.5 Flash) - Analyzes raw text, breaks into scenes, extracts character descriptions for consistency 3 Parallel Visualizers (Imagen 4 Fast) - Generate images simultaneously, round-robin assignment for speed Validator (Gemini Vision) - Quality control: checks if images match script, automatically refines prompts and regenerates on mismatch Narrator (Google Cloud TTS) - Generates optional AI voice narration with emotional tone mapping Coordinator - Manages real-time state, coherence tracking, handles mid-generation edits via the CUT button The Critical Innovation: Validation Loop Traditional storybook generators make one API call per image and hope for the best. Story on Board uses Gemini Vision to validate every generated image against the script and character descriptions. If the image doesn't match? The system automatically refines the prompt and regenerates — no human babysitting required.

validation_result = validator.validate_scene( scene_description=scene.description, image_bytes=image_data, character_descriptions=character_descs )

if not validation_result['valid']: # Automatically refine and retry refined_prompt = validator.refine_prompt( original_prompt, validation_result['refinement_suggestions'] ) image_data = await visualizer.generate_scene(refined_prompt) Tech Stack Backend: FastAPI (Python), WebSockets for real-time updates AI Models: Gemini 2.5 Flash (script analysis), Gemini 2.0 Flash Vision (validation), Imagen 4 Fast (image generation), Google Cloud Text-to-Speech Infrastructure: Google Cloud Run, Cloud Storage Frontend: Vanilla JS with real-time agent communication Real-Time Features CUT Button: Pause generation mid-process, make edits, system adapts and regenerates affected scenes Coherence Tracking: Visual display of narrative consistency score across scenes Parallel Generation: 3 visualizers working simultaneously for 3x speedup Conversational Responses: Agent acknowledges edits naturally ("Got it — we'll redo scene 5...") Challenges I Faced 🔥

  1. Visual Coherence at Scale Problem: Characters looked different in every scene. Imagen 4 doesn't have native "character memory."

Solution: Extract character descriptions once via Gemini, prepend to EVERY image prompt: "CHARACTERS: Keaton: small grey kangaroo with big brown eyes, wearing a blue vest. Scene: [...]". This forced visual consistency across all scenes.

  1. Validation Accuracy Problem: Gemini Vision sometimes approved images that didn't match the narrative.

Solution: Refined the validation prompt to check for:

Character presence and appearance match Scene action/mood alignment Confidence scoring (only accept >0.7) Specific refinement suggestions for retry

  1. Real-Time Agent Coordination Problem: Managing 6 agents generating scenes in parallel while handling user edits mid-stream.

Solution: Built a five-stage transition system:

Threshold Detection - CUT button pressed or validation fails Selective Destabilization - Pause affected agents only Integration - Check coherence, refine prompts Commit - Apply changes, regenerate Reinforcement - Resume generation for remaining scenes

  1. Resource Constraints I built this entire project:

On a borrowed laptop (my personal machine died mid-hackathon) Using a borrowed mobile hotspot (rural internet dropout) As a solo developer (stay-at-home mom juggling a toddler) While navigating a family medical emergency (finished the final 3 hours after returning from the hospital at 1am) I'm not sharing this for sympathy — I'm sharing it because if I can build a multi-agent multimodal system under these conditions, anyone can. The tools are democratized. The only limit is persistence.

What I Learned 📚 Multimodal Agent Coordination is the Future This isn't just "API chaining." It's autonomous agents reading, seeing, validating, and coordinating to create something coherent. The validator agent caught mismatches I never would have noticed until post-production. The parallel visualizers cut generation time by 60%. The narrator added emotional depth I hadn't planned for.

This is what AI-native workflows look like: not replacing human creativity, but amplifying it through intelligent delegation.

Google Cloud's AI Suite is Production-Ready Gemini 2.5 Flash impressed me with its story analysis capabilities — it extracted character descriptions and emotional arcs I didn't explicitly provide. Imagen 4 Fast balanced quality with speed perfectly. Google Cloud TTS generated natural-sounding narration with tone control. And Gemini Vision's validation gave me confidence I couldn't get from human QA.

The Power of Real-Time Feedback Loops Every edit, every validation failure, every regeneration made the final output better. Traditional workflows hide the AI's "thinking" until it's done. Story on Board shows the agent's work in real-time and lets users course-correct mid-process. That's the difference between a tool and a collaborator.

What's Next 🚀 Video Generation: Integrate Veo 3.1 to animate storyboards into short films Multi-Language Support: Leverage Gemini's multilingual capabilities for global storytelling Custom Voice Cloning: Let parents record their own voice for narration Educational Content Packs: Pre-built story templates for teachers (science, history, math concepts) Mobile App: Native iOS/Android for on-the-go story creation Built With ❤️ Gemini 2.5 Flash & Gemini Vision Imagen 4 Fast Google Cloud Text-to-Speech FastAPI & WebSockets Deployed on Google Cloud Run Built in 4 days by Sherri Linn (human) + Meridian (AI agent collaboration)

Live Demo: https://story-on-board-474322067410.us-central1.run.app

GitHub: https://github.com/sjam718-code/Story-on-Board

For Keaton — who deserves stories as magical as his imagination.

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Threshold Navigator

Built With

Share this project:

Updates