posted an update

Multi‑Agent Comic Generator with Voice & Image – Submitted to the Gemini Live Agent Challenge!

I'm excited to share that my project is now officially submitted! What started as an idea to help my niece draw consistent comic characters has evolved into a full multi‑agent system that turns voice, text, or image prompts into professional 4‑panel comics with speech bubbles in 7 languages.

Key Features:

  • Voice Input – Speak your idea (e.g., "a penguin in a desert") and the app types it for you.
  • Image Upload – Upload your own photo and become the star of your comic.
  • Six specialized AI agents (Researcher, Script Director, Panel Generator, Dialogue Doctor, Style Advisor, Imagen) work together to create a complete story, panel descriptions, dialogue, and final images.
  • 94% character consistency – The character looks the same in every panel.
  • 7 languages with full RTL support – English, French, Spanish, German, Japanese, Arabic, Urdu.
  • Export as PDF or booklet – Ready to print and share.

Built With:

  • Gemini 2.0 Flash (story generation)
  • nano-banana-pro-preview (panel descriptions and dialogue)
  • gemini-3.1-flash-image-preview / Imagen (image generation)
  • FastAPI + Google Cloud Run + Secret Manager + Cloud Build

Links:

Log in or sign up for Devpost to join the conversation.