Inspiration
When I discussed the Gemini Live Agent Challenge with my kids, they gave me the idea to create something that will help them with their story telling and creativity. Currently they doodle and create comic books on paper and wished they can do it digitally with AI.
What it does
StoryForge Pro is a Creative Storyteller AI agent that transforms any script into rich, multimodal visual content. It breaks the "text box" paradigm by producing interleaved text and imagery — comics, manga, storyboards, and full cinematic trailers — all from a single script input.
🎯 Key Features
4 Output Modes: Comic, Manga, Storyboard, Trailer
12+ Visual Styles: American, Seinen, Shonen, Retro, Indie, European, and more
5 Director's Eye Styles (Trailer mode): Nolan, Cameron, Ritchie, Mani Ratnam, Nelson
AI Script Analysis: Paste any script → Gemini parses it into 4-6 visual scenes with characters, locations, mood, and dialogue
Interleaved Generation: Gemini generates comic/manga pages with panels, speech bubbles, and text — all in a single creative pass
Video Trailers: Veo 3.1 generates scene clips → auto-assembled into a full trailer with ffmpeg
Real-time Progress: WebSocket-powered live generation progress with scene-by-scene updates
Story Chat: Conversational AI editor to develop and refine your story
Download & Export: Download trailers as .mp4, print comics/manga/storyboards as PDF
How we built it
StoryForge Pro was built with a React/TypeScript frontend and FastAPI backend, deployed on Google Cloud Run with Firestore and Cloud Storage. It leverages Gemini 2.5 Pro for script analysis and scene breakdown, Gemini Flash for visual generation across comics, manga, and storyboards, and Veo 3.1 for cinematic trailer creation. Users paste a script, pick a style, and the AI handles the rest.
Challenges we ran into
The biggest challenge was orchestrating multiple generative models in a single pipeline — coordinating Gemini's script analysis with Flash's visual output while maintaining narrative coherence across panels. Also hit Gemini rate limits during testing and setup and had to retry multiple times as the resources were always busy even after using Global setting in GCP. Finally, balancing generation quality vs. latency required careful prompt engineering and async processing to keep the user experience snappy.
Accomplishments that we're proud of
Built a fully functional script-to-screen pipeline that transforms raw text into comics, manga, storyboards, and cinematic trailers — all from a single input. We're proud of the Director's Eye feature, letting users generate trailers in the signature style of filmmakers like Nolan, Mani Ratnam, and Guy Ritchie. Most importantly, the entire app went from concept to live deployment in under two weeks, with 12+ visual styles and a polished page viewer experience.
What we learned
Chaining multiple Gemini models (Pro → Flash → Veo) requires careful state management — each model's output shapes the next, and small prompt drifts compound fast. Pivoting from Imagen to Gemini Flash for image generation taught to stay flexible and evaluate tradeoffs between specialized vs. unified model families. Prompt engineering proved to be a design discipline, not just a dev task. I also learnt how to effectively use different models for different needs whilst managing the budget. One mistake burned $230 in credit within 10 minutes which was identified and fixed quickly.
What's next for StoryForgePro
Planning to add real-time collaborative editing, letting multiple users co-create and iterate on stories together. Next up is fine tuning the voice-driven script input using Gemini Live, so users can narrate their story and watch it come to life in real time. We also want to expand into animation exports — turning static panels into short animated sequences with sound design.
Built With
- cloudflare
- docker
- fastapi
- ffmpeg
- framer-motion
- gcp
- gemini-2.5-flash-(text)
- gemini-2.5-flash-image-(interleaved)
- google-genai-sdk
- mongodb
- python-3.12
- react
- typescript
- uvicorn
- veo
- vertex-ai
- vite
Log in or sign up for Devpost to join the conversation.