Inspiration
The spark came from watching Avatar: the last Airbender movie and reminiscing how some classic scenes from the animation were changed. I kept wondering since Ai can generate realist videos, why cant 2d cartoons be turned into live action while staying true to the script.
What it does
I am building a mini creative production, giving anyone the power to go from "what if..." to a complete short-film package (script, storyboard, voiceovers, edits) in minutes, not months. Gemini 3's release felt like the perfect moment: its massive context, multimodal reasoning, planning, and self-correction capabilities could finally make an end-to-end autonomous creative agent possible.
How we built it
I Started with Google AI Studio for rapid prototyping, then moving to a full repo using Python/GO + the Gemini SDK, LangGraph for the agent loop, Streamlit for a clean web UI, and integrated free TTS/image APIs for voiceovers and visuals and Veo for video generation. Core logic: a planner prompt → execution agents → evaluator → iterate or user feedback.
Challenges we ran into
Biggest hurdles were prompt engineering for reliable creative consistency and managing API rate limits during long workflows. Debugging self-correction loops was tricky but taught me how powerful iterative reasoning can be.
Accomplishments that we're proud of
We have a functioning prototype capable of generating scripts, storyboard, voiceovers and executes steps with self-correction and user feedback loops.
What we learned
I learned Gemini 3 excels at complex, multi-step orchestration when given clear tools and evaluation criteria, unlocking workflows that feel almost human.
What's next for Enno Creative Orchestrator
Proceeding further with development and testing to ship a production ready MVP in the next couple of months
Built With
- firebase
- gcloud
- gemini
- go
- nextjs
- python
- typescript
Log in or sign up for Devpost to join the conversation.