Inspiration
Short films have always required a full production crew — scriptwriters, storyboard artists, cinematographers, voice actors, composers, and editors. We wanted to know: what if a single person with just an idea could direct an entire film? The democratisation of generative AI made this feel possible for the first time. We were inspired by how Google's Veo 3.0, Imagen 4, and Gemini could each handle a distinct slice of the filmmaking craft, and challenged ourselves to wire them into one coherent, autonomous pipeline that mirrors how a real production actually works — pre-production, production, and post.
What it does
Film Master is a conversational AI film director. You describe your idea — a genre, a mood, a one-line premise — and the agent produces a complete short film: script, shot list, character portraits, location reference art, scene images, and Veo-generated video clips with embedded dialogue audio, all assembled into a final MP4. The pipeline has two human checkpoints. After the agent generates the film concept, you approve or redirect it. After character portraits and location images are ready, you review them and can request individual regenerations. After that, the full production runs autonomously generating scene images, video clips, and final assembly with no further input needed.
How we built it
We built Film Master on Google's Agent Development Kit (ADK), using a composite agent architecture: -A single conversational LlmAgent (Gemini 2.5 Flash) acts as the root director, orchestrating the pipeline and maintaining the approval checkpoints -A ParallelAgent runs character portrait generation (Imagen 4 Ultra) and location image generation (Gemini) simultaneously during pre-production -A SequentialAgent chains the production stages — scene images → Veo 3.0 video clips → moviepy assembly — with each stage reading GCS URIs written by the previous one through shared ADK session state
Challenges we ran into
Visual consistency across generation. Characters would subtly shift appearance, and colour palettes would drift between shots. We addressed this at the shot-design stage — forcing the model to repeat each character's full canonical description verbatim in every shot, reuse identical location_description text for the same location, and apply a global visual style rule to the entire shot list before writing individual scenes.
Accomplishments that we're proud of
A complete end-to-end film — concept to assembled MP4 The Gemini image evaluator acts as an automated quality gate, retrying generation until a threshold score is reached rather than accepting the first output Shot-list coherence enforced at the design stage means the image generator receives a self-consistent, cinematographer-grade brief for every scene rather than relying on the generator to infer continuity
What we learned
- Veo 3.0 generates surprisingly good dialogue audio natively from the prompt text, making a separate Lyria music + TTS pipeline unnecessary.
- ADK session state is a clean way to pass large assets (GCS URIs, artifact names) between pipeline stages without coupling agents to each other directly
What's next for Film Master
The next step is Granular Post-Production. We want to implement an "Editor's Suite" where users can highlight a specific 2-second window of the video and ask for a targeted change (like "make the explosion bigger" or "change the lighting to moonlight") without re-rendering the entire project. We also aim to integrate Lyria 3 for natively generated, emotionally-synced cinematic scores.
Built With
- adk
- gemini-2.5
- veo-3
Log in or sign up for Devpost to join the conversation.