Inspiration
The gap between a “cool idea” and a professional cinematic pitch is usually filled by weeks of manual labor; concept writing, storyboarding, character design, voice narration, animation, and editing.
I wanted to build an AI Creative Director that could bridge that gap instantly.
The inspiration came from the idea of interleaved multimodal output; an AI system that doesn’t just respond with text, but constructs a full creative world in real time. Instead of describing a character, it generates the character. Instead of explaining a story, it presents a cinematic experience.
Aura Genesis explores what happens when multiple specialized AI models collaborate like a production team to transform a simple prompt into a complete cinematic pitch.
What it does
Aura Genesis is an AI-powered multimedia agent that transforms a single prompt into a cohesive cinematic presentation.
It orchestrates multiple models to simulate a mini film production pipeline.
The Brain
Generates a structured creative brief including character identity, tone, world setting, and narrative context.
The Vision
Creates a high-quality character portrait and a cinematic movie poster with embedded typography while maintaining visual consistency between assets.
The Motion
Animates the character portrait into a high-quality cinematic video clip.
The Voice
Narrates the story using the deep and authoritative Fenrir voice profile to create a dramatic storytelling experience.
The Simulation
Combines all generated assets into a full-screen interactive cinematic premiere where:
- Video plays
- Credits scroll dynamically
- Audio narration plays
- Visuals remain synchronized
The result feels less like an AI output and more like watching the opening sequence of a film pitch.
How we built it
Aura Genesis is powered by a Multimodal Model Symphony orchestrated through the Google GenAI SDK.
Frontend
- Next.js (App Router) for server/client orchestration
- Tailwind CSS for styling
- Framer Motion for cinematic animations and scrolling credit effects
AI Orchestration
- Gemini 3.0 Flash Preview
- Acts as the Creative Director agent
- Converts a user prompt into a structured JSON creative brief
- Coordinates instructions for downstream models
Visual Generation
Imagen 3
- Generates the initial 3:4 character portrait
Imagen 3 Pro
- Generates a 16:9 cinematic poster
- Uses the portrait as a reference image
- Embeds stylized film typography while maintaining character identity
Cinematic Generation
- Veo 3.1 Fast
- Converts the portrait prompt into a dynamic video clip
- Adds motion and cinematic framing
Audio Narration
- Lyria 3
- Produces cinematic narration
- Uses the Fenrir voice profile for dramatic tone
Infrastructure
- Google Cloud Run for scalable serverless deployment
- GitHub CI/CD for automated builds
- Google Cloud Secret Manager for secure API key storage
- Containerized deployment with Docker
Challenges we ran into
Maintaining Character Consistency Across Models
One of the biggest technical challenges was ensuring that the character generated in the portrait remained visually consistent in the poster and video assets.
Different models interpret prompts slightly differently, which can result in:
- facial structure drift
- costume changes
- inconsistent color palettes
To solve this, we implemented image-to-image referencing with Imagen 3 Pro, where the initial portrait becomes the canonical visual reference for the poster generation step.
This significantly improved visual continuity across assets.
Synchronizing Multimodal Outputs
The Simulate mode required precise synchronization between:
- dynamically generated narration audio
- scrolling credits animation
- video playback timing
Since Lyria generates audio of variable duration, we implemented custom React logic that:
- Detects the final audio duration
- Dynamically calculates scroll speed for the credit crawl
- Synchronizes animation timing to match the narration
This ensures the cinematic sequence behaves like a coordinated film intro rather than independent components playing separately.
Managing Multi-Model Orchestration
Coordinating multiple generative models required careful orchestration:
- Gemini generates structured instructions
- Outputs are passed between models
- Results are stitched together into a final experience
Handling asynchronous generation while maintaining fast perceived responsiveness required thoughtful state management in the Next.js application.
Accomplishments that we're proud of
The “Simulate” Cinematic Mode
The feature we are most proud of is Simulate Mode.
When activated, the application transitions from a traditional interface into a full-screen cinematic presentation where:
- the generated video plays
- narration begins
- movie credits scroll
- visuals and audio align perfectly
It transforms the AI output into a storytelling experience rather than a dataset of generated media.
True Multimodal Orchestration
Aura Genesis successfully demonstrates how multiple specialized models can collaborate like a creative production team.
Instead of relying on a single model, the system assigns distinct roles:
- Creative Director (Gemini)
- Concept Artist (Imagen)
- Cinematographer (Veo)
- Narrator (Lyria)
This architecture produces higher fidelity creative outputs than a single model could achieve alone.
Production-Ready Deployment
The application is fully containerized and deployed using Google Cloud Run, allowing it to scale automatically for users without requiring dedicated servers.
This makes Aura Genesis not just a prototype but a deployable creative AI platform.
What we learned
Orchestration Beats Monolithic AI
One of the most important insights from this project is that the future of AI applications lies in orchestration rather than single-model dominance.
By assigning specific creative roles to specialized models, we can achieve:
- higher output quality
- better control over creative results
- modular architecture for improvements
Multimodal UX Matters
Most AI tools focus on text responses, but creative workflows benefit enormously from visual and experiential output.
Designing an experience where:
- visuals
- sound
- animation
- narrative
all work together dramatically increases the perceived intelligence and creativity of the system.
Serverless Infrastructure Simplifies AI Products
Using Google Cloud Run significantly simplified deployment.
It allowed us to:
- containerize the application
- scale automatically with usage
- integrate secure secrets via Secret Manager
- deploy quickly from GitHub
This infrastructure makes it feasible to run media-heavy AI pipelines without complex DevOps overhead.
What's next for Aura Genesis
The next phase of Aura Genesis will focus on interactive creative collaboration.
Real-Time Creative Direction
Integrate the Gemini Live API so creators can talk directly to the AI Creative Director to:
- adjust character appearance
- modify story tone
- regenerate scenes in real time
Persistent Cinematic Universes
Using Vertex AI and cloud storage, we plan to enable creators to:
- store generated characters
- expand stories into universes
- build interconnected cinematic worlds over time
Expanded Media Pipeline
Future iterations could include:
- AI-generated soundtrack scoring
- scene generation and storyboarding
- multi-character casting
- full trailer generation
The long-term vision is to evolve Aura Genesis into an AI-powered cinematic studio.
Built With
- css
- google-cloud
- google-genai
- nextjs
- react
- tailwind
Log in or sign up for Devpost to join the conversation.