Inspiration

The idea started when we want to create a video for our product, something good enough for ads and demos, but quickly realized how expensive and time-consuming the process was. Hiring editors, motion designers was far beyond our budget. Existing tools either required heavy manual work or produced results that didn’t feel intentional.

Instead of giving up on the idea, we asked a simple question:

What if creating a high-quality video could work the same way a good team works, by breaking ideas into parts, refining them, reviewing progress, and improving iteratively?

That question led us to build Agent Studio: a system where multiple AI agents collaborate to transform a raw idea into a structured, high-quality video, scene by scene, with visible progress and continuous feedback.

What it does

Agent Studio is a multi-agent video generation system that takes a user’s idea (for example, an ad concept or script outline) and turns it into a complete video through a multi-agent workflow.

It follows this process: Breaks a high-level idea or script into structured scenes. Assigns each scene to specialized agents that generate and refine visuals. Evaluates the output using a separate evaluation agent that scores quality and intent alignment. Continuously evaluates scene quality using an evaluation agent. Loops through refinement until scenes meet expected visual and narrative quality. Lets users see progress in real time, give feedback, and approve changes. Exports the final video once all scenes are approved and ordered correctly.

How we built it

We designed Agent Studio as a multi-agent orchestration system, not a single monolithic model.

Core Architecture Director Agent Acts as the orchestrator Receives the script + expectations Assigns tasks to scene agents Tracks progress and refinement status per scene

Scene Generation Agents Responsible for generating visuals for individual scenes Iterate on prompts, composition, and visual coherence Communicate results back to the Director

Evaluation Agent Reviews generated scenes against expectations Detects visual issues, inconsistencies, or missing elements Sends structured feedback for refinement

Queue & Orchestration Layer Handles task scheduling, retries, and agent coordination Enables parallel scene processing and iterative feedback loops

Video Generation API Uses Nano Banana as the image/video generation backend Abstracted behind a clean API so models can be swapped later

Workflow Logic The system runs in a loop: Input script is parsed into scenes Scenes are generated independently Evaluation agent reviews outputs Feedback is routed back to the responsible agent Scene is refined until it meets quality thresholds Approved scenes are ordered and exported

This loop continues until all scenes pass evaluation, ensuring consistent quality across the final video.

Challenges we ran into

State management across agents Keeping track of scene versions, feedback cycles, and completion states required careful orchestration.

Avoiding brittle one-shot generation Most video tools generate once and stop. Designing a reliable refinement loop took multiple iterations.

Accomplishments that we're proud of

Built a working multi-agent video pipeline, not just a demo Implemented iterative refinement, not one-shot generation Designed a system where progress is visible and controllable Created a flexible architecture that can scale to longer or more complex videos Abstracted generation APIs to allow future model upgrades

What we learned

Multi-agent systems work best when responsibilities are clearly separated Feedback loops are more valuable than raw generation power Orchestration and evaluation matter just as much as generation Creative tools feel more trustworthy when users can see and influence progress

What's next for Agent Studio

Add audio and voice-over agents Support longer-form content (tutorials, explainers) Improve evaluation agents with stronger visual consistency check

Built With

Share this project:

Updates