FMV Studio

Inspiration

FMV Studio was inspired by how fragmented the current AI video workflow still is. Even with powerful models, making a polished AI music video usually means jumping between separate tools for writing, storyboarding, image generation, video generation, music, and editing. That slows creators down and makes continuity, timing, and iteration much harder than they should be. I wanted to build a system that feels less like operating a pipeline of disconnected tools and more like directing one connected production environment.

What it does

FMV Studio is an agentic, stage-based music video studio. A project starts with screenplay, soundtrack, lore, and uploaded assets, then moves through Planning, Storyboarding, Filming, and Production while preserving context across the workflow and ensuring character, asset, and scene consistency. It can generate shot plans, storyboard frames, video clips, and a final edited sequence. It also includes a realtime Live Director that lets the user guide the system through voice or text, plus an asset system that turns uploaded images, documents, audio, and video into meaningful project context rather than passive attachments.

How we built it

I have used Google Antigravity and OpenAI CODEX to assist with coding. The frontend is built with Next.js and React, and the backend is a FastAPI orchestration service. On the model side, Gemini 3.1 Pro acts as the main orchestrator for planning, asset routing, prompt rewriting, and Live Director intent handling. Gemini 3 Flash powers the evaluator layer for fast critique and review. Gemini Live API powers the realtime Live Director voice experience. Lyria 2 is used for music generation, NanoBanana 2 powers image generation, and Veo 3.1 on Vertex AI is used for video generation. On the cloud side, the system is deployed on Cloud Run, with long-running jobs handled through Cloud Tasks and project media stored in Google Cloud Storage. I also automated deployment with Terraform and deployment scripts so the project is reproducible and cloud-native.

Challenges we ran into

The hardest problem was making the workflow feel coherent and dependable across many stages. Continuity needed to extend beyond characters to props, locations, documents, and other references. Critics needed to be reliable enough not to hallucinate issues that were not actually present. Stage rewind behavior also became surprisingly complex, because users need to move backward, revise work, and regenerate outputs without losing state or accidentally jumping forward again. Realtime voice control introduced its own challenges as well, since the Live Director needed to do more than talk back; it needed to make safe, meaningful project changes.

Accomplishments that we're proud of

The thing I’m proudest of is that FMV Studio gives the user a way to direct the whole production and be as hands-on or hands-off as they choose. The Live Director, stage-based workflow, asset-aware context system, and production timeline all work together as one creative environment. I’m also proud that the app runs as a real cloud-deployed system on Google Cloud instead of just a local prototype, with automated deployment and a public repo that shows exactly how it was built.

What we learned

The biggest lesson was that strong multimodal products are not only about model quality. They are also about workflow architecture, state management, trust, and user control. A powerful model can generate a frame or clip, but the harder problem is making all of those capabilities work together in a way that feels coherent and dependable. I also learned that critics need grounding and consensus, and that durable async state becomes essential as soon as media generation moves from local experimentation to a real cloud deployment.

What's next for FMV Studio

The next step is to keep improving reliability, creative control, and provider flexibility, including support for more models. That includes making the Live Director even more capable, expanding asset-aware continuity, improving production editing, and continuing to optimize the cloud workflow for speed and robustness. Longer term, the goal is for FMV Studio to become a general directing environment for AI-native video production that goes beyond just music videos and is extended to general purpose videos and film scenes. The pipeline itself is already robust and can be easily adapted for other use-cases.