Inspiration

The idea didn't come in a boardroom; it hit us at 4 AM during a back-to-back hackathon grind.

We were exhausted. We had just spent 48 hours building a complex project, and then the dread set in: we still had to edit the submission video. It’s the "hackathon tax" —that tedious, manual process of cutting clips, syncing audio, and hunting for assets when your brain is already fried.

We were coding in Cursor at the time, marveling at how it understood our code, predicted our intent, and fixed our mistakes. One of us blurred out, "Man, I wish there was a Cursor for video editing."

That was the spark. Why are video editors still dumb canvases? Why can't I just tell the timeline what I want? We realized we didn't need another tool with better shortcuts; we needed a Digital Director. We needed CutPilot.

What it does

CutPilot transforms video editing from a manual task into an agentic workflow. It’s an intelligent workspace where you act as the Director, and a Multi-Agent System acts as your film crew.

Instead of dragging sliders for hours, you type: "Make this intro more energetic and add a cinematic transition."

The system then spins up a 4-agent loop to execute your vision:

  1. Eyes (Analyst): Watches your video using Gemini 3 Pro, understanding pacing, lighting, and mood.

  2. Brain (Planner): Formulates a complex edit plan, deciding which tools to use and which AI models (Veo, Imagen) fit the request.

  3. Hands (Executor): Manipulates the timeline code, executes cuts, and moves tracks.

  4. Verifier (QA): The game-changer. It watches the result to ensure it matches your prompt. If the edit drifts, it auto-corrects.

How we built it

The Catalyst: Google AI Studio (70%) As data scientists, AI Studio was our architectural co-pilot, allowing us to design complex systems without being software veterans. We used the Multimodal Playground to simulate our "Eyes" agent—dragging frame buffers into prompts to verify Gemini could "see" the canvas state accurately before writing a single line of code. We then iteratively refined our "Brain" agent's System Instructions to guarantee valid JSON outputs. Finally, the "Get Code" feature allowed us to export this validated logic directly, turning weeks of trial-and-error into hours of implementation.

The Execution: High-Performance Architecture (30%) We wrapped this logic in a React 19 shell using the Google GenAI SDK.

  • The Engine: A custom "Loop Runner" orchestrates the handoff between planning agents and the Tool Registry.
  • The Foundry: We integrated Veo 3.1 and Imagen for asset generation, using Gemini 3 Flash for real-time vision.
  • Performance: To handle 50+ AI edits per second, we utilized an Observable Store pattern, bypassing standard state updates for smooth, real-time performance.

Challenges we ran into

Building an AI that "sees" time was a nightmare.

  • The "Hallucinating Editor": Early versions of the Brain agent would try to trim clips that didn't exist or move audio to non-existent tracks. We had to build a strict Verifier agent to act as a "sanity check" layer, preventing the AI from breaking the timeline.

  • Context Window Fatigue: Video data is heavy. Feeding every frame to the LLM blew up our context window instantly. We had to engineer a sampling algorithm that allows the Eyes agent to "glance" at the video intelligently (survey mode) rather than watching every millisecond.

  • Async Chaos: Syncing the UI with asynchronous video generation (Veo) and synchronous timeline operations was causing race conditions. We had to implement a robust locking mechanism in our TimelineStore to ensure the "Hands" didn't move a clip while the "Foundry" was still generating it.

  • Export Bottlenecks: Exporting turned out to be one of the hardest constraints. Since CutPilot runs in the browser, we’re limited by browser memory caps and performance ceilings—especially when rendering large timelines. Many editors solve this by offloading export to cloud infrastructure or third-party rendering pipelines, but we had to engineer around these limitations while keeping the workflow fast and reliable.

Accomplishments that we're proud of

  • The Self-Healing Loop: Watching the Verifier agent catch a mistake (e.g., "The video is too short") and autonomously trigger a "Replan Request" to fix it without human intervention felt like magic.

  • Seamless Veo Integration: Successfully implementing Veo's "Morph Mode" directly into the timeline for A-to-B transitions.

  • Real-Time Latency: Optimizing the agent loop so it feels conversational, not like a batch process.

What we learned

  • Context is King: AI agents are only as good as the context you give them. Hard-coding the "Timeline Primitives" gave the model a vocabulary it could actually understand.

  • Verification is Mandatory: In agentic workflows, trust but verify. The Verifier agent turned out to be the most important member of the crew.

  • The Future is Agentic: We proved that LLMs aren't just for text generation; with the right architecture, they can reason through temporal and spatial problems like video editing.

What's next for CutPilot

  • Collaborative Editing: Multiplayer sessions where agents and humans work on the same timeline.

  • Local LLM Support: Running the "Eyes" agent on-device for privacy.

  • Advanced Color Grading: Using Gemini to apply LUTs based on mood descriptions.

CutPilot isn't just a tool; it's the end of the "edit tax."

Share this project:

Updates