Buzz.cut

Initial skyline video snippet
Grayscale and blurred video of the skyline
Improved brightness video

Inspiration

Ever dreamt of the perfect video edit , or come across YouTube transitions that made you wish that you could edit like that? Dream no more. Enter BuzzCut - trained on thousands of videos, audios, and transcripts, it has the ability to understand, analyze, and edit your video.

Video editors are complicated tools to learn, providing a high barrier of entry for technologically novice individuals. Besides, the sheer complexity also results in them being a time taking ordeal even for experienced professionals. With Buzzcut, we can help save both - time and effort!

What it does

Buzzcut offers you absolute control of your video - whether you want to do things by hand or have our models take care of it. BuzzCut analyzes and understands clips, audio, and colors to then recommend and even edit aspects such as saturation, color, and brightness to the best recommendation.

Given the ability to cut, trim, rearrange, and transition, you can perform any logical operation on it like "remove all occurrences of the guy with the red tie" or "rearrange so the New York skyline videos are followed by Seattle and include transition effects".

How we built it

We create basic editing functions such as cut, move, trim etc. that can be used by an AI agent. User’s videos are split into image chunks to get semantic understanding using OpenCV and OpenAI 4o, and the audio is transcribed using Whispr.

Whatever request the chatbot receives, we form an agent workforce that breaks down the logical operation into simple commands and processes the video. This is achieved by implementing Gemini’s reasoning model to create a plan, function calling to run commands and triggering FFMpeg on the frontend to reduce latency.

We also offer a manual video editor built in Next.js to edit the videos manually.

Challenges we ran into

Simple operations like cut, trim or move were not tough, but breaking down complex logical operations into atomic operations and executing them relative to the timestamps was the biggest challenge.

Easiest way to run operations for any application is to do them in the backend, but for a video editor, latency is primal so we had to figure out a way to do it in the frontend.

Building a fully functioning manual video editor in Next.js and Python was also complicated.

Accomplishments that we're proud of

We developed an Agent Workforce system where one agent acts as the Planner, mapping out complex editing tasks, while another serves as the Executor, performing the planned operations. Coordinating between the Planner and Executor to maintain efficiency and fluidity in complex edits like scene rearrangement or object removal required advanced logic and synchronization. In order to maintain latency and process everything in real-time synced with the manual video editor, we used function calling and then Web Assembly frameworks in the frontend.

What we learned

Reasoning models are very strong in planning scenarios. We had initially created our planner using OpenAI o4, but its reasoning capabilities could not take into account the consequences of actions it would take in the future. Gemini Flash Thinking was better at taking and giving instructions!