Inspiration

Asian variety shows have a unique ability to spark joy through high-energy, chaotic, and heartwarming visual storytelling. However, the professional editing required to create them is grueling. I wanted to democratize this "joy-filled" aesthetic, making it possible for anyone to transform simple clips into broadcast-quality entertainment with zero manual effort.

What it does

VarietyShow AI Director transforms raw mobile footage into viral-ready reality show content. By analyzing the "vibe" and context of a video, the tool automatically overlays iconic captions and reactive effects. It's designed not just for social media growth, but as a way to preserve personal memories in a fun, cinematic, and professional format.

How we built it

The project was "vibe-coded" using Google AI Studio, powered by the Gemini 3 Flash model. We built a custom "Director Service" that treats the AI as a professional editor, providing it with a complex spatial coordinate system to ensure effects are placed intelligently. Key features of our build include:

  • Multimodal Reasoning: We use Gemini 3 Flash to "watch" the video and analyze emotional beats, awkward silences, and comedic timing.
  • Spatial Awareness: The AI calculates x, y coordinates based on the subject's position, ensuring reactive effects (like "Zoom Clones" or "Shocked" bubbles) appear in the negative space without covering the action.
  • Dynamic Prompting: We implemented a system that translates user "Director's Notes" into specific editing instructions, allowing for personalized styles like "more chaotic" or "focus on the cat."
  • Safe-Zone Mapping: A custom normalization layer maps the AI's 0-100 coordinates into global video space, handling complex math for cropped frames and different aspect ratios.

Challenges we ran into

The biggest challenge was the "Autonomy Balance." I wanted to offer users creative freedom through simple interactions while allowing the AI to maintain enough control to ensure a polished, coherent, and fun output every time. On the technical side, while Gemini 3 Flash is excellent at identifying funny moments, it doesn't inherently know where the "edges" of a user's screen or a cropped video are. We had to solve two specific technical issues:

  • Coordinate Mapping: We developed a transformation engine to map the AI's relative $0-100$ coordinates into global video space. Using the formula: $$GlobalPos = CropStart + (RelativePos \times CropSize)$$ we ensured that even if a user is zoomed in on a specific subject, the effects land exactly where intended.
  • The Safe-Zone Constraint: To prevent the AI from placing captions behind UI elements or cutting them off at the screen edge, we implemented a Safe Margin Clamping system.

Accomplishments that we're proud of

I'm incredibly proud that the tool can now take a simple, unedited clip and reliably produce a high-engagement video that feels like it was handled by a professional!

## What we learned I am in awe, what I figured out is that as long as I keep talking and communicating, the project can move on! Modern AI doesn't just code or process pixels; it appears to "understand" intent, humor, and timing, this is amazing.

What's next for VarietyShow AI Director

Gonna launch it and let more people create with it!

Built With

Share this project:

Updates