VarietyShow AI Director

Screenshot of a frame of the generated video.
Screenshot taken in live demo.

Inspiration

Asian variety shows have a unique ability to spark joy through high-energy, chaotic, and heartwarming visual storytelling. However, the professional editing required to create them is grueling. I wanted to democratize this "joy-filled" aesthetic, making it possible for anyone to transform simple clips into broadcast-quality entertainment with zero manual effort.

What it does

VarietyShow AI Director transforms raw mobile footage into viral-ready reality show content. By analyzing the "vibe" and context of a video, the tool automatically overlays iconic captions and reactive effects. It's designed not just for social media growth, but as a way to preserve personal memories in a fun, cinematic, and professional format.

How we built it

The project was "vibe-coded" using Google AI Studio, powered by the Gemini 3 Flash model. We built a custom "Director Service" that treats the AI as a professional editor, providing it with a complex spatial coordinate system to ensure effects are placed intelligently. Key features of our build include:

Multimodal Reasoning: We use Gemini 3 Flash to "watch" the video and analyze emotional beats, awkward silences, and comedic timing.
Spatial Awareness: The AI calculates x, y coordinates based on the subject's position, ensuring reactive effects (like "Zoom Clones" or "Shocked" bubbles) appear in the negative space without covering the action.
Dynamic Prompting: We implemented a system that translates user "Director's Notes" into specific editing instructions, allowing for personalized styles like "more chaotic" or "focus on the cat."
Safe-Zone Mapping: A custom normalization layer maps the AI's 0-100 coordinates into global video space, handling complex math for cropped frames and different aspect ratios.

Challenges we ran into

The biggest challenge was the "Autonomy Balance." I wanted to offer users creative freedom through simple interactions while allowing the AI to maintain enough control to ensure a polished, coherent, and fun output every time. On the technical side, while Gemini 3 Flash is excellent at identifying funny moments, it doesn't inherently know where the "edges" of a user's screen or a cropped video are. We had to solve two specific technical issues:

Coordinate Mapping: We developed a transformation engine to map the AI's relative $0-100$ coordinates into global video space. Using the formula: $$GlobalPos = CropStart + (RelativePos \times CropSize)$$ we ensured that even if a user is zoomed in on a specific subject, the effects land exactly where intended.
The Safe-Zone Constraint: To prevent the AI from placing captions behind UI elements or cutting them off at the screen edge, we implemented a Safe Margin Clamping system.

Accomplishments that we're proud of

I'm incredibly proud that the tool can now take a simple, unedited clip and reliably produce a high-engagement video that feels like it was handled by a professional!

## What we learned I am in awe, what I figured out is that as long as I keep talking and communicating, the project can move on! Modern AI doesn't just code or process pixels; it appears to "understand" intent, humor, and timing, this is amazing.

What's next for VarietyShow AI Director

Gonna launch it and let more people create with it!

Built With

Updates

x-cean C started this project — Feb 09, 2026 05:57 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.