Inspiration
We’ve all been there: you’re deep into a chill, lo-fi study stream or a beautiful travel vlog, and suddenly—BAM—a loud, neon-colored insurance commercial ruins the entire mood. It’s annoying for the viewer and honestly, it’s a missed opportunity for the brand. We wanted to fix that "ad-break whiplash." We wondered: What if an ad actually cared about the video it was sitting in? With Gemini 3’s ability to actually "watch" and "feel" a video, we realized we could finally make ads that don't suck.
What it does
GeminiAd is basically a "vibe-matching" engine for YouTube. It watches a video just like a human would—noticing the colors, the music, and the overall mood. It then uses that context to instantly generate an ad that fits right in. If you're watching a minimalist tech review, you get a sleek, quiet ad. If it’s a high-energy fitness vlog, the ad matches that intensity. It creates everything from the script to the visuals, making sure the transition from content to commercial feels seamless, not jarring.
How we built it
We took a "Vibe Coding" approach using Antigravity. Instead of getting bogged down in the boilerplate, we used the agentic IDE to scaffold the app while we focused on the creative logic. The heavy lifting is done by Gemini 3 Pro, which natively handles YouTube URLs to process video, audio, and text all at once. We then hooked that up to Imagen 3 and Veo to turn those "vibes" into actual images and video clips.
Challenges we ran into
The hardest part was getting the timing right. A 15-minute video changes a lot—the vibe at the beginning might be totally different by the end. We spent a lot of time teaching the model to find "the right moment"—the specific timestamps where a certain ad would actually make sense. Also, we had some classic "developer luck" with setting up our GCP billing and UPI payments mid-hackathon, which was a stress test we didn't plan for, but we made it through!
Accomplishments that we're proud of
We’re really proud of the "Vibe Consistency." There was this moment during testing where the AI saw a somber, cinematic documentary and generated an ad with a soft piano score and muted colors to match. Seeing the AI "get it"—understanding the emotional weight of a video rather than just keyword-matching—was a huge win for us.
What we learned
We learned that the future of AI isn't just about processing text; it’s about context. Letting Gemini 3 "see" the video changed everything. It also changed how we work—using Antigravity showed us that we can move so much faster when we focus on the intent of the code and let the AI handle the syntax.
What's next for GeminiAd
We want to take this live. Imagine this running in real-time on a Twitch stream or a YouTube Live, where the ads change as the streamer changes their tone. We also want to add a "Personalization Layer" so the ad matches both the video and the person watching it. The goal is to turn ads from "annoying interruptions" into "relevant recommendations.
Log in or sign up for Devpost to join the conversation.