Inspiration Digital marketers spend hours analyzing ad creatives, iterating variants, and crafting platform-specific captions—often manually, leading to inefficiency and inconsistent results. With Gemini's multimodal reasoning and generation capabilities, I saw an opportunity to build a streamlined agent that handles the full workflow: from analysis to optimization to deployment-ready assets. Inspired by Gemini 3 Hackathon's focus on practical, innovative applications, GemAd Catalyst aims to democratize ad optimization for small teams and solo creators. What it does GemAd Catalyst is a Gemini-powered multimodal ad optimizer:
Analyzes uploaded images/videos with structured reasoning (composition, emotions, compliance, hypothetical performance metrics). Generates optimized text variants (headlines, CTAs, scripts) with explanations. Offers variant selection for regeneration (triggers Imagen for images/Veo for short videos when quota allows). Provides SEO captions, hashtags, and step-by-step manual guides for deployment on GMB, email marketing, and social platforms (Instagram, Facebook, LinkedIn, X, WhatsApp). Agentic flow: Proactive offers, user feedback integration, "no changes needed" branching.
It delivers actionable, professional outputs—hypothetical lifts based on patterns (not guaranteed data). How we built it Built entirely as a custom Gem in Google AI Studio using Gemini multimodal models (1.5/2.0 series).
System prompt engineered for strict workflow: JSON-structured analysis, variant options/selection before generation, conditional Imagen/Veo triggers, platform-specific guides. Tested extensively with diverse creatives (images/short videos) for reasoning depth and output consistency. No external code/API integrations—pure Gemini for hackathon focus (multimodal input, chained reasoning, agentic iteration).
Challenges we ran into
Quota limits on free tier for Imagen/Veo generation (inconsistent availability—fallback to detailed prompts). Ensuring strict variant selection before generation (prompt drift required reinforcement rules). Video processing occasional delays/stuck extraction (mitigated by recommending short clips/images). Balancing creativity with consistency in outputs (temperature tuning helped).
Accomplishments that we're proud of
End-to-end workflow in a single chat Gem: From raw creative upload to deployment-ready assets—practical for real marketers. Deep Gemini integration: Multimodal analysis + agentic loops + conditional generation, showcasing reasoning chains effectively. Transparent, ethical design: Manual guides only (no overclaimed automation), hypothetical metrics clearly labeled. Polished, user-friendly interaction despite prototype constraints.
What we learned
Gemini's multimodal strength shines in subjective tasks like ad critique, but outputs vary—prompt engineering critical for consistency. Agentic flows (selection before generation) enhance control but require strict rules to prevent drift. Hackathon prototypes benefit from simplicity (Gem over complex apps)—focus on core Gemini features yields stronger impact. Real-world utility matters: Manual guides add value where automation impossible.
What's next for GemAd Catalyst
Upgrade to coded frontend (e.g., Streamlit with Gemini API) for custom UI, session history, exports. Integrate trend inputs (manual or safe API) for adaptive variants. Expand audits (accessibility/inclusivity scoring). Explore Vertex AI for scaled deployment if demand grows.
Built With
- api/ai
- gemini
Log in or sign up for Devpost to join the conversation.