Inspiration
The spark came from a frustrating reality in digital marketing: creating a single product photoshoot costs $5,000-15,000 and takes weeks of coordination between photographers, studios, models, and post-production teams.
We asked ourselves: What if AI could understand not just what a product looks like, but how it behaves physically? Glass refracts light at IOR 1.52. Metal reflects at specific angles. Shadows follow the light source. These are physics laws that even the best AI image generators often ignore.
The Gemini 3 Global Hackathon was the perfect opportunity to prove that AI can reason causally about the physical world—and generate commercially-viable marketing campaigns in 60 seconds instead of 6 weeks.
What it does
AetherSnap is an autonomous AI agent that transforms a single product photo (or video) into a complete marketing campaign with three key innovations:
🔬 Physics DNA Extraction — Analyzes Index of Refraction (IOR), light vectors, shadow softness, color temperature, and typography DNA using cause-effect reasoning.
🔄 Autonomous Self-Correction Loop — Generates images with Imagen 4, audits quality using Gemini 2.0 (scores 0-100), and automatically refines prompts if score < 70%. No human intervention required.
🎬 Spatial-Temporal Video Understanding — Analyzes product videos for motion, physics changes, and cause-effect chains like "Hand picks up bottle → label becomes visible → light creates highlight".
Result: Professional-grade marketing campaigns at 95%+ quality scores in under a minute.
How we built it
| Layer | Technology | Purpose |
|---|---|---|
| 🧠 Analysis | Gemini 2.0 Flash | Physics extraction, quality audits, prompt refinement |
| 🎨 Generation | Imagen 4 | 16K photorealistic campaign images |
| ☁️ Platform | Vertex AI | Enterprise-grade API with service account auth |
| 🔒 Security | Node.js/Express | Secure proxy (no API keys in browser) |
The entire pipeline runs autonomously—demonstrating true agentic behavior where the AI makes decisions, evaluates outcomes, and self-corrects without human prompts.
Challenges we ran into
JSON Parsing Instability — Gemini 2.5 Flash returned malformed JSON. Solution: Multi-step parsing with regex fallback + switched to Gemini 2.0 Flash.
Quality Consistency — Early Imagen 3 outputs scored 65-75%. Solution: Upgraded to Imagen 4 with quality boost keywords. Result: 95%+ scores.
Autonomous Loop Termination — Risk of infinite loops. Solution: Added
autoRegenerationAttemptedflag to limit retries.
Accomplishments that we're proud of
🏆 95%+ Quality Scores — Self-correction loop ensures commercial-grade outputs
🎬 First Physics-Aware Video-to-Campaign Pipeline — Cause-effect reasoning from product videos
🔄 True Agent Autonomy — Zero human intervention from upload to final assets
⚡ 60-Second Campaigns — What takes agencies 6 weeks, we do in a minute
What we learned
Physics constraints improve AI outputs — Teaching AI about IOR, light vectors, and materials produces more realistic images than generic prompts.
Self-correction is transformative — The jump from 75% to 95% quality came from letting the AI audit and fix itself.
Gemini 2.0 Flash is production-ready — Rock-solid JSON parsing, perfect for agentic loops.
Imagen 4 is a quantum leap — The quality difference from Imagen 3 is immediately visible.
What's next for AetherSnap
📅 Q1 2026: Campaign Timeline Planner — Multi-week scheduling with A/B variants
🧠 Q1 2026: Brand Memory — Store brand guidelines for consistent multi-product campaigns
🎬 Q2 2026: Video Output — Generate product videos, not just stills
🏢 Q3 2026: Enterprise API — White-label solution for agencies and e-commerce
Vision: AetherSnap becomes the physics-aware operating system for marketing content—where every brand can create agency-quality campaigns at the speed of thought.
Built With
- css3
- express.js
- gemini-2.0-flash
- google-cloud
- html5
- imagen-4
- javascript
- material-desing-3
- node.js
- vertex-ai
Log in or sign up for Devpost to join the conversation.