Inspiration
The inspiration for this project came from a simple observation: online shopping is currently a lonely, static experience. Most "Virtual Try-On" tools are just fancy image filters—you upload a photo, wait, and get a result. Having showcased a VTO project at the IBM Pre-AI Summit, I realized that what users actually want is a stylist, not just a mirror. I wanted to build an agent that can see what you’re wearing, listen to your needs, and talk you through a fashion transformation in real-time.
What it does
an AI-powered "Live Fashion Stylist." Instead of a "Text-In/Text-Out" interface, it is a fully immersive agent that:
Sees: Uses a live camera feed to analyze the user’s build, skin tone, and current outfit.
Hears & Speaks: Uses the Gemini Live API to have a natural, bidirectional conversation. Users can interrupt the agent to change colors, styles, or occasions.
Creates: Generates photorealistic "Try-On" visuals on the fly, weaving them into the conversation stream using interleaved output. It transforms a website into a digital dressing room where the AI acts as a creative director for your personal style.
How I built it
The Brain: Gemini 2.0 Flash handles the multimodal reasoning, while the Gemini Live API (via WebSockets) manages the real-time voice and "barge-in" logic.
The Vision: I integrated MediaPipe Pose and OpenCV to track body landmarks, ensuring that the AI-generated garments align perfectly with the user's live video feed.
The Infrastructure: The backend is containerized with Docker and hosted on Google Cloud Run. I used Vertex AI for model orchestration and Cloud Storage for garment assets.
Automation: To ensure reproducibility, the entire infrastructure was provisioned using Terraform (Infrastructure as Code).
Challenges I ran into
The biggest technical hurdle was Latency Synchronization. Keeping a live audio conversation going while simultaneously generating and overlaying high-resolution AI garments required intense optimization.
Accomplishments that I'm proud of
Accomplishments that I'm proud of Successful "Live" Transition: I’m incredibly proud of moving from a static photo-processor to a stateful agent that can handle real-time interruptions.
Seamless Multimodal Flow: Achieving a fluid "Interleaved Output" where the agent talks about a blazer while simultaneously showing it on the user’s body was a major milestone.
Scalable Architecture: Building a production-ready GCP environment that can scale to handle live video streams efficiently.
What I learned
This hackathon was a masterclass in Agentic Design. I learned that building an agent is fundamentally different from building a chatbot; it requires managing state, handling asynchronous interruptions, and prioritizing "context" over "prompts." I also gained deep experience in Google Cloud's advanced AI offerings, specifically how to utilize the GenAI SDK to create cohesive, mixed-media user experiences.
What's next for virtual Try-On
The future of this project lies in Environmental Simulation. I plan to add a feature where the agent can "change the lighting" or the background of the video feed—allowing a user to see how a dress looks under the dim lights of a gala or the bright sun of a beach wedding. I am also exploring Social Try-On, where friends can join a live styling session to give feedback in real-time.
Built With
- 3.
- a
- adk-based
- agent
- agent").
- agents.
- ai
- and
- bidirectional-audio/video-streaming.-2.-orchestration-&-agent-frameworks-agent-development-kit-(adk):-the-mandatory-framework-for-this-hackathon.-use-this-to-manage-your-multi-agent-architecture-(e.g.
- computer
- engine
- engine:
- for
- gemini-3.1-flash:-your-primary-multimodal-reasoning-engine-(essential-for-the-"live"-category-to-ensure-low-latency).-gemini-3.1-flash-image-(nano-banana-2):-the-2026-standard-for-high-speed
- high-fidelity-virtual-garment-generation.-gemini-2.5-flash-live:-the-specialized-model-for-sub-second
- history
- host
- interaction
- managed
- media
- memory.
- personalized
- runtime
- scale
- sessions:
- store
- style
- stylist
- the
- to
- used
- vertex
- vision
- vision,agent
- where
- you
- your
Log in or sign up for Devpost to join the conversation.