Inspiration

The inspiration for this project came from a simple observation: online shopping is currently a lonely, static experience. Most "Virtual Try-On" tools are just fancy image filters—you upload a photo, wait, and get a result. Having showcased a VTO project at the IBM Pre-AI Summit, I realized that what users actually want is a stylist, not just a mirror. I wanted to build an agent that can see what you’re wearing, listen to your needs, and talk you through a fashion transformation in real-time.

What it does

an AI-powered "Live Fashion Stylist." Instead of a "Text-In/Text-Out" interface, it is a fully immersive agent that:

Sees: Uses a live camera feed to analyze the user’s build, skin tone, and current outfit.

Hears & Speaks: Uses the Gemini Live API to have a natural, bidirectional conversation. Users can interrupt the agent to change colors, styles, or occasions.

Creates: Generates photorealistic "Try-On" visuals on the fly, weaving them into the conversation stream using interleaved output. It transforms a website into a digital dressing room where the AI acts as a creative director for your personal style.

How I built it

The Brain: Gemini 2.0 Flash handles the multimodal reasoning, while the Gemini Live API (via WebSockets) manages the real-time voice and "barge-in" logic.

The Vision: I integrated MediaPipe Pose and OpenCV to track body landmarks, ensuring that the AI-generated garments align perfectly with the user's live video feed.

The Infrastructure: The backend is containerized with Docker and hosted on Google Cloud Run. I used Vertex AI for model orchestration and Cloud Storage for garment assets.

Automation: To ensure reproducibility, the entire infrastructure was provisioned using Terraform (Infrastructure as Code).

Challenges I ran into

The biggest technical hurdle was Latency Synchronization. Keeping a live audio conversation going while simultaneously generating and overlaying high-resolution AI garments required intense optimization.

Accomplishments that I'm proud of

Accomplishments that I'm proud of Successful "Live" Transition: I’m incredibly proud of moving from a static photo-processor to a stateful agent that can handle real-time interruptions.

Seamless Multimodal Flow: Achieving a fluid "Interleaved Output" where the agent talks about a blazer while simultaneously showing it on the user’s body was a major milestone.

Scalable Architecture: Building a production-ready GCP environment that can scale to handle live video streams efficiently.

What I learned

This hackathon was a masterclass in Agentic Design. I learned that building an agent is fundamentally different from building a chatbot; it requires managing state, handling asynchronous interruptions, and prioritizing "context" over "prompts." I also gained deep experience in Google Cloud's advanced AI offerings, specifically how to utilize the GenAI SDK to create cohesive, mixed-media user experiences.

What's next for virtual Try-On

The future of this project lies in Environmental Simulation. I plan to add a feature where the agent can "change the lighting" or the background of the video feed—allowing a user to see how a dress looks under the dim lights of a gala or the bright sun of a beach wedding. I am also exploring Social Try-On, where friends can join a live styling session to give feedback in real-time.

Built With

  • 3.
  • a
  • adk-based
  • agent
  • agent").
  • agents.
  • ai
  • and
  • bidirectional-audio/video-streaming.-2.-orchestration-&-agent-frameworks-agent-development-kit-(adk):-the-mandatory-framework-for-this-hackathon.-use-this-to-manage-your-multi-agent-architecture-(e.g.
  • computer
  • engine
  • engine:
  • for
  • gemini-3.1-flash:-your-primary-multimodal-reasoning-engine-(essential-for-the-"live"-category-to-ensure-low-latency).-gemini-3.1-flash-image-(nano-banana-2):-the-2026-standard-for-high-speed
  • high-fidelity-virtual-garment-generation.-gemini-2.5-flash-live:-the-specialized-model-for-sub-second
  • history
  • host
  • interaction
  • managed
  • media
  • memory.
  • personalized
  • runtime
  • scale
  • sessions:
  • store
  • style
  • stylist
  • the
  • to
  • used
  • vertex
  • vision
  • vision,agent
  • where
  • you
  • your
Share this project:

Updates