Inspiration

Online eyewear shopping has a frustrating problem: 40% of glasses purchased online get returned because they don't fit properly. The core issue? Customers can't accurately measure their Pupillary Distance (PD) at home, and existing virtual try-on solutions look fake and unreliable.

I wanted to solve both problems: precise measurement AND realistic visualization.

What it does

Real Virtual TryOn is a complete pipeline that:

  1. Measures your PD precisely from selfies using a credit card as reference (±2mm accuracy)
  2. Positions glasses correctly based on your actual facial measurements
  3. Generates photorealistic portraits using Gemini 3's image generation

The user takes a selfie holding a credit card, selects glasses, and gets a studio-quality image showing exactly how they'll look.

How I built it

The pipeline has two main stages:

Stage 1 - Measurement & Overlay:

  • SAM3 for credit card segmentation and scale calibration
  • MediaPipe/dlib for facial landmark detection
  • Custom perspective correction for tilted selfies
  • Precise positioning based on real mm-to-pixel ratios

Stage 2 - Gemini 3 Post-processing:

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[prompt, tryon_image, reference_glasses],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(aspect_ratio="4:3", image_size="2K")
    )
)

Two API calls refine the output:

  1. First call removes overlay artifacts while preserving glasses details
  2. Second call generates a clean studio portrait

Challenges I ran into

  • Perspective distortion: Selfies taken at arm's length have significant lens distortion. I implemented a depth correction factor based on the 35mm equivalent focal length.

  • Maintaining glasses identity: Early Gemini prompts would subtly change the glasses design. I solved this by always passing the reference glasses image and explicitly instructing to preserve details.

  • Aspect ratio matching: Gemini 3 only supports specific aspect ratios. I built auto-detection to find the closest match to the input image.

What I learned

Gemini 3's multi-image understanding is genuinely impressive. Unlike traditional inpainting models, it can follow complex instructions like "keep the face identical, keep the glasses from the second image, but change the background" - and actually do it consistently.

What's next

  • Real-time video try-on using Gemini's streaming capabilities
  • Integration with e-commerce platforms via API
  • Support for sunglasses with different lens tints

Built With

Share this project:

Updates