Real Virtual TryOn | VTO

Inspiration

Online eyewear shopping has a frustrating problem: 40% of glasses purchased online get returned because they don't fit properly. The core issue? Customers can't accurately measure their Pupillary Distance (PD) at home, and existing virtual try-on solutions look fake and unreliable.

I wanted to solve both problems: precise measurement AND realistic visualization.

What it does

Real Virtual TryOn is a complete pipeline that:

Measures your PD precisely from selfies using a credit card as reference (±2mm accuracy)
Positions glasses correctly based on your actual facial measurements
Generates photorealistic portraits using Gemini 3's image generation

The user takes a selfie holding a credit card, selects glasses, and gets a studio-quality image showing exactly how they'll look.

How I built it

The pipeline has two main stages:

Stage 1 - Measurement & Overlay:

SAM3 for credit card segmentation and scale calibration
MediaPipe/dlib for facial landmark detection
Custom perspective correction for tilted selfies
Precise positioning based on real mm-to-pixel ratios

Stage 2 - Gemini 3 Post-processing:

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[prompt, tryon_image, reference_glasses],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(aspect_ratio="4:3", image_size="2K")
    )
)

Two API calls refine the output:

First call removes overlay artifacts while preserving glasses details
Second call generates a clean studio portrait

Challenges I ran into

Perspective distortion: Selfies taken at arm's length have significant lens distortion. I implemented a depth correction factor based on the 35mm equivalent focal length.
Maintaining glasses identity: Early Gemini prompts would subtly change the glasses design. I solved this by always passing the reference glasses image and explicitly instructing to preserve details.
Aspect ratio matching: Gemini 3 only supports specific aspect ratios. I built auto-detection to find the closest match to the input image.

What I learned

Gemini 3's multi-image understanding is genuinely impressive. Unlike traditional inpainting models, it can follow complex instructions like "keep the face identical, keep the glasses from the second image, but change the background" - and actually do it consistently.

What's next

Real-time video try-on using Gemini's streaming capabilities
Integration with e-commerce platforms via API
Support for sunglasses with different lens tints

Built With

amazon-web-services
dlib
gemini-3-api
mediapipe
numpy
opencv
pillow
python
pytorch
s3
sam3

Updates

resshara Ali Asia started this project — Feb 08, 2026 08:21 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.