Inspiration
Online eyewear shopping has a frustrating problem: 40% of glasses purchased online get returned because they don't fit properly. The core issue? Customers can't accurately measure their Pupillary Distance (PD) at home, and existing virtual try-on solutions look fake and unreliable.
I wanted to solve both problems: precise measurement AND realistic visualization.
What it does
Real Virtual TryOn is a complete pipeline that:
- Measures your PD precisely from selfies using a credit card as reference (±2mm accuracy)
- Positions glasses correctly based on your actual facial measurements
- Generates photorealistic portraits using Gemini 3's image generation
The user takes a selfie holding a credit card, selects glasses, and gets a studio-quality image showing exactly how they'll look.
How I built it
The pipeline has two main stages:
Stage 1 - Measurement & Overlay:
- SAM3 for credit card segmentation and scale calibration
- MediaPipe/dlib for facial landmark detection
- Custom perspective correction for tilted selfies
- Precise positioning based on real mm-to-pixel ratios
Stage 2 - Gemini 3 Post-processing:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[prompt, tryon_image, reference_glasses],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(aspect_ratio="4:3", image_size="2K")
)
)
Two API calls refine the output:
- First call removes overlay artifacts while preserving glasses details
- Second call generates a clean studio portrait
Challenges I ran into
Perspective distortion: Selfies taken at arm's length have significant lens distortion. I implemented a depth correction factor based on the 35mm equivalent focal length.
Maintaining glasses identity: Early Gemini prompts would subtly change the glasses design. I solved this by always passing the reference glasses image and explicitly instructing to preserve details.
Aspect ratio matching: Gemini 3 only supports specific aspect ratios. I built auto-detection to find the closest match to the input image.
What I learned
Gemini 3's multi-image understanding is genuinely impressive. Unlike traditional inpainting models, it can follow complex instructions like "keep the face identical, keep the glasses from the second image, but change the background" - and actually do it consistently.
What's next
- Real-time video try-on using Gemini's streaming capabilities
- Integration with e-commerce platforms via API
- Support for sunglasses with different lens tints
Built With
- amazon-web-services
- dlib
- gemini-3-api
- mediapipe
- numpy
- opencv
- pillow
- python
- pytorch
- s3
- sam3
Log in or sign up for Devpost to join the conversation.