Inspiration
Perfect Corp's YouCam APIs cover makeup, hair color, skin smoothing, and accessories, but each call wants a specific endpoint with a specific arg shape. A real shopper does not think in API names. They say "summer beach look, less makeup, lighter hair." There is a gap between the way a shopper talks and the way the API listens. This concierge bridges that gap with one chat turn and one ordered plan of calls. The goal was a demo that turns one English sentence into a sequence of real YouCam transformations and shows the user the before, the after, and every intermediate state so the chain is visible.
What it does
The user describes a vibe in plain English. An intent extractor running Gemini 2.0 Flash on Vertex AI classifies the request into the right YouCam capabilities and emits a typed IntentPlan. A planner orders the API calls so each transformation runs on the output of the previous one (skin smoothing first, then makeup, then hair color, then accessories), keeping the visual delta clean. The Streamlit app shows the source selfie, the planned sequence of calls, and a side-by-side before/after with each intermediate state visible. Every API call is rate-tracked and a per-session cost panel shows the user what the makeover would actually cost to run at scale.
How I built it
Python 3.10+ with a Streamlit front. Gemini 2.0 Flash on Vertex AI handles intent extraction; the YouCam API wrapper handles execution. Cost rates ship in rates.json so the demo self-prices without an external call. The observability layer captures every call as a trace row that feeds the in-app cost panel. The intent extractor emits a typed IntentPlan object that the planner validates against a schema before ordering the calls, so a bad classification never wastes a YouCam API quota.
Challenges I ran into
Mapping vague consumer language to a discrete API surface was the hard part. "Beachy" maps to lighter hair plus warmer makeup plus brighter skin, but only sometimes. Solved by having Gemini emit a typed IntentPlan object validated against a schema, then logging the per-vibe routing decisions so the prompts could be tuned against real shopper phrases. The second hard part was call ordering: makeup applied before skin smoothing looks fake; hair color applied after accessories sometimes clipped the hairline. The fixed ordering rule in the planner solved it.
Accomplishments I'm proud of
Chaining the calls so the smoothed skin feeds into the makeup pass which feeds into the hair pass, and seeing the gallery animate in the demo. The intent extractor reliably classifies one-sentence vibes into the right four-or-five-step makeover. The cost panel makes the per-session price visible so a real product team can reason about quotas.
What I learned
Consumer-facing AI demos live or die on the latency of the visible feedback. The side-by-side gallery with each intermediate state was the unlock. The user gets to see why each step happened, not just the final image. The typed IntentPlan also turned out to be the right place to enforce safety: malformed plans get rejected before any YouCam call goes out.
What's next for Perfect Corp Try-On Concierge
Wire the live YouCam API key into a deployed Cloud Run instance so the demo runs end-to-end without a fake provider. Add a save-the-look feature that exports the IntentPlan plus the gallery as a sharable link. A second intent class for "occasion shopping" (wedding, interview, beach) that pre-loads vibe presets.

Log in or sign up for Devpost to join the conversation.