StyleVision – The World's First Agentic Fashion Mirror
Inspiration
We’ve all experienced the "mirror moment"—that flicker of doubt where you stare at your reflection wondering, "Does this actually work?" or "What if this jacket were leather instead of denim?" We realized that while AI chatbots exist, they lack the physicality and immediacy of a real-world styling session. We wanted to build a "Magic Mirror" that doesn't just reflect who you are, but reimagines who you could be, democratizing the luxury of a professional personal stylist through ambient, hands-free technology.
What it does
StyleVision is a multimodal, agentic fashion mirror powered by Google’s Gemini Live API.
- Real-Time Stylist: It sees what you wear via a live video feed and converses with you with zero latency—no buttons, just natural voice.
- The "Imagine" Engine: Users can verbally request outfit swaps (e.g., "Show me this in emerald green velvet"). The AI captures a frame and generates a photorealistic visualization of the user in that exact outfit while maintaining face and pose consistency.
- Proactive Agency: It’s not a passive bot; it proactively manages tools, deciding when to take photos, generate looks, or trigger a "Runway Slideshow" to compare options side-by-side.
How we built it
The backbone of StyleVision is the Google Gemini Multimodal Live API over WebSockets.
- Frontend: Built with React 19 and TypeScript for a slick, responsive UI.
- Audio/Video Pipeline: We implemented raw 16-bit LPCM audio streaming to ensure ultra-low latency, bypassing standard STT/TTS for a native, emotional voice experience.
- Image Generation: We utilized Gemini 2.5 Flash Image combined with custom prompt engineering to handle the generative virtual try-on.
- Agentic Logic: We used Dynamic Tool Calling to allow the agent to manipulate the UI, trigger the camera, and manage the "Runway" comparison montage.
Challenges we ran into
The biggest hurdle was Latency and State Management. Maintaining a live video/audio websocket while simultaneously triggering heavy generative image tasks required a complex asynchronous architecture. We also wrestled with Voice Activity Detection (VAD)—ensuring the AI could handle "barge-in" interruptions naturally so it felt like a human conversation rather than a walkie-talkie exchange.
Accomplishments that we're proud of
We are incredibly proud of achieving a Native Multimodal Loop. Most apps "fake" multimodality by stitching together different APIs; StyleVision feels like a single, living entity that sees, hears, and creates simultaneously. Achieving "Face & Pose Consistency" during generative swaps was a major win, as it moves the app from a "fun filter" to a legitimate fashion tool.
What we learned
Building StyleVision taught us that the future of AI isn't in text boxes—it's in Ambient Computing. We learned how to optimize binary data streams for real-time performance and realized that "Agentic" behavior (the AI choosing its own tools) creates a much more "magical" user experience than traditional command-based interfaces.
What's next for StyleVision – The World's First Agentic Fashion Mirror
The vision is to move from a personal tool to a Retail Revolution. We plan to integrate with brand catalogs so users can say, "Show me this with those boots from the new Zara collection," and see it instantly. We are also looking into long-term memory integration to allow the mirror to "remember" your existing wardrobe, offering suggestions based on what you already own and your personal style evolution over time.
Built With
- gemini-function-calling
- gemini-live-api
- google-ai-studio
- nano-banana
Log in or sign up for Devpost to join the conversation.