Unlok Cam

Choose your MIRROR to face and unlock your POTENTIAL
In 11 languages
Guided video interview
feedback in text and audio

Inspiration

We lie to ourselves constantly. In job interviews, we project confidence while our eyes betray anxiety. On first dates, we say "I'm fine" while our shoulders tense up. In difficult conversations, we avoid the real issue while our face screams discomfort.

Traditional self-improvement apps validate us. They tell us what we want to hear. But real growth requires confrontation with uncomfortable truths.

Inspired by emotional unlocking techniques used in high-performance coaching, we asked: what if AI could be the brutally honest mirror we need?

What it does

Unlok uses Gemini 3's native multimodal video understanding to analyze practice sessions across six life scenarios:

🪞 Emotional Mirror — General self-discovery and awareness
💼 Job Interview — Practice professional presence
🚀 Business Pitch — Rehearse investor presentations
💕 First Date — Develop authentic conversation skills
😰 Difficult Conversation — Prepare for confrontational discussions
🎤 Public Speaking — Master stage presence

Users record themselves speaking (15-60 seconds), and Gemini 3 analyzes facial micro-expressions, voice tone, posture, and body language. The AI delivers confrontational insights — not comfort, but truth.

Feedback is delivered via text AND synthesized audio, creating an intimate coaching experience.

Supports 11 languages with full UI translation and real-time AI insight localization.

How we built it

Core Architecture:

Gemini 3 Pro for deep multimodal video analysis (2M token context)
Gemini 3 Flash for real-time translation of AI insights
Google Cloud Text-to-Speech for audio feedback generation
Next.js 15 with App Router and React 19
MediaRecorder API for browser-based video capture
Zustand for lightweight state management

Key Technical Decisions:

Native video-to-Gemini pipeline: We send the raw video blob directly to Gemini's multimodal endpoint instead of extracting frames, enabling holistic temporal analysis of emotional patterns.
BYOK (Bring Your Own Key): Users input their own Gemini API key, making the app free to operate while preserving privacy and giving users cost control.
Confrontational prompt engineering: Our system prompts are designed to bypass LLM politeness defaults, pushing toward uncomfortable but growth-oriented revelations.
On-the-fly localization: UI is pre-translated to 11 languages, but AI-generated insights are translated in real-time using Gemini Flash for natural, context-aware localization.

Challenges we ran into

Making AI honest, not nice: LLMs default to validation and comfort. Engineering prompts that produce confrontational-but-constructive feedback required extensive iteration and cross-cultural testing.

Video compression tradeoffs: Browser-recorded videos can exceed API limits. We optimized MediaRecorder settings (VP9 codec, 720p, controlled bitrate) to balance quality with size.

Emotional universality vs. cultural context: Micro-expressions have universal elements but cultural variations. We tuned analysis prompts to acknowledge this nuance.

Audio-text synchronization: Generating TTS that matches the emotional weight of written insights required careful voice selection and pacing adjustments per language.

Accomplishments that we're proud of

11 languages fully supported (EN, PT-BR, ES, FR, DE, JA, ZH, HE, HI, KO, AR)
Sub-30-second analysis for typical recordings
Audio feedback that transforms text insights into personal coaching moments
Zero-backend architecture: Runs entirely client-side + direct Gemini API calls
BYOK model: Sustainable, privacy-first, infinitely scalable
Successfully built an AI that tells users what they don't want to hear

What we learned

Gemini 3's multimodal capabilities are transformative: Sending raw video and receiving nuanced emotional analysis feels like magic. The model understands context across time, not just individual frames.
Confrontation requires trust: Users need psychological safety before accepting hard truths. The UX journey from onboarding to revelation is as important as the AI itself.
Audio changes everything: Reading "you're avoiding eye contact" is informative. Hearing it spoken directly to you is transformative.
i18n ≠ translation: True localization means adapting emotional concepts and cultural contexts, not just words.

What's next for Unlok Cam

Progress tracking: Session history with visual comparison of emotional patterns over time
AI coaching conversations: Interactive follow-up dialogues about detected patterns
Talking avatar feedback: Animated AI coach face delivering insights
Team/Enterprise mode: Practice workplace conversations with feedback for both parties
Therapist/Coach integration: Export session data to human professionals
Mobile app: Native iOS/Android for on-the-go practice

Built With

cloud
gemini-3-flash
gemini-3-pro
google
google-cloud-text-to-speech
mediarecorder-api
next.js-15
react-19
tailwind-css
typescript
vercel
web-audio-api
zustand

Updates

Murilo Hallgren started this project — Jan 24, 2026 02:01 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.