Inspiration

We lie to ourselves constantly. In job interviews, we project confidence while our eyes betray anxiety. On first dates, we say "I'm fine" while our shoulders tense up. In difficult conversations, we avoid the real issue while our face screams discomfort.

Traditional self-improvement apps validate us. They tell us what we want to hear. But real growth requires confrontation with uncomfortable truths.

Inspired by emotional unlocking techniques used in high-performance coaching, we asked: what if AI could be the brutally honest mirror we need?

What it does

Unlok uses Gemini 3's native multimodal video understanding to analyze practice sessions across six life scenarios:

  • 🪞 Emotional Mirror — General self-discovery and awareness
  • 💼 Job Interview — Practice professional presence
  • 🚀 Business Pitch — Rehearse investor presentations
  • 💕 First Date — Develop authentic conversation skills
  • 😰 Difficult Conversation — Prepare for confrontational discussions
  • 🎤 Public Speaking — Master stage presence

Users record themselves speaking (15-60 seconds), and Gemini 3 analyzes facial micro-expressions, voice tone, posture, and body language. The AI delivers confrontational insights — not comfort, but truth.

Feedback is delivered via text AND synthesized audio, creating an intimate coaching experience.

Supports 11 languages with full UI translation and real-time AI insight localization.

How we built it

Core Architecture:

  • Gemini 3 Pro for deep multimodal video analysis (2M token context)
  • Gemini 3 Flash for real-time translation of AI insights
  • Google Cloud Text-to-Speech for audio feedback generation
  • Next.js 15 with App Router and React 19
  • MediaRecorder API for browser-based video capture
  • Zustand for lightweight state management

Key Technical Decisions:

  1. Native video-to-Gemini pipeline: We send the raw video blob directly to Gemini's multimodal endpoint instead of extracting frames, enabling holistic temporal analysis of emotional patterns.

  2. BYOK (Bring Your Own Key): Users input their own Gemini API key, making the app free to operate while preserving privacy and giving users cost control.

  3. Confrontational prompt engineering: Our system prompts are designed to bypass LLM politeness defaults, pushing toward uncomfortable but growth-oriented revelations.

  4. On-the-fly localization: UI is pre-translated to 11 languages, but AI-generated insights are translated in real-time using Gemini Flash for natural, context-aware localization.

Challenges we ran into

Making AI honest, not nice: LLMs default to validation and comfort. Engineering prompts that produce confrontational-but-constructive feedback required extensive iteration and cross-cultural testing.

Video compression tradeoffs: Browser-recorded videos can exceed API limits. We optimized MediaRecorder settings (VP9 codec, 720p, controlled bitrate) to balance quality with size.

Emotional universality vs. cultural context: Micro-expressions have universal elements but cultural variations. We tuned analysis prompts to acknowledge this nuance.

Audio-text synchronization: Generating TTS that matches the emotional weight of written insights required careful voice selection and pacing adjustments per language.

Accomplishments that we're proud of

  • 11 languages fully supported (EN, PT-BR, ES, FR, DE, JA, ZH, HE, HI, KO, AR)
  • Sub-30-second analysis for typical recordings
  • Audio feedback that transforms text insights into personal coaching moments
  • Zero-backend architecture: Runs entirely client-side + direct Gemini API calls
  • BYOK model: Sustainable, privacy-first, infinitely scalable
  • Successfully built an AI that tells users what they don't want to hear

What we learned

  1. Gemini 3's multimodal capabilities are transformative: Sending raw video and receiving nuanced emotional analysis feels like magic. The model understands context across time, not just individual frames.

  2. Confrontation requires trust: Users need psychological safety before accepting hard truths. The UX journey from onboarding to revelation is as important as the AI itself.

  3. Audio changes everything: Reading "you're avoiding eye contact" is informative. Hearing it spoken directly to you is transformative.

  4. i18n ≠ translation: True localization means adapting emotional concepts and cultural contexts, not just words.

What's next for Unlok Cam

  • Progress tracking: Session history with visual comparison of emotional patterns over time
  • AI coaching conversations: Interactive follow-up dialogues about detected patterns
  • Talking avatar feedback: Animated AI coach face delivering insights
  • Team/Enterprise mode: Practice workplace conversations with feedback for both parties
  • Therapist/Coach integration: Export session data to human professionals
  • Mobile app: Native iOS/Android for on-the-go practice

Built With

  • cloud
  • gemini-3-flash
  • gemini-3-pro
  • google
  • google-cloud-text-to-speech
  • mediarecorder-api
  • next.js-15
  • react-19
  • tailwind-css
  • typescript
  • vercel
  • web-audio-api
  • zustand
Share this project:

Updates