InterVU

Inspiration

Job interviews are nerve-wracking, and most candidates rarely receive honest, real-time feedback about their performance. Feedback usually focuses on what they said, but rarely covers how they communicated — eye contact, posture, speaking clarity, and confidence.

Most existing mock interview tools are passive systems. They simply ask questions and wait for answers. Real interviewers behave differently — they interrupt, probe deeper, and observe body language.

We wanted to build something closer to a real interview experience: sitting across from a tough but fair senior engineering manager who:

  • Challenges your claims
  • Notices behavioral signals like eye contact and posture
  • Interrupts when answers are unclear
  • Switches to coaching mode when you're truly stuck

This idea led to InterVU.


What it does

InterVU is a real-time AI mock interview platform powered by Google's Gemini Live API.

Users upload a job description and their resume, then enter a live audio/video interview with Wayne — an AI Senior Engineering Hiring Manager.

Wayne operates using a 3-state behavior engine:

1. The Interrogation

Wayne cross-references the candidate’s resume against the job description and probes deeper to verify whether the candidate actually used the skills they claim.

2. The Visual & Audio Evaluator

Wayne continuously monitors body language and communication signals, including:

  • Eye contact
  • Posture
  • Speaking duration

Wayne will interrupt the candidate if they:

  • Break eye contact for more than 5 seconds
  • Slouch
  • Ramble beyond 30 seconds

3. The Tutor Pivot

If the candidate is fundamentally stuck, Wayne temporarily drops the interviewer persona and switches to teaching mode, explaining the concept using simple analogies before returning to evaluation.


Post-Interview Feedback

After the interview, users receive a structured performance report that includes:

  • Per-skill evaluation scores (1–10)
  • Body language analysis
  • Resume accuracy verification
  • Communication clarity feedback
  • A personalized 2-week coaching plan

The system supports 14 languages. Questions remain in English, but Wayne dynamically switches to the user's native language for hints when vocabulary becomes a barrier — ensuring that language does not limit talent recognition.


How we built it

Backend

  • FastAPI with asynchronous WebSocket handling
  • Three concurrent tasks run using async workflows:
    • Browser → Gemini streaming
    • Gemini → Browser streaming
    • Interview countdown timer

AI Layer

  • Google Gemini Live API
  • Real-time bidirectional audio and video streaming
  • A ~250-line dynamic system prompt encoding Wayne's 3-state behavior engine

Voice Activity Detection

Custom client-side Voice Activity Detection (VAD) using the Web Audio API, including:

  • Adaptive noise floor detection (exponential smoothing)
  • Dynamic thresholding
  • Hangover and release timing
  • Echo suppression

This allows natural conversations without a "press-to-speak" button.

Video Analysis

  • Capture 1 frame every 10 seconds
  • Resize to 512×512 JPEG
  • Stream frames to Gemini for body language analysis

Report Generation Pipeline

Three-stage report generation:

  1. Structured interview transcript analysis
  2. Skill-level scoring
  3. Personalized coaching plan

All generated via Gemini Chat with structured output.

Storage

  • Google Cloud Storage for media and reports
  • Automatic local fallback
  • SQLAlchemy Async ORM for transcripts and confidence samples

Frontend

  • Vanilla HTML / CSS / JavaScript
  • Modular architecture
  • No heavy frameworks to reduce overhead

Challenges we ran into

Turn Detection with Gemini Live

Determining when the candidate finished speaking without a manual button was difficult.

We built an adaptive VAD system that:

  • Learns ambient noise levels
  • Uses dynamic thresholding

Timing parameters:

  • 250ms hangover
  • 900ms release timing

This handles natural speech pauses effectively.


Echo Suppression

The AI’s audio responses sometimes triggered the microphone.

Solution:

  • Mode-specific cooldown timers:
    • 500ms (standard mode)
    • 1500ms (native-audio mode)

We also added input locking during AI responses.


Silence Deadlocks

Occasionally both the user and AI waited for each other.

Solution:

A 2.5-second silence watchdog that nudges the model:

"The candidate has paused. Please continue with your next question."


Persona Consistency

Maintaining Wayne’s three behavioral states required extensive prompt engineering.

Particularly challenging was enforcing mandatory interruption rules for body language violations.


Accomplishments we're proud of

Mandatory Interruption System

Wayne interrupts candidates mid-sentence for:

  • Poor eye contact
  • Rambling answers
  • Weak posture

Most AI interview systems are passive — InterVU behaves like a real interviewer.


Zero-Button Voice Interaction

Our adaptive VAD system allows completely natural conversation even in noisy environments like coffee shops or home offices.


Resume-to-Reality Verification

The system cross-references resume claims against the job description, forcing candidates to prove depth of knowledge.


Multilingual Coaching

Candidates are evaluated in English, but Wayne can switch to the user's native language for hints, ensuring language barriers don't mask true ability.


Production-Ready Resilience

Robust infrastructure includes:

  • GCS with local fallback
  • WebSocket disconnect recovery
  • Timer-triggered report generation
  • Graceful transcript flushing during disconnections

What we learned

Building real-time AI interview systems revealed several insights:

  • Real-time audio/video streaming with LLMs is fundamentally different from chat-based systems.
  • Latency, turn management, and echo suppression are critical engineering challenges.
  • Persona-driven prompt engineering creates far more natural interactions than basic instruction prompts.
  • Client-side audio processing significantly improves responsiveness.
  • Vanilla JavaScript with modular architecture can deliver high-performance real-time applications without heavy frameworks.

What's next for InterVU

Panel Interviews

Multiple AI interviewers with different personas:

  • Technical Lead
  • Behavioral Interviewer
  • Culture Fit Evaluator

All participating in a single interview session.


Interview Analytics Dashboard

Users will be able to track improvement through:

  • Confidence curves
  • Skill progression graphs
  • Body language heatmaps

Company-Specific Preparation Modes

Wayne will simulate interview styles for specific companies, such as:

  • FAANG system design interviews
  • Startup technical rounds
  • Consulting case interviews

Peer Practice Rooms

Two users can interview each other while Wayne provides live coaching and feedback to both participants.


Mobile Support

Optimized audio/video capture pipelines for smartphone cameras and microphones, allowing candidates to practice anywhere.

Built With

Share this project:

Updates