intervu

InterVU

Inspiration

Job interviews are nerve-wracking, and most candidates rarely receive honest, real-time feedback about their performance. Feedback usually focuses on what they said, but rarely covers how they communicated — eye contact, posture, speaking clarity, and confidence.

Most existing mock interview tools are passive systems. They simply ask questions and wait for answers. Real interviewers behave differently — they interrupt, probe deeper, and observe body language.

We wanted to build something closer to a real interview experience: sitting across from a tough but fair senior engineering manager who:

Challenges your claims
Notices behavioral signals like eye contact and posture
Interrupts when answers are unclear
Switches to coaching mode when you're truly stuck

This idea led to InterVU.

What it does

InterVU is a real-time AI mock interview platform powered by Google's Gemini Live API.

Users upload a job description and their resume, then enter a live audio/video interview with Wayne — an AI Senior Engineering Hiring Manager.

Wayne operates using a 3-state behavior engine:

1. The Interrogation

Wayne cross-references the candidate’s resume against the job description and probes deeper to verify whether the candidate actually used the skills they claim.

2. The Visual & Audio Evaluator

Wayne continuously monitors body language and communication signals, including:

Eye contact
Posture
Speaking duration

Wayne will interrupt the candidate if they:

Break eye contact for more than 5 seconds
Slouch
Ramble beyond 30 seconds

3. The Tutor Pivot

If the candidate is fundamentally stuck, Wayne temporarily drops the interviewer persona and switches to teaching mode, explaining the concept using simple analogies before returning to evaluation.

Post-Interview Feedback

After the interview, users receive a structured performance report that includes:

Per-skill evaluation scores (1–10)
Body language analysis
Resume accuracy verification
Communication clarity feedback
A personalized 2-week coaching plan

The system supports 14 languages. Questions remain in English, but Wayne dynamically switches to the user's native language for hints when vocabulary becomes a barrier — ensuring that language does not limit talent recognition.

How we built it

Backend

FastAPI with asynchronous WebSocket handling
Three concurrent tasks run using async workflows:
- Browser → Gemini streaming
- Gemini → Browser streaming
- Interview countdown timer

AI Layer

Google Gemini Live API
Real-time bidirectional audio and video streaming
A ~250-line dynamic system prompt encoding Wayne's 3-state behavior engine

Voice Activity Detection

Custom client-side Voice Activity Detection (VAD) using the Web Audio API, including:

Adaptive noise floor detection (exponential smoothing)
Dynamic thresholding
Hangover and release timing
Echo suppression

This allows natural conversations without a "press-to-speak" button.

Video Analysis

Capture 1 frame every 10 seconds
Resize to 512×512 JPEG
Stream frames to Gemini for body language analysis

Report Generation Pipeline

Three-stage report generation:

Structured interview transcript analysis
Skill-level scoring
Personalized coaching plan

All generated via Gemini Chat with structured output.

Storage

Google Cloud Storage for media and reports
Automatic local fallback
SQLAlchemy Async ORM for transcripts and confidence samples

Frontend

Vanilla HTML / CSS / JavaScript
Modular architecture
No heavy frameworks to reduce overhead

Challenges we ran into

Turn Detection with Gemini Live

Determining when the candidate finished speaking without a manual button was difficult.

We built an adaptive VAD system that:

Learns ambient noise levels
Uses dynamic thresholding

Timing parameters:

250ms hangover
900ms release timing

This handles natural speech pauses effectively.

Echo Suppression

The AI’s audio responses sometimes triggered the microphone.

Solution:

Mode-specific cooldown timers:
- 500ms (standard mode)
- 1500ms (native-audio mode)

We also added input locking during AI responses.

Silence Deadlocks

Occasionally both the user and AI waited for each other.

Solution:

A 2.5-second silence watchdog that nudges the model:

"The candidate has paused. Please continue with your next question."

Persona Consistency

Maintaining Wayne’s three behavioral states required extensive prompt engineering.

Particularly challenging was enforcing mandatory interruption rules for body language violations.

Accomplishments we're proud of

Mandatory Interruption System

Wayne interrupts candidates mid-sentence for:

Poor eye contact
Rambling answers
Weak posture

Most AI interview systems are passive — InterVU behaves like a real interviewer.

Zero-Button Voice Interaction

Our adaptive VAD system allows completely natural conversation even in noisy environments like coffee shops or home offices.

Resume-to-Reality Verification

The system cross-references resume claims against the job description, forcing candidates to prove depth of knowledge.

Multilingual Coaching

Candidates are evaluated in English, but Wayne can switch to the user's native language for hints, ensuring language barriers don't mask true ability.

Production-Ready Resilience

Robust infrastructure includes:

GCS with local fallback
WebSocket disconnect recovery
Timer-triggered report generation
Graceful transcript flushing during disconnections

What we learned

Building real-time AI interview systems revealed several insights:

Real-time audio/video streaming with LLMs is fundamentally different from chat-based systems.
Latency, turn management, and echo suppression are critical engineering challenges.
Persona-driven prompt engineering creates far more natural interactions than basic instruction prompts.
Client-side audio processing significantly improves responsiveness.
Vanilla JavaScript with modular architecture can deliver high-performance real-time applications without heavy frameworks.