InterVU
Inspiration
Job interviews are nerve-wracking, and most candidates rarely receive honest, real-time feedback about their performance. Feedback usually focuses on what they said, but rarely covers how they communicated — eye contact, posture, speaking clarity, and confidence.
Most existing mock interview tools are passive systems. They simply ask questions and wait for answers. Real interviewers behave differently — they interrupt, probe deeper, and observe body language.
We wanted to build something closer to a real interview experience: sitting across from a tough but fair senior engineering manager who:
- Challenges your claims
- Notices behavioral signals like eye contact and posture
- Interrupts when answers are unclear
- Switches to coaching mode when you're truly stuck
This idea led to InterVU.
What it does
InterVU is a real-time AI mock interview platform powered by Google's Gemini Live API.
Users upload a job description and their resume, then enter a live audio/video interview with Wayne — an AI Senior Engineering Hiring Manager.
Wayne operates using a 3-state behavior engine:
1. The Interrogation
Wayne cross-references the candidate’s resume against the job description and probes deeper to verify whether the candidate actually used the skills they claim.
2. The Visual & Audio Evaluator
Wayne continuously monitors body language and communication signals, including:
- Eye contact
- Posture
- Speaking duration
Wayne will interrupt the candidate if they:
- Break eye contact for more than 5 seconds
- Slouch
- Ramble beyond 30 seconds
3. The Tutor Pivot
If the candidate is fundamentally stuck, Wayne temporarily drops the interviewer persona and switches to teaching mode, explaining the concept using simple analogies before returning to evaluation.
Post-Interview Feedback
After the interview, users receive a structured performance report that includes:
- Per-skill evaluation scores (1–10)
- Body language analysis
- Resume accuracy verification
- Communication clarity feedback
- A personalized 2-week coaching plan
The system supports 14 languages. Questions remain in English, but Wayne dynamically switches to the user's native language for hints when vocabulary becomes a barrier — ensuring that language does not limit talent recognition.
How we built it
Backend
- FastAPI with asynchronous WebSocket handling
- Three concurrent tasks run using async workflows:
- Browser → Gemini streaming
- Gemini → Browser streaming
- Interview countdown timer
AI Layer
- Google Gemini Live API
- Real-time bidirectional audio and video streaming
- A ~250-line dynamic system prompt encoding Wayne's 3-state behavior engine
Voice Activity Detection
Custom client-side Voice Activity Detection (VAD) using the Web Audio API, including:
- Adaptive noise floor detection (exponential smoothing)
- Dynamic thresholding
- Hangover and release timing
- Echo suppression
This allows natural conversations without a "press-to-speak" button.
Video Analysis
- Capture 1 frame every 10 seconds
- Resize to 512×512 JPEG
- Stream frames to Gemini for body language analysis
Report Generation Pipeline
Three-stage report generation:
- Structured interview transcript analysis
- Skill-level scoring
- Personalized coaching plan
All generated via Gemini Chat with structured output.
Storage
- Google Cloud Storage for media and reports
- Automatic local fallback
- SQLAlchemy Async ORM for transcripts and confidence samples
Frontend
- Vanilla HTML / CSS / JavaScript
- Modular architecture
- No heavy frameworks to reduce overhead
Challenges we ran into
Turn Detection with Gemini Live
Determining when the candidate finished speaking without a manual button was difficult.
We built an adaptive VAD system that:
- Learns ambient noise levels
- Uses dynamic thresholding
Timing parameters:
- 250ms hangover
- 900ms release timing
This handles natural speech pauses effectively.
Echo Suppression
The AI’s audio responses sometimes triggered the microphone.
Solution:
- Mode-specific cooldown timers:
- 500ms (standard mode)
- 1500ms (native-audio mode)
We also added input locking during AI responses.
Silence Deadlocks
Occasionally both the user and AI waited for each other.
Solution:
A 2.5-second silence watchdog that nudges the model:
"The candidate has paused. Please continue with your next question."
Persona Consistency
Maintaining Wayne’s three behavioral states required extensive prompt engineering.
Particularly challenging was enforcing mandatory interruption rules for body language violations.
Accomplishments we're proud of
Mandatory Interruption System
Wayne interrupts candidates mid-sentence for:
- Poor eye contact
- Rambling answers
- Weak posture
Most AI interview systems are passive — InterVU behaves like a real interviewer.
Zero-Button Voice Interaction
Our adaptive VAD system allows completely natural conversation even in noisy environments like coffee shops or home offices.
Resume-to-Reality Verification
The system cross-references resume claims against the job description, forcing candidates to prove depth of knowledge.
Multilingual Coaching
Candidates are evaluated in English, but Wayne can switch to the user's native language for hints, ensuring language barriers don't mask true ability.
Production-Ready Resilience
Robust infrastructure includes:
- GCS with local fallback
- WebSocket disconnect recovery
- Timer-triggered report generation
- Graceful transcript flushing during disconnections
What we learned
Building real-time AI interview systems revealed several insights:
- Real-time audio/video streaming with LLMs is fundamentally different from chat-based systems.
- Latency, turn management, and echo suppression are critical engineering challenges.
- Persona-driven prompt engineering creates far more natural interactions than basic instruction prompts.
- Client-side audio processing significantly improves responsiveness.
- Vanilla JavaScript with modular architecture can deliver high-performance real-time applications without heavy frameworks.
What's next for InterVU
Panel Interviews
Multiple AI interviewers with different personas:
- Technical Lead
- Behavioral Interviewer
- Culture Fit Evaluator
All participating in a single interview session.
Interview Analytics Dashboard
Users will be able to track improvement through:
- Confidence curves
- Skill progression graphs
- Body language heatmaps
Company-Specific Preparation Modes
Wayne will simulate interview styles for specific companies, such as:
- FAANG system design interviews
- Startup technical rounds
- Consulting case interviews
Peer Practice Rooms
Two users can interview each other while Wayne provides live coaching and feedback to both participants.
Mobile Support
Optimized audio/video capture pipelines for smartphone cameras and microphones, allowing candidates to practice anywhere.
Log in or sign up for Devpost to join the conversation.