Inspiration
Technical interviews are high-stakes but hard to practice. Mock interviews with friends are hard to schedule, platforms like LeetCode give no conversational feedback, and hiring a coach is expensive. I wanted to build something that feels like a real interview where you speak out loud, write actual code, and get honest feedback available any time, for free.
What it does
AI Mock Interviewer conducts a full technical interview in real time using voice and vision. You speak to an AI interviewer named Alex, answer concept questions out loud, write code in your editor while Alex watches your screen, and receive a scored coaching report the moment the interview ends. The entire session from greeting to feedback report runs autonomously with no human involvement.
How we built it
The core is Google Gemini Live for real-time bidirectional audio and vision. The browser captures microphone audio at 16kHz via AudioWorklet and screen frames as JPEG every 5 seconds, both streamed to a FastAPI backend over WebSocket. The backend forwards audio and video to Gemini Live and plays back the AI response at 24kHz. The hardest part was the audio pipeline. Gemini Live requires explicit stream_end signals to know when the user has finished speaking. We built a custom VAD system from scratch - computing RMS energy on every 256ms PCM chunk, detecting sustained silence, applying a cooldown after the AI finishes speaking to prevent echo triggering false stream_ends, and handling barge-in when the candidate speaks over Alex. The second challenge was making the interview follow a reliable structure. A tool-based state machine enforces the correct sequence - concept questions, coding phase, wrap-up - by blocking tool calls that fire out of order and returning error messages so Gemini self-corrects. Five concurrent async tasks per session handle audio sending, audio receiving, VAD, stuck-candidate nudging, and timer watching. The final report is generated by Gemini 2.5 Flash and rendered as markdown in the browser.
Challenges we ran into
- 1011 WebSocket errors - Gemini Live closes idle connections. Fixed with a keepalive loop sending silent audio every 20 seconds with guards to prevent it firing during tool calls or active speech.
- Duplicate audio - Gemini sometimes sends the same sentence twice as separate audio turns. Fixed by comparing each turn's transcript against the previous and sending an interrupted signal to the browser to discard duplicates.
- Model vocalising tool calls - Gemini occasionally speaks "log behavioral note" aloud instead of calling the tool silently. Fixed with transcript cleaning that strips function-call syntax and auto-rescue that detects and executes spoken tool calls directly in Python.
- Premature interview endings - Without guardrails Gemini would call end_interview after 30 seconds. Fixed with a closing_spoken gate that only unblocks end_interview after a turn containing actual closing words like "thanks for your time."
Accomplishments that we're proud of
A fully autonomous AI interviewer that conducts a real technical interview from greeting to scored report - completely hands-free, in real time, with voice and vision. The VAD pipeline handles natural speech with pauses, barge-in, and echo cancellation without any third-party VAD library. The tool state machine reliably enforces interview structure across every session.
What we learned
Gemini Live is powerful but requires careful state management on the application side. The model is capable of conducting a great interview when given the right guardrails, the challenge is building those guardrails reliably in an async real-time environment where timing matters at the millisecond level. Pure "let the LLM decide everything" doesn't work in production, structured tool validation and state machines are essential for agentic systems that need to follow a reliable flow.
What's next for AI Mock Interviewer
- Support for multiple interview types: system design, behavioral-only, frontend
- Session history so candidates can track improvement over time across multiple sessions
- Configurable difficulty levels: junior, mid, senior
- Multi-language support for non-English speakers
- Integration with job descriptions: paste a JD and Alex tailors the interview to that specific role
Built With
- fastapi
- gemini-2.5-flash
- gemini-live
- google-adk
- google-cloud-run
- python
- web-audio-api
- websockets
Log in or sign up for Devpost to join the conversation.