Inspiration Technical interviews are broken. Candidates spend weeks preparing with flashcards and LeetCode, but freeze when a real person asks them to explain their thinking out loud. The gap isn't knowledge — it's the experience of being in a live conversation under pressure. Existing tools are either text-based chatbots (nothing like a real interview) or expensive human coaches ($150+/hour). I wanted to build something that actually feels like sitting across from a real interviewer — one that listens, adapts, and pushes you to be better.

What it does InterviewPilot is a real-time AI interview coach that conducts voice-based mock interviews using Gemini's native audio capabilities. You talk to it like a real interviewer — no typing, no scripted flows. It adapts question difficulty based on how you're performing using Item Response Theory, transitions naturally between topics, and generates a detailed scorecard when you're done. The system supports HR, behavioral, and technical interview phases with distinct interviewer personas, each with their own voice and style. For technical roles, questions are pulled from curated banks covering Python, ML/AI, system design, and more — calibrated from junior to staff level.

How I built it The core insight was using Gemini 2.5 Flash with native audio I/O through the Live API — no separate speech-to-text or text-to-speech pipelines. This gives sub-second conversational latency that feels natural, not robotic. I built a three-service architecture: a Next.js frontend for the interview UI and scorecards, a FastAPI backend managing sessions and evaluation data, and a LiveKit Agent running the Gemini-powered interviewer. Real-time coaching feedback flows through LiveKit data channels, not REST polling, so the candidate sees live notes as they speak. The adaptive difficulty engine tracks a theta parameter (inspired by Item Response Theory) that adjusts in real-time — strong answers increase difficulty, struggles ease it back. A background evaluator powered by Gemini Flash Lite scores responses on specificity, structure, and depth without interrupting the conversation flow. Everything runs on Google Cloud — Compute Engine for the services, Cloud Artifact Registry for container images, with Docker Compose orchestrating the stack.

Challenges I ran into Getting the timing right for Gemini's live audio session was the hardest part. The SDK's generate_reply call needs the realtime connection fully established before it works, but the agent lifecycle fires on_enter before that's ready. I had to restructure the greeting flow to happen after session.start() completes rather than in the agent's entry hook. I also tried building a context reset mechanism to keep long interviews from hitting token limits — it caused the agent to go completely silent. Turned out Gemini's native sliding_window compression with session_resumption handles this far better than any manual approach. Deploying a three-service stack with WebRTC, WebSocket upgrades, and audio streaming through Nginx required careful proxy configuration to keep all the real-time channels working.

What I learned Native audio changes everything. The difference between piping text through TTS versus having Gemini speak directly is massive — the pacing, intonation, and conversational feel are on another level. It's the difference between talking to Siri and talking to a person. I also learned that adaptive difficulty matters more than question quality. A mediocre question asked at the right difficulty level teaches more than a perfect question that's too easy or too hard.

What's next for InterviewPilot Multi-language support (the question banks already cover 5 languages), webcam-based body language feedback, interview recording and playback for self-review, and a collaborative mode where a real mentor can shadow the AI interview and add their own feedback in real-time.

Built With

Share this project:

Updates