Inspiration

As someone actively pursuing backend software engineering roles and applying to MSCS programs, I spend a lot of time preparing for technical interviews. But grinding coding problems in isolation isn't enough; the hardest part is verbally communicating your thought process, weighing algorithmic trade-offs, and explaining system design choices out loud. I wanted to build a realistic, proactive mock interviewer that could actually see what I was coding and talk back to me in real-time, removing the friction and cost of scheduling human mock interviews.

What it does

Interview Helper is an immersive mock interview agent powered by the Gemini 2.5 Flash Live API. It acts as a proactive Senior Software Engineer. By capturing your screen and microphone, it watches you write code or draw system architecture diagrams. If you start writing a brute-force O(n^2) solution or make a suboptimal design choice, the agent gently interrupts you via voice to ask for optimizations or edge cases, just like a real human interviewer.

How I built it

The project relies on a low-latency, bidirectional streaming architecture:

  • Client: A local Python application using mss, cv2, and sounddevice to capture 1080p screen frames and raw microphone audio.
  • Backend: A FastAPI server managing secure WebSocket (wss://) connections.
  • Cloud Hosting: To ensure the low latency required for real-time voice conversations, the backend is containerized with Docker and deployed to Google Cloud Run.
  • AI Engine: The backend utilizes the google-genai SDK to stream the multimodal data (raw audio bytes and screen captures) directly to the Gemini Live API, leveraging its native audio capabilities for instantaneous voice responses.

Challenges I ran into

Managing the asynchronous event loops in Python was incredibly tricky. I had to ensure that capturing the screen, recording audio, and playing back the AI's voice didn't block one another, which required careful use of asyncio queues and thread-safe callbacks. I also ran into some 502 Bad Gateway errors during my Google Cloud deployment due to strict schema validation with LiveConnectConfig, and I had to spend time fine-tuning the Voice Activity Detection (VAD) silence_duration_ms so the agent interrupted at a natural speed.

What I learned

I leveled up my understanding of real-time transport protocols by building and debugging live WebSockets. I also learned how to containerize an ASGI application, deploy it seamlessly to Google Cloud Run, and interact with cutting-edge multimodal AI models using raw byte streaming instead of traditional text-based REST requests.

What's next for Interview Helper

I plan to add customizable interview personas (e.g., "Strict Principal Engineer" vs. "Helpful Startup CTO"), support for ingesting PDFs of a user's resume for tailored behavioral questions, and session transcripts for post-interview review.

Built With

Share this project:

Updates