InterviewOS

AI-powered multi-panel mock interviews with real-time coaching, industry-specific depth, and actionable post-interview insights.


Inspiration

Traditional mock interviews are hard to scale, inconsistent in quality, and rarely capture the full picture of how a candidate performs under pressure. Most tools focus only on question-and-answer text, ignoring delivery, timing, and industry context.

With the Gemini 3 family and Gemini 2.5 Live, we saw an opportunity to build something closer to a real hiring panel: multiple distinct interviewers, continuous context, real-time audio and video, and structured feedback that feels like a debrief from a seasoned hiring committee.

InterviewOS is our attempt to turn “practice interviews” into high-fidelity simulations that measure not just what you say, but how you say it and how you evolve over the course of a full session.


What it does

InterviewOS simulates a full panel-style interview and produces a rich, structured evaluation:

  • Multi-panel AI interviewers

    • Generates three distinct panelists with Indian and global names, each with their own role, focus area, and questioning style.
    • Assigns gender-matched voices and avatar colors for clarity in the UI.
  • Real-time live interview

    • Uses Gemini 2.5 Flash Live in the browser for ultra-low-latency audio.
    • Streams your microphone and camera to drive live conversation with the panel.
    • Shows a live transcript with clear “You vs. Panelist” speaker attributions.
  • Adaptive orchestration

    • A dedicated InterviewOrchestrator tracks topics, depth (1–5), and panel balance.
    • A server-side orchestration WebSocket sends hints back to the client: which topic to explore next, how deep to go, and which panelist should lead.
  • Emotion and body language snapshots

    • Periodically captures short video segments during the call.
    • Sends them to backend endpoints for body language and emotion analysis, with rate limiting and safe fallbacks.
    • Feeds a dashboard-like view of posture, eye contact, and general composure across the interview.
  • Industry-specialized evaluation

    • Supports profiles for FAANG, Finance, Consulting, Medical, Legal, Startup, and General via IndustrySpecialist.
    • Can generate industry-specific questions and evaluations (scores, strengths, weaknesses, recommendations).
  • Final multi-dimensional report

    • Uses Gemini 3 Pro to synthesize:
    • Technical, Communication, and Culture Fit scores.
    • Panelist-level comments and improvement suggestions.
    • Augments the report with sample-based body/voice/temporal analytics, clearly labeled as demonstration data when APIs are rate-limited.

How we built it

  • Frontend (React + TypeScript + Vite)

    • A single-page React app written in TypeScript, styled with Tailwind CSS and animated with Framer Motion.
    • Core UI pieces:
    • LiveInterview.tsx for the full interview experience (camera feed, controls, live transcript, timers).
    • Dashboard.tsx, PanelConfiguration.tsx, and ResumeUploader.tsx for setup and post-interview views.
    • Uses:
    • @google/genai in the browser to open a Gemini Live session and stream audio/video.
    • A custom audio worklet and ScriptProcessor fallback to send 16kHz PCM packets to Gemini in near real time.
    • useVAD hook for client-side voice activity detection, triggering end-of-speech events.
    • useVideoAnalysis hook to periodically record VP8 clips, convert them to base64, and call /api/analyze-body-language.
  • Backend (Node.js + Express + WebSocket)

    • Express API in server/src/index.ts exposing:
    • /api/health, /api/parse-resume, /api/generate-panelists, /api/generate-report.
    • Advanced endpoints: /api/analyze-emotion, /api/analyze-body-language, /api/analyze-speech.
    • Industry endpoints: /api/industry/:industry, /api/industry-questions, /api/industry-evaluate.
    • WebSocket orchestration server at /ws/interview:
    • Uses LiveInterviewHandler to receive transcript updates and speech_end events.
    • Calls InterviewOrchestrator to compute hints (topic, depth, panelist) and time/phase updates.
  • Core services

    • GeminiService:
    • Wraps Gemini 3 Flash and Gemini 3 Pro with typed responseSchema for structured JSON.
    • Implements retry with exponential backoff, skipping 4xx client errors.
    • Generates panelists and final reports and augments reports with sample analytics when needed.
    • InterviewOrchestrator:
    • Tracks interview phase (opening/active/closing/completed), topics covered, depth, and panelist workloads.
    • Uses simple but effective heuristics to decide when to follow up, when to switch topics, and when to rotate panelists.
    • EmotionAnalyzer and PresentationCoach:
    • Handle text/audio/video analysis for emotion and body language, wrapped with rate limiting and fallback defaults.
    • IndustrySpecialist:
    • Encodes reusable profiles for industries and drives question generation and answer evaluation on top of Gemini Pro.

Challenges we ran into

  • Low-latency audio streaming

    • Ensuring smooth, gap-free playback while decoding base64 audio from Gemini in the browser required a queue + pre-decoding strategy.
    • Handling edge cases when the audio worklet fails and falling back to ScriptProcessor without breaking the user experience.
  • Transcript consistency

    • Gemini Live often sends cumulative partial transcripts; we had to carefully reconcile:
    • Input transcriptions (user speech) and
    • Output transcriptions (panel speech) into a single, readable chat-style log without duplicates or missing chunks.
  • Panel orchestration without over-complication

    • Designing InterviewOrchestrator to be smart enough (topics, depth, panel rotation, timing) without making it brittle or overfit to a specific conversation pattern.

Accomplishments that we’re proud of

  • Multi-panel interview that feels coherent

    • Three AI interviewers with distinct personas, voices, and focus areas, tied together by a shared orchestrator, produce a session that feels more like a real panel than a single model “persona”.
  • End-to-end real-time experience

    • From microphone input to Gemini 2.5 Live to audio playback and transcript, the pipeline is tuned for low latency and stable behavior, instrumented with timing logs for continuous tuning.
  • Thoughtful orchestration layer

    • The orchestration WebSocket and InterviewOrchestrator give us a place to experiment with “Marathon Agent”-style logic:
    • Tracking depth,
    • Managing phases and timing,
    • And enabling dynamic panel handoffs.
  • Industry-aware evaluation

    • The IndustrySpecialist service lets the same core engine feel tailored to FAANG vs Finance vs Consulting vs Medical, without forking the rest of the system.

What we learned

  • Gemini Live is powerful, but demands careful UX

    • The technology is capable of near-conversational latency, but the user’s perceived smoothness depends on:
    • How transcripts are updated,
    • How audio is buffered,
    • And how clear the UI is about who is speaking.
  • Orchestration is where “application-level intelligence” lives

    • The biggest leap in realism came not from tweaking prompts, but from explicit orchestration:
    • Tracking state,
    • Planning next actions,
    • And feeding hints back into the Live session.
  • Rate limiting and fallbacks are part of product design

    • Designing a good experience meant assuming APIs will occasionally say “no,” and making sure the app still:
    • Responds quickly,
    • Shows something meaningful,
    • And clearly labels any sampled/demo data.
  • Typed schemas reduce friction

    • Using Gemini’s structured output via responseSchema removed a lot of fragile parsing logic and made the system more robust to prompt drift.

What’s next for InterviewOS

  • Deeper “Marathon Agent” behavior

    • Expand InterviewOrchestrator with richer Thought Signatures and explicit self-correction loops so the panel can critique and refine its own questions across long sessions.
  • Richer post-interview analytics

    • Turn temporal trends (confidence, nervousness, engagement) into:
    • Comparative views across multiple sessions,
    • And personalized recommendation plans over time.
  • More granular industry & role templates

    • Add role-specific panels (e.g. “Staff Backend at FAANG”, “Product Manager in FinTech”) with targeted question banks and scoring rubrics.
  • Interactive replay

    • Allow candidates to replay key moments:
    • Jump to points with high stress or low clarity,
    • See recommendations tied to specific transcript segments.
  • Team & recruiter dashboards

    • Extend InterviewOS from a solo practice tool into a team training and evaluation platform, where mentors or recruiters can review reports and annotate sessions.

Tech Stack (at a glance)

  • Frontend

    • React 19, TypeScript 5.8, Vite
    • Tailwind CSS, Framer Motion, Recharts
    • @google/genai (browser), custom audio worklet, useVAD, useVideoAnalysis
  • Backend

    • Node.js 20+, Express, WebSocket (ws)
    • @google/genai, Multer, dotenv
  • AI Models

    • Gemini 3 Flash – resume parsing, panelist generation
    • Gemini 3 Pro – final evaluation, industry-specific reasoning
    • Gemini 2.5 Flash Live – real-time audio and transcript

Built With

  • express.js
  • framer-motion
  • gemini-2.5-flash-live)
  • google/genai-(gemini-3-flash/pro
  • html2canvas
  • jspdf
  • mediarecorder
  • multer
  • node.js-20+
  • react-19
  • react-router
  • recharts
  • tailwind-css
  • typescript
  • vite
  • web-audio-api
  • webrtc/getusermedia
  • websocket-(ws)
Share this project:

Updates