Inspiration
Preparing for interviews is one of the most stressful and uncertain parts of getting a job. I’ve personally felt the frustration of not knowing whether I’m ready, what I’m doing wrong, or why I’m not getting selected despite putting in the effort.
Most students don’t fail because they lack talent, they fail because they lack direction, feedback, and realistic practice.
I wanted to build something I genuinely needed: a system that doesn’t just let you practice, but actually tells you how to improve and what’s holding you back. That’s how IntervAI was born.
What it does
IntervAI is a voice-powered mock interview platform. You enter your target role, company, and interview type (technical, HR, or mixed), optionally upload your resume, and get dropped into a live interview with Alex, an AI interviewer powered by GPT-4o.
Alex speaks your questions out loud using ElevenLabs voice synthesis You answer using your microphone — your speech is transcribed in real time Questions escalate in difficulty: warm-up → core → advanced Questions are tailored to your role, company, and resume After ending, you get a full report: per-question scores, model answers showing exactly how a great candidate would have answered, "What You Missed," coaching tips, communication insights, study recommendations, and an overall performance score
IntervAI doesn’t just help you practice — it helps you understand exactly what to fix to land the job.
How I built it
Frontend: Next.js 15 (App Router) with TypeScript, fully client-side voice interaction AI backbone: UMD's TerpAI (GPT-4o) as the primary model, with Featherless.ai (Mistral 7B) as fallback Voice: ElevenLabs API for text-to-speech (Rachel voice), Web Speech API for real-time speech-to-text transcription Auth: Auth0 for user authentication and session management Resume parsing: Featherless AI extracts structured information from uploaded resumes to tailor questions Performance: Question prefetching during the listening phase so the next question + audio is ready before the user finishes answering, near-instant transitions Reports: Parallel API calls, final_report for the overall summary and individual analyze_answer calls per question simultaneously, cutting report generation from ~20 seconds to ~6 seconds
Challenges I ran into
TerpAI integration: There was no public documentation. We reverse-engineered the SSE streaming protocol from network DevTools, it streams base64-encoded chunks with custom event types that we had to decode manually. React StrictMode double invocation: In development, useEffect runs twice, which caused two simultaneous AI calls and the interviewer speaking over itself. Fixed with a hasStartedRef guard. LLM JSON reliability: Mistral 7B frequently wraps JSON output in markdown fences or adds preamble text, causing silent parse failures and empty reports. Solved with a multi-strategy JSON extractor and automatic retry logic. Speech recognition gaps: The Web Speech API fires onend even mid-sentence when there's a pause, dropping words. We used continuous mode with an accumulator pattern to capture all final transcripts. Answer loss on early exit: When users clicked "End Interview" mid-answer, the current in-progress transcript was silently discarded. Fixed by capturing transcriptRef.current in endInterview before navigating.
Accomplishments that we're proud of
The interview genuinely feels like a real conversation — voice in, voice out, escalating difficulty, personalized to your resume Near-instant question transitions through prefetching: the next question and its audio are fetched in the background while you're still answering the current one The report is specific and actionable — model answers are written in first person demonstrating exactly what a great answer sounds like, not just bullet points of what you missed Successfully integrated TerpAI (GPT-4o) with no official API docs — purely from inspecting network requests
What we learned
How to work with streaming SSE APIs and decode base64-chunked responses in real time The complexity of building reliable voice UI: audio timing, speech recognition edge cases, and keeping state consistent across async audio/recognition lifecycles LLM output is non-deterministic — production apps need robust JSON extraction, validation, and retry layers, not just a JSON.parse() call Perceived performance matters as much as actual performance — showing question text immediately while audio loads makes a 3-second wait feel instant
What's next for Intervai
Live feedback during the interview — real-time filler word detection ("um", "like", "you know") and pacing suggestions shown subtly on screen Industry-specific question banks — curated question sets for FAANG, consulting, finance, product management Progress tracking — score trends across sessions, weak areas highlighted over time, streaks Video mode — webcam analysis for eye contact, posture, and facial confidence alongside voice Shareable reports — generate a PDF or shareable link of your interview performance to send to a mentor or coach
Built With
- api
- auth0
- elevenlabs
- featherless.ai
- gpt-4o
- mistral-7b
- next.js
- node.js
- react
- speech
- terpai
- typescript
- web
Log in or sign up for Devpost to join the conversation.