Inspiration

Coding interviews - one which I had recently for a club on campus - are terrifying. Not because I was unsure of the material, but because I've never had to talk out loud and answer spontaneous questions while my mind is already occupied with a LeetCode question. LeetCode is a great tool - but it only trains you how to code; we wanted to build something that puts you in the hot seat the same way a real interview does: a voice at the other end pushing back, asking follow-up questions, and most importantly - keeping you on the clock. Thats where ARIA - AI Readiness Interview Assistant - was born.

What it does

ARIA simualtes a full technical coding interview with a voice-driven AI interviewer. You can configure your session by question style (LeetCode, system design, debugging), difficulty, interview persona (ranging from friendly coach to tough pressure), and time limit. Then, you're dropped into a live interview: the AI introduces itself, gives you a LeetCode problem, and listens as you explain your thinking; asking (and, if needed, answering) clarifying questions and pushing back against incorrect assumptions as you code in Monaco editor.

Once you finish, ARIA generates a detailed scorecard across 6 dimensions (correctness, algorithm choice, communication clarity, confidence, etc,), follows up with a few system design trade-off questions (if applicable), and gives you a full transcript and optimal solution comparision to review. If enabled, it can also use Google's MediaPipe + OpenCV to monitor your camera, tracking gaze and presence for an extra layer of realism.

How we built it

  • Next.js 16 + React 19 for the full-stack framework
  • Claude API for real-time interviewer responses and post-session scoring
  • ElevenLabs TTS with 3 distinct voice IDs per persona to give each interviewer a unique feel
  • Browser Web Speech API for in-browser speech-to-text
  • Monaco Editor for the in-browser code editor with syntax highlighting and multi-language support
  • MediaPipe for optional camera-based gaze tracking and presence scoring
  • Tailwind CSS v4 + shadcn/ui + Allotment for the resizable split-panel interview layout
  • Framer Motion for animated scorecard reveals

Challenges we ran into

The hardest part was managing the async voice loop without race conditions. Browser STT fires interim and final transcript events continuously, and our AI response pipeline (STT -> Claude -> ElevenLabs -> Audio Playback) has multiple async stages that can collide. We ended up using React refs instead of state for in-flight values, and added guards to prevent a single utterance from being processed twice.

Parsing structured scorecard data out of Gemin's streamed response was also a bit messy - we needed reliable JSON extraction from a streaming text response, and had to handle edge cases like malformed arrays and inconsistent field names across question types.

Getting ElevenLabs latency low enough to feel conversational took turning too - we proxied it server-side and cached audio blobs on the client to minimize perceived delay.

Accomplishments that we're proud of

  • The full voice interview loop actually works!! More than just speaking, you can get a response and get pushed back on, or get a hint if you need! All of this is end to end, in a browser, with no accounts
  • Three distinct interviewer personas with unique voices and conversation styles that genuinely feel different
  • A feedback page with animated circular score indicators, per-dimension breakdowns, and a side-by-side code comparison that makes the post-session reviewgenuinely useful

What we learned

Real-time voice UX is hard. The gap between "technically works" and "feels natural to use" is enormous when speech is involved - timing, interruption handling, answering at the 'correct' time, and perceived latency all matter in ways that text-based UIs don't really have to worry about. We also learned how much prompt engineering matters for interview simulation: getting Claude to stay in character as a specific interviewer persona while still asking technically rigorous questions required a lot of iteration on the system prompt.

What's next for A.R.I.A.

  • Company-Specific Questions and Interviewer Personas - we've been doing research on how different different company's interviews are: not just in terms of the questions they ask, but the style and vibe overall. Accommodating for that would be very helpful for candidates preparing for specific company's interviews.
  • Session Replay - even just an ability to hear back on audio snippets helps candidates understand what the other side hears
  • Filler word heatmap
  • Resume-aware questions

Built With

  • claudeapi
  • elevenlabs
  • mediapipes
  • monaco-editor
  • nextjs
  • opencv
  • python
  • shadcn
  • tailwind
Share this project:

Updates