A.R.I.A.

Inspiration

Coding interviews - one which I had recently for a club on campus - are terrifying. Not because I was unsure of the material, but because I've never had to talk out loud and answer spontaneous questions while my mind is already occupied with a LeetCode question. LeetCode is a great tool - but it only trains you how to code; we wanted to build something that puts you in the hot seat the same way a real interview does: a voice at the other end pushing back, asking follow-up questions, and most importantly - keeping you on the clock. Thats where ARIA - AI Readiness Interview Assistant - was born.

What it does

ARIA simualtes a full technical coding interview with a voice-driven AI interviewer. You can configure your session by question style (LeetCode, system design, debugging), difficulty, interview persona (ranging from friendly coach to tough pressure), and time limit. Then, you're dropped into a live interview: the AI introduces itself, gives you a LeetCode problem, and listens as you explain your thinking; asking (and, if needed, answering) clarifying questions and pushing back against incorrect assumptions as you code in Monaco editor.

Once you finish, ARIA generates a detailed scorecard across 6 dimensions (correctness, algorithm choice, communication clarity, confidence, etc,), follows up with a few system design trade-off questions (if applicable), and gives you a full transcript and optimal solution comparision to review. If enabled, it can also use Google's MediaPipe + OpenCV to monitor your camera, tracking gaze and presence for an extra layer of realism.

How we built it

Next.js 16 + React 19 for the full-stack framework
Claude API for real-time interviewer responses and post-session scoring
ElevenLabs TTS with 3 distinct voice IDs per persona to give each interviewer a unique feel
Browser Web Speech API for in-browser speech-to-text
Monaco Editor for the in-browser code editor with syntax highlighting and multi-language support
MediaPipe for optional camera-based gaze tracking and presence scoring
Tailwind CSS v4 + shadcn/ui + Allotment for the resizable split-panel interview layout
Framer Motion for animated scorecard reveals

Challenges we ran into

The hardest part was managing the async voice loop without race conditions. Browser STT fires interim and final transcript events continuously, and our AI response pipeline (STT -> Claude -> ElevenLabs -> Audio Playback) has multiple async stages that can collide. We ended up using React refs instead of state for in-flight values, and added guards to prevent a single utterance from being processed twice.

Parsing structured scorecard data out of Gemin's streamed response was also a bit messy - we needed reliable JSON extraction from a streaming text response, and had to handle edge cases like malformed arrays and inconsistent field names across question types.

Getting ElevenLabs latency low enough to feel conversational took turning too - we proxied it server-side and cached audio blobs on the client to minimize perceived delay.

Accomplishments that we're proud of

The full voice interview loop actually works!! More than just speaking, you can get a response and get pushed back on, or get a hint if you need! All of this is end to end, in a browser, with no accounts
Three distinct interviewer personas with unique voices and conversation styles that genuinely feel different
A feedback page with animated circular score indicators, per-dimension breakdowns, and a side-by-side code comparison that makes the post-session reviewgenuinely useful

What we learned

Real-time voice UX is hard. The gap between "technically works" and "feels natural to use" is enormous when speech is involved - timing, interruption handling, answering at the 'correct' time, and perceived latency all matter in ways that text-based UIs don't really have to worry about. We also learned how much prompt engineering matters for interview simulation: getting Claude to stay in character as a specific interviewer persona while still asking technically rigorous questions required a lot of iteration on the system prompt.

What's next for A.R.I.A.

Company-Specific Questions and Interviewer Personas - we've been doing research on how different different company's interviews are: not just in terms of the questions they ask, but the style and vibe overall. Accommodating for that would be very helpful for candidates preparing for specific company's interviews.
Session Replay - even just an ability to hear back on audio snippets helps candidates understand what the other side hears
Filler word heatmap
Resume-aware questions

Built With

claudeapi
elevenlabs
mediapipes
monaco-editor
nextjs
opencv
python
shadcn
tailwind

Submitted to

DiamondHacks 2026

Created by

I worked on the idea of this website which allows you to interview prep with confidence, added MediaPipe for camara, and created anthropic API for answers, Elevenlabs for voice. I wanted to add facial recognition to give feedback on eye movement-body movement etc. We also thought about pulling data from git for more accurate answers for system design questions but time ran out.

Omar Masad
I worked on bits of both the front-end (E.g. the landing, and info pages), as well as the back-end (especially with the Gemini API and ElevenLabs with the interviewer-interviewee interactions like follow up questions and hints). I also helped storyboard and build the video and deck for the final pitch, and write up the description.

Rohan George
Ryan Soe
Christina Lin

Updates

Ryan Soe started this project — Apr 05, 2026 12:12 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.