Inspiration
Across the world, there are thousands of talented candidates who possess the qualifications necessary to land a particular job. However, an interviewer's perception of an interviewee goes beyond the content of their answers. Many interviewees perform poorly in interviews due to slouched posture, poor eye contact, hand fidgeting, or speaking rapidly, and unfortunately, many people are unaware of these shortcomings.
Traditional mock interviews tools fail to capture these details: they solely focus on the content of an interviewee's answers. As such, we built RecruitReady to provide objective, real-time feedback on an area that is often overlooked by many people
What it does
RecruitReady is a mock interview application that monitors a user in real-time through their webcam and microphone. Our AI interviewer powered by Google Gemini conducts a practice interview while our computer vision system tracks eye contact through iris position tracking; posture, including shoulder alignment, head tilt, and forward lean; body language, such as head motion and hand fidgeting; and ppeech patterns, including words per minute, volume, and clarity.
After each answer, RecruitReady provides personalized, actionable feedback on these features.
How we built it
We built a pipeline combining three core technologies:
Computer Vision: Using OpenCV and MediaPipe, we track 33 body landmarks, hand positions, and facial features, including iris tracking for precise eye contact detection. The system calculates real-time metrics for posture, head motion, hand fidgeting, and gaze direction.
Speech Analysis: OpenAI's Whisper model transcribes speech and analyzes speaking patterns. We implemented text-based Voice Activity Detection (VAD) to track words per minute, volume in decibels, and speech clarity scores.
AI Agent: Google's Gemini 2.0 Flash powers our interview coach. Using Google's Agent Development Kit, we created an autonomous agent with specialized analysis tools for evaluating eye contact, posture, movement, speech pace, volume, and clarity. The agent receives continuous metric streams during responses and provides natural, conversational feedback.
The main application orchestrates everything: streaming camera metrics at 30 FPS, detecting speech start/end automatically, aggregating metrics during each response, and sending comprehensive summaries to the AI agent for analysis.
Challenges we ran into
Calibrating Body Detection Metrics: Getting accurate, consistent measurements across different users, lighting conditions, and camera angles was challenging. MediaPipe provides raw landmark coordinates, but translating those into meaningful metrics like "good posture" or "eye contact" required extensive testing and calibration. We established precise thresholds (like iris position ranges of 2.75-2.85 for the left eye) through trial and error, then implemented smoothing algorithms with moving averages to reduce jitter and false positives from natural movement.
Building an Autonomous AI Agent: Creating an agent that could conduct natural interviews while analyzing complex metrics in real-time was more involved than anticipated. Google's ADK required understanding session management with InMemorySessionService and Runners, async event handling, and proper tool function design. We had to architect specialized analysis tools (for eye contact, posture, movement, speech) that return structured scores and feedback, then teach the agent when and how to use them.
Making Metrics Meaningful: Raw numbers like "head motion: 18.5 px/frame" or "volume: -62 dB" mean nothing without context. We spent significant time establishing what's actually "good" versus "needs improvement”. We calibrated realistic thresholds that balance being helpful without being overly critical, ensuring feedback feels constructive rather than nitpicky. The goal was giving users actionable insights, not overwhelming them with technical data.
Accomplishments that we're proud of
This experience enabled us to learn to develop in a professional environment, utilize natural sounding AI, and complete end-to-end pipeline from raw video/audio input to AI Feedback.
What we learned: We learned that MediaPipe is powerful for real-time pose and facial landmark detection, but requires careful calibration and smoothing. Additionally, this experience taught us that agent frameworks like ADK provide structure but require understanding session management, tool calling, and async patterns. Finally, tuning the UX to show real-time transcription and visual feedback makes the system feel responsive and trustworthy.
What's Next
We plan to add facial micro-expressions, filler word detection ("um", "like", "you know"), and sentiment analysis for improved interview evaluatoin. Additionally, future points of expansion may include Interview Avatar for conversing with a visual AI agent, and Resume and Job Posting Integration for a specific interview framework.
Built With
- adk
- gemini
- javascript
- mediapipe
- npm
- numpy
- openai
- opencv
- pyaudio
- pydantic
- python
- typescript
- wave
- whisper

Log in or sign up for Devpost to join the conversation.