ReadTheRoom

Inspiration

Preparing for interviews is stressful because most candidates never get clear feedback on how they actually come across. You can record yourself answering practice questions, but rewatching your own videos is awkward, time-consuming, and easy to misjudge. You might not notice that you are rambling, using filler words, speaking too quickly, avoiding eye contact, or giving answers that sound less structured than they felt in the moment.

ReadTheRoom was inspired by the idea that interview coaching should be more accessible, immediate, and less intimidating. Instead of waiting for a mentor, career coach, or mock interviewer, candidates can get fast AI-powered feedback after every practice answer.

What it does

ReadTheRoom is an AI-powered interview coach that helps users understand both what they say and how they say it during practice interviews.

Users choose an interview question, record their answer in the browser, and receive a feedback report in under two minutes. The report includes:

Communication scores: confidence, clarity, and conciseness
Speech analytics: filler words, hedging phrases, pacing, duration, and long pauses
Answer quality: relevance, STAR-method structure, strengths, and improvement areas
Video insights: engagement and facial expression signals from sampled frames
Actionable coaching: specific next steps to improve before the real interview

The goal is not just to score users, but to help them practice, improve, and build confidence faster.

How we built it

Frontend: Vue 3, Vite, and Tailwind CSS

Browser-based video recording with the MediaRecorder API
Interview question audio playback powered by ElevenLabs text-to-speech
Smooth 3-step flow with animated modals and progress indicators
Professional landing page and feedback visualization

Backend: FastAPI and Python

FFmpeg extracts audio from uploaded interview videos
ElevenLabs transcribes the extracted audio into text
Custom audio analysis detects filler words, hedging phrases, pacing, duration, and long pauses
Video frames are sampled for expression and engagement analysis
Google Gemini generates structured interview coaching from the transcript
Results are combined into one feedback report for the frontend

Challenges we ran into

AI orchestration: combining video recording, FFmpeg processing, ElevenLabs transcription, custom metrics, vision analysis, and Gemini feedback into one smooth flow
Speech analysis accuracy: detecting filler words and hedging phrases without overcounting normal speech
Prompt engineering: making Gemini respond like a supportive coach instead of a generic evaluator
Performance: keeping the record-to-feedback flow fast enough for users to practice repeatedly
User experience: presenting feedback in a way that feels useful and confidence-building, not discouraging

Accomplishments that we're proud of

ReadTheRoom turns a raw interview recording into a structured coaching report in under two minutes. By combining browser-native video recording, FFmpeg audio extraction, ElevenLabs transcription, custom speech analytics, and Gemini-powered feedback, we built an end-to-end pipeline that transforms unstructured video into measurable interview insights.

We are especially proud of our custom audio metrics engine. Instead of relying only on an LLM, we extract concrete communication signals such as filler word frequency, hedging phrases, speaking pace, long pauses, buzzwords, and STAR-method indicators. These metrics are then presented alongside AI-generated coaching so users can see both objective speech patterns and qualitative feedback in one dashboard.

The result is a fast, repeatable interview practice loop: record an answer, get detailed feedback, improve, and try again.

What we learned

How to orchestrate multiple AI and media-processing tools in one product
How to extract useful communication signals from transcripts and audio
How to design LLM prompts for constructive coaching
How to build a full-stack Vue + FastAPI application under hackathon time pressure
How to turn raw AI output into a dashboard users can actually understand and act on

What's next for ReadTheRoom

Video playback with feedback overlays showing exactly where filler words, pauses, or weak moments happened
Interview history and progress tracking over time
Multi-language interview practice
Harder mock interview scenarios with follow-up questions