Our Story: Building Tremolo

What Inspired Us

As Carnegie Mellon students, we hear about interviews everywhere. They're essential to getting jobs—especially since the lockdown, when nearly 80% of employers began conducting virtual interviews. But not everyone has someone to review how they come across, and many don't even want to ask for that kind of feedback.

That's where Tremolo came from: a real-time interview coaching overlay that helps you improve how you come across during interviews.

Tremolo gives you live feedback on your pace, vocal energy, eye contact, and filler words, while automatically analyzing your STAR method responses, a structured storytelling technique. When you're speaking too fast, using too many filler words, or missing key story components, Tremolo alerts you instantly and helps you improve in real-time. Beyond interviews, Tremolo is widely applicable: from defending your thesis to presenting slideshows, Tremolo helps you communicate more effectively while you speak.

What We Learned

Building Tremolo taught us a lot about:

Real-time audio processing: We integrated Python-based audio analysis (using librosa and sounddevice) with Electron's main process to track vocal energy, pitch variance, and volume dynamics in real-time
Computer vision integration: Using the Overshoot SDK, we learned how to track eye contact and video framing through webcam feeds
LLM-powered analysis: We used Google Gemini 2.0 Flash via OpenRouter to automatically detect and score STAR method interview responses, understanding how to structure prompts for concise, actionable feedback
Electron IPC architecture: We utilized inter-process communication between the renderer and main processes to coordinate audio, transcription, and vision APIs without lag

How We Built It

Architecture Overview

Tremolo is built as an Electron overlay that sits on top of any video call application. The architecture splits responsibilities:

Frontend (React + TypeScript): Translucent UI with draggable, resizable widgets that display real-time metrics and STAR analysis
Electron Main Process: Coordinates Python audio analysis, OpenRouter API calls, and manages IPC communication
Python Backend: Handles real-time microphone input processing using librosa for pitch/variance analysis
APIs & Services:
- Deepgram for real-time transcription
- OpenRouter/Gemini for STAR method analysis
- Overshoot SDK for eye contact and video framing tracking

Key Features

Real-Time Metric Tracking:
- Pace: Words per minute (WPM) calculated from transcription stream
- Vocal Energy: Composite score combining pitch variance and volume dynamics from Python audio analysis
- Eye Contact: Computer vision tracking via Overshoot SDK
- Filler Words: Regex-based detection on transcript using sliding window analysis
STAR Method Analysis:
- Two-stage detection: regex pre-filter for instant story detection, followed by LLM analysis for scoring
- Scores each component (Situation, Task, Action, Result) from 1-10
- Provides ultra-concise tips (3-5 words) for the weakest component
Live Alerts System:
- Queue-based alert system that surfaces critical feedback (e.g., "TOO MANY FILLER WORDS", "TOO SLOW")
- Alerts clear automatically as metrics improve

Challenges We Faced

Challenge 1: Key Metrics Took Much Iteration

Our biggest technical challenge was defining and calculating the key metrics. We went through multiple iterations before finding the right approach. Initially, we used simple thresholds like "filler words > 5 in the last 10 words," but this proved too noisy. We solved this by implementing sliding window analysis with normalized scoring (0-100 scale) and context-aware thresholds that adapt to speaking style.

For vocal energy, we initially just tracked RMS volume, but this didn't capture engagement—a monotone voice at high volume isn't actually "energetic." We solved this by combining pitch variance and volume dynamics into a composite score, with baseline calibration during the first 30 seconds of speech to establish a personal baseline.

Challenge 2: Story Detection Issues

Our biggest technical setback was the STAR method story detection. Our initial approach used pure regex-based detection looking for keywords like "I was", "the challenge was", and "I implemented." This caused too many false positives (normal conversation triggered story mode) and false negatives (well-told STAR stories were missed because they didn't use our keywords). We had to redo our entire logic with a two-stage approach. Stage 1 uses a regex pre-filter for fast, client-side detection using expanded pattern matching to quickly identify potential stories. Stage 2 uses LLM analysis—only when regex detects story indicators do we call OpenRouter/Gemini. This then scores each component (1-10) with specific criteria, and generates concise tips for improvement (3-5 words max). This hybrid approach gave us the best of both worlds: instant detection without API delays, but intelligent analysis when needed.

The Result

Tremolo is now a fully functional, deployed desktop app that serves as a real-time interview coach, helping users improve their performance during interviews. The system tracks multiple metrics simultaneously, provides actionable feedback, and automatically analyzes behavioral interview responses.