EscalateConvo

Introduction
Org Home page
List of modules
List of Reports
Report
Test taker modules
Test page

Inspiration

We realized that hiring for customer-facing roles is fundamentally broken. A candidate can have a perfect resume and charm a hiring manager in a calm interview, but that doesn't prove they can handle the actual job: an angry customer screaming at them at 8:00 AM because their refund is late.

We call this the "Resume Gap." Companies are hiring blindly, hoping their candidates have emotional resilience. We wanted to build a tool that doesn't just ask about soft skills, but actually stress-tests them. We were inspired by the idea of a "Flight Simulator" for human conversation a safe place to crash, burn, and learn before talking to a real human.

What it does

EscalateConvo is an AI-powered roleplay platform that simulates high-pressure customer service scenarios.

The Setup: A company creates an "Interview Session" (e.g., "The Lost Refund" scenario) and sends a link to a candidate.

The Simulation: The candidate puts on a headset and enters a voice call with our AI Agent. The AI is not a polite chatbot; it is prompted to be frustrated, sarcastic, or angry. It acts out the role of a difficult customer using hyper-realistic voice emotion.

The Interaction: The candidate must de-escalate the situation. If they interrupt or sound dismissive, the AI reacts dynamically getting angrier or demanding a manager.

The Analysis: Once the call ends, Google Gemini analyzes the entire audio transcript. It generates a "Composure Report" that highlights:

De-escalation Score: Did they calm the AI down?

Empathy Gaps: Specific moments where the candidate failed to listen.

Actionable Feedback: "You interrupted the customer 3 times. Try active listening."

How we built it

We built EscalateConvo as a real-time, event-driven application to ensure the conversation feels natural and fluid.

Frontend: We used React to build the interactive interview interface and the company dashboard.

Backend: We built a Node.js & Express server to handle the complex audio streaming pipeline.

The Brain (AI): We utilized the Gemini API for a post-processing step that grades the candidate's performance against industry standards.

The Voice: We integrated ElevenLabs Turbo v2.5 API via streaming. This was crucial for the "Wow" factor it allows the AI to express genuine anger, sighs, and pitch changes that standard TTS engines cannot match.

Database & Auth: We used Firebase to handle secure user authentication (for companies) and to store the generated interview reports and logs.

Challenges we ran into

Overcoming the "Politeness" Bias We struggled to bypass the inherent "helpful assistant" guardrails of the Gemini API to create a truly frustrated persona. Refining the system prompts required deep iteration to ensure the AI stayed in character as a difficult customer without defaulting to a polite chatbot. This taught us that prompting for emotional volatility is a distinct art compared to prompting for logical reasoning or code generation.

Precision in Performance Evaluation Quantifying subjective qualities like "empathy" and "composure" into an objective report posed a significant data processing hurdle. We had to build a complex rubric for Gemini to ensure it didn't just analyze the transcript for keywords, but understood the tone and timing of the candidate's responses. Synchronizing the audio analysis with specific timestamps was essential to provide the actionable, minute-by-minute feedback our "Composure Report" promised.

Accomplishments that we're proud of

Emotional Realism: The first time we heard the AI actually "scream" at us through ElevenLabs, we knew we had something special. It genuinely raises your heart rate.

The "Coaching" Insight: The reports generated by Gemini are genuinely useful. They don't just say "Good job", they point out specific sentences where the candidate could have been more empathetic.

What we learned

Voice is the new UI: We learned that in a voice-first app, latency is the user experience. Even a 500ms delay breaks the immersion.

Prompt Engineering is distinct from Logic: We learned that instructing an AI to behave a certain way (e.g., "be passive-aggressive") requires a completely different approach than asking it to solve a coding problem.

What's next for EscalateConvo

Custom Scenario Builder: allowing companies to upload their own "Previous Worst Calls" scripts to train against specific real-world problems.

Multi-Speaker Support: Simulating a noisy background or a "conference call" with multiple angry stakeholders.

ATS Integration: Plugging directly into tools like Greenhouse so a failed simulation automatically flags the candidate in the hiring pipeline.

We plan to implement a feature that retrieves the full, time-stamped conversation history directly from ElevenLabs. This will allow hiring managers to replay specific moments of tension and listen to the candidate's actual tone, not just the text transcript.