Inspiration

What if games could respond to the emotional tone of your voice? We built a hostage negotiation game where you try to talk down an armed robber using only your voice. JAX the robber, analyzes how you speak: yelling, whispering, stammering, hesitation. Every vocal cue changes the outcome. We wanted to prove that tone of voice could be the core mechanic of a game.

What it does

Under Pressure is a voice-controlled negotiation game. You're a negotiator on the phone with JAX, a bank robber holding hostages. The game analyzes your voice in real-time, yelling makes him aggressive, whispering triggers paranoia, stammering erodes your credibility, hesitation increases his anxiety. JAX is powered by Gemini 3 and remembers everything you say. Succeed and he surrenders. Fail and people die.

How we built it

We have used Next.js for the Frontend, Gemini 3 Flash for our AI model, audio processing via Web Audio API, graphics with Phaser 3 and state management with Zustand. We built a real-time audio engine that captures your voice and extracts features like RMS energy and zero-crossing rate. A biometric analyzer detects emotional signals. The API sends audio and biometrics to Gemini and streams back JAX's response. Everything calibrates to your voice at startup.

Challenges we ran into

Yelling detection doesn't work with just volume, microphone sensitivity varies. We added a calibration phase to establish baselines. Streaming from Gemini sometimes breaks character, so we added conversation history to every call. Audio processing, biometrics, AI, and rendering all need to happen simultaneously, which required careful optimization. Getting Gemini to understand biometrics as audio metadata took prompt engineering. Building the UI to feel like a real command center was iterative.

Accomplishments that we're proud of

We don't know of another game that does real-time voice biometric analysis with AI reaction. JAX feels genuinely real, he stays in character and makes decisions that make sense. The whole thing is polished despite being built quickly. The architecture is modular enough to carry over to future projects.

What we learned

Voice carries emotional data that games completely ignore. Just detecting yelling, whispering, and stammering opens up new design possibilities. Sending raw audio to Gemini instead of transcribing first makes NPCs feel way more real. Voice biometrics are harder than they seem, calibration is crucial. The best moment was watching our teammates genuinely plead with an AI because they were emotionally invested. Real-time streaming keeps interactions snappy.

What's next

Different scenarios: hostage situations, bomb defusals, angry customer service calls. Show JAX's emotional state in real-time so players see how their tactics affect him. Multiplayer teams where negotiators coordinate while managing their own tension. Test with Gemini Pro. Mobile support. Smarter voice detection using ML. Procedurally generated suspects so every game is different. Voice-controlled gameplay with multimodal AI is the future.

Built With

Share this project:

Updates