Inspiration

I grew up loving mystery games like Phoenix Wright and Sherlock Holmes, but the tragedy of the genre is: once a case is solved, it's over forever. I wanted a game that never runs out of cases. With the new Gemini 3.1 Pro reasoning capabilities and the zero-latency streaming of the Gemini Live API, I realized I could finally build an AI "Oracle" that creates coherent, highly detailed, procedurally generated murder mysteries on demand—and lets players actually talk face-to-face with the suspects to crack the case.

What it does

Infinite Mysteries: Case Vault is an episodic, voice-driven detective simulator. Starting from a simple creative seed (e.g., "Cyberpunk Tokyo" or "Victorian London"), the application:

  • Procedurally Generates a Coherent Story: Creates a victim, murder method, a hidden sequence of true events, and 4 unique suspects with complex alibis and secrets (where only one is guilty).
  • Supports Real-Time Live Interrogation: You literally speak into your microphone to interrogate the suspects. Thanks to the Gemini Live API and voice activity detection, they listen, process your questions, and speak back instantly, staying completely in character.
  • Visualizes the Investigation: Every location and character portrait is generated natively and instantly on the client using Nano Banana 2.
  • Includes Real Stakes: This is not an open-ended chatbot. Wrong accusations lead to immediate case failure, while smart questioning forces suspects to reveal hidden Evidence logged in your case file.

How we built it

The application is built entirely as a high-performance Vite + Vanilla JS single-page application, designed to run directly in the browser with a lightweight Node.js/Express backend ready for Google Cloud Run deployment.

The entire engine is powered by the Google Generative AI SDK, utilizing a specialized multi-agent architecture:

  • Story Engine: gemini-3.1-pro-preview handles the complex logic of generating the initial "Truth File" (the JSON ground truth for the murder).
  • Live Interrogation Agent: gemini-2.5-flash-native-audio-latest manages the bidirectional WebSocket connection, capturing 16kHz PCM audio from the user's mic and streaming back 24kHz character dialogue.
  • Visual Engine: gemini-3.1-flash-image-preview (Nano Banana 2) handles all asynchronous image generation for the scenes and suspect portraits.
  • Identity Provider: Firebase Authentication manages user sessions and locks the experience securely.

Challenges we ran into

  • Hallucination Control: Infinite narrative generation often spirals out of control. We solved this by generating a rigid JSON "Truth File" first. This file acts as the undeniable ground truth for all subsequent AI agents. Guilt is deterministic, not invented on the fly.
  • Latency & Audio Artifacts: Implementing bidirectional audio streams natively in the browser required careful Web Audio API state management. We had to build a custom sequential PCM queue to prevent NaN scheduler crashes and overlapping audio frames when the AI generated speech faster than the browser could play it.
  • "Ghost in the Machine" Leaks: The live audio models sometimes tried to narrate their internal thoughts (e.g., "Crafting a nervous response..."). We had to heavily harden the system prompts and build a multi-pass client-side regex filter to ensure players only ever heard the character's direct spoken dialogue.

Accomplishments that we're proud of

  • Flawless Real-time Voice Interrogation using the true BidiGenerateContent WebSocket protocol natively in the browser without third-party audio translation layers.
  • A True Game State: Creating a mystery with a verifiable win/lose state and deductive logic, elevating the project from a simple "chatbot" to a real game.
  • Automated Evidence Extraction: Building a custom parsing system where the Live Audio agent can seamlessly embed JSON tags into its stream when pressured, automatically unlocking Evidence items in the player's UI without breaking the voice immersion.

What we learned

Building with the Gemini Live API completely changed how we think about human-computer interaction. We learned that latency is the enemy of immersion. Stripping down the prompts, removing complex reasoning instructions from the real-time agent, and relying on the pre-computed "Truth File" instead was the key to achieving sub-second voice response times.

What's next for Infinite Mysteries

  1. Multiplayer Investigation: We are currently planning Supabase Realtime integration so 2-4 friends can join a room via a 6-digit code and play "Good Cop / Bad Cop" on the same suspect simultaneously.
  2. Persistent Detective Careers: Allowing authenticated users to save their case histories and build detective "reputations" based on their accuracy and speed across dozens of generated cases.
  3. Motion Evidence: Moving beyond static Nano Banana portraits and integrating Veo to generate short motion CCTV footage as collectible clues.

Built With

  • firebase
  • google-gemini-3.1-pro
  • google-gemini-flash-native-audio
  • nano-banana-2
  • vanilla-js
  • vite
  • web-audio-api
Share this project:

Updates