Inspiration
The best crime films, heist series, and noir thrillers have always been more than entertainment. They are masterclasses in human behaviour. Watch a great interrogation scene, and you are studying how people construct narratives under pressure, how they read the room, and how they adapt in real time.
Watch a con unfold, and you are learning persuasion, perception, and the architecture of trust. Witness was born from that realisation, a game with human experience at its centre, where you are not watching the story unfold. You are the detective. The AI is your witness, sitting across the table, in the very room you are in. Your job is to question it, read it, and catch it out. Hello, detective.
We built a tool that makes learning immersive, interactive, and personal by placing you inside the scene in your own physical space, with an AI that responds to everything you say and see. To some, Witness is pure entertainment. A noir adventure where your real room becomes the crime scene and an AI detective puts you under pressure. To organisations, it is a training environment in disguise: scenario-based, live, and adaptive, for investigators, legal teams, compliance officers, HR practitioners, or anyone whose work requires them to ask hard questions or hold their ground under scrutiny. To students of criminology, psychology, or communication, it is a laboratory that does not feel like one.
But in every case, you are playing or being played. With Witness, you are also just learning without realising it.
The Problem
Immersive games are not truly immersive. Even the most sophisticated titles ask players to step into a constructed world on a screen. The physical environment you are actually in, the room around you, and the objects on your desk are invisible to the game.
At the same time, AI-powered NPCs in games today are largely scripted. They respond from dialogue trees. They cannot see what you see, cannot respond to what you describe, and cannot hold a genuinely dynamic conversation that changes based on information only you possess.
The result is a gap: players are present in the game but absent from it. Their real context, their space, their body language cues, and their hesitation are entirely lost.
What it does
Witness is a multimodal noir investigative game in which the player is a detective questioning an AI witness about a crime that occurred in the player's actual room. Using the Gemini Live API, the AI witness conducts a real-time voice conversation, responding to your questions, describing what it saw, and managing what it reveals.
The core loop works as follows:
The game generates a crime scenario set in the player's physical location The player takes the role of the detective, leading the interrogation A Gemini-powered witness character responds to questions in real time via live voice The detective can request visual confirmation, asking the witness to reference specific objects visible through the device camera
Gemini's vision capability analyzes the room, and the witness responds based on what it actually sees The witness adapts dynamically, evading, contradicting itself, or revealing details under pressure. At the end, the detective delivers a verdict: credible witness, person of interest, or primary suspect
How we built it
Gemini Live API The core of the experience. Gemini Live API powers the real-time voice interaction between the player detective and the AI witness character. It handles streaming audio, turn-taking, and maintains conversational context across the full interrogation session. Without Gemini Live, the witness would be a chatbot. With it, it feels like a real person under pressure trying to manage what they reveal.
Gemini Multimodal Vision When the detective asks the witness about something in the room, Gemini analyses the camera feed, and the witness responds based on what it actually observes. This grounds the fiction in the player's real space. A witness who claims the desk was empty when the camera clearly shows a coffee mug, a notebook, and a phone will contradict itself under further questioning.
Gemini AI Studio Used during development to prototype and iterate on the detective's persona, interrogation logic, and prompt architecture. Five core AI Studio prompts were developed and refined to shape how the detective opens, probes, escalates, and closes the interrogation. Model Configuration via Environment Variable The Gemini model name is managed through a MODEL_NAME environment variable, keeping the codebase flexible and easy to update across model versions without hardcoding.
Challenges we ran into
Maintaining character consistency across a fully live, unscripted conversation was harder than expected. The detective needed to feel grounded and persistent, not generic. This required significant prompt engineering to establish a stable persona that would not drift mid-session.
Handling turn-taking naturally in voice interactions is a known hard problem. We had to carefully manage how the detective yields, interrupts, and waits without the experience feeling robotic or laggy. Grounding the fiction in a real space without being intrusive required careful UX design. The camera interaction needed to feel like a natural escalation of the interrogation, not a jarring technical prompt.
Designing a noir atmosphere in a mobile-first interface meant every screen had to carry visual weight. The typography, colour choices, and motion work had to do heavy lifting because there is no ambient sound or environment to rely on. Keeping the AI's verdict meaningful required the detective to track and weigh contradictions across the session, which meant the system prompt needed to encode a clear internal logic for credibility assessment.
Accomplishments and Learning
What we are proud of We built a genuinely novel game format. Witness is not a chatbot with a skin. It is a coherent experience with a beginning, middle, and end, driven entirely by a live AI. The multimodal integration works. When the detective asks to see something and then responds specifically to what it observes, the moment is genuinely surprising and immersive.
The noir aesthetic holds together. The 7-screen UI, typography system, and interaction design create a consistent world. We designed a complete interrogation arc, with the detective capable of opening casually, escalating with pressure, and closing with a verdict, all within a live, unscripted session.
What we learned
Persona stability in live AI conversations requires as much design work as the feature logic itself. Character is a product decision, not just a prompt. Multimodal interactions create moments of genuine surprise that purely text or voice interactions cannot. The camera reveal is a mechanic that deserves its own design language.
Real-time voice UX has very different constraints from text-based product design. Latency, interruption handling, and conversational rhythm all need explicit design decisions
What's next for WITNESS
Witness has a clear path forward if developed beyond the hackathon. Procedurally generated crime scenarios based on a player's specific physical environment, using vision analysis to customise the crime to what Gemini actually sees in the room.
Multiplayer mode where one player is the detective and another is the witness, with Gemini mediating and scoring the interrogation A case file system that logs each interrogation session, tracks verdicts, and builds a player profile across multiple sessions.
Difficulty levels that calibrate how aggressive, perceptive, and consistent the detective character is Localisation to support interrogations in multiple languages, leveraging Gemini's multilingual capability Integration with wearable sensors or environmental audio to give the detective additional signals beyond the camera feed. We believe the core insight that your real space is the game world has significant untapped potential, well beyond the noir crime genre.

Log in or sign up for Devpost to join the conversation.