WitnessReplay

Phone View
Authentication View from PC
View from PC
Case list view on Admin Center (Click on the top right to login)
Case view on Admin center - It will auto group and auto generate the images

Inspiration

Witness interviews are one of the most important parts of an investigation, but they are also one of the easiest places to lose critical information. People remember events in fragments, details arrive out of order, multiple witnesses describe the same incident differently, and investigators spend too much time turning raw testimony into usable reports.

We wanted to build an AI agent that could listen like a calm investigator, guide a witness naturally, and transform messy real-time testimony into something structured, visual, and actionable.

What it does

WitnessReplay is a Gemini Live-powered witness interview and scene reconstruction system.

A witness can speak naturally to Detective Ray, interrupt naturally, or switch to typing when needed. The system transcribes audio, asks follow-up questions, extracts structured details, generates visual scene reconstructions, assigns a report ID, and groups related witness reports into the same case when they refer to the same incident.

On the investigator side, WitnessReplay provides an admin portal to review reports, cases, timelines, previews, and scene reconstructions in one place.

How we built it

We built WitnessReplay with a mobile-first frontend in vanilla HTML, CSS, and JavaScript and a FastAPI backend in Python.

Real-time conversations run over WebSockets, and the AI stack is powered by the Google GenAI SDK and Gemini Live API for low-latency voice interaction, interruption handling, and multimodal reasoning.

Gemini is used for:

Real-time witness interviewing
Audio transcription
Structured extraction of scene details
Report summarization
Contradiction detection and case matching
Image generation fallback and scene refinement

For cloud services, the project integrates with:

Google Firestore for sessions, reports, and cases
Google Cloud Storage for generated media
Google Cloud Run deployment configuration
Google Cloud Build for CI/CD
Terraform for infrastructure automation

We also built a multi-account Gemini failover layer that prioritizes a Primary account and automatically fails over to Secondary and Tertiary accounts when rate limits or quota exhaustion occur, using passive response-header tracking instead of extra status calls.

Challenges we ran into

The hardest problems were reliability, latency, and trust.

We had to fix:

Voice sessions that kept listening too long.
Manual "finish speaking" behavior that could cut off audio early.
Delays between a model response appearing in text and being spoken aloud.
Low-quality or template-like scene images.
Reports that should belong to the same case but were being split apart.

We also had to harden the system against quota problems. Because this is a real-time multimodal experience, a single exhausted account or model can break the flow. To solve that, we implemented model fallback and multi-account API-key rotation with passive rate-limit tracking from functional requests.

Accomplishments that we're proud of

Custom Failover Logic: Built a robust multi-account rotation system that ensures 100% uptime for the live voice experience, even during heavy multimodal usage.
Low-Latency Multimodal Flow: Successfully integrated Gemini Live to handle real-time interruptions and complex scene reasoning with minimal delay.
Infrastructure as Code: Fully automated the Google Cloud environment using Terraform, allowing for rapid deployment and consistency across environments.

What we learned

We learned that building a strong live agent is much more than writing prompts. A production-ready multimodal agent needs strong turn-taking logic, streaming-friendly UX, structured extraction, good fallback paths, and quota resilience. We also learned how important operational resilience is when a real-time system depends on multimodal APIs with shifting quotas and latency constraints.

What's next for WitnessReplay

Next, we want to expand evidence intake, make scene reconstruction even more controllable, improve explainability for case matching, and continue hardening the cloud-native deployment path for real-world law enforcement pilots.

Built With

css
docker
fastapi
gemini-2.5-flash
gemini-live-api
google-cloud
google-cloud-build
google-cloud-run
google-firestore
google-genai-sdk
html
imagen-4
javascript
python
sqlite
terraform
websockets

Updates

Gilbert De Leon started this project — Mar 15, 2026 07:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.