Inspiration
Witness interviews are one of the most important parts of an investigation, but they are also one of the easiest places to lose critical information. People remember events in fragments, details arrive out of order, multiple witnesses describe the same incident differently, and investigators spend too much time turning raw testimony into usable reports.
We wanted to build an AI agent that could listen like a calm investigator, guide a witness naturally, and transform messy real-time testimony into something structured, visual, and actionable.
What it does
WitnessReplay is a Gemini Live-powered witness interview and scene reconstruction system.
A witness can speak naturally to Detective Ray, interrupt naturally, or switch to typing when needed. The system transcribes audio, asks follow-up questions, extracts structured details, generates visual scene reconstructions, assigns a report ID, and groups related witness reports into the same case when they refer to the same incident.
On the investigator side, WitnessReplay provides an admin portal to review reports, cases, timelines, previews, and scene reconstructions in one place.
How we built it
We built WitnessReplay with a mobile-first frontend in vanilla HTML, CSS, and JavaScript and a FastAPI backend in Python.
Real-time conversations run over WebSockets, and the AI stack is powered by the Google GenAI SDK and Gemini Live API for low-latency voice interaction, interruption handling, and multimodal reasoning.
Gemini is used for:
- Real-time witness interviewing
- Audio transcription
- Structured extraction of scene details
- Report summarization
- Contradiction detection and case matching
- Image generation fallback and scene refinement
For cloud services, the project integrates with:
- Google Firestore for sessions, reports, and cases
- Google Cloud Storage for generated media
- Google Cloud Run deployment configuration
- Google Cloud Build for CI/CD
- Terraform for infrastructure automation
We also built a multi-account Gemini failover layer that prioritizes a Primary account and automatically fails over to Secondary and Tertiary accounts when rate limits or quota exhaustion occur, using passive response-header tracking instead of extra status calls.
Challenges we ran into
The hardest problems were reliability, latency, and trust.
We had to fix:
- Voice sessions that kept listening too long.
- Manual "finish speaking" behavior that could cut off audio early.
- Delays between a model response appearing in text and being spoken aloud.
- Low-quality or template-like scene images.
- Reports that should belong to the same case but were being split apart.
We also had to harden the system against quota problems. Because this is a real-time multimodal experience, a single exhausted account or model can break the flow. To solve that, we implemented model fallback and multi-account API-key rotation with passive rate-limit tracking from functional requests.
Accomplishments that we're proud of
- Custom Failover Logic: Built a robust multi-account rotation system that ensures 100% uptime for the live voice experience, even during heavy multimodal usage.
- Low-Latency Multimodal Flow: Successfully integrated Gemini Live to handle real-time interruptions and complex scene reasoning with minimal delay.
- Infrastructure as Code: Fully automated the Google Cloud environment using Terraform, allowing for rapid deployment and consistency across environments.
What we learned
We learned that building a strong live agent is much more than writing prompts. A production-ready multimodal agent needs strong turn-taking logic, streaming-friendly UX, structured extraction, good fallback paths, and quota resilience. We also learned how important operational resilience is when a real-time system depends on multimodal APIs with shifting quotas and latency constraints.
What's next for WitnessReplay
Next, we want to expand evidence intake, make scene reconstruction even more controllable, improve explainability for case matching, and continue hardening the cloud-native deployment path for real-world law enforcement pilots.
Built With
- css
- docker
- fastapi
- gemini-2.5-flash
- gemini-live-api
- google-cloud
- google-cloud-build
- google-cloud-run
- google-firestore
- google-genai-sdk
- html
- imagen-4
- javascript
- python
- sqlite
- terraform
- websockets
Log in or sign up for Devpost to join the conversation.