Inspiration

After hearing firsthand accounts from his father-in-law’s experience in law enforcement, Umer realized how critical every second is during high-risk situations. Even a brief delay in securing backup can drastically escalate danger for officers and civilians. Simultaneously, officers often spend up to 30% of their shifts on administrative tasks, tying up resources and slowing response times. An AI agent that streamlines reporting and accelerates backup coordination can save precious minutes, enhance officer/civilian safety, and improve day-to-day operations.

Team Information

Umer Haider

Responsible for the Front-End
Responsible for the Real-Time Streaming
Responsible for AI Agent Alerting

Joshua Shu

Responsible for the Transcription Service
Responsible for Footage Procurement

Problem Statement

The Issue

High-Risk Encounters: Law enforcement officers regularly face urgent, high-stakes situations where every second counts in ensuring personal and public safety.
Excessive Paperwork: Officers can spend up to 30% of their shifts on administrative tasks, significantly reducing their time and availability for active duty.
Overloaded Dispatchers: Rising call volumes, increasingly more complex emergencies, and poor public perception of law enforcement have led to understaffed dispatch centers that need a solution to assist them in their jobs.

Why It Matters

Critical Delays: Delays in backup requests or coordination can escalate danger for both officers and civilians.
Operational Inefficiency: Time-intensive documentation ties up resources.

Solution Overview

Approach

Integrate real-time audio/video streaming from officer body cams.
Transcribe and analyze these streams using AI (LLMs) to identify critical moments and take automated actions—like requesting backup or running background checks.
Provide a dashboard where command centers can monitor multiple officers in real-time.

Key Features

Real-Time Video & Audio Processing: Body cam or webcam feeds are streamed and transcribed on the fly (we are using existing body cam footage).
Actionable Alerts & Coordination: Automated flags when threats are detected or an officer needs backup.
Officer Dashboard: A unified interface showing all active streams, with quick access to transcripts, threat level indicators, and location-based maps.

Impact

Officer Safety: Automated triggers for backup requests and hazard alerts.
Reduced Administration: Less time spent on manual reporting and documentation.
Faster Response: Officers remain focused on critical tasks, improving public safety overall.

Technical Details

Architecture

Audio/Video Input → Transcription Service → Database (Supabase) -- We ran a simulation which simulated the streaming of audio from a body cam. We used real body cam footage audio that was then transcribed using OpenAI’s Whisper API. To simulate live-streaming we chunked the transcription messages into the database.

Transcription Streaming -- We set up a real-time database (Supabase) which we subscribed to on the frontend to simulate the transcription being streamed in real-time.

LLM/Agent subscribes to new transcripts in the database to determine if actions are needed (e.g. background checks and backup requests).
-- We used the same subscription for the transcription streaming to stream directly to our AI agent which would then run automated actions.

Alerts/Actions are written back to a table that the dashboard queries in real-time, updating officer status and threat levels on a map UI.
-- During transcription rendering we tried to include moments where a threat was detected, e.g. a gunshot noise going off. Our AI agent would read threats and determine whether or not to alert for backup.

Backup Call -- When our AI agent fires off an alert, we used ElevenLabs to generate the audio from text. With this audio, we made a call to Twilio to call a phone number. This required setting up a webhook which Twilio would call to retrieve the audio file.

Challenges We Ran Into

Speech Diarization

Separating different speakers (e.g. officer vs. suspect) within noisy or overlapping audio was a major challenge. Accurate identification of who is speaking at any given moment is crucial for contextual understanding and actionable insights. We experimented with various diarization techniques, but background noise, multiple overlapping voices, and varied accents made it an ongoing area for refinement.

Real-Time Streaming Complexity

Ensuring continuous transcription updates and near-instant synchronization between the database, the LLM, and the UI.

Finding Useful Body Cam Footage

Finding publicly accessible footage that is simple to transcribe is more challenging than it seems. Privacy and legal issues combined with restrictive police department policies mean that footage generally isn’t found on sites like YouTube unless an individual went out of their way to request it from their local police station and post it online.

Conclusion & Future Plans

Next Steps:

Integrate with Police Databases: Real-time checking of suspect information against official records.
Connect with Radio Telecommunication: Instead of communicating with phones, we would likely need to relay information over radio channels to other officers.
Enhanced Analytics: Use advanced NLP models for sentiment analysis, threat profiling, and officer well-being monitoring.
Speech Diarization: Correctly identify who is speaking in any given situation.
Sound Classification: Parse out important non-speech sounds like gunshots, screams, and explosions from all active audio feeds and tie them to useful alerts.
Integrate with Other Departments: Integrate with departments such as EMS, Fire Departments, etc.