Thumbnail

OmniWatch

Real-Time AI Emergency Detection & Smart Medical Response System

"Watch. Detect. Act."

Built at the Google DevFest AI Hackathon 2026 · Washington University in St. Louis
Team: Sanjit Subhash, Joel Joby Akkarkudil, Preetham, Lomesh Vanapalli

Inspiration

Every year, emergencies happen in places where people are supposed to be safe — office lobbies, hospital corridors, university hallways. Someone falls. A person collapses. A fight breaks out. The tragedy is rarely the emergency itself. It's the delay.

Three things stuck with us: every second costs lives, security systems are built to record not respond, and the gap between noticing and acting is the real problem. A single security guard watches 12 camera feeds at once and misses the one that matters. A person screams for help in an empty corridor. By the time someone notices, calls it in, and decides where to send help — minutes have already passed. In cardiac arrest, survival probability drops roughly 10% per minute without intervention.

We built OmniWatch to collapse that delay from minutes to seconds.

What it does

OmniWatch is a multimodal AI that monitors live camera and audio feeds, automatically identifying life-threatening incidents and routing the fastest medical response — all within 10 secondshttps://youtu.be/qUovxo02Ixg.

The system operates through three intelligent layers:

Vision AI — Uses the Google Cloud Vision API to analyze live camera frames via optical flow, detecting falls, fights, and motionless persons. It flags anomalies the moment they occur and scores the incident trauma level.
Audio AI — Uses the Cloud Speech to Text API to listen continuously for screams, distress calls, and impact sounds. It acts as a confirmation layer, using audio to make refined decisions that confirm the incident level when video and audio agree.
Agentic AI — Powered by Gemini 2.5 Pro. It fuses the vision and audio signals, calculates severity (LOW → MEDIUM → HIGH → CRITICAL), finds the optimal route based on incident level and nearest help available, and provides ETA via the Google Maps API with a live feed.

The result: 30x faster response — compressing the entire detection-to-dispatch chain from 2.5–8 minutes down to under 10 seconds. And a +41% survival gain — by catching cardiac and trauma events in the first moments, OmniWatch preserves the intervention window where survival odds are highest.

How we built it

Layer	Tool
Vision AI	Google Cloud Vision API
Audio AI	Cloud Speech to Text API
Agentic AI / Reasoning	Gemini 2.5 Pro
Video Capture	OpenCV / WebRTC
Audio Streaming	Web Audio API
Routing & ETA	Google Maps API
Dashboard	React + WebSockets

The architecture flows in five steps: Vision Detection scores the incident trauma level from the camera feed → Audio Detection refines and confirms the incident level → Decision Making calculates severity and decides on a response → Optimal Route is selected based on incident level and nearest available help → ETA is provided via Google Maps with a live feed to the security dashboard.

Challenges we ran into

Multimodal fusion tuning — Getting the vision and audio confidence weights right was the hardest calibration problem. Too much weight on audio created false positives from loud but harmless sounds. We had to iterate on the weighting to get reliable, low false-alarm behavior across different environments.
Optical flow thresholds — Distinguishing a genuine fall from someone quickly sitting down or a camera shake required a lot of testing. Getting the stillness threshold and wait window right without making the system either too sensitive or too slow was a careful balance.
Real-time latency — Keeping the full pipeline — frame extraction, Vision API call, Speech-to-Text, Gemini reasoning, Maps routing, WebSocket push — under 10 seconds end-to-end required disciplined async design throughout the backend.
Real routing over real data — Making the routing decisions actually reflect real-world emergency response, not just raw proximity, meant building a proper composite scoring model that accounts for trauma level and live travel time, not just who is closest.
Privacy by design from day one — Building face blurring and conditional clip storage into the architecture from the start, rather than adding it later, added complexity but was non-negotiable for us.

Accomplishments that we're proud of

Achieved sub-10-second end-to-end detection-to-recommendation latency on live camera and audio input.
Built a three-layer AI pipeline — Vision, Audio, Agentic — where each layer has a distinct role and they compound each other's reliability, not just run in parallel independently.
Grounded the routing in real hospital and emergency data so the recommendations reflect actual response capability, not simulated proximity.
Delivered a fully working demo with a live security dashboard, priority alerts, hospital routing, and ETA — all within a hackathon timeframe.
The system is deployable anywhere that existing CCTV infrastructure exists — hospitals, campuses, offices, public spaces — with no hardware overhaul required.

What we learned

Agentic AI is only as good as its tools — Gemini 2.5 Pro's reasoning is powerful, but the routing quality depended entirely on the structure of what we fed it. Designing the tool definitions and data pipeline carefully made the difference between useful recommendations and generic ones.
Multimodal systems need explicit fusion logic — Asking a model to just decide "is this an emergency?" from raw video and audio wasn't reliable. Fusing the signals with explicit scoring and thresholds gave us predictable, tunable behavior we could actually reason about and improve.
Real data is humbling — Live hospital locations, trauma designations, and ETA from real APIs behaved very differently from any assumptions we had going in. Using real data forced us to build a more robust system than if we had worked with static mock data.
Security systems are passive by design — and that is the problem — The more we dug into how existing CCTV infrastructure works, the more it became clear that the industry has optimized entirely for recording, not responding. OmniWatch is a fundamentally different design philosophy.

What's next for OmniWatch

Wearable integration — Pulling smartwatch accelerometer data into the fall detection model alongside optical flow for higher accuracy.
Multi-camera coordination — Tracking an incident across multiple cameras so a person does not disappear between camera zones.
Crowd density risk analysis — Detecting dangerous crowd density before an incident occurs, not just reacting after.
Direct 911 dispatch integration — Automatic handoff to emergency dispatch when confidence is high enough, removing the last human bottleneck for the most critical events.
AI self-improvement — Online learning to refine detection thresholds from confirmed versus false positive incident history over time.
Smart city dashboards — City-wide incident heatmaps and resource optimization for public infrastructure deployments at scale.