Sentinel-V2: Multimodal Guardian

Inspiration

Falls are the leading cause of injury-related death for adults over 65. The current market is flooded with wearables that seniors forget to wear or "Big Brother" cameras that invade privacy. We wanted to build something better: a passive, privacy-first guardian that doesn't just watch, but understands.

We originally prototyped a basic pose detection script in 2025 (Sentinel V1). However, we realized that vision alone wasn't enough it couldn't hear a cry for help or scale beyond a single laptop.

For DeveloperWeek 2026, we completely re-engineered the system. We migrated from a local Python script to a Hybrid Cloud Architecture, containerizing the application with Docker, deploying it on Akamai Linode, and integrating Deepgram Nova-2 for acoustic intelligence. This isn't just an update; it's a new nervous system for elderly care.

What it does

Sentinel-V2 is a multimodal fall detection system that acts as a "smart appliance" for the home. It uses two senses to confirm emergencies, virtually eliminating false alarms:

The Eyes (Edge Computing): A local camera tracks human pose skeletons in real-time. It analyzes velocity (dY) and geometric state to detect sudden drops (falls) while preserving privacy (no video is sent to the cloud, only coordinate data).
The Ears (Deepgram Cloud Intelligence): When a potential fall is detected, the system activates Deepgram’s Nova-2 model to listen for distress keywords (e.g., "Help me," "I'm hurt") in under 300ms.
The Brain (Akamai Linode): The heavy lifting of the API and alert logic is containerized and hosted on Akamai Linode, ensuring that even if the local device struggles, the alert pipeline remains robust and accessible from anywhere.

How we built it

We adopted a Hybrid Cloud Architecture to balance privacy with power.

1. Akamai Linode Deployment (Infrastructure) ☁️

We moved away from "it works on my machine" to a production-grade deployment.

Containerization: We wrapped the entire application in Docker, handling complex dependencies like opencv-python-headless and libGL.
Edge Compute: We deployed the container to an Akamai Linode (Shared CPU) instance. This gives us a public API endpoint (/detect_fall) that can manage alerts from multiple edge devices globally.

2. Deepgram Audio Intelligence (The "Ears") 👂

We pivoted from simple volume detection to semantic understanding.

Integration: We used the Deepgram Python SDK to capture audio buffers during fall events.
Model: We utilized the Nova-2 model for its superior speech-to-text accuracy. It filters out background noise (TV, radio) and only triggers an alert if specific distress keywords are identified in the transcript.

3. Computer Vision Core (The "Eyes") 🎥

MediaPipe & Physics: We use a custom Geometric State Machine that calculates the aspect ratio of the bounding box and the vertical velocity of the hip joints.
Privacy: The video feed is processed 100% locally. Only the mathematical skeleton data and audio snippets are processed, ensuring privacy.

Challenges we ran into

The "Dependency Hell" of Cloud Deployment: Moving our local OpenCV/PyTorch models to a headless Linux server (Linode) was a nightmare. We faced massive library conflicts (libGL.so, soundfile). We solved this by building a multi-stage Docker container that isolates the environment.
Audio Latency: Initially, waiting for audio transcription delayed the alert by 3-4 seconds. We switched to Deepgram’s API, which reduced transcription latency to <300ms, allowing the "Eyes" and "Ears" to make a decision almost instantly.
Refactoring Legacy Code: We had to strip down our old, heavy monolithic code (from the V1 research prototype) and refactor it into lean microservices suitable for a cloud-native hackathon submission.

Accomplishments that we're proud of

Deployed on Akamai: We successfully took a local-only script and turned it into a live, dockerized cloud API accessible via a public IP. Deepgram Integration: We moved beyond simple volume detection ("loud noise = fall") to semantic understanding. The system now knows the difference between dropping a book and someone yelling "Help!"

Privacy-First Design: We achieved our goal of "Zero-Knowledge" video monitoring. The Akamai server never sees a single frame of video, only mathematical skeleton data.

What we learned

The Power of Edge + Cloud: We learned that you don't have to choose between Edge (privacy) and Cloud (power). A hybrid approach is the future of IoT.
Docker is Non-Negotiable: Containerization wasn't just a "nice to have"; it was the only way to manage the complex dependencies of modern AI models across different architectures (Windows Dev vs. Linux Cloud).

What's next for Sentinel-V2

Sensor Fusion: Addressing previous feedback on accelerometers, we plan to integrate ESP32-based wearables that "talk" to the Sentinel Hub for triple-verification (Video + Audio + G-Force).
Two-Way Communication: Using Deepgram Aura (Text-to-Speech) to allow the system to talk back to the fallen senior: "I heard you. Calling emergency contacts now."
Fleet Management: Using Akamai's global reach to manage thousands of Sentinel nodes from a single dashboard.

Built With

akamai-linode
deepgram
elevenlabs
flask
mediapipe
opencv
python
xgboost

Updates

Zaynul Abedin Miah started this project — Feb 11, 2026 01:47 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.