Inspiration
The genesis of Sentinel AI stems from a harrowing reality: for the elderly, a fall is rarely just an accident it is often the beginning of a rapid decline in independence. Existing solutions fall into two flawed camps: invasive wearables that users forget to charge, or "dumb" cameras that trigger frantic false alarms because a user sat down too quickly.
We were inspired by the concept of "Vibe Engineering" creating a system that doesn't just watch, but understands the environment. We wanted to build a "Sentinel" that respects privacy while providing the diagnostic precision of a human observer.
What it does
Sentinel AI is a privacy-first, multimodal fall detection engine. Unlike traditional systems, it doesn't just look for a body on the floor. It employs a Temporal Voting Tribunal to analyze three distinct "expert" perspectives:
The Physics Expert: Monitors joint angles (Hip/Knee) and velocity (dY).
The Detective: An XGBoost model calculating fall probability based on historical patterns.
The Listener: An Audio Spectrogram Transformer (AST) that "hears" the specific acoustic signature of a fall impact.
The Empath: I have integrated ElevenLabs to give the Sentinel a voice. Instead of a jarring alarm, the system uses an empathetic, human-like voice to ask, "I detected a fall, are you okay?" This reduces panic for the elderly user while the system awaits a response.
When these three perspectives align, Sentinel AI triggers an emergency response, virtually eliminating the "cry wolf" syndrome of modern home security.
How we built it
We leaned heavily into Speed Building and modular AI architecture:
Vision Core: Integrated MediaPipe for real-time pose estimation, calculating the Geometric State Machine (AR and joint vectors).
Audio Intelligence: Implemented an Audio Spectrogram Transformer (AST) to process environmental sounds, distinguishing a falling body from a dropped object.
Voice Synthesis: I utilized the ElevenLabs Text-to-Speech API to generate high-fidelity, low-latency audio for the user interaction layer. We specifically chose a calming voice model to ensure the system feels like a companion rather than a surveillance device.
The Brain: A Python Flask backend orchestrates the XGBoost classifier and the sliding-window temporal logic.
Vibe Coding Tools: We utilized Google's Antigravity and Claude Code to rapidly iterate on the frontend dashboard, allowing for real-time visualization of the "Thinking Levels" of our AI agents.
Challenges we ran into
The primary hurdle was temporal consistency. A fall isn't a single frame; itβs a sequence. We initially struggled with "pose jitter" where a fast sitting motion looked like a fall. We solved this by building a Temporal Voting Tribunal, requiring the AI to maintain a high-confidence "Fall" state across a specific sliding window of time before alerting authorities.
Accomplishments that we're proud of
Multimodal Synergy: Successfully syncing the Audio Spectrogram with Vision data in near-zero latency.
Privacy-First Design: The system processes data locally, ensuring that no sensitive video feed ever leaves the home environment only the "Alert" signal does.
High Precision: Achieving a significantly lower false-positive rate compared to standard pose-estimation models.
What we learned
We learned that "More Data" isn't the solution "Better Context" is. By combining physics-based rules (Geometric State Machine) with deep learning (AST), we created a system that is more robust than the sum of its parts. We also realized the power of Vibe Engineering; by using AI-assisted coding tools, we spent less time debugging syntax and more time refining the "vibe" and accuracy of our detection logic.
What's next for Senital AI
Gemini Live Integration: We aim to use the Gemini Live API to create a "Real-Time Teacher" for the system, where it can ask the user, "I heard a loud noise, are you okay?" and process the voice response to cancel false alarms.
Edge Deployment: Porting the AST and XGBoost models to run on lightweight hardware like NVIDIA Jetson or Raspberry Pi for true decentralized monitoring if sponsored.
Predictive Analytics: Moving from detecting falls to predicting them by analyzing gait changes over time.
Log in or sign up for Devpost to join the conversation.