Inspiration
Personal safety is a universal concern. In moments of sudden panic or danger, the simple act of reaching for a phone, unlocking it, and dialing emergency services can be nearly impossible. We were inspired by a critical question: What if your device could act as a proactive, private guardian?
We wanted to leverage the power of advanced, on-device AI to create an always-on safety assistant one that doesn't compromise user privacy by beaming sensitive continuous audio to the cloud. The vision was to build a system that could intelligently "listen" to the environment, understand the emotional and verbal context of a situation, and automatically coordinate life-saving help when you cannot do it yourself.
How we built it
GuardianAI is built upon a sophisticated, decentralized multi-agent architecture. It runs primarily on-device and is divided into three core intelligent agents, all managed through a unified, interactive dashboard:
- The Observer Agent: This agent securely processes real-time microphone streams. It utilizes a local ASR (Automatic Speech Recognition) model, specifically a local instance of Whisper, to generate continuous transcripts. Beyond just words, it analyzes vocal inflections to gauge real-time emotional stress, anger, and panic levels.
- The Context Analyzer Agent: Working in tandem with the Observer, this agent assesses the external environment. It takes the parsed transcripts and the user's current GPS coordinates, running them against a multi-lingual threat dictionary and a local threat classifier (
toxic-bert). Simultaneously, it queries for nearby emergency services relevant to the specific threat detected. - The Orchestrator Agent: This is the central decision-making hub. It fuses real-time data from the Observer and Context Analyzer to calculate a final Risk Assessment Score.
To determine the severity of a situation, the Orchestrator evaluates the risk $\mathcal{R}$ at any given time $t$ using a weighted mathematical model:
$$ \mathcal{R}(t) = \alpha \cdot \mathcal{E}(t) + \beta \cdot \mathcal{T}(t) + \gamma \cdot \mathcal{V}(t) $$
Where:
- $\mathcal{E}(t)$ is the emotional distress score (from the Observer).
- $\mathcal{T}(t)$ is the verbal threat severity (from the Context Analyzer).
- $\mathcal{V}(t)$ is contextual vulnerability (e.g., proximity to safe zones).
- $\alpha, \beta, \gamma$ are tunable weights such that $\alpha + \beta + \gamma = 1$.
If $\mathcal{R}(t)$ exceeds a critical threshold $\tau$, the Orchestrator bypasses manual confirmation and automatically dispatches alerts to the identified local authorities.
Challenges we faced
- Edge-Computing Constraints: Running robust AI models (Whisper and
toxic-bert) entirely locally is incredibly resource-intensive. We faced significant challenges with battery drain and thermal throttling. We had to implement strict battery and thermal profiling scripts, eventually developing a hybrid runtime fallback mechanism to maintain system stability without crashing the user's device. - ASR Accuracy in High-Stress Scenarios: During testing, we noticed the ASR struggled to accurately transcribe critical keywords (like "kidnap" or "kill") when spoken rapidly or distorted by panic. We had to invest heavily in refining a custom, multi-lingual threat dictionary and adjusting the audio preprocessing pipeline to isolate voice frequencies from background noise.
- Real-time Agent Synchronization: Managing the asynchronous flow of data between three distinct agents and ensuring that the Orchestrator always had the most up-to-date emotional and contextual data before making a life-or-death decision and required complex state management and rigorous race-condition testing in our JavaScript logic.
What we learned
Building GuardianAI was a masterclass in edge AI and privacy-preserving application design. We learned the intricate realities of deploying machine learning models in constrained environments, specifically the delicate balance between inference speed, accuracy, and device temperature.
More importantly, working on the multi-agent architecture deepened our understanding of sensor fusion. We learned how to synthesize disparate data streams to convert raw audio, semantic meaning, emotional tone, and geographical mapping into a single, cohesive, and actionable safety protocol. We walked away with a profound appreciation for how local, decentralized AI can be harnessed for social good.
Built With
- api
- bert
- css3
- html5
- javascript
- onrender
- python
- transformers
- whisper
Log in or sign up for Devpost to join the conversation.