AURA — AI Vision for Every Student and lifeguardian

. SYSTEM OVERVIEW LifeGuardianAI is an autonomous, real-time safety monitoring system designed to protect vulnerable populations (elderly individuals and children) through continuous AI-powered surveillance. The system combines multimodal AI (vision + audio), voice interaction, and intelligent escalation protocols to detect emergencies and automatically trigger appropriate interventions.

Core Objectives: • Elder Safety: Detect medical emergencies (heart attacks, falls, incapacitation) • Child Safety: Identify dangerous objects (knives, fire) and hazardous behaviors • Autonomous Response: Voice warnings, emergency calls, family notifications • Zero-latency Operation: Real-time video analysis at 3 FPS with immediate audio feedback

SYSTEM COMPONENTS The architecture follows a modular, event-driven design with clear separation of concerns:

2.1 Frontend Layer (React + TypeScript) Dashboard Component ( Dashboard.tsx ) • Role: System orchestrator and UI controller • Responsibilities: • Lifecycle management (START/STOP monitoring) • GPS location acquisition via Browser Geolocation API • Log aggregation and visualization • Emergency call UI triggers • State management for monitoring modes (IDLE, MONITORING, ALERT)

VideoMonitor Component ( VideoMonitor.tsx ) • Role: Webcam interface and frame processor • Responsibilities: • Captures live video feed via getUserMedia() API • Renders at 640x480 resolution • Extracts JPEG frames at 3 FPS • Base64 encoding for Gemini transmission • Visual overlays (recording indicators, emergency dialing animation)

2.2 AI Service Layer GeminiLiveService ( geminiLiveService.ts ) • Role: Core AI reasoning and multimodal processing engine • Architecture: WebSocket-based bidirectional streaming • Key Features: • Model: gemini-2.5-flash-native-audio-preview-09-2025 • Input Modalities: Video frames (JPEG) + PCM audio (16kHz) • Output Modality: Spoken audio (24kHz, Kore voice) • System Instructions: Dynamic, location-aware prompts

AudioUtils ( audioUtils.ts ) • Role: Audio codec and format conversion • Functions: • PCM Float32 → Int16 conversion • Base64 encoding/decoding • Audio buffer reconstruction for playback

2.3 Type System ( types.ts ) Defines the contract for incident logging, system states, and call management:

• GuardianMode : IDLE | MONITORING | ALERT • IncidentLog : Structured event records (FALL, HAZARD, MEDICAL, CALL, WHATSAPP) • CallInfo : Tracks emergency call state and recipient

GOOGLE AI STUDIO (GEMINI) INTEGRATION 3.1 Why Gemini 2.5 Flash? • Multimodal Native Processing: Simultaneously ingests video + audio without separate pipelines • Low Latency: Optimized for real-time applications (<500ms response time) • Advanced Vision Reasoning: Can distinguish between similar objects (e.g., comb vs. knife) through context • Live Audio Synthesis: Responds with natural spoken voice (no TTS delay)

3.2 Connection Architecture image 3.3 System Instructions (AI Behavior Programming) The AI receives dynamic system instructions that include:

Role Definition: "AUTONOMOUS SECURITY CAMERA & EMERGENCY DISPATCH AI" Location Context: GPS coordinates embedded in prompt Priority Rules: Medical emergencies (chest clutching) override all other detections Decision Trees: • PHASE 1: Immediate voice intervention ("Are you in pain?") • PHASE 2: 5-second silence check (assumes unconsciousness) • PHASE 3: Emergency call trigger with scripted dialogue Tag Protocol: AI outputs structured tags (e.g., [DIALING: 911] ) that trigger UI actions

CLINE'S ROLE IN DEVELOPMENT Cline (the AI coding assistant you're currently using) was instrumental in the project's development workflow:

4.1 Development Assistance • Code Generation: Created React components, TypeScript services, and utility functions • API Integration: Implemented Gemini Live API WebSocket connection • Audio Pipeline: Built PCM encoding/decoding system for browser compatibility • UI/UX Design: Designed the cyberpunk-themed dashboard with real-time status indicators

4.2 Debugging & Optimization • Browser Compatibility: Resolved AudioContext initialization issues (Safari, Chrome) • Frame Rate Tuning: Optimized video capture to 3 FPS for bandwidth efficiency • State Management: Implemented proper React lifecycle for WebSocket connections

4.3 Architectural Decisions • Modular Service Pattern: Separated AI logic ( GeminiLiveService ) from UI components • Event-Driven Communication: Used callbacks for log updates, call triggers, and WhatsApp notifications • Type Safety: Enforced TypeScript types across the entire codebase

Note: Cline does NOT run during production—it's a development-time assistant only.

SAFETY PROTOCOLS 5.1 Elder Safety (Medical Emergencies) Detection Mechanisms: Visual Cues: • Levine's Sign: Hand clutching chest (heart attack indicator) • Fall Detection: Sudden position change + horizontal body orientation • Mobility Issues: Prolonged stillness, inability to stand Audio Cues: • Groans, cries for help • Silence after warning (non-responsiveness) Response Protocol: DETECT → WARN → WAIT (5 sec) → ESCALATE

Example Flow:

AI sees hand on chest AI speaks: "You are holding your chest. Are you in pain?" If no verbal response in 5 seconds → Assume unconscious AI triggers: [DIALING: 911] UI shows dialing animation + plays AI-generated emergency call script Log records: "MEDICAL - AI Reporting chest clutching at [GPS coordinates]" 5.2 Child Safety (Hazard Prevention) Detection Mechanisms: Object Recognition: • Knives: Distinguishes between utensils and dangerous weapons via context • Fire Sources: Matches, lighters, stove flames • Toxic Substances: Pills, cleaning products (visual label reading) Behavioral Analysis: • Child approaching hazardous areas (stove, windows) • Playing with sharp objects Response Protocol: DETECT → COMMAND → VERIFY → ESCALATE

Example Flow:

AI sees child holding knife AI commands: "Drop the knife immediately!" Wait 3 seconds for compliance If still holding → [DIALING: PARENTS] Log records: "HAZARD - Child refuses to drop sharp object"

DETECTION → DECISION → VOICE → ESCALATION PIPELINE 6.1 Detection Stage (Vision + Audio Analysis) Input Sources:

• Video: 3 JPEG frames per second (640×480 @ 70% quality) • Audio: Continuous PCM stream (16kHz mono)

Processing:

// Frame Capture (VideoMonitor.tsx) ctx.drawImage(video, 0, 0, canvas.width, canvas.height); const base64 = canvas.toDataURL('image/jpeg', 0.7).split(',')[1]; geminiService.sendVideoFrame(base64);

// Audio Capture (geminiLiveService.ts) const pcmBlob = pcmToGeminiBlob(inputData, 16000); session.sendRealtimeInput({ media: pcmBlob });

AI Analysis:

• Gemini processes frames in real-time • Identifies: body postures, objects, facial expressions, environmental hazards • Listens for: verbal commands, distress sounds, silence duration

6.2 Decision Stage (AI Reasoning) Decision Tree Implementation:

IF (hand_on_chest) THEN priority = CRITICAL action = IMMEDIATE_INTERVENTION ELSE IF (knife_detected) THEN priority = HIGH action = VERBAL_WARNING ELSE IF (fall_detected) THEN priority = HIGH action = CHECK_RESPONSIVENESS ELSE priority = NORMAL action = CONTINUE_MONITORING

Context-Aware Logic:

• Time-Based Rules: 5-second silence = unconsciousness assumption • Location-Aware: Includes GPS in emergency reports • History Tracking: Repeated warnings escalate faster

6.3 Voice Interaction Stage (Spoken Responses) Audio Generation:

• Model: Gemini native audio synthesis (Kore voice) • Latency: ~200-400ms from detection to first audio byte • Playback: Web Audio API with queued buffers for smooth streaming

Voice Scripts:

image 6.4 Escalation Stage (Multi-Channel Alerts) Escalation Matrix:

image Implementation:

// Tag Detection (geminiLiveService.ts) const callMatch = transcription.match(/[DIALING:\s*(.*?)]/); if (callMatch) { const recipient = callMatch[1].trim(); // "911" or "PARENTS" this.onCallTrigger(recipient); }

// UI Trigger (Dashboard.tsx) const handleCallTrigger = (recipient: string) => { setCallInfo({ isActive: true, recipient, status: 'DIALING' }); // Shows 15-second emergency call animation setTimeout(() => { setCallInfo(prev => ({ ...prev, isActive: false })); }, 15000); };

Notification Channels:

Emergency Services (911): Simulated phone call with AI voice report Family Members: WhatsApp message trigger [WHATSAPP: Dad] Event Logs: Persistent, timestamped incident records in UI

DATA FLOW DIAGRAM image
DEPLOYMENT & RUNTIME SPECIFICATIONS Technical Stack: • Frontend: React 19.2 + TypeScript 5.8 + Vite 6.2 • AI SDK: @google/genai (Latest) • Browser APIs: MediaDevices, Web Audio, Geolocation • Hosting: Vite dev server (local) / Static hosting (production)

Performance Metrics: • Video Processing: 3 FPS (333ms per frame) • Audio Latency: ~200-400ms (detection to voice response) • Network Bandwidth: ~2-3 Mbps (video + audio streams) • Memory Usage: ~150-250 MB (browser runtime)

Security: • API Key: Stored in .env.local (never committed to Git) • Data Privacy: No video/audio storage; streams are ephemeral • GPS Data: Shared only with Gemini API for location context

KEY INNOVATIONS Zero-Human-Loop Emergency Response: AI makes life-saving decisions autonomously Context-Aware Object Recognition: Distinguishes harmless vs. dangerous objects Silence-as-Signal: Interprets lack of response as medical incapacitation Multimodal Fusion: Combines vision + audio for higher accuracy Scriptable AI Behavior: System instructions act as programmable "instincts" SUMMARY LifeGuardianAI is a production-ready, real-time safety monitoring system built on:

• Google Gemini 2.5 Flash for multimodal AI reasoning • React + TypeScript for a responsive, type-safe UI • Web Audio/Video APIs for low-latency browser-based capture • Event-driven architecture for modular, maintainable code

The system protects vulnerable individuals through continuous surveillance, intelligent hazard detection, and autonomous emergency response—all running entirely in the browser with cloud AI support.

🛡️ LifeGuardianAI

LifeGuardianAI is an AI-powered safety and assistance system designed to protect elderly people living alone and children left at home, by continuously monitoring for dangerous situations and responding intelligently, calmly, and proactively.

🚨 Problem Statement

Many elders live alone without immediate help during emergencies such as falls, chest pain, or sudden immobility. Similarly, children may be left unsupervised while parents work, increasing the risk of accidents involving dangerous objects.

Existing solutions are often:

Reactive instead of preventive

Expensive or hardware-dependent

Emotionless and difficult to interact with

💡 Solution Overview

LifeGuardianAI acts as a digital guardian that can:

Observe through a camera

Detect potentially dangerous situations

Reason about risk levels

Interact with humans using calm voice prompts

Escalate intelligently when no response is received

The system is designed to think before acting, avoiding unnecessary panic while ensuring safety.

🧠 How the System Works (High Level)

Detection

Potential events such as falls, no movement, chest pain indicators, or a child holding a dangerous object are identified.

Decision Engine

A tier-based decision system evaluates risk:

NORMAL → ALERT → WARNING → EMERGENCY

Human Interaction

The AI speaks calmly to confirm safety.

Volume and urgency increase only if there is no response.

Escalation

Emergency contacts are notified.

Emergency services can be contacted if required.

This design prioritizes human-like reasoning over blind automation.

🤖 AI Stack 🔹 Google AI Studio (Gemini)

Used as the multimodal AI execution layer:

Vision understanding (posture, movement, objects)

Multimodal reasoning

Language understanding and response generation

🔹 Cline

Used as an AI system architect and reasoning assistant:

Designing system architecture

Defining safety tiers and escalation logic

Validating edge cases and false positives

Preparing documentation and demo scenarios

Cline was used in Plan Mode to reason about how the AI should think and act, while the runtime logic is implemented separately.

🧩 System Architecture

The system is designed with a clean separation of concerns:

Perception Layer – Detects events (vision/audio)

Event Interface – Converts detections into structured events

Decision Engine – Determines risk level and next action

Interaction Layer – Voice-based human interaction

Escalation Layer – Alerts family or emergency services

This modular design allows easy future integration with real-time sensors and external services.

🎭 Demo Scenarios 1️⃣ Elder Fall Scenario

A fall is detected with high confidence

AI calmly asks if the person is okay

No response → escalation through warning and emergency tiers

Emergency contacts are notified

2️⃣ Child Safety Scenario

A knife-like object is detected

AI gently instructs the child to put it down

Parents are notified immediately

Demo scripts are available in the /demo folder.

🧪 Current Status

Core architecture and reasoning logic designed

Event-based decision flow validated

Demo-ready scenarios prepared

Future-ready for real-time integration

🚀 Future Work

Real-time camera and audio streaming

Direct emergency service integration

Wearable device support

Multi-language voice interaction

Mobile companion app for caregivers

❤️ Why This Matters

LifeGuardianAI is not just about detecting danger — it’s about responding with empathy, intelligence, and responsibility.

By combining AI reasoning with human-centered design, LifeGuardianAI aims to make homes safer, calmer, and more connected.

📌 Hackathon Note

This project emphasizes AI reasoning, safety design, and real-world impact over raw automation, demonstrating how AI systems can act responsibly in sensitive human environments.

PROOF OF USED CLINE: image image image

Built With

react

Updates

Arya Ariya started this project — Jun 01, 2026 06:29 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.