ERGO | Devpost

Inspiration

Emergency situations demand immediate action, but panic often paralyzes bystanders who lack medical training. Every year, thousands of lives are lost not because help wasn't nearby, but because people didn't know what to do. We asked ourselves: What if a device could watch over someone, detect when they're in trouble, and calmly guide a bystander through exactly what to do, hands-free, in real time?

We were inspired by the gap between emergency detection and emergency response. Existing solutions like fall detection in smartwatches only alert—they don't guide. ERGO bridges that gap by not just detecting distress, but actively coaching bystanders through life-saving first aid using AI-generated voice instructions.

What it does

ERGO (Emergency Responsive Ground Observer) is a real-time distress detection and emergency guidance system. A Raspberry Pi 5 with a NoIR camera continuously monitors a scene, while a laptop runs YOLOv8 Pose estimation on every frame to detect distress postures such as falls, collapses, fetal positions, hands-on-head signals, and more.

When distress is detected across 5 consecutive frames, an emergency alarm plays through a Bluetooth speaker. A physical panic button on the Pi triggers the AI pipeline:

The scene is captured and analyzed by Google Gemini 2.5 Flash.

ERGO provides anatomical landmark context (sternum for CPR, outer thigh for EpiPen, neck for pulse check).

The medical guidance is converted to natural speech via ElevenLabs TTS.

Audio is played wirelessly through the speaker to guide any person nearby.

The alarm automatically stops when the person is no longer in distress, and the system seamlessly returns to monitoring mode.

How we built it

We built ERGO as an edge-cloud hybrid system:

Edge (Raspberry Pi 5): Handles camera streaming (MJPEG over SSH), GPIO button input via gpiozero, and Bluetooth audio output through PulseAudio. The Pi runs headless and communicates entirely over persistent SSH connections using Paramiko, eliminating the ~3 second handshake overhead on every operation.

Compute (Laptop): Runs the heavy lifting YOLOv8n-Pose for real-time pose estimation (~12ms per frame), a custom distress detection engine with 7 distinct posture checks, and anatomical landmark mapping for medical sites. It orchestrates the Gemini 2.5 Flash multimodal analysis and ElevenLabs synthesis.

State Machine: The core application runs a three-state machine (MONITORING, ALARM, PIPELINE) with debounced detection, automatic recovery, and a 5-second button cooldown to prevent duplicate triggers.

The entire system is written in Python and connected over a mobile hotspot, making it truly portable.

Challenges we ran into

University WiFi Portal: The Raspberry Pi 5 runs headless, so we couldn't authenticate through the university’s web-based captive portal. We solved this by switching to a mobile phone hotspot, which had the added benefit of making the system truly portable.

Audio Routing on Pi: Getting audio to play through Bluetooth was a multi-step battle. We moved from ALSA and aplay to mpg123, eventually settling on PulseAudio with the Bluetooth module to ensure reliable streaming to the external speaker.

Alarm Termination: Our alarm used a while true bash loop to repeat. A simple pkill on the player would cause the loop to instantly respawn it. We had to refactor the logic to kill both the loop and the player process simultaneously.

SSH Connection Overhead: Opening a new connection for every command introduced a 3-second lag—unacceptable in an emergency. We refactored to persistent Paramiko connections (one for the alarm, one for the button, one for the pipeline) to make operations near-instant.

SD Card Corruption: Switching from wall power to power banks caused unexpected reboots, leading to corrupted SD cards. We learned the hard way that robust power management and clean shutdowns are non-negotiable for Pi-based projects.

Accomplishments that we're proud of

End-to-End Pipeline: Achieved a flow from visual distress detection to spoken medical guidance in seconds, fully automated.

Geometric Posture Analysis: Detected 7 distinct distress postures using only geometric analysis of YOLO keypoints—zero external training data required.

Medical Specificity: Implemented anatomical landmark mapping that provides Gemini with precise coordinates for CPR, EpiPen, and pulse check locations.

Intelligent Auto-Recovery: The system identifies when a person has recovered and automatically silences the alarm without manual intervention.

True Portability: The system is fully functional using only a power bank and a mobile hotspot.

What we learned

Linux Audio Stacks: The journey from ALSA to PulseAudio to Bluetooth sinks provided a deep dive into how Linux handles hardware peripherals.

Latency Optimization: We learned that persistent SSH connections are the backbone of responsive IoT systems.

Reliability vs. API Dependency: We learned to pre-generate static assets (like the alarm sound) to avoid API rate limits or network failures during critical emergency loops.

The Power of Keypoints: We discovered that pose estimation keypoints are incredibly versatile; by computing midpoints and ratios, we turned a standard computer vision model into a specialized medical tool.

What's next for ERGO

On-Device Inference: Porting YOLOv8 to run directly on the Pi 5’s NPU to eliminate the need for a laptop entirely.

Multi-Person Tracking: Extending the logic to assess multiple people in a single frame for crowded environments.

Wearable Form Factor: Miniaturizing the tech into a body-worn camera + earpiece for elderly care or security personnel.

Emergency Dispatch Integration: Automatically notifying emergency services via Twilio with the GPS location and the AI-assessed severity of the scene.

Training Mode: Adding a "Practice Mode" where users can rehearse CPR or the recovery position with real-time pose feedback.