USF RapidCam – Real-Time Fall and Emergency Detection
Built by a freshman at the University of South Florida (USF)
About the Project Inspiration
Most safety camera systems report incidents only after they occur. I wanted to build a system that could evaluate emergencies as they happen, detect when someone needs help, and escalate automatically without waiting for human review. The goal was to explore how AI, computer vision, and voice interfaces can be combined to improve campus safety.
What I Learned
How to design and coordinate multiple ROS2 nodes for vision, audio, escalation logic, and system monitoring.
How to set up hardware on the Jetson Orin Nano, including CUDA acceleration and GStreamer CSI pipelines.
How to securely manage secret keys, avoid committing .env files, and work around GitHub push protection.
How AI systems can be used to provide real safety benefits.
Debugging real-time, multi-threaded systems where audio, video, and LLM inference must run together without delays.
How I Built It
Implemented pose detection with YOLO11s-pose running on the Jetson Orin Nano GPU.
Built a VoiceAlertNode that listens for phrases such as “help” or “I am fine” to confirm the situation.
Created an LLM Node using Gemini 2.5 to manage escalation dialogues and monitor user responsiveness.
Integrated Twilio for automated emergency phone calls triggered by ROS2 events.
Used ROS2 Humble to orchestrate communication between all nodes.
Implemented a live video stream using a GStreamer CSI pipeline.
Challenges I Faced
Audio device routing, microphone selection, and system configuration on the Jetson Orin Nano.
Synchronizing communication between multiple ROS2 nodes with different timing constraints.
Ensuring Twilio integration remained secure and compatible with push-protection requirements.
Balancing real-time performance across vision, audio, and LLM processes while keeping the system responsive.
How It Works
- Vision Detection
The camera feed runs through a GStreamer CSI pipeline into YOLO11s-pose on the Jetson Orin Nano. The pose model produces keypoints and detects abnormal posture patterns such as falls. When a fall is detected, the system publishes a rapidcam/person_down event to ROS2.
- Audio Response Verification
The VoiceAlertNode activates when a fall event is received. It asks the user if they need help. If the user says:
“Help” → the system immediately escalates.
“I am fine” → the alert is cancelled. If there is no response, a countdown begins.
- LLM Escalation Logic
The LLM Node uses Gemini 2.5 to check for verbal intent, analyze user responses, and handle escalation dialogue. If the user remains unresponsive or indicates distress, the node publishes rapidcam/escalate.
- Emergency Calling
The CallNode uses Twilio to automatically call a preset emergency contact number. All nodes operate independently but communicate through ROS2 topics to maintain reliability even if one node fails.
- Live Monitoring
A lightweight Flask server exposes an MJPEG stream for monitoring the system in action. This can be used on a local dashboard or web interface.
Technical Architecture +------------------------------+ | Jetson Orin Nano | +------------------------------+
[Camera Input - CSI] | v +------------------------------+ | GStreamer CSI Pipeline | +------------------------------+ | v +------------------------------+ | YOLO11s-pose (GPU) | | Publishes: rapidcam/person_down +------------------------------+
|
v
+------------------------------+ +------------------------------+ | VoiceAlertNode | <------> | LLM Node (Gemini 2.5) | | Listens for speech | | Escalation logic | | "help" / "I am fine" | | Dialogue + countdown | +------------------------------+ +------------------------------+ | v +------------------------------+ | CallNode (Twilio) | | Publishes phone call alerts | +------------------------------+
|
v
+------------------------------+ | Monitoring Server (Flask) | | Live MJPEG video stream | +------------------------------+
Future Improvements Improved Vision Model
Upgrade to higher-accuracy pose models or integrate temporal fall detection to reduce false positives.
Better Audio Understanding
Incorporate local speech-to-text or on-device keyword spotting to handle noisy environments.
Multi-Camera Support
Expand the system to handle multiple camera feeds through a single ROS2 network.
Secure Remote Dashboard
Add authenticated web controls, event logs, and live system status indicators.
Hardware Integration
Add LED indicators, sirens, or haptic feedback for accessibility.
Explore battery-powered or portable versions for wider use cases.
Log in or sign up for Devpost to join the conversation.