Inspiration
We kept thinking about the gap between when something happens to a dementia patient and when anyone finds out. A fall at 2am. Getting confused and not knowing where the cable box is. Wandering into a dangerous situation. Current medical alert systems are passive — they wait for the patient to press a button. But dementia patients often can't or don't. We wanted something that watches, listens, and acts on its own.
What it does
The system runs on two pieces of custom hardware the patient wears — a pair of glasses with a camera and a glove with sensors — plus a PC that ties everything together.
Fall detection: The glove's accelerometer constantly measures motion. The moment it detects a fall, the system saves a video clip of what happened, buzzes the glove to alert the patient, and automatically places a real phone call to the emergency contact. The contact gets connected to a live AI that can answer their questions about the situation.
Hands-free voice assistant: The patient just talks. Say "scan" and the AI describes what the glasses camera sees. Say "track" and it logs the event. Say anything and the Chatbot agent answers. No buttons, no apps, no learning curve.
Vision agent: The glasses stream a first-person view back to the AI. If an obstacle is detected by the IR sensor on the glove, the AI narrates what's ahead and alerts the patient.
Activity tracking: Everything that happens — falls, conversations, IR alerts, voice interactions — gets logged. Caregivers can ask natural language questions like "did anything happen this morning?" and get real answers.
Patient onboarding: Scan a QR code with the patient's medical info and the system dynamically enables the right agents based on their conditions.
How we built it
The hardware is two ESP32-CAM modules. The glasses node streams MJPEG video. The glove node has the camera plus an MPU-6050 IMU for fall detection, an IR proximity sensor, a passive buzzer for alerts, and a tiny TFT screen showing live stats. The glove sends sensor data to the backend every 100ms over UDP.
The backend is FastAPI in Python. It pulls together the camera streams, processes IMU data in real time, runs agents, and coordinates everything. The AI is powered by Google's Gemma 4. Voice uses ElevenLabs for text-to-speech and the Web Speech API for hands-free listening in the browser. Emergency calls go through Twilio — when a fall is detected, Twilio dials the contact and connects them to a Gemma-powered conversation that can answer their questions about what happened.
The frontend is a React dashboard showing dual camera feeds, live agent status, a reasoning log of what the AI is thinking, and a voice assistant panel.
Challenges Getting the ESP32-CAM to simultaneously handle the camera, SPI display, I2C IMU, IR sensor, and WiFi without GPIO conflicts was genuinely painful. The camera alone uses most of the pins. We also had to think carefully about how to prevent the system from flooding the patient with alerts — a fall cooldown, IR debouncing, and TTS queuing all had to work together.
On the software side, making everything fail gracefully was harder than the happy path. If Firebase isn't configured, fall back to local storage. If ElevenLabs is down, still show the text response. If the hardware isn't connected, the dashboard still loads.
What we're proud of
The phone call flow is probably the coolest thing we built. When a fall happens, the system doesn't just send a text — it picks up the phone, dials the family member, and holds a real two-way conversation with them powered by Gemma 4. The family can ask "is she okay?" and the AI answers based on what the sensors recorded.
What's next
GPS location in the fall alert, medication reminders through the voice assistant, and longer-term memory so the AI can notice patterns ("she seems to fall more in the mornings").
Log in or sign up for Devpost to join the conversation.