Inspiration
Dementia is a profound and intensely challenging journey, not just for the patients who slowly lose their grasp on the people and memories that define them, but also for their loved ones and caregivers. We witnessed how the loss of recognition creates an isolating gap between patients and their families, leading to confusion, anxiety, and heartbreak. The inspiration for Dementia Assist (D-Vision) was born out of a desire to bridge this gap. We wanted to build a non-intrusive, real-time "memory prosthetic"—smart glasses that can act as a companion, quietly helping patients navigate social interactions and providing peace of mind to their caregivers.
What it does
D-Vision is an AI-powered smart glasses application designed to act as an external memory aid for individuals with dementia. Through a heads-up display (HUD), the system continuously scans the patient's field of vision and performs real-time face recognition. When a familiar person approaches, the smart glasses visually overlay their name, their relationship to the patient, and any important caregiver notes directly onto the patient's field of view.
Key features include:
- Confident Recognition Framework: A dynamic two-threshold identification system ensures that the system doesn't confidently provide the wrong name to a stranger (preventing the worst failure mode).
- Vitals & Emotion Tracking: The system uses facial landmark analysis (FaceMesh) to detect the emotional state (e.g., Happy, Confused, Sad) of the person the patient is talking to, helping patients decode social cues.
- Caregiver Voice Notes: Using Whisper STT and natural language processing, caregivers can leave context-rich text or voice notes that pop up as AR-style cards when the patient encounters a specific person.
- Audio Cues: Leveraging ElevenLabs Text-to-Speech (TTS), the glasses can discreetly whisper the approaching person's name and details into the patient's ear.
- Headless Web Streaming: The entire HUD and AR interface can be streamed dynamically over a local network (MJPEG) directly to a mobile device or a dedicated smart glasses display module.
- Caregiver Dashboard: A dedicated React Native companion app where caregivers can remotely manage the MongoDB database, review unfamiliar faces, and push daily agendas to the patient's HUD.
How we built it
The core engine of D-Vision is written in Python, designed with portability in mind so it can be deployed on lightweight, wearable hardware like the Raspberry Pi Zero 2 W.
- Computer Vision & AR: We used
OpenCVand theface_recognitionlibrary (dlib) for detecting and encoding faces. The AR UI was custom-built using a hybrid between OpenCV drawing primitives and PIL (Python Imaging Library) to render modern, dynamic HUD cards with rounded corners and frosted glass aesthetics that scale perfectly to the device's native resolution. - Data Storage: We integrated MongoDB Atlas as the primary backend to store face embeddings, interaction histories, and caregiver notes. By utilizing SciPy's vectorized cosine similarity (
cdist), the system can perform real-time \( \mathcal{O}(1) \) batch matching against the database embeddings. We also implemented a running-average algorithm to dynamically refine a person's embedding over time based on multiple sightings! - Multithreading: To ensure the camera feed remains stutter-free, we completely decoupled the heavy face encoding and FaceMesh vitals tracking from the main rendering loop using asynchronous Python threading and concurrency locks.
- Accessibility Add-ons: We utilized OpenAI's Whisper model for recording caregiver notes via microphone and the ElevenLabs API to provide conversational text-to-speech audio cues.
Challenges we ran into
Building a high-performance computer vision application for constrained hardware is incredibly difficult.
- Performance Bottlenecks: Face encoding is computationally expensive. If run synchronously, the camera feed drops to unusable framerates. We overcame this by moving inference to a background worker thread, implementing frame-skipping (Nth frame processing), and downscaling the detection tensors to achieve near 60 FPS rendering.
- Network & SSL Connectivity: Transitioning from a local JSON database to MongoDB Atlas introduced TLS/SSL Handshake errors (
TLSV1_ALERT_INTERNAL_ERROR) due to root certificate issues on our dev hardware. We had to patch our PyMongo client to usecertifi's CA bundle for secure validation and resolve IP access rules. - UI/UX on Smart Glasses: Drawing beautiful AR UI in raw OpenCV is notoriously clunky. We spent a significant amount of time optimizing PIL text-rendering overlays to make sure the HUD wasn't overwhelming, and implemented responsive bounding boxes that stick tightly to moving subjects.
Accomplishments that we're proud of
We are incredibly proud of the Two-Threshold Security System. In dementia care, the only thing worse than not recognizing a son is confidently calling a stranger by the son's name. Our system separates "Confident Matches" (Green) from "Possible Matches" (Yellow / "Maybe ?"), drastically improving the safety of the interaction.
We're also proud of our dynamic embedding refinement system. Every time the glasses successfully recognize a known person, they gently update their stored embedding matrix in MongoDB using a weighted average. This means the system actually gets better at recognizing family members over time, regardless of lighting or angle changes!
What we learned
We learned a tremendous amount about threading and the Global Interpreter Lock (GIL) in Python, specifically how to architect non-blocking UI loops over intensive ML workloads. We also learned how to leverage MongoDB Atlas for storing large arrays (facial embeddings) and querying them efficiently using Cosine Similarity metrics instead of traditional relational joins. Lastly, we discovered the intricacies of streaming multipart JPEG data (MJPEG) efficiently over a Flask server for zero-latency network viewing.
What's next for Dementia Assist
- Hardware Integration: We plan to officially deploy the Python application onto a Raspberry Pi Zero attached to a physical prismatic smart-glass display (like the Vufine or Brilliant Labs Frame).
Log in or sign up for Devpost to join the conversation.