Inspiration
The world is facing a "Silver Tsunami." By 2050, 1.5 billion people will be over the age of 65. For the millions suffering from dementia and Alzheimer's, the world becomes a frightening, lonely place.
We were deeply moved by the concept of "Validation Therapy." In dementia care, when a patient asks for a deceased spouse, correcting them with the "truth" ("She died 3 years ago") causes fresh grief every single time. The compassionate approach is to validate their emotion ("You miss her? She is safe. Tell me about her.").
Current voice assistants are robotic, forgetful, and logic-driven. They fail at empathy. We built ElderKeep to be more than a chatbot—we built a Cognitive Prosthetic. It sees for those with failing eyes, remembers for those with failing memory, and calls for help when the user cannot.
What it does
ElderKeep is a multimodal AI companion that runs on a tablet or phone, synced with a caregiver dashboard. It operates on four core pillars:
- The Empathy Engine: A voice companion that uses long-term memory to converse about the user's specific life history. Crucially, it adheres to "Validation Therapy" protocols—it never argues with the user's confused reality, but instead validates their feelings to reduce anxiety.
- The Visual Guardian: Using the camera, the user can ask, "Who is this person in the photo?" ElderKeep analyzes the image using Gemini Vision and identifies family members based on visual descriptors stored in the database.
- The Safety Loop: If ElderKeep sees a face it doesn't recognize, it doesn't just say "I don't know." It triggers a real-time alert to the Family Dashboard. The caregiver can remotely label the face, and ElderKeep updates its memory instantly.
- The Safety Sentinel: A background process that listens for crisis keywords (e.g., "Help," "I fell," "Pain"). If detected, it overrides the AI persona and triggers a massive Red Alert on the Family Dashboard.
How we built it
We architected a real-time, event-driven system to minimize latency, which is critical for elderly users who get confused by silence.
- The Brain (Google Gemini 2.5 Flash): We used Gemini via Google AI Studio for its multimodal capabilities. We inject a dynamic
SYSTEM_PROMPTcontaining the user's biography and family context from Firestore on every interaction. - The Voice (ElevenLabs): Voice is the UI. We used the ElevenLabs Python SDK with specific tuning. We utilized the
eleven_turbo_v2model for sub-second latency and setstability=0.8to ensure a soothing, consistent tone that doesn't fluctuate erratically. - The Nervous System (FastAPI): We built a Python backend using FastAPI and WebSockets. This acts as a "Man-in-the-Middle" between the user and the LLM to perform safety checks before generating a response.
- The Memory (Firebase Firestore): We used Firestore in Native Mode to store user profiles, memory graphs, and real-time alerts. This allows the React Dashboard and the React Native App to stay perfectly in sync.
- The Interface: The patient app is built with React Native (Expo) for cross-platform compatibility, while the caregiver dashboard is a React/Vite web app.
Challenges we ran into
- The "Truth" Bias: Standard LLMs are trained to be helpful and truthful. When a user asked "Where is my wife?" (who was deceased in the database), the AI initially replied, "She died in 2012." This is traumatic for a dementia patient. We had to perform rigorous prompt engineering to implement a "Therapeutic Lying" protocol that prioritizes emotional safety over factual accuracy.
- Latency vs. Quality: Initially, the round-trip time (Audio -> Text -> LLM -> TTS -> Audio) was nearly 5 seconds. By switching to
eleven_turbo_v2, optimizing image compression in Python, and using WebSockets instead of REST for the voice loop, we got this down to a conversational speed. - Visual Hallucinations: When users showed a photo and said, "This is my daughter Taekwondo," the AI often extracted "Daughter" as the name. We solved this by forcing Gemini to output structured JSON
{ "name": "...", "relation": "..." }and prioritizing proper nouns.
Accomplishments that we're proud of
- The "Safety Loop": We successfully built a feature where the AI detects an unknown face, alerts the dashboard, the caregiver labels it remotely, and the AI recognizes the face in real-time seconds later. It feels like magic.
- Therapeutic Voice Design: Tuning the ElevenLabs voice to sound genuinely caring rather than robotic was a huge win for the user experience. It feels like a presence, not a computer.
- Full Stack Integration: Connecting a mobile app, a Python backend, a vector-capable LLM, and a web dashboard into a single cohesive ecosystem.
What we learned
- AI as a Medical Device: We learned that "Prompt Engineering" isn't just about getting the right answer; in healthcare, it's about safety guidelines and psychological protocols.
- Multimodal Complexity: Handling audio streams and image streams simultaneously over WebSockets requires careful state management to prevent race conditions.
- The Importance of Fallbacks: We implemented a text fallback system so that if the voice generation API hangs or fails, the user still gets a response on the screen, preventing the app from appearing "frozen."
What's next for ElderKeep
- Hardware Integration: Deploying this software onto a locked-down tablet ("Kiosk Mode") in a rugged case for use in nursing homes.
- Clinical Pilots: Partnering with assisted living facilities to measure the reduction in "sundowning" (evening agitation) episodes among residents using ElderKeep.
- Predictive Health: Using voice biomarkers to detect the progression of cognitive decline over time, providing doctors with early warning signs.
Built With
- elevenlabs
- expo.io
- fastapi
- firebase
- firestore
- flash
- gemini
- javascript
- native
- nosql
- python
- react
- render
- typescript
- vercel
- vite)
Log in or sign up for Devpost to join the conversation.