Inspiration
Reminiscence therapy has 47 peer-reviewed studies confirming it reduces anxiety, maintains identity, and slows cognitive decline in Alzheimer's patients. It works. But it costs $150 to $300 per professional session, requires a trained therapist in the room, and leaves caregivers completely in the dark about what is happening during the session. We built Remember Eleanor because the clinical evidence exists, the need is massive, and the technology to make it accessible has never been more ready. 55 million patients worldwide. 500 million family caregivers affected. Zero tools that connect a patient and their caregiver in real time during an actual therapy session — until now.
What it does
Remember Eleanor is a camera-powered, multi-agent reminiscence therapy tool for Alzheimer's patients and their caregivers. Eleanor Park is 78 years old. She has Alzheimer's. She cannot always find the words to describe her memories, Alzheimer's takes language first. But she can show you a photograph.
Eleanor holds an old photograph up to her laptop camera. Gemini 2.0 Flash Vision reads the image in real time, extracting objects, lighting, decade, emotional tone, and people, and generates a natural language transcript as if Eleanor is describing the memory herself. That transcript feeds into a Gemma 4 31B memory parsing agent which reconstructs the memory as a warm illustrated scene in the browser. Eleanor hears her younger voice narrating the scene through ElevenLabs Voice Design. A Gemini 2.0 Flash journal agent then streams Eleanor's memory as a personal journal entry, word by word, in real time. She can read it in English, Korean, Spanish, Hindi, Mandarin, or Arabic.
While Eleanor plays through her memory, a hidden camera loop runs every 4 seconds. Gemini Vision analyzes Eleanor's facial expression and pushes an emotion label — joyful, smiling, confused, distressed to Firebase Realtime Database. Hannah, Eleanor's daughter, watches this live on her phone from her office in San Jose. She sees three smile emojis appear on the timeline. When Eleanor looks confused, Hannah types a gentle hint into her dashboard. That text converts to Young Eleanor's voice via ElevenLabs, uploads to Cloudinary via the official Cloudinary React SDK, and arrives in Eleanor's session as a whisper in her own younger voice. Eleanor does not feel managed. She feels guided by something familiar. When Eleanor smiles remembering a moment, Hannah clicks "Save this moment." Gemini writes one warm sentence for the family journal. That sentence goes to Eleanor's son James in Portland. He calls his mother. Eleanor says: "I had a good day today."
How we built it
We built the entire pipeline on Google Cloud using Google's Agent Development Kit (ADK) to orchestrate four specialised AI agents.
Agent 1 - Memory Parser (Gemma 4 31B): Takes the patient's memory transcript from camera input or text and outputs a structured scene JSON including location type, decade, hero objects, emotional tone, and cognitive puzzle parameters. We chose Gemma 4 specifically because it is open-source. In a production hospital deployment, the entire Gemma 4 model can be self-hosted on the facility's own Google Cloud infrastructure. Patient memory descriptions — among the most sensitive data imaginable — never leave the institution's environment. This is a HIPAA compliance guarantee no closed proprietary model can offer. The open architecture is not a feature. It is the foundation.
Agent 2 - World Builder (Gemini 2.0 Flash): Takes the scene JSON and generates the illustrated scene description, NPC dialogue, and emotional arc for the session.
Agent 3 - Journal Narrator (Gemini 2.0 Flash): Writes Eleanor's personal journal entry and streams it word by word to the browser using server-sent events. Also handles multilingual translation with cultural adaptation across six languages.
Agent 4 - Memory Curator (Gemini Embedding 2 Preview): Embeds each memory document as a 768-dimension vector and stores it in MongoDB Atlas. MongoDB Vector Search finds semantically similar memories across anonymised patient sessions, powering the Memory Resonance feature: "Your memory resonates with 23 others who remember libraries."
The real-time caregiver layer uses Firebase Realtime Database as the live bridge between Eleanor's session and Hannah's dashboard. Emotion events, whisper audio URLs, and moment captures all flow through Firebase with zero polling. The whisper system generates audio on-demand via ElevenLabs, uploads it using the Cloudinary Python SDK with audio normalisation applied, pushes the Cloudinary public ID via Firebase, and the frontend plays it back using the official @cloudinary/react SDK AdvancedAudio component. All pre-generated NPC voices are delivered through Cloudinary CDN with automatic format optimisation for elderly patients on any device.
World ID verifies caregiver accounts — only proven humans can access a vulnerable patient's emotional monitoring data or assign memory prompts. Cognition MCP validates every AI-generated output before it reaches the patient. The Fetch.ai Agentverse agent handles caregiver notification routing and family journal sharing workflows around the Google ADK pipeline.
Challenges we ran into
Getting Gemini Vision to produce consistent structured JSON from low-quality or partially obscured photographs was the hardest technical problem. Old photographs are often faded, overexposed, or damaged — exactly the kinds of images our target patients would hold up to their cameras. We spent significant time engineering the vision prompt to be robust against these edge cases without hallucinating scene details that would feel wrong to the patient.
The whisper system latency was a real challenge. ElevenLabs TTS, Cloudinary upload, Firebase push, and frontend playback had to happen in under 5 seconds for the interaction to feel natural. We achieved this by pre-normalising audio on upload and keeping the Firebase path shallow for minimum read latency.
Running emotion detection every 4 seconds without disrupting the patient's session required the hidden camera loop to fail completely silently — any error state that interrupted Eleanor's experience was unacceptable. The EmotionMonitor component catches all errors and continues regardless.
Accomplishments that we're proud of
The whisper system works end to end. Hannah types a hint on her phone. Four seconds later Eleanor hears it in her own younger voice inside her memory session. That specific moment — a daughter guiding her mother through a 1985 memory in her mother's own younger voice, mediated entirely by AI — is something we had not seen built before. We are proud it works reliably enough to demo live.
We are proud of the camera-first input design. By removing the need for an Alzheimer's patient to type or describe their memory verbally, we removed the primary barrier between the technology and the population it is designed to serve. Eleanor holds up a photograph. That is the entire interaction.
We are proud of the HIPAA architecture argument. Choosing Gemma 4 as an open-source model means this tool can be deployed by actual care facilities on their own infrastructure — not as a theoretical future feature, but as a genuine deployment path available today on Google Cloud.
What we learned
Alzheimer's patients lose verbal articulation before they lose visual recognition. This single clinical fact drove every major design decision: the camera as primary input, the illustrated scene as output, the voiced narration rather than text to read, the puzzle as active engagement rather than passive consumption. Designing for this population forced precision in every interaction. Nothing could require typing, navigating menus, or remembering instructions.
We learned that the emotional peak of a technology demo is not always the most technically impressive feature. The whisper system is a straightforward pipeline of four API calls. But when a judge watches Hannah type a hint and hears it play in Eleanor's younger voice four seconds later, the reaction is consistently stronger than anything else in the demo. Emotional clarity matters more than technical complexity.
Google ADK's agent orchestration gave us retry logic and structured tool calling that we would have had to build manually with Fetch.ai. The tradeoff was that ADK is newer and its documentation is still sparse. We learned it by reading the source.
What's next for Remember Eleanor
The immediate next step is a clinical pilot with an Alzheimer's care facility. The HIPAA-compliant deployment path using self-hosted Gemma 4 on Google Cloud infrastructure is ready for a real institutional partner.
The Memory Resonance feature becomes more meaningful with scale. At 1,000 patient sessions the vector search returns genuine connections between strangers who share the same memory of the same type of place. "23 others remember libraries" becomes "23 others remember this specific neighborhood in Oakland in the 1980s." The specificity of the connection is what matters to patients with Alzheimer's — not that others remember something vaguely similar, but that others remember the same precise world they remember.
Long-term, the voice profiles Eleanor builds across sessions — the specific texture of her younger voice, the places she returns to, the objects she remembers most clearly — become a permanent memory archive her family can access after she can no longer articulate her own past. Remember Eleanor becomes the record of who Eleanor was, built by Eleanor herself, before she could no longer build it.
Built With
- agentverse
- cloudinary-react-sdk
- cognition-mcp
- elevenlabs-voice-design
- fetch.ai
- firebase-realtime-database
- flask
- gemini-2.0-flash
- gemini-2.0-flash-vision
- gemini-embedding-2-preview
- gemma-4-31b
- google-adk
- google-cloud
- mongodb-atlas
- mongodb-vector-search
- node.js
- pm2
- python
- react
- vite
- vultr
- world-id

Log in or sign up for Devpost to join the conversation.