Inspiration
My grandmother had Alzheimer's. Every day, she'd ask "Where are my glasses?" and we'd search the house together. Sometimes she'd forget she already asked.
55 million people worldwide live with dementia. The first cognitive function to decline is episodic memoryβthe ability to remember WHAT happened, WHERE, WHEN, WHO you were with, and HOW things happened.
I built GEM to give that memory back.
What it does
GEM (Gemini Episodic Memory) implements Tulving's (1972) 5 dimensions of human episodic memory:
| Dimension | Human Example | GEM Implementation |
|---|---|---|
| WHAT | "I saw my keys" | Object + Activity detection |
| WHERE | "On the kitchen counter" | Scene location + spatial position |
| WHEN | "This morning around 8 AM" | Timestamps + time-based queries |
| WHO | "I was with John" | Audio names + visual person detection |
| HOW | "I put them there after shopping" | Movement tracking + causal narratives |
Core Features:
- π "Where are my keys?" β Shows location with photo and bounding box
- π "Did I take my medication?" β Activity detection confirms actions
- π₯ "Who did I meet today?" β Names from audio + visual descriptions
- π‘ Smart suggestions β Suggests likely locations using Gemini's world knowledge
How we built it
Hardware ($50): Raspberry Pi Zero 2W + Whisplay HAT (camera, LCD, mic, speaker)
6 Gemini 3 Capabilities:
| Capability | Purpose |
|---|---|
| Vision | Object + activity + person detection |
| Speech-to-Text | Voice queries |
| NLU | Intent classification for all 5 dimensions |
| Text-to-Speech | Spoken responses |
| Audio Transcription | Extract names from conversations |
| Thinking Mode | Causal reasoning generation |
Key Architectural Decisions:
- O(1) hash-based lookup (no embeddings needed for 512MB RAM)
- Temporal graph for movement tracking (HOW dimension)
- Dual WHO detection: audio names + visual descriptions
- Zero-shot detection for any object or activity
Marathon Agent
GEM is designed as a Marathon Agentβan AI that runs autonomously for extended periods without user intervention.
- Always-on daemon: Captures memories every 10-30 seconds continuously
- Headless operation:
python gem.py --headlessruns on battery-powered wearable - Persistent state: Memories survive restarts, indexed for instant O(1) recall
- Self-managing: Automatic cleanup of old memories (mimics human forgetting curve)
- Hours of autonomy: Optimized for Pi Zero 2W's limited 512MB RAM
The daemon continuously monitors the environment, building episodic memories in the background. Users query anytime with voice: "Where are my glasses?" and get instant answers with photos.
Challenges we ran into
- 512MB RAM: Can't run embeddings β Solved with hash-based indexing
- Complete WHO dimension: Added visual person detection ("man in blue shirt") linked with audio-extracted names
- Activity vs Object: "Did I take medication?" differs from "Where are pills?" β Added dedicated activity detection
Accomplishments that we're proud of
- All 5 Tulving dimensions implemented
- 6 Gemini capabilities integrated
- Zero-shot object AND activity detection
- Dual WHO: audio names + visual descriptions
- Runs on $15 hardware
What we learned
- Gemini's box_2d is accurate for objects AND people
- Tulving's 1972 framework maps perfectly to assistive memory
- Edge AI success is about architecture, not hardware power
What's next for GEM
- Smart glasses integration (camera + bone conduction speaker)
- MedGemma integration for medical-grade memory assistance
Log in or sign up for Devpost to join the conversation.