Gemini Episodic Memory (GEM)

Wearable Prototype
Wearable Prototype

Inspiration

My grandmother had Alzheimer's. Every day, she'd ask "Where are my glasses?" and we'd search the house together. Sometimes she'd forget she already asked.

55 million people worldwide live with dementia. The first cognitive function to decline is episodic memory—the ability to remember WHAT happened, WHERE, WHEN, WHO you were with, and HOW things happened.

I built GEM to give that memory back.

What it does

GEM (Gemini Episodic Memory) implements Tulving's (1972) 5 dimensions of human episodic memory:

Dimension	Human Example	GEM Implementation
WHAT	"I saw my keys"	Object + Activity detection
WHERE	"On the kitchen counter"	Scene location + spatial position
WHEN	"This morning around 8 AM"	Timestamps + time-based queries
WHO	"I was with John"	Audio names + visual person detection
HOW	"I put them there after shopping"	Movement tracking + causal narratives

Core Features:

🔍 "Where are my keys?" → Shows location with photo and bounding box
💊 "Did I take my medication?" → Activity detection confirms actions
👥 "Who did I meet today?" → Names from audio + visual descriptions
💡 Smart suggestions → Suggests likely locations using Gemini's world knowledge

How we built it

Hardware ($50): Raspberry Pi Zero 2W + Whisplay HAT (camera, LCD, mic, speaker)

6 Gemini 3 Capabilities:

Capability	Purpose
Vision	Object + activity + person detection
Speech-to-Text	Voice queries
NLU	Intent classification for all 5 dimensions
Text-to-Speech	Spoken responses
Audio Transcription	Extract names from conversations
Thinking Mode	Causal reasoning generation

Key Architectural Decisions:

O(1) hash-based lookup (no embeddings needed for 512MB RAM)
Temporal graph for movement tracking (HOW dimension)
Dual WHO detection: audio names + visual descriptions
Zero-shot detection for any object or activity

Marathon Agent

GEM is designed as a Marathon Agent—an AI that runs autonomously for extended periods without user intervention.

Always-on daemon: Captures memories every 10-30 seconds continuously
Headless operation: python gem.py --headless runs on battery-powered wearable
Persistent state: Memories survive restarts, indexed for instant O(1) recall
Self-managing: Automatic cleanup of old memories (mimics human forgetting curve)
Hours of autonomy: Optimized for Pi Zero 2W's limited 512MB RAM

The daemon continuously monitors the environment, building episodic memories in the background. Users query anytime with voice: "Where are my glasses?" and get instant answers with photos.

Challenges we ran into

512MB RAM: Can't run embeddings → Solved with hash-based indexing
Complete WHO dimension: Added visual person detection ("man in blue shirt") linked with audio-extracted names
Activity vs Object: "Did I take medication?" differs from "Where are pills?" → Added dedicated activity detection

Accomplishments that we're proud of

All 5 Tulving dimensions implemented
6 Gemini capabilities integrated
Zero-shot object AND activity detection
Dual WHO: audio names + visual descriptions
Runs on $15 hardware

What we learned

Gemini's box_2d is accurate for objects AND people
Tulving's 1972 framework maps perfectly to assistive memory
Edge AI success is about architecture, not hardware power

What's next for GEM

Smart glasses integration (camera + bone conduction speaker)
MedGemma integration for medical-grade memory assistance

Built With

gemini-3-audio-transcription
gemini-3-nlu
gemini-3-speech-to-text
gemini-3-text-to-speech
gemini-3-vision-api
numpy
picamera2
pil
python
raspberry-pi-zero-2w
tulving-episodic-memory
whisplay-hat

Updates

Ram Kumar Koppu started this project — Feb 08, 2026 09:20 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.