Inspiration

My grandmother had Alzheimer's. Every day, she'd ask "Where are my glasses?" and we'd search the house together. Sometimes she'd forget she already asked.

55 million people worldwide live with dementia. The first cognitive function to decline is episodic memoryβ€”the ability to remember WHAT happened, WHERE, WHEN, WHO you were with, and HOW things happened.

I built GEM to give that memory back.

What it does

GEM (Gemini Episodic Memory) implements Tulving's (1972) 5 dimensions of human episodic memory:

Dimension Human Example GEM Implementation
WHAT "I saw my keys" Object + Activity detection
WHERE "On the kitchen counter" Scene location + spatial position
WHEN "This morning around 8 AM" Timestamps + time-based queries
WHO "I was with John" Audio names + visual person detection
HOW "I put them there after shopping" Movement tracking + causal narratives

Core Features:

  • πŸ” "Where are my keys?" β†’ Shows location with photo and bounding box
  • πŸ’Š "Did I take my medication?" β†’ Activity detection confirms actions
  • πŸ‘₯ "Who did I meet today?" β†’ Names from audio + visual descriptions
  • πŸ’‘ Smart suggestions β†’ Suggests likely locations using Gemini's world knowledge

How we built it

Hardware ($50): Raspberry Pi Zero 2W + Whisplay HAT (camera, LCD, mic, speaker)

6 Gemini 3 Capabilities:

Capability Purpose
Vision Object + activity + person detection
Speech-to-Text Voice queries
NLU Intent classification for all 5 dimensions
Text-to-Speech Spoken responses
Audio Transcription Extract names from conversations
Thinking Mode Causal reasoning generation

Key Architectural Decisions:

  • O(1) hash-based lookup (no embeddings needed for 512MB RAM)
  • Temporal graph for movement tracking (HOW dimension)
  • Dual WHO detection: audio names + visual descriptions
  • Zero-shot detection for any object or activity

Marathon Agent

GEM is designed as a Marathon Agentβ€”an AI that runs autonomously for extended periods without user intervention.

  • Always-on daemon: Captures memories every 10-30 seconds continuously
  • Headless operation: python gem.py --headless runs on battery-powered wearable
  • Persistent state: Memories survive restarts, indexed for instant O(1) recall
  • Self-managing: Automatic cleanup of old memories (mimics human forgetting curve)
  • Hours of autonomy: Optimized for Pi Zero 2W's limited 512MB RAM

The daemon continuously monitors the environment, building episodic memories in the background. Users query anytime with voice: "Where are my glasses?" and get instant answers with photos.

Challenges we ran into

  • 512MB RAM: Can't run embeddings β†’ Solved with hash-based indexing
  • Complete WHO dimension: Added visual person detection ("man in blue shirt") linked with audio-extracted names
  • Activity vs Object: "Did I take medication?" differs from "Where are pills?" β†’ Added dedicated activity detection

Accomplishments that we're proud of

  • All 5 Tulving dimensions implemented
  • 6 Gemini capabilities integrated
  • Zero-shot object AND activity detection
  • Dual WHO: audio names + visual descriptions
  • Runs on $15 hardware

What we learned

  • Gemini's box_2d is accurate for objects AND people
  • Tulving's 1972 framework maps perfectly to assistive memory
  • Edge AI success is about architecture, not hardware power

What's next for GEM

  • Smart glasses integration (camera + bone conduction speaker)
  • MedGemma integration for medical-grade memory assistance

Built With

  • gemini-3-audio-transcription
  • gemini-3-nlu
  • gemini-3-speech-to-text
  • gemini-3-text-to-speech
  • gemini-3-vision-api
  • numpy
  • picamera2
  • pil
  • python
  • raspberry-pi-zero-2w
  • tulving-episodic-memory
  • whisplay-hat
Share this project:

Updates