Inspiration
People living with dementia and Alzheimer’s disease often struggle with the most basic aspects of daily life.
- They may not recognize who just entered the room.
- Sometimes, they don’t even recognize their own family members.
- They forget to take essential medication.
Most solutions assume a level of independence that simply isn’t there.
But what happens when someone can’t use any of those?
We built Memex.AI because the people who need technology the most are often the ones least able to use it.
What It Does
Memex.AI is a two-component assistive system for dementia and Alzheimer’s patients.
Wearable Patient Device (Rubik Pi 3)
- Continuously scans for faces using a USB camera and announces who is in the room: "Aisha, your daughter, is in front of you"
- Agent Listens for a wake word, then answers spoken questions and accepts voice commands through an agentic voice loop
- Supports agentic voice entry: the patient can say "remind me to take my pill at 9pm" and the agent parses, confirms, and saves it without any screen interaction
- Speaks scheduled medication reminders at the right time through earphones
- Runs entirely offline. No screen, no tapping, no internet required
Caregiver iOS App (iPhone)
- Enrolls people by uploading 1 to 5 photos with a name and relationship
- Reads prescription photos using on-device OCR, then uses an on-device LLM to extract structured medication data
- Supports agentic voice entry for medications
- Instantly retrieves the patient's full ongoing medication list, giving doctors a structured summary at the start of every visit and reducing the time spent asking "what are you currently taking?"
- Connects to the patient device over an encrypted Tailscale network
How We Built It
Patient Device — Rubik Pi 3 running a FastAPI server backed by MongoDB. InsightFace handles face detection and embedding. openWakeWord listens continuously for the wake word. Whisper tiny (ONNX) handles speech-to-text once triggered. An MCP agent then processes the query, calls the right tools (set reminder, get medications, add medication), and Kokoro TTS speaks the response through earphones.
iOS App — SwiftUI with Apple Vision Framework for OCR and SFSpeechRecognizer for voice input. We used the ZETIC Melange SDK to run gemma-4-E2B-it on-device via NPU for medication structuring, and MediaPipe Face Detection for face processing. All AI inference runs locally with no cloud calls.
Networking — Tailscale provides an encrypted peer-to-peer connection between the iPhone and the Pi with a fixed private IP, so the system works across any network.
Challenges We Faced
Rubik Pi 3
🎥 USB Camera Bandwidth Raw YUYV frames were saturating the Pi's USB bus, causing the camera loop to drop frames and block other processes. Switching to MJPEG codec, hardware-compressed in the camera itself, reduced bandwidth by roughly 10x and stabilized the entire pipeline.
🧠 Face Recognition on CPU-Only ARM64 InsightFace's buffalo_l model was designed for GPU inference. Getting around 1 second latency on CPU-only ARM64 required keeping one shared model instance loaded across all requests, and removing the Haar cascade pre-filter that was silently rejecting valid faces at odd angles.
🎧 Bluetooth Earphone Pairing on Headless Ubuntu
Pairing Bluetooth earphones to a headless ARM64 device with no GUI meant doing everything through bluetoothctl over CLI. Audio routing had to be configured manually through ALSA and PulseAudio to ensure aplay directed output to the paired device, and handling automatic reconnection after the device sleeps required scripting persistent pairing rules.
🔁 Agentic Loop Correction
The MCP voice agent occasionally entered runaway tool-calling loops, calling get_reminders repeatedly without terminating when the LLM was not confident in its response. Solved by adding a max-iteration guard and tightening the system prompt to force the agent to always produce a final spoken response rather than looping back for more tool calls.
🔒 Privacy-First Architecture Under Hackathon Pressure Every decision, from local MongoDB to the Redis queue to the Whisper offline fallback, had to pass one hard constraint: does this ever send data out? That is a fundamentally different mindset from typical app development and added real overhead to every technical choice made under time pressure.
iOS App
⚡ ZETIC Melange NPU Access The free tier restricts NPU usage, causing Gemma to fall back to CPU and take over 10 minutes per inference. We reached out on Discord and received a Pro+ access code mid-hackathon, which brought inference down to seconds.
💬 Gemma Chain-of-Thought Output
gemma-4-E2B-it outputs a <|channel|>thought thinking block before its actual response. We had to parse and strip this before extracting the JSON medication array.
📄 TexTeller Was the Wrong Model We initially tried ZETIC's TexTeller encoder/decoder for OCR, only to discover it is designed for mathematical formula recognition, not printed text. We pivoted to Apple Vision OCR paired with Gemma for structuring.
🗂️ Complex Prescription Formats Table-style prescriptions with merged cells and multi-line drug names broke simple regex parsers. On-device LLM inference was the right solution once NPU access was unlocked.
What We Learned
- On-device AI is ready for real healthcare applications. Latency and privacy requirements make cloud inference a non-starter for this use case.
- ZETIC Melange dramatically simplifies NPU deployment across device families.
- The caregiver confirmation step before saving any medication is non-negotiable for a medical device. Automated parsing errors must never reach the patient.
- Voice-first, screen-free design is genuinely hard and genuinely important.
What's Next
- Deploy Whisper and InsightFace through ZETIC Melange on the Rubik Pi 3 for unified NPU-accelerated inference
- Add multi-language support for caregivers and patients
- Expand the agent to handle more complex caregiver queries
- Clinical validation with dementia care specialists
Log in or sign up for Devpost to join the conversation.