Memex.AI

iOS App + Wearable Device
Patient wearing the device
Application usage by the caretaker
Prescription scanning with OCR + NER
Checking connection of App with Rubik Pi 3 over Home Wifi via Tailscale
Dosage addition
Overall Architecture

Inspiration

People living with dementia and Alzheimer’s disease often struggle with the most basic aspects of daily life.

They may not recognize who just entered the room.
Sometimes, they don’t even recognize their own family members.
They forget to take essential medication.

Most solutions assume a level of independence that simply isn’t there.

But what happens when someone can’t use any of those?

We built Memex.AI because the people who need technology the most are often the ones least able to use it.

What It Does

Memex.AI is a two-component assistive system for dementia and Alzheimer’s patients.

Wearable Patient Device (Rubik Pi 3)

Continuously scans for faces using a USB camera and announces who is in the room: "Aisha, your daughter, is in front of you"
Agent Listens for a wake word, then answers spoken questions and accepts voice commands through an agentic voice loop
Supports agentic voice entry: the patient can say "remind me to take my pill at 9pm" and the agent parses, confirms, and saves it without any screen interaction
Speaks scheduled medication reminders at the right time through earphones
Runs entirely offline. No screen, no tapping, no internet required

Caregiver iOS App (iPhone)

Enrolls people by uploading 1 to 5 photos with a name and relationship
Reads prescription photos using on-device OCR, then uses an on-device LLM to extract structured medication data
Supports agentic voice entry for medications
Instantly retrieves the patient's full ongoing medication list, giving doctors a structured summary at the start of every visit and reducing the time spent asking "what are you currently taking?"
Connects to the patient device over an encrypted Tailscale network

How We Built It

Patient Device — Rubik Pi 3 running a FastAPI server backed by MongoDB. InsightFace handles face detection and embedding. openWakeWord listens continuously for the wake word. Whisper tiny (ONNX) handles speech-to-text once triggered. An MCP agent then processes the query, calls the right tools (set reminder, get medications, add medication), and Kokoro TTS speaks the response through earphones.

iOS App — SwiftUI with Apple Vision Framework for OCR and SFSpeechRecognizer for voice input. We used the ZETIC Melange SDK to run gemma-4-E2B-it on-device via NPU for medication structuring, and MediaPipe Face Detection for face processing. All AI inference runs locally with no cloud calls.

Networking — Tailscale provides an encrypted peer-to-peer connection between the iPhone and the Pi with a fixed private IP, so the system works across any network.

Challenges We Faced

Rubik Pi 3

🎥 USB Camera Bandwidth Raw YUYV frames were saturating the Pi's USB bus, causing the camera loop to drop frames and block other processes. Switching to MJPEG codec, hardware-compressed in the camera itself, reduced bandwidth by roughly 10x and stabilized the entire pipeline.

🧠 Face Recognition on CPU-Only ARM64 InsightFace's buffalo_l model was designed for GPU inference. Getting around 1 second latency on CPU-only ARM64 required keeping one shared model instance loaded across all requests, and removing the Haar cascade pre-filter that was silently rejecting valid faces at odd angles.

🎧 Bluetooth Earphone Pairing on Headless Ubuntu Pairing Bluetooth earphones to a headless ARM64 device with no GUI meant doing everything through bluetoothctl over CLI. Audio routing had to be configured manually through ALSA and PulseAudio to ensure aplay directed output to the paired device, and handling automatic reconnection after the device sleeps required scripting persistent pairing rules.

🔁 Agentic Loop Correction The MCP voice agent occasionally entered runaway tool-calling loops, calling get_reminders repeatedly without terminating when the LLM was not confident in its response. Solved by adding a max-iteration guard and tightening the system prompt to force the agent to always produce a final spoken response rather than looping back for more tool calls.

🔒 Privacy-First Architecture Under Hackathon Pressure Every decision, from local MongoDB to the Redis queue to the Whisper offline fallback, had to pass one hard constraint: does this ever send data out? That is a fundamentally different mindset from typical app development and added real overhead to every technical choice made under time pressure.

iOS App

⚡ ZETIC Melange NPU Access The free tier restricts NPU usage, causing Gemma to fall back to CPU and take over 10 minutes per inference. We reached out on Discord and received a Pro+ access code mid-hackathon, which brought inference down to seconds.

💬 Gemma Chain-of-Thought Output gemma-4-E2B-it outputs a <|channel|>thought thinking block before its actual response. We had to parse and strip this before extracting the JSON medication array.

📄 TexTeller Was the Wrong Model We initially tried ZETIC's TexTeller encoder/decoder for OCR, only to discover it is designed for mathematical formula recognition, not printed text. We pivoted to Apple Vision OCR paired with Gemma for structuring.

🗂️ Complex Prescription Formats Table-style prescriptions with merged cells and multi-line drug names broke simple regex parsers. On-device LLM inference was the right solution once NPU access was unlocked.

What We Learned

On-device AI is ready for real healthcare applications. Latency and privacy requirements make cloud inference a non-starter for this use case.
ZETIC Melange dramatically simplifies NPU deployment across device families.
The caregiver confirmation step before saving any medication is non-negotiable for a medical device. Automated parsing errors must never reach the patient.
Voice-first, screen-free design is genuinely hard and genuinely important.

What's Next

Deploy Whisper and InsightFace through ZETIC Melange on the Rubik Pi 3 for unified NPU-accelerated inference
Add multi-language support for caregivers and patients
Expand the agent to handle more complex caregiver queries
Clinical validation with dementia care specialists

Built With

agent
ai
avspeechsynthesizer-rubik-pi-3-(qualcomm)
elevenlabs
fastapi
gemma-4-e2b-it
insightface
kokoro-tts-apple-vision-framework
mcp
mediapipe-face-detection
mongodb
ollama
openwakeword
python
sfspeechrecognizer
sqlite-zetic-melange-sdk
swift
swiftui
tailscale
uvicorn
whisper-tiny-(onnx)

Submitted to

LA Hacks 2026

Created by

My contribution was designing and building the entire backend system for Memex.AI: from system architecture and multi-process orchestration on the Rubik Pi 3, to the FastAPI server, MongoDB integration, Redis TTS queue, and the MCP agentic voice loop powered by locally deployed Gemma.

I also handled all DevOps on the edge device: headless Ubuntu setup, Bluetooth audio routing through ALSA/PulseAudio, USB camera optimization, MongoDB ARM64 installation, Tailscale networking, and running four concurrent services reliably on constrained hardware making sure the full system stayed online and responsive for the demo.

Snehanshu Raj
Panyi Wang

Updates

Snehanshu Raj started this project — Apr 26, 2026 08:58 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.