Emory

Meta-RayBans live feed within the Desktop app.
Recognizing Ryan (mobile UI)
User's view of Nate's profile and memories (mobile UI)
Recent memories with Nate. Viewed from the Desktop app.

Inspiration

Emory sprouted from our love for family members whose worlds became quieter, not because they stopped caring, but because the connection between a face and everything it used to mean could slip mid-conversation. Those small pauses, attempting to recollect details that you once remembered carries weight for everyone involved. Emory is our attempt to meet that moment gently, as a steady hand to remind those who struggle with neurodegenerative disease of the people and conversations most important to them.

What it does

Emory is a memory assistant: when someone you’ve enrolled appears on the Meta-RayBans camera, it can start an encounter, record conversation-linked audio while they’re within view, transcribe it, and extract structured memories tied to that person. You can then use the app to retrieve information about people, their related conversations, and relationships to you.

Flow: glasses → phone relay → desktop → phone relay → playback through the RayBans via bluetooth

How we built it

Monorepo (Turborepo): Electron + React desktop app; ONNX SCRFD + ArcFace face pipeline in @emory/core; SQLite via @emory/db for people, embeddings, encounters, recordings, and memories; Deepgram (STT), OpenRouter (memory extraction + grounded Q&A), Cartesia (TTS). [incomplete] iOS relay for Meta glasses streaming, WebSocket/WebRTC-style streaming between phone and desktop, and a finalized server–client split (desktop/hub as the long-running “brain,” phone as AV bridge).

Challenges we ran into

Face recognition accuracy under real lighting, angles, and motion—tuning enrollment, thresholds, and quality gating.
Server–client architecture

Accomplishments that we're proud of

Seamless facial recognition and LLM memory extraction
Automatic facial recognition training

What we learned

Working with the Meta SDK is challenging. It requires an extensive access approval process and is very restrictive on hardware usage (mic, camera)
encoding and decoding video frames with compression and optimization to support low-latency and stable frame rates.

What's next for Emory

Push into deep conversational contextualization: beyond memories keyed to one recognized face, toward speaker-aware, multi-party dialogue—so a single rich moment can produce correctly attributed memories across several speakers (who said what, who it matters to, how each person’s graph updates)
Expand assistance to learn a user's daily tasks and routine, giving helpful reminders of how to complete those tasks when those routines are interrupted.