Pulse - Computer Vision
Real-Time Overlay
Home - Dashboard
Profile - Dashboard
Lens Studio

Inspiration

One of our team members has nurses in the family and heard their frustrations for years: constantly juggling devices while trying to focus on patients, breaking sterile technique to look up information, never enough instructor time for students. During a hospital visit, we watched it happen: a nurse interrupted patient care 5+ times in one hour to verify medications and check allergies, then struggled to access records while wearing sterile gloves, being forced to choose between breaking sterile technique or asking for help. Nursing students practiced on mannequins without feedback, waiting for instructors stretched too thin. Healthcare workers spend more time managing information systems than with patients. We asked: what if clinical knowledge was hands-free, and training happened with AR guidance instead of textbooks? We built MediSnap for the nurses in our family, and every nurse who deserves better.

What it does

MediSnap transforms Snap Spectacles into a hands-free AI medical assistant with two core modes:

Training Mode

Provides immersive, real-time guidance for learning medical procedures. While our demo focuses on pulse-taking, the architecture supports any physical skill: IV insertion, wound care, CPR, catheterization. When activated by voice, it:

Displays AR overlays with anatomical markers and directional arrows showing exactly where to position hands
Provides step-by-step voice instructions that keep eyes on the patient
Delivers live technique feedback using MediaPipe Hands computer vision with multi-factor pressure detection

Clinical Mode

Serves practicing healthcare workers by:

Retrieving patient records in <3 seconds via voice command and displaying AR patient cards with critical data
Processing voice-documented symptoms in real-time and generating AI diagnoses with clinical reasoning
Performing automated drug interaction and allergy checking and blocking dangerous prescriptions
Maintaining conversation context across entire patient sessions

The system scales from bedside care to home health, emergency medicine, and telemedicine. MediSnap is the only solution combining hands-free voice interaction + intelligent AR overlays + AI clinical reasoning + dual training/clinical functionality.

How we built it

Architecture: Snap Spectacles with custom Lens Studio app → Node.js/Express backend on Railway with LettaCloud/FishAudio/Gemini → Supabase PostgreSQL database

Snap Spectacles

We pushed Snap Spectacles beyond standard AR filters by using the computer vision model MediaPipe Hands to detect body landmarks in real-time. We designed three types of medical-optimized overlays:

Pulsing circles for pulse points, injection sites, measurement locations
3D arrows showing optimal hand positioning and movement paths
Floating patient data displays in upper peripheral vision

Fish Audio

Our AR audio optimizations include:

Different voices for different agents (a calm "Training Coach" and confident "Clinical Advisor") to help users mentally separate contexts
Volume normalization for noisy clinical environments
Streaming integration for <800ms time-to-first-word latency

Letta Stateful Agent

Letta powers dual memory systems for patient clinical history and student training progress:

Manages decades per patient with rolling window (20 turns/4000 tokens)
Preserves full context when nurses hand off patients between shifts
Stores finger placement corrections and technique feedback from CV
Considers complete patient history for clinical recommendations

Gemini API

Gemini powers clinical reasoning by processing structured data (vital signs, medications, labs) alongside unstructured voice-documented symptoms. Key capabilities:

Differential diagnosis: Ranked possibilities with confidence scores and clinical reasoning
Real-time drug interaction checking: 8+ medication database, <2 second response
Evidence-based recommendations: Grounded in clinical guidelines
Natural language understanding: Extracts structured data from conversational speech

Patient Dashboard

We built a sophisticated frontend dashboard that serves as the visual interface for clinical decision-making:

Command palette search (cmdk): Real-time patient lookup by name, ID, or complaint
Color-coded vital signs: Instant visual status indicators (BP, HR, temp, O2)
AI diagnosis panel: Toggleable synthesis of all patient data into differential diagnoses

End-to-end orchestration in <1500ms average:

Snap Spectacles captures voice via built-in STT
Backend queries Supabase + Letta Stateful Agent in parallel
Gemini processes context and generates clinical response
Fish Audio streams speech with appropriate voice profile
Coordinated AR overlays display while audio plays
Dashboard panels update in real-time with coordinated AR overlays

Challenges we ran into

Snap Spectacles/Lens Studio Integration

Documentation for real-time backend communication is sparse, and Lens Studio's networking APIs are limited. We had to really search through past projects to find examples of what to do, and AI-assisted coding was near-impossible due to lack of support, so we had to manually code all of the Lens Studio-facing scripts. We had a steep learning curve regarding Unity since none of us had any major experiences regarding it.

Latency Crisis

Initial 3-4 second latency from sequential API calls (Letta → Gemini → Fish Audio → Snap). We parallelized everything and implemented aggressive caching, cutting latency from 4300ms to <1500ms.

CV Model

MediaPipe Hands provides 21 hand landmarks but doesn't detect pressure/touch intensity, which is critical for pulse-taking training. We needed to determine if fingers were applying correct pressure (too light = can't feel pulse, too heavy = occludes artery) without any force sensors. As a solution, we developed a multi-factor pressure estimation algorithm combining three heuristic signals: finger curvature (50% weight), visibility score (25% weight), and depth compression (25% weight).

Accomplishments that we're proud of

Performance Metrics:

<1500ms average latency from voice command to AI response to AR display
<2 second latency even during simultaneous drug interaction checking + differential diagnosis
~90% CV accuracy on body landmark detection

Technical Innovations:

First hands-free AR medical assistant combining training + clinical decision support in one platform
Live drug interaction prevention (ex. caught Ibuprofen + Warfarin, suggested acetaminophen)
Scalable architecture where new procedures require defining landmarks/criteria, not rebuilding
Decades of patient context parseable through Letta

Execution Excellence:

Delivered fully functional dual-mode system in 48 hours with real CV, AI, voice, and AR.

What we learned

Technical Insights:

Context compression through Letta's summarization enables production-ready long-term memory
Parallel orchestration is ~4x faster than sequential processing
Custom CV training delivered massive accuracy gains in specialized medical domain

Healthcare Domain:

Speed and flexibility beat feature count. Nurses want 5 instant tools over 20 with navigation.
Error prevention impresses more than AI diagnostics.

Product Design:

Multimodal (voice + AR) is essential. Either alone by themselves fail, especially in a context-heavy medical setting.
Seamless mode switching with preserved context maintains trust in medical settings.

What's next for MediSnap

Immediate Next Steps (3-6 months):

User testing with university nursing program on 3-5 core procedures
Build 5-7 additional procedure modules based on curriculum priorities
Refine CV models across diverse skin tones, lighting, and patient demographics
Implement data encryption and anonymization for patient data handling

Growth Phase (6-12 months):

Pilot with 1-2 nursing schools for independent skills lab practice
Clinical workflow study with 20-30 nurses measuring time savings and user satisfaction
Expand drug interaction database from 8 to 2,000+ medications

Longer-Term Vision (12+ months):

Teaching hospital collaboration for supervised clinical trial programs
Home healthcare expansion where solo nurses benefit from hands-free support
Institutional licensing for nursing schools and hospitals

Healthcare adoption is slow and cautious. We'll prove value in education first, gather evidence, and expand gradually. MediSnap keeps hands on patients, eyes on care, and AI at their service.

Built With

express.js
fish-audio
gemini
javascript
lens-studio
letta
next.js
snap
snap-spectacles
typescript

Submitted to

Cal Hacks 12.0
- Winner Snap: Best Use of Snap Spectacles

Created by

I developed the CV and ML models and supporting scripts to calculate details like pressure.

Devin Cheng
EECS @ Berkeley, Regents' and Chancellor's Scholar
I worked on the full backend infrastructure and integrations with Snap.

Jason Yi
EECS @ UC Berkeley
I worked on the Snap AR building.

Delbert Tran

Updates

Jason Yi started this project — Oct 26, 2025 11:58 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.