Inspiration
One of our team members has nurses in the family and heard their frustrations for years: constantly juggling devices while trying to focus on patients, breaking sterile technique to look up information, never enough instructor time for students. During a hospital visit, we watched it happen: a nurse interrupted patient care 5+ times in one hour to verify medications and check allergies, then struggled to access records while wearing sterile gloves, being forced to choose between breaking sterile technique or asking for help. Nursing students practiced on mannequins without feedback, waiting for instructors stretched too thin. Healthcare workers spend more time managing information systems than with patients. We asked: what if clinical knowledge was hands-free, and training happened with AR guidance instead of textbooks? We built MediSnap for the nurses in our family, and every nurse who deserves better.
What it does
MediSnap transforms Snap Spectacles into a hands-free AI medical assistant with two core modes:
Training Mode
Provides immersive, real-time guidance for learning medical procedures. While our demo focuses on pulse-taking, the architecture supports any physical skill: IV insertion, wound care, CPR, catheterization. When activated by voice, it:
- Displays AR overlays with anatomical markers and directional arrows showing exactly where to position hands
- Provides step-by-step voice instructions that keep eyes on the patient
- Delivers live technique feedback using MediaPipe Hands computer vision with multi-factor pressure detection
Clinical Mode
Serves practicing healthcare workers by:
- Retrieving patient records in <3 seconds via voice command and displaying AR patient cards with critical data
- Processing voice-documented symptoms in real-time and generating AI diagnoses with clinical reasoning
- Performing automated drug interaction and allergy checking and blocking dangerous prescriptions
- Maintaining conversation context across entire patient sessions
The system scales from bedside care to home health, emergency medicine, and telemedicine. MediSnap is the only solution combining hands-free voice interaction + intelligent AR overlays + AI clinical reasoning + dual training/clinical functionality.
How we built it
Architecture: Snap Spectacles with custom Lens Studio app → Node.js/Express backend on Railway with LettaCloud/FishAudio/Gemini → Supabase PostgreSQL database
Snap Spectacles
We pushed Snap Spectacles beyond standard AR filters by using the computer vision model MediaPipe Hands to detect body landmarks in real-time. We designed three types of medical-optimized overlays:
- Pulsing circles for pulse points, injection sites, measurement locations
- 3D arrows showing optimal hand positioning and movement paths
- Floating patient data displays in upper peripheral vision
Fish Audio
Our AR audio optimizations include:
- Different voices for different agents (a calm "Training Coach" and confident "Clinical Advisor") to help users mentally separate contexts
- Volume normalization for noisy clinical environments
- Streaming integration for <800ms time-to-first-word latency
Letta Stateful Agent
Letta powers dual memory systems for patient clinical history and student training progress:
- Manages decades per patient with rolling window (20 turns/4000 tokens)
- Preserves full context when nurses hand off patients between shifts
- Stores finger placement corrections and technique feedback from CV
- Considers complete patient history for clinical recommendations
Gemini API
Gemini powers clinical reasoning by processing structured data (vital signs, medications, labs) alongside unstructured voice-documented symptoms. Key capabilities:
- Differential diagnosis: Ranked possibilities with confidence scores and clinical reasoning
- Real-time drug interaction checking: 8+ medication database, <2 second response
- Evidence-based recommendations: Grounded in clinical guidelines
- Natural language understanding: Extracts structured data from conversational speech
Patient Dashboard
We built a sophisticated frontend dashboard that serves as the visual interface for clinical decision-making:
- Command palette search (cmdk): Real-time patient lookup by name, ID, or complaint
- Color-coded vital signs: Instant visual status indicators (BP, HR, temp, O2)
- AI diagnosis panel: Toggleable synthesis of all patient data into differential diagnoses
End-to-end orchestration in <1500ms average:
- Snap Spectacles captures voice via built-in STT
- Backend queries Supabase + Letta Stateful Agent in parallel
- Gemini processes context and generates clinical response
- Fish Audio streams speech with appropriate voice profile
- Coordinated AR overlays display while audio plays
- Dashboard panels update in real-time with coordinated AR overlays
Challenges we ran into
Snap Spectacles/Lens Studio Integration
Documentation for real-time backend communication is sparse, and Lens Studio's networking APIs are limited. We had to really search through past projects to find examples of what to do, and AI-assisted coding was near-impossible due to lack of support, so we had to manually code all of the Lens Studio-facing scripts. We had a steep learning curve regarding Unity since none of us had any major experiences regarding it.
Latency Crisis
Initial 3-4 second latency from sequential API calls (Letta → Gemini → Fish Audio → Snap). We parallelized everything and implemented aggressive caching, cutting latency from 4300ms to <1500ms.
CV Model
MediaPipe Hands provides 21 hand landmarks but doesn't detect pressure/touch intensity, which is critical for pulse-taking training. We needed to determine if fingers were applying correct pressure (too light = can't feel pulse, too heavy = occludes artery) without any force sensors. As a solution, we developed a multi-factor pressure estimation algorithm combining three heuristic signals: finger curvature (50% weight), visibility score (25% weight), and depth compression (25% weight).
Accomplishments that we're proud of
Performance Metrics:
- <1500ms average latency from voice command to AI response to AR display
- <2 second latency even during simultaneous drug interaction checking + differential diagnosis
- ~90% CV accuracy on body landmark detection
Technical Innovations:
- First hands-free AR medical assistant combining training + clinical decision support in one platform
- Live drug interaction prevention (ex. caught Ibuprofen + Warfarin, suggested acetaminophen)
- Scalable architecture where new procedures require defining landmarks/criteria, not rebuilding
- Decades of patient context parseable through Letta
Execution Excellence:
Delivered fully functional dual-mode system in 48 hours with real CV, AI, voice, and AR.
What we learned
Technical Insights:
- Context compression through Letta's summarization enables production-ready long-term memory
- Parallel orchestration is ~4x faster than sequential processing
- Custom CV training delivered massive accuracy gains in specialized medical domain
Healthcare Domain:
- Speed and flexibility beat feature count. Nurses want 5 instant tools over 20 with navigation.
- Error prevention impresses more than AI diagnostics.
Product Design:
- Multimodal (voice + AR) is essential. Either alone by themselves fail, especially in a context-heavy medical setting.
- Seamless mode switching with preserved context maintains trust in medical settings.
What's next for MediSnap
Immediate Next Steps (3-6 months):
- User testing with university nursing program on 3-5 core procedures
- Build 5-7 additional procedure modules based on curriculum priorities
- Refine CV models across diverse skin tones, lighting, and patient demographics
- Implement data encryption and anonymization for patient data handling
Growth Phase (6-12 months):
- Pilot with 1-2 nursing schools for independent skills lab practice
- Clinical workflow study with 20-30 nurses measuring time savings and user satisfaction
- Expand drug interaction database from 8 to 2,000+ medications
Longer-Term Vision (12+ months):
- Teaching hospital collaboration for supervised clinical trial programs
- Home healthcare expansion where solo nurses benefit from hands-free support
- Institutional licensing for nursing schools and hospitals
Healthcare adoption is slow and cautious. We'll prove value in education first, gather evidence, and expand gradually. MediSnap keeps hands on patients, eyes on care, and AI at their service.
Built With
- express.js
- fish-audio
- gemini
- javascript
- lens-studio
- letta
- next.js
- snap
- snap-spectacles
- typescript


Log in or sign up for Devpost to join the conversation.