Inspiration

One of our team members has nurses in the family and heard their frustrations for years: constantly juggling devices while trying to focus on patients, breaking sterile technique to look up information, never enough instructor time for students. During a hospital visit, we watched it happen: a nurse interrupted patient care 5+ times in one hour to verify medications and check allergies, then struggled to access records while wearing sterile gloves, being forced to choose between breaking sterile technique or asking for help. Nursing students practiced on mannequins without feedback, waiting for instructors stretched too thin. Healthcare workers spend more time managing information systems than with patients. We asked: what if clinical knowledge was hands-free, and training happened with AR guidance instead of textbooks? We built MediSnap for the nurses in our family, and every nurse who deserves better.

What it does

MediSnap transforms Snap Spectacles into a hands-free AI medical assistant with two core modes:

Training Mode

Provides immersive, real-time guidance for learning medical procedures. While our demo focuses on pulse-taking, the architecture supports any physical skill: IV insertion, wound care, CPR, catheterization. When activated by voice, it:

  • Displays AR overlays with anatomical markers and directional arrows showing exactly where to position hands
  • Provides step-by-step voice instructions that keep eyes on the patient
  • Delivers live technique feedback using MediaPipe Hands computer vision with multi-factor pressure detection

Clinical Mode

Serves practicing healthcare workers by:

  • Retrieving patient records in <3 seconds via voice command and displaying AR patient cards with critical data
  • Processing voice-documented symptoms in real-time and generating AI diagnoses with clinical reasoning
  • Performing automated drug interaction and allergy checking and blocking dangerous prescriptions
  • Maintaining conversation context across entire patient sessions

The system scales from bedside care to home health, emergency medicine, and telemedicine. MediSnap is the only solution combining hands-free voice interaction + intelligent AR overlays + AI clinical reasoning + dual training/clinical functionality.

How we built it

Architecture: Snap Spectacles with custom Lens Studio app → Node.js/Express backend on Railway with LettaCloud/FishAudio/Gemini → Supabase PostgreSQL database

Snap Spectacles

We pushed Snap Spectacles beyond standard AR filters by using the computer vision model MediaPipe Hands to detect body landmarks in real-time. We designed three types of medical-optimized overlays:

  • Pulsing circles for pulse points, injection sites, measurement locations
  • 3D arrows showing optimal hand positioning and movement paths
  • Floating patient data displays in upper peripheral vision

Fish Audio

Our AR audio optimizations include:

  • Different voices for different agents (a calm "Training Coach" and confident "Clinical Advisor") to help users mentally separate contexts
  • Volume normalization for noisy clinical environments
  • Streaming integration for <800ms time-to-first-word latency

Letta Stateful Agent

Letta powers dual memory systems for patient clinical history and student training progress:

  • Manages decades per patient with rolling window (20 turns/4000 tokens)
  • Preserves full context when nurses hand off patients between shifts
  • Stores finger placement corrections and technique feedback from CV
  • Considers complete patient history for clinical recommendations

Gemini API

Gemini powers clinical reasoning by processing structured data (vital signs, medications, labs) alongside unstructured voice-documented symptoms. Key capabilities:

  • Differential diagnosis: Ranked possibilities with confidence scores and clinical reasoning
  • Real-time drug interaction checking: 8+ medication database, <2 second response
  • Evidence-based recommendations: Grounded in clinical guidelines
  • Natural language understanding: Extracts structured data from conversational speech

Patient Dashboard

We built a sophisticated frontend dashboard that serves as the visual interface for clinical decision-making:

  • Command palette search (cmdk): Real-time patient lookup by name, ID, or complaint
  • Color-coded vital signs: Instant visual status indicators (BP, HR, temp, O2)
  • AI diagnosis panel: Toggleable synthesis of all patient data into differential diagnoses

End-to-end orchestration in <1500ms average:

  1. Snap Spectacles captures voice via built-in STT
  2. Backend queries Supabase + Letta Stateful Agent in parallel
  3. Gemini processes context and generates clinical response
  4. Fish Audio streams speech with appropriate voice profile
  5. Coordinated AR overlays display while audio plays
  6. Dashboard panels update in real-time with coordinated AR overlays

Challenges we ran into

Snap Spectacles/Lens Studio Integration

Documentation for real-time backend communication is sparse, and Lens Studio's networking APIs are limited. We had to really search through past projects to find examples of what to do, and AI-assisted coding was near-impossible due to lack of support, so we had to manually code all of the Lens Studio-facing scripts. We had a steep learning curve regarding Unity since none of us had any major experiences regarding it.

Latency Crisis

Initial 3-4 second latency from sequential API calls (Letta → Gemini → Fish Audio → Snap). We parallelized everything and implemented aggressive caching, cutting latency from 4300ms to <1500ms.

CV Model

MediaPipe Hands provides 21 hand landmarks but doesn't detect pressure/touch intensity, which is critical for pulse-taking training. We needed to determine if fingers were applying correct pressure (too light = can't feel pulse, too heavy = occludes artery) without any force sensors. As a solution, we developed a multi-factor pressure estimation algorithm combining three heuristic signals: finger curvature (50% weight), visibility score (25% weight), and depth compression (25% weight).

Accomplishments that we're proud of

Performance Metrics:

  • <1500ms average latency from voice command to AI response to AR display
  • <2 second latency even during simultaneous drug interaction checking + differential diagnosis
  • ~90% CV accuracy on body landmark detection

Technical Innovations:

  • First hands-free AR medical assistant combining training + clinical decision support in one platform
  • Live drug interaction prevention (ex. caught Ibuprofen + Warfarin, suggested acetaminophen)
  • Scalable architecture where new procedures require defining landmarks/criteria, not rebuilding
  • Decades of patient context parseable through Letta

Execution Excellence:

Delivered fully functional dual-mode system in 48 hours with real CV, AI, voice, and AR.

What we learned

Technical Insights:

  • Context compression through Letta's summarization enables production-ready long-term memory
  • Parallel orchestration is ~4x faster than sequential processing
  • Custom CV training delivered massive accuracy gains in specialized medical domain

Healthcare Domain:

  • Speed and flexibility beat feature count. Nurses want 5 instant tools over 20 with navigation.
  • Error prevention impresses more than AI diagnostics.

Product Design:

  • Multimodal (voice + AR) is essential. Either alone by themselves fail, especially in a context-heavy medical setting.
  • Seamless mode switching with preserved context maintains trust in medical settings.

What's next for MediSnap

Immediate Next Steps (3-6 months):

  • User testing with university nursing program on 3-5 core procedures
  • Build 5-7 additional procedure modules based on curriculum priorities
  • Refine CV models across diverse skin tones, lighting, and patient demographics
  • Implement data encryption and anonymization for patient data handling

Growth Phase (6-12 months):

  • Pilot with 1-2 nursing schools for independent skills lab practice
  • Clinical workflow study with 20-30 nurses measuring time savings and user satisfaction
  • Expand drug interaction database from 8 to 2,000+ medications

Longer-Term Vision (12+ months):

  • Teaching hospital collaboration for supervised clinical trial programs
  • Home healthcare expansion where solo nurses benefit from hands-free support
  • Institutional licensing for nursing schools and hospitals

Healthcare adoption is slow and cautious. We'll prove value in education first, gather evidence, and expand gradually. MediSnap keeps hands on patients, eyes on care, and AI at their service.

Built With

Share this project:

Updates