MediLink AI: The Voice-First Spatial Medical Compass π§
π‘ Inspiration
It started with a painful realization: Medical emergencies drastically reduce cognitive load. Existing platforms (Doctolib, Google Maps) assume users are calm and tech-savvy. But when a patient is in acute pain or holding a crying child, filtering by zip code is a friction point. We asked: What if finding a doctor was as natural as saying, "I hurt"? We aimed to bridge the biological signal (voice/pain) and the spatial solution (nearest doctor) using a native multimodal AI.
π» What it does
MediLink AI is a "Zero-UI" triage assistant powered by Gemini 3 Pro.
Multimodal Ingestion: The user speaks naturally. Gemini 3 Pro processes the raw audio directly (without separate STT/TTS layers), capturing tone and urgency nuances.
Clinical Reasoning: The model analyzes the symptoms, infers the medical specialty (e.g., Orthopedist vs. Neurologist), and assigns an urgency score.
Locates & Routes: The system grabs real-time GPS coordinates and renders an interactive map with the optimal route to the nearest specialist, factoring in live traffic.
βοΈ How we built it
We architected a Spatial-Audio Web Application using a modern stack:
Frontend: React, TypeScript, and Tailwind CSS ("Glassmorphism" design).
Core Intelligence: Gemini 3 Pro (via Vertex AI). We utilize its native audio capabilities for low-latency, empathetic dialogue and its Function Calling feature to extract structured JSON data (Specialty, Urgency) from the conversation in real-time.
Mapping: Google Maps JavaScript API, Directions Service, and Distance Matrix API.
Orchestration: Custom React Hooks synchronize the Gemini audio stream with visual DOM updates.
The "Secret Sauce": Hybrid Routing Algorithm To ensure reliability:
Primary Layer: Google Distance Matrix API for live traffic data.
Fallback Layer: Client-side Haversine Formula (Great-Circle distance) to sort doctors instantly if API quotas are hit or connectivity drops.
π§ Challenges we ran into
The biggest hurdle was the Multimodal Sync/Race Condition. Gemini 3 Pro generates responses incredibly fast. Initially, the agent would say "I found a doctor..." before the map finished rendering the route.
The Fix: We implemented await locks on the client-side function execution. We force the audio stream to buffer until the Google Maps DirectionsService returns a valid OK status, ensuring the visual map and audio guidance are perfectly synchronized.
π§ What we learned
Native Multimodality > Pipelines: Using Gemini 3 Proβs native audio understanding reduced latency significantly compared to daisy-chaining Whisper (STT) -> LLM -> TTS.
Resilience Engineering: The Haversine fallback proved that in healthcare tech, "Error 404" is not an option.
Empathy via Latency: The speed of Gemini 3 Pro creates a conversation that feels "live" rather than "processed," which is crucial for reducing patient anxiety.
π What's next for MediLink AI
Visual Diagnosis (Gemini Vision): Leveraging Gemini 3 Proβs vision capabilities to allow users to show visible symptoms (rashes, swelling) via camera for higher-accuracy triage.
Telemedicine Hand-off: Auto-generating video links if no physical doctor is within 15km.
Hospital ERP Integration: Connecting to HL7/FHIR standards for real-time waiting room analytics.
Log in or sign up for Devpost to join the conversation.