MediLink AI: The Voice-First Spatial Medical Compass 🧭

πŸ’‘ Inspiration

It started with a painful realization: Medical emergencies drastically reduce cognitive load. Existing platforms (Doctolib, Google Maps) assume users are calm and tech-savvy. But when a patient is in acute pain or holding a crying child, filtering by zip code is a friction point. We asked: What if finding a doctor was as natural as saying, "I hurt"? We aimed to bridge the biological signal (voice/pain) and the spatial solution (nearest doctor) using a native multimodal AI.

πŸ’» What it does

MediLink AI is a "Zero-UI" triage assistant powered by Gemini 3 Pro.

Multimodal Ingestion: The user speaks naturally. Gemini 3 Pro processes the raw audio directly (without separate STT/TTS layers), capturing tone and urgency nuances.

Clinical Reasoning: The model analyzes the symptoms, infers the medical specialty (e.g., Orthopedist vs. Neurologist), and assigns an urgency score.

Locates & Routes: The system grabs real-time GPS coordinates and renders an interactive map with the optimal route to the nearest specialist, factoring in live traffic.

βš™οΈ How we built it

We architected a Spatial-Audio Web Application using a modern stack:

Frontend: React, TypeScript, and Tailwind CSS ("Glassmorphism" design).

Core Intelligence: Gemini 3 Pro (via Vertex AI). We utilize its native audio capabilities for low-latency, empathetic dialogue and its Function Calling feature to extract structured JSON data (Specialty, Urgency) from the conversation in real-time.

Mapping: Google Maps JavaScript API, Directions Service, and Distance Matrix API.

Orchestration: Custom React Hooks synchronize the Gemini audio stream with visual DOM updates.

The "Secret Sauce": Hybrid Routing Algorithm To ensure reliability:

Primary Layer: Google Distance Matrix API for live traffic data.

Fallback Layer: Client-side Haversine Formula (Great-Circle distance) to sort doctors instantly if API quotas are hit or connectivity drops.

🚧 Challenges we ran into

The biggest hurdle was the Multimodal Sync/Race Condition. Gemini 3 Pro generates responses incredibly fast. Initially, the agent would say "I found a doctor..." before the map finished rendering the route.

The Fix: We implemented await locks on the client-side function execution. We force the audio stream to buffer until the Google Maps DirectionsService returns a valid OK status, ensuring the visual map and audio guidance are perfectly synchronized.

🧠 What we learned

Native Multimodality > Pipelines: Using Gemini 3 Pro’s native audio understanding reduced latency significantly compared to daisy-chaining Whisper (STT) -> LLM -> TTS.

Resilience Engineering: The Haversine fallback proved that in healthcare tech, "Error 404" is not an option.

Empathy via Latency: The speed of Gemini 3 Pro creates a conversation that feels "live" rather than "processed," which is crucial for reducing patient anxiety.

πŸš€ What's next for MediLink AI

Visual Diagnosis (Gemini Vision): Leveraging Gemini 3 Pro’s vision capabilities to allow users to show visible symptoms (rashes, swelling) via camera for higher-accuracy triage.

Telemedicine Hand-off: Auto-generating video links if no physical doctor is within 15km.

Hospital ERP Integration: Connecting to HL7/FHIR standards for real-time waiting room analytics.

Built With

Share this project:

Updates