Inspiration Doctors spend a significant portion of their day on clinical documentation rather than patient care. SOAP notes, referral letters, prescription records; the paperwork piles up fast. We built VoxMed to give that time back.

What it does

VoxMed is an AI medical scribe that turns a consultation recording into a complete clinical report. Upload an audio file, and VoxMed transcribes it with automatic speaker diarization (Doctor vs Patient), extracts a structured SOAP note, validates the treatment plan for safety concerns, and suggests ICD-10 diagnosis codes. The doctor reviews, edits, and generates a branded PDF report and referral letter in minutes. Supports English, French, Arabic, and Vietnamese.

How we built it

Frontend in Next.js with Zustand for state management. Backend runs on Supabase Edge Functions. Audio uploads directly to Supabase Storage to bypass serverless size limits.

The AI pipeline uses two Qwen models via DashScope. Qwen3.5-Omni handles transcription with a two-pass approach: first detecting the language, then transcribing with a language-specific prompt that labels Doctor and Patient turns with timestamps. Qwen3-max handles structured reasoning: SOAP extraction with strict JSON schema enforcement, plan validation, and ICD-10 suggestions. The last two run as background tasks so doctors can start editing immediately. Clinical PDFs are generated with pdf-lib running natively in Deno.

Challenges we ran into

Getting accurate speaker diarization required a two-pass approach. A single prompt asking Qwen3.5-Omni to transcribe and label speakers simultaneously produced inconsistent results. We split it: a fast first pass detects the language, then a second pass transcribes using a language-specific prompt tuned to the correct Doctor and Patient labels for that language. This improved accuracy significantly but added latency we had to budget carefully against the Edge Function timeout.

Accomplishments that we're proud of

A working end-to-end pipeline from raw audio to a downloadable clinical PDF, in under three minutes for a typical consultation. The plan validation step caught a drug-diagnosis mismatch in a test recording that could easily be missed. Full multilingual support including Arabic RTL layout, language-specific prompts, and localised ICD-10 descriptions.

What we learned

Use the right model for each task. Qwen3.5-Omni is excellent at understanding audio but not ideal for structured extraction. Qwen3-max is faster and more consistent for JSON reasoning tasks. Separating the pipeline into focused model calls with clear input/output contracts made everything easier to debug and improve.

What's next for VoxMed

Live consultation mode (record directly in-app instead of uploading). EHR integration to push notes directly to patient records. Medication interaction checking in the plan validation step. And replacing the mock patient data with a real patient lookup system.

Built With

  • nextjs
  • qwen3-max
  • qwen3.5-omni
  • react
  • supabase
  • supabase-edge-functions
  • ts
  • vercel
Share this project:

Updates