Inspiration

Amount of time wasted in doing non-doctor things (like transcribing and verifying icd10 codes) to make patient care more available

What it does

It is interactive dashboard designed to survive a ED triage in a multi-room / multi-bay emergency department where multiple doctor-patient pairs are triaging simultaneously, each with background noise, overlapping speech, and multilingual patients.

How we built it

Tech Stack

Frontend

  • Next.js (App Router) + React Hooks (useState, useRef, useEffect) + Tailwind CSS
  • HTML5 Web Audio API for acoustic noise gating, recording buffers, and instantaneous byte-range playback
  • Dark mode clinical UI theme

Backend (runs on Google Colab T4 GPU, 16GB VRAM)

  • FastAPI + uvicorn + websockets + pyngrok (ngrok tunnel on port 8000)
  • nvidia/nemotron-speech-streaming-en-0.6b — audio→transcript only: streaming STT with word-level timestamps
  • google/medgemma-4b — all medical NLP: ICD-10 encoding, summarization, ambiguity flagging (4-bit/8-bit quantized to fit alongside Nemotron)

Communication

  • WebSocket endpoint: /ws/triage?pair_id={pair_id}
  • Frontend sends binary audio chunks tagged with pair_id; backend returns structured JSON scoped to that pair

Challenges we ran into

Finetuning nemotron 0.6b from whisper on dataset of 272 doctor-patient conversations and getthing medgemma to output in correct json format

Accomplishments that we're proud of

making it "edge-compute" coz nemotron is a 600M parameter model , which can even run on smartphones

What we learned

Edge compute is feasible and should be implemented, plus provides localized serving without any external calls to any AI giant

What's next for NotAScribe

change from nemotron to whisper model for 100+ language support, to accurately captures different accents and languages

Built With

Share this project:

Updates