Inspiration
Amount of time wasted in doing non-doctor things (like transcribing and verifying icd10 codes) to make patient care more available
What it does
It is interactive dashboard designed to survive a ED triage in a multi-room / multi-bay emergency department where multiple doctor-patient pairs are triaging simultaneously, each with background noise, overlapping speech, and multilingual patients.
How we built it
Tech Stack
Frontend
- Next.js (App Router) + React Hooks (
useState,useRef,useEffect) + Tailwind CSS - HTML5 Web Audio API for acoustic noise gating, recording buffers, and instantaneous byte-range playback
- Dark mode clinical UI theme
Backend (runs on Google Colab T4 GPU, 16GB VRAM)
- FastAPI + uvicorn + websockets + pyngrok (ngrok tunnel on port 8000)
- nvidia/nemotron-speech-streaming-en-0.6b — audio→transcript only: streaming STT with word-level timestamps
- google/medgemma-4b — all medical NLP: ICD-10 encoding, summarization, ambiguity flagging (4-bit/8-bit quantized to fit alongside Nemotron)
Communication
- WebSocket endpoint:
/ws/triage?pair_id={pair_id} - Frontend sends binary audio chunks tagged with
pair_id; backend returns structured JSON scoped to that pair
Challenges we ran into
Finetuning nemotron 0.6b from whisper on dataset of 272 doctor-patient conversations and getthing medgemma to output in correct json format
Accomplishments that we're proud of
making it "edge-compute" coz nemotron is a 600M parameter model , which can even run on smartphones
What we learned
Edge compute is feasible and should be implemented, plus provides localized serving without any external calls to any AI giant
What's next for NotAScribe
change from nemotron to whisper model for 100+ language support, to accurately captures different accents and languages
Built With
- fastapi
- nextjs
- uvicorn
- websockets
Log in or sign up for Devpost to join the conversation.