Inspiration

Doctors spend a significant portion of their time on manual documentation instead of patient care. Clinical note-taking is repetitive, time-consuming, and prone to inconsistency. This inspired us to build a system that can convert doctor voice notes into structured medical documentation instantly.

What it does

MedScript AI converts doctor voice input into structured clinical outputs including:

  • Clinical Summary
  • SOAP Note (Subjective, Objective, Assessment, Plan)
  • Medication Safety Check
  • Structured JSON for system integration

This enables faster, consistent, and standardized medical documentation.

How we built it

The system works in two stages:

  1. Speech Recognition
    Doctor voice notes are converted into text using Whisper.

  2. AI Structuring
    The transcript is processed using a language model (via Groq API) to generate structured clinical documentation.

The interface is built using Gradio and deployed on Hugging Face Spaces for easy access.

Challenges we ran into

  • Handling transcription errors in medical terms
  • Preventing AI hallucinations in clinical outputs
  • Structuring outputs into strict formats like SOAP
  • Ensuring consistent and reliable results across inputs

What we learned

  • How to combine speech AI with language models in a real-world workflow
  • Importance of constraining AI outputs for reliability
  • Building end-to-end AI systems, not just isolated models
  • Designing simple but effective user interfaces for practical use

What's next

  • Integration with Electronic Health Record (EHR) systems
  • Multi-language doctor support
  • Real-time transcription during consultations
  • Advanced clinical decision support with safety constraints ## What it does

MedScript AI converts doctor voice notes into structured clinical documentation. It generates a clinical summary, SOAP note (Subjective, Objective, Assessment, Plan), medication safety check, and structured JSON for integration. This reduces manual documentation effort and improves consistency in medical records.


How we built it

The system is built in two stages. First, Whisper is used for speech-to-text conversion of doctor voice input. Second, a language model via Groq API processes the transcript and generates structured clinical outputs. The interface is built using Gradio and deployed on Hugging Face Spaces for accessibility.


Challenges we ran into

We faced challenges with transcription accuracy for medical terms and units. Another major issue was controlling AI hallucinations and ensuring the model does not generate information beyond the provided transcript. Structuring outputs into strict clinical formats like SOAP also required careful prompt design.


Accomplishments that we're proud of

We successfully built a working end-to-end system that converts voice input into structured clinical documentation. The system produces multiple formats (summary, SOAP, safety, JSON) in a single pipeline and runs as a live deployed application. Achieving reliable outputs with constrained AI behavior is a key accomplishment.


What we learned

We learned how to combine speech recognition and language models into a real-world workflow. We understood the importance of prompt constraints to avoid hallucinations and the need for structured outputs in practical applications. We also gained experience in deploying AI applications using Hugging Face Spaces.


What's next for MedScript AI – Clinical Documentation Assistant

We plan to integrate the system with Electronic Health Record (EHR) systems, support multiple languages, and enable real-time transcription during consultations. Future improvements also include adding clinical decision support with strict safety controls.

Built With

Share this project:

Updates