AWS Nova AI Hackathon

Insights and Insertion
SOAP Generated

The Story Behind MedScribe+

What Inspired Us

This project is personal.

In the span of a few months, three people close to me went through the same broken system.

A doctor under pressure ordered unnecessary tests for me. My partner wasn't feeling better after treatment and only found out something had been missed when he went to a different hospital. And my wife, heavily pregnant, was prescribed hypertensive medication she didn't need. We only caught it because I happened to have a pharmacist friend I could call.

Most people don't have that.

The problem isn't careless doctors. It's overwhelmed ones. Studies show physicians spend nearly 2 hours on documentation for every 1 hour of patient care. That cognitive load has consequences for the doctor and the patient.

MedScribe+ is the second pair of eyes that should have been there.

What We Built

MedScribe+ is an AI-powered clinical documentation and quality assurance system that sits inside a physician's workflow.

The pipeline:

Audio → Transcript — Nova 2 Sonic transcribes the consultation in real time (WebSocket) or from an uploaded file
Transcript → SOAP Note — Nova 2 Lite generates a structured clinical note with ICD-10 codes, CPT codes, medications, and follow-up instructions
RAG Enrichment — Nova Multimodal Embeddings — Before note generation, the system silently retrieves relevant clinical guidelines and drug references from a vector store to enrich context
Quality Evaluation — A dedicated EvaluationAgent runs four checks in parallel:
- Hallucination detection (claims not grounded in the transcript)
- Drug interaction screening against a clinical reference database
- Guideline alignment scoring per detected condition
- Documentation completeness scoring
EHR Insert — Physician reviews, approves with one click, note is committed with a full audit trail

Tech stack:

Amazon Nova 2 Sonic — real-time audio transcription (bidirectional streaming)
Amazon Nova 2 Lite — SOAP generation and clinical reasoning via Bedrock
Amazon Nova MultiModal Embeddings — 1024-dim vectors for RAG retrieval
FastAPI + WebSocket — real-time and file-upload paths
React + TypeScript — physician-facing UI
Redis — session caching and transcript accumulation
Clean architecture — agents, tools, services, prompts fully separated
Pytest - Unit testing suite

What We Learned

Agentic pipelines are only as reliable as their sequencing. The biggest engineering challenge wasn't the LLM calls — it was ensuring the evaluation agent always ran after SOAP generation, with the correct context, regardless of how Nova batched its tool calls. We built explicit result-storage patterns and re-run guards to handle race conditions in tool execution ordering.

RAG enrichment needs to be invisible. Early versions surfaced retrieval status to the physician. That was wrong. The right design is silent enrichment — retrieve, use the context internally, discard gracefully if nothing useful comes back. The physician should never know or care.

Hallucination detection catches real noise. Nova 2 Sonic occasionally picked up ambient audio artifacts as transcript text. The evaluation layer correctly flagged a spurious "cold" reference that appeared in the transcript — grounded: false — and still scored LOW overall risk. The system worked exactly as designed.

Challenges We Faced

Sonic batching vs. sequential tool execution — The evaluation LLM would batch all four tool calls in a single response. aggregate_scores would fire before the other three had stored their results. We solved this with a result-storage pattern where each tool writes to a shared kwargs store immediately on completion, with a guard that detects and recovers from out-of-order execution.

SOAP note context threading — The generated SOAP note needed to reach the evaluation agent, but the ScribeAgent never updated its own kwargs after tool execution. We solved this by having SOAPTools write the result directly to its own kwargs on completion, and giving ScribeEvaluationTools a direct reference to read from — eliminating the cache dependency entirely.

Guideline alignment with real conditions — The keyword matching against clinical guidelines was strict. "Type 2 Diabetes Mellitus" in the SOAP note wouldn't match "type 2 diabetes" in the guidelines dictionary. We normalized condition strings at multiple levels — SOAP output, patient context, and the evaluation agent's condition inference — to ensure consistent matching.

Building two audio paths — The file upload path and the real-time WebSocket path needed to converge on the same evaluation pipeline. Keeping the architecture clean required careful separation: Sonic handles audio, Bedrock handles reasoning, and the agent layer never cares which path produced the transcript.

The Result

A system that a physician can actually use — not a demo, a workflow. Drop an audio file or speak live, and in seconds you have a verified, EHR-ready SOAP note with a quality score, drug safety check, and guideline gap analysis.

Built because we lived the alternative. And because most people don't have a pharmacist friend to call.