MediNova

Inspiration

Physicians in developing countries like Pakistan, India, and Sub-Saharan Africa are often the only doctor serving thousands of patients. Despite this crushing load, they spend 35–40% of their time on documentation — typing the same structured data into Electronic Health Record (EHR) systems, over and over, instead of treating patients.

We asked: what if an AI agent could listen to a doctor describe a patient, see the uploaded scans and lab reports, reason through the clinical picture, and then act on the EHR system automatically — all in real time?

That question became MediNova.

What it does

MediNova is a multimodal, multi-agent clinical intelligence system built entirely on Amazon Nova. A doctor speaks a patient case description in natural language. The system simultaneously:

Transcribes and understands the voice input via Amazon Nova 2 Sonic (speech-to-speech, with crossmodal text/voice switching)
Retrieves similar past cases using Amazon Nova Multimodal Embeddings — searching a knowledge base using both the spoken description and an uploaded ECG/X-ray image as a unified cross-modal query
Reasons through differential diagnoses, flags drug interactions, and identifies missing workup items using Amazon Nova 2 Lite with extended thinking enabled at medium budget — the reasoning trace is surfaced to the doctor in real time
Automatically fills the EHR encounter form on the hospital's web portal using Amazon Nova Act, with >90% task reliability at scale
Reads the clinical summary back to the doctor via Nova 2 Sonic voice output, completing the full voice-in, voice-out loop

The result: a complete clinical encounter — from spoken description to structured EHR entry — in under 60 seconds, with the AI's full reasoning chain visible and auditable.

How we built it

Architecture

MediNova uses a three-agent orchestration pattern built on the Strands Agents SDK deployed to Amazon Bedrock AgentCore:

Agent 1 — Intake Agent (Nova 2 Sonic + Nova Multimodal Embeddings)

Receives bidirectional audio stream via LiveKit integration (full-duplex, voice activity detection included)
Processes crossmodal input: spoken case description + uploaded images (ECG, X-ray, lab PDFs)
Generates unified embeddings via Nova Multimodal Embeddings and queries an Amazon OpenSearch Service vector index of 500+ synthetic past cases — retrieving the top-3 similar cases by image+text combined similarity

Agent 2 — Reasoning Agent (Nova 2 Lite)

Model ID: us.amazon.nova-2-lite-v1:0
Extended thinking enabled at medium budget — exposes the step-by-step reasoning trace
Uses built-in web grounding tool to pull live clinical guidelines (CDC, WHO) as context during reasoning
Uses built-in code interpreter to run basic statistical risk scoring (HEART score, Wells criteria)
Returns structured JSON: primary diagnosis, differential, risk flags, recommended workup, proposed EHR note

Agent 3 — Automation Agent (Nova Act)

Receives structured output from the reasoning agent
Opens the EHR web interface and autonomously fills: Chief Complaint, HPI, Assessment & Plan, and Orders fields
Deployed via Nova Act IDE extension → Amazon ECR → Bedrock AgentCore Runtime
Human-in-the-loop escalation built in: any field with confidence < 0.85 is flagged for doctor confirmation before submission

Stack

Orchestration: Strands Agents SDK with swarm, use_agent, and think tools
Voice: Amazon Nova 2 Sonic (amazon.nova-2-sonic-v1:0) + LiveKit Agents framework
Reasoning: Amazon Nova 2 Lite with reasoningConfig: enabled, maxReasoningEffort: medium
Multimodal search: Nova Multimodal Embeddings + Amazon OpenSearch Serverless
UI automation: Amazon Nova Act (playground → VS Code extension → Bedrock AgentCore)
Storage: Amazon S3 for scan uploads, Amazon S3 Vectors for embedding store
Backend: Python 3.12, FastAPI
Frontend: React + LiveKit JS SDK for the voice interface

Challenges we ran into

Cross-modal embedding alignment: Getting meaningful similarity scores when the query is a spoken sentence but the indexed content contains both text and image embeddings required careful normalization of the embedding space. Nova Multimodal Embeddings handles this natively through its unified vector space, but tuning the retrieval threshold for clinical relevance (where a false positive is dangerous) required careful calibration.

Extended thinking latency vs. UX: When Nova 2 Lite's extended thinking is enabled at medium budget, the reasoning trace can take 8–12 seconds on complex cases. We solved this by streaming the thinking text to the UI in real time — so the doctor watches the AI think rather than staring at a loading spinner. This turned a UX problem into a feature.

Nova Act reliability on dynamic EHR forms: Real EHR interfaces are notoriously inconsistent — fields appear conditionally, dropdowns load asynchronously, and session timeouts are aggressive. We used Nova Act's notebook-style builder in the IDE extension to test and harden each step individually, and built in retry logic with human escalation for steps that failed twice.

Voice + reasoning latency in a single pipeline: Chaining Nova Sonic (real-time stream) → Nova 2 Lite (reasoning, ~10s) → Nova Act (browser automation, ~20s) → Nova Sonic (output) creates a ~35 second end-to-end pipeline. We parallelized the intake and EHR pre-loading steps to bring perceived latency under 20 seconds for the doctor.

Accomplishments that we're proud of

First known use of Nova Multimodal Embeddings for cross-modal clinical search — querying a patient database with an ECG image as the input, not text keywords
Full four-model Nova integration in a single coherent system: Sonic + 2 Lite + Act + Multimodal Embeddings
Auditable AI reasoning — the extended thinking trace is stored per encounter as a compliance artifact, answering the "why did the AI say that?" question that blocks clinical AI adoption
Sub-60-second full encounter loop from voice description to completed EHR entry
Polyglot support via Nova 2 Sonic — the same system works in English, Urdu/Hindi, Spanish, and Portuguese without model switching, making it viable for global deployment

What we learned

Amazon Nova 2 Lite's built-in web grounding and code interpreter tools are dramatically underutilized — combining them with extended thinking creates a reasoning agent that can cite sources and verify its own outputs
Strands Agents' swarm primitive makes it trivially easy to spawn parallel sub-agents for concurrent tasks — we used this to run EHR pre-population in parallel with the reasoning trace
Nova Act's "web gym" for testing agents is genuinely one of the most thoughtful developer experiences in the agentic AI ecosystem — prototype in the playground, harden in the IDE, ship to production in one click
The crossmodal feature in Nova 2 Sonic (switching between text and voice mid-session) unlocks interaction patterns that no prior voice AI system supported — a doctor can type a complex medication name and speak everything else

What's next for MediNova

Integration with real EHR systems: Epic and OpenMRS (open-source, used across the developing world) are the first targets
Fine-tuning on clinical data: Nova 2 Lite's customization support on Amazon Bedrock and SageMaker AI means we can fine-tune on de-identified clinical notes for specialty-specific reasoning (cardiology, emergency medicine)
Fleet deployment: Nova Act's ability to manage fleets of agents means one MediNova instance could handle documentation for an entire hospital ward simultaneously
Regulatory pathway: Pursuing FDA 510(k) exemption as a clinical decision support tool (not a diagnostic) — the auditable reasoning trace is specifically designed to support this classification
Community health workers: A simplified voice-only version for community health workers in rural Pakistan and India who have smartphones but no formal medical training — Nova Sonic's Hindi and Urdu support makes this viable today

Built With

amazon-bedrock
amazon-bedrock-agentcore
amazon-ecr
amazon-nova-2-lite
amazon-nova-2-sonic
amazon-nova-act
amazon-nova-multimodal-embeddings
amazon-opensearch-service
docker
fastapi
livekit
python
react
s3
strands-agents

Updates

Rayyan Khan started this project — Mar 16, 2026 01:15 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.