Inspiration

I've seen nurses spend entire shifts toggling between screens — admitting patients in one system, prescribing meds in another, searching records in a third. It's death by a thousand clicks. I wanted to see if a conversational AI could collapse all of that into natural language. One sentence instead of five screens.

When I saw Amazon Nova had reasoning, embeddings, and voice models all under one roof, I realized I could build the full loop — a nurse types or speaks, the agent reasons about what to do, calls the right tools, searches records semantically, and responds. No tab switching.

What it does

NovaHealth is an AI nursing assistant where nurses manage patients through conversation. Instead of navigating forms, the nurse says:

"Admit John Doe, 45 male, room 305, pneumonia, and prescribe Amoxicillin 500mg every 8 hours."

The agent admits the patient, prescribes the medication, checks for allergy conflicts, and confirms — all in one turn. It also does semantic search across clinical documents (searching "heart attack" finds records about "Acute Myocardial Infarction") and supports real-time voice interaction so nurses can work hands-free during procedures.

How I built it

Three Amazon Nova models, each doing what it's best at:

  • Nova 2 Lite with Strands Agents SDK — the agent brain. It reasons about nurse requests, picks from 8 tools (admit, prescribe, administer, search, etc.), chains multiple operations, and streams responses via SSE.
  • Nova Multimodal Embeddings — indexes 46 clinical documents into 1024-dim vectors stored in PostgreSQL + pgvector. Powers semantic search that understands medical meaning, not just keywords.
  • Nova 2 Sonic — real-time bidirectional speech-to-speech via WebSocket. The nurse speaks, the agent responds with audio. Phone-call UX, not push-to-talk.

Frontend is Next.js with Framer Motion for a polished three-panel dashboard. Backend is FastAPI. Everything runs locally with Docker Compose.

Challenges I ran into

Nova 2 Sonic was humbling. Bidirectional streaming audio means juggling WebSocket state, audio buffering, microphone permissions, PCM encoding at 16kHz, and graceful fallback when connections drop. I spent more time on the voice pipeline than any other feature. The Strands BidiAgent helped a lot, but wiring it through a WebSocket relay with proper lifecycle management was the hardest integration of the project.

Getting the agent to reliably chain multiple tools from a single sentence also took iteration — mostly prompt tuning to make Nova 2 Lite understand that "admit and prescribe" means two sequential tool calls, not one.

Accomplishments that I'm proud of

The allergy safety check. When a nurse asks to give Margaret Chen Penicillin, the agent catches her documented Penicillin allergy, raises an alert, and asks for confirmation before proceeding. It's a small feature but it shows the agent reasoning about safety, not just executing commands blindly.

Also, semantic search working end-to-end — uploading a document, watching it flow through the ingest pipeline (upload → extract → embed → index), and then finding it by meaning instead of keywords. That felt like the future.

What I learned

Strands Agents SDK made the agentic side surprisingly smooth. Define tools as Python functions, hand them to the agent, and it figures out when to call them. Most of my debugging was prompt engineering, not plumbing.

I also learned that voice AI is a different beast from text AI. Latency, audio quality, state management — it's a whole layer of complexity on top of the reasoning pipeline. But getting it working was the most rewarding part.

What's next for NovaHealth

  • EHR integration via FHIR APIs (Epic, Cerner) to work with real hospital systems
  • Multilingual voice — Nova Sonic supports it, hospitals need it
  • HIPAA-compliant audit logging for every agent action
  • Pilot with a nursing informatics program to test in a real clinical setting

Built With

Share this project:

Updates