Inspiration

One in five patients makes a medication error in the week after hospital discharge. They come home with new pills, old pills, and a discharge summary that confuses the people who wrote it. Their doctor is too busy to call. The pharmacist doesn't have their chart. So they guess. And guessing about Metformin or Warfarin sends them right back to the hospital — the most expensive AI use case nobody has solved.

Could an LLM fix it? Yes — but one hallucinated drug interaction is worse than no AI at all in healthcare. We didn't need a smarter chatbot. We needed a system that physically can't fabricate a drug fact.

That became the design constraint: build an AI agent that cannot hallucinate. Not "tries not to" — cannot.

What it does

Medrec Superpower is a multi-agent healthcare AI on the Prompt Opinion platform. It reconciles a patient's pre-admission and discharge medication lists, runs deterministic drug-safety checks, and produces a clinician-defensible report with mandatory citations.

A patient asks: "Should I still be taking my Metformin?"

The system replies with a 4-card report:

  • Medication changes — STARTED / STOPPED / HOLD / DOSE CHANGE, each cited to MedlinePlus
  • Safety verdictclear, caution, or hold (which mechanically blocks the daily plan)
  • Daily plan — only when safe to generate
  • Questions to ask your doctor — grounded in the actual changes

The Coordinator never sees the patient's identity. It never composes a drug fact. It never writes a URL. Every output traces back to a deterministic source: the workspace FHIR server, RxNav, or MedlinePlus.

How we built it

Stack: Python 3.10+, FastMCP, FastAPI, Pydantic v2, httpx, structlog, tenacity, pyjwt. Strict mypy. Ruff with a maximalist rule set. 190 pytest cases. 85% coverage gate enforced in CI.

Three agents, bounded authority:

  • Reconciliation Coordinator (Gemini / Sonnet) — orchestrates the workflow, calls MCP tools, renders the report.
  • Drug Safety Specialist (P2, designed) — owns the binding SafetyVerdict.
  • Patient Educator (P1, configured) — 6th-grade-reading-level translation with mandatory citations.

Seven MCP tools: get_patient_context, get_pre_admit_meds, get_discharge_meds, parse_discharge_summary, lookup_rxnorm, check_interaction, get_drug_education_handout. Each returns a structured ToolResult envelope. Errors are typed, never opaque exceptions across the MCP boundary.

Open standards top to bottom:

  • MCP for the tool protocol
  • A2A for agent-to-agent handoffs
  • FHIR R4 for clinical data
  • SHARP-on-MCP for identity injection via HTTP headers (X-FHIR-Server-URL, X-FHIR-Access-Token, X-Patient-ID) — gated by the ai.promptopinion/fhir-context capability extension
  • SMART scopes for fine-grained authorization (patient/MedicationRequest.rs, etc.)

Five safety rules (R1–R5), mechanically enforced — not prompt-engineered:

  • R1 — patient identity bound to SHARP context (@requires_sharp decorator + ASGI middleware)
  • R2 — no PHI in plaintext logs (structlog redaction processor)
  • R3 — drug data only from authoritative APIs (ToolResult.check_succeeded=false on every upstream failure mode)
  • R4 — every drug claim cites authority (deterministic RxCUI → MedlinePlus URL resolver)
  • R5safety.status="hold" blocks the daily plan (Pydantic model_validator raises ValidationError)

Challenges we ran into

1. The SHARP protocol we assumed wasn't the SHARP protocol that existed. We initially built around a signed-JWT-in-a-tool-argument model. The actual Prompt Opinion protocol is HTTP headers gated by an MCP capability extension declaration. A meaningful mid-flight refactor — but the architecture survived because of the FhirClient Protocol and the structured ToolResult envelope.

2. FastMCP's streamable_http_path doubled when mounted. Our server was serving /mcp/mcp instead of /mcp. Fix: pass streamable_http_path="/" to the constructor.

3. Mounted Starlette sub-apps don't propagate their lifespan to the outer FastAPI app. Tool calls crashed with "Task group is not initialized." Fix: capture mcp.streamable_http_app() once, delegate its lifespan from the outer FastAPI app.

4. Synthea data doesn't match its own spec. MedicationStatement is documented but rarely emitted. MedicationRequest.intent="discharge" is never used (always order). Our PoFhirClient now falls through three progressively broader queries.

5. Prompt Opinion's FHIR server uses POST, not PUT. Our first bundle upload failed because we used PUT Patient/P123. Switched to POST with urn:uuid: cross-references so the server assigns IDs at commit time.

Accomplishments that we're proud of

  • All 5 safety rules are mechanically enforced. Each one is code that physically cannot return a value violating the rule — not a prompt asking the LLM nicely.
  • 190 tests passing, 85.26% coverage. Hit through real respx-mocked FHIR servers, not contrived stubs.
  • Zero LLM-generated drug facts in any code path. Every drug name, dose, RxCUI, interaction, and URL is sourced from RxNav, MedlinePlus, or the workspace FHIR server.
  • All 7 MCP tools work end-to-end through Prompt Opinion with the SMART scopes granted by the user via the FHIR-context capability extension.
  • A reproducible demo bundle. scripts/export_demo_fhir_bundle.py generates a FHIR R4 transaction Bundle judges can re-upload into their own workspace.
  • Twelve architecture diagrams in Mermaid, every one validated against the live renderer.

What we learned

  • Open standards work. Once we corrected our reading of SHARP-on-MCP, the rest was protocol mechanics. No glue code. Any compliant agent can call our tools tomorrow.
  • Mechanically-enforced safety beats prompt-engineered safety. A Pydantic validator that refuses to construct an invalid object is auditable. A prompt that asks an LLM not to do something is not.
  • LLMs are great orchestrators, terrible knowledge stores. Treat them as planners over deterministic tools and you get reliable behaviour. Treat them as drug encyclopaedias and you get readmissions.
  • The trace is the product. Showing users (and judges) which tool fired with which arguments is the difference between "trust me" and "you can verify."

Built With

  • a2a-protocol
  • fastapi
  • fhir-r4
  • google-gemini
  • httpx
  • mcp
  • medlineplus
  • mypy
  • ngrok
  • prompt-opinion
  • pydantic
  • pyjwt
  • pytest
  • python
  • ruff
  • rxnav
  • sharp-on-mcp
  • smart-on-fhir
  • structlog
  • tenacity
  • uv
Share this project:

Updates