Adverse drug events kill around 125,000 americans a year and more than half are preventable.

$$ \text{preventable ADE deaths/year} \geq 0.5 \times 125{,}000 = 62{,}500 $$

thats a 747 going down every 4 days. and the failure pattern is the same one again and again. a patient is on a common chronic med. their PCP adds a routine antibiotic for a sinus infection. nobody catches that the antibiotic is a strong CYP3A4 inhibitor, the chronic med levels triple, and the patient ends up in the ER with rhabdomyolysis or a torsades arrest. the info needed to catch it isnt rare. its just split across 5 different references that nobody pulls all 5 of. interaction database in one place, allergy list in another, eGFR in a third, QT list in a fourth, FDA label in a fifth.

AI agents in clinical workflows are the right place to enforce all 5 checks every time. but only if every new agent doesnt have to rebuild clinical safety from scratch.

What it does

one MCP server with 8 tools any healthcare agent can pick up. drug interactions, allergy cross reactivity, renal dose adjustment, QT prolongation risk, FDA labels, pregnancy and lactation, RxNorm normalization, and live patient med pull from FHIR. the pre-prescribe agent chains them into one brief with a verdict: safe, monitor, consider alternative, or avoid.

How we built it

python 3.13, fastmcp 3.x, httpx for FHIR, RxNav, openFDA. dockerized, deployed on render. the novel part is SHARP on MCP. the server advertises a ai.promptopinion/fhir-context capability extension on every initialize. the host platform sees it and forwards 3 headers on every tool call: x-fhir-server-url, x-fhir-access-token, x-patient-id. tools read them and call the FHIR server as the user. server stays stateless. cant impersonate anyone. portable to any host that speaks SHARP on MCP.

severity is a strict order so the agent surfaces the worst finding first:

$$ \text{contraindicated} \succ \text{major} \succ \text{moderate} \succ \text{low} $$

QT tool corrects QT for rate using Bazett:

$$ QTc = \dfrac{QT}{\sqrt{RR}} $$

and uses sex adjusted cutoffs ($QTc > 450$ ms for men, $QTc > 460$ ms for women) before warning that a Known TdP drug is being added on top.

Challenges we ran into

  • MCP streamable HTTP is opinionated. wrong Accept header gets a JSON RPC error. follow up calls without Mcp-Session-Id from the initialize response get rejected. burned an hour on a probe script that wasnt carrying session through.
  • fastmcp 2.x vs 3.x. the capability monkey patch misbehaves across the major version. not a runtime error. just a subtly malformed initialize payload.
  • PUT as create is not portable. prompt opinions FHIR rejected my idempotent seed script with "updateCreate operation is not supported". switched to POST + urn:uuid cross refs in the bundle. server assigns ids, resolves refs inside the transaction.
  • urn:uuid is validated strictly. my friendly string urn:uuid:Patient-medsafety-demo got rejected. RFC 4122 wants a real UUID. switched to uuid.uuid5() with a fixed namespace so reruns are stable.
  • honest data sourcing. curated DDI / renal / QT tables are labeled as a demo set, sourced from FDA labels, KDIGO, and the CredibleMeds public list. tool signatures dont change when a hospital plugs in First Databank or Lexicomp. judges should trust the architecture, not the dataset.

Accomplishments that we're proud of

  • end to end MCP + A2A + FHIR + SHARP loop works on the live deployed server. confirmed with a probe that pulls 3 active meds for the demo patient via the SHARP headers.
    • the clarithromycin demo flips to avoid on a real contraindication (CYP3A4 + simvastatin) plus a real QT risk (already on amiodarone). the moment that wins the demo is when the verdict is a clean "no", not a wishy washy "be careful".
    • 8 tools shipped polished and resisted the temptation to add 8 more.

What we learned

build the hammer once. every agent picks it up. caught myself thinking about MedSafety as an agent. its not. its a hammer. tools belong in MCPs. judgment belongs in agents. once that clicked, every API decision got easier.

also: the MCP transport, the FHIR upload shape, the urn:uuid format, and the fastmcp version are all things that look like details until they silently break the demo. the difference between this submission and a half working one is just being stubborn about each of those in turn.

What's next for MedSafety

swap the curated tables for a licensed reference behind an env var. add propose_alternative. given a flagged drug and a reason, return a same indication candidate and rerun the panel automatically. clarithromycin flagged means suggest azithromycin without the human asking. then push the same SHARP on MCP pattern to a second domain: a LabSafety MCP for critical value follow up. prove one prompt opinion agent can pick up multiple MCPs and still produce one coherent brief.

Built With

Share this project:

Updates