Inspiration

Adverse event reporting is one of the most under-resourced workflows in clinical pharmacy. A pharmacist who suspects a drug reaction must manually cross-reference the patient chart, score a causality questionnaire, apply FDA seriousness criteria, and transcribe everything into a multi-page MedWatch form — a process that can take hours and is error-prone enough that the vast majority of reportable ADEs never get filed. We wanted to see how much of that burden a conversational AI agent could take on, without sacrificing clinical rigor.

What it does

The AE Investigator is a conversational pharmacovigilance assistant that takes a clinical pharmacist from patient FHIR data to a submission-ready FDA MedWatch Form 3500 — entirely in chat.

The agent operates in three modes:

  • Investigation — pulls patient demographics, medication history, past medical history, and family history directly from the FHIR record
  • Reasoning — scores Naranjo causality deterministically from structured data, flags red flags, compares against known adverse events in the FDA label, and surfaces what still needs clinician input
  • Drafting — generates a complete, field-accurate MedWatch Form 3500 as a live HTML render, incorporating all investigation findings and any clinician overrides

Naranjo questions that cannot be answered from the data are explicitly marked unknown rather than guessed. When a clinician provides a missing answer — such as confirming the reaction recurred on re-challenge — the agent redrafts the report instantly with the updated score.

How we built it

  • MCP server (Python + FastMCP, hosted on Railway) — 14 tools organized across investigate, reason, and draft namespaces. Patient data is fetched live from the Prompt Opinion FHIR proxy. Naranjo scoring and FDA seriousness determination are deterministic Python algorithms — no LLM calls.
  • BYO agent (Prompt Opinion) — system-prompted AE Investigator with the MCP server attached. Handles the conversational layer and tool orchestration.
  • Demo FHIR bundles — three hand-authored synthetic patient cases (ibuprofen-GI bleed, Daytrana-leukoderma, azathioprine-hepatotoxicity) uploaded to the PO FHIR server, each with realistic lab progressions, medication timelines, and family history.
  • openFDA + NLM RxNorm — integrated for known AE lookups and drug normalization.

Challenges we ran into

Getting Naranjo scoring right without any hallucination required careful separation of concerns: each question is either answerable from structured FHIR data or explicitly deferred to clinician override — there is no middle ground. Designing that boundary, and enforcing it consistently across tool calls, was the hardest design problem.

The PO platform's FHIR proxy also has non-obvious constraints — bundles require RFC 4122 UUIDs in fullUrl fields, and token scopes in consult-flow differ from workspace scopes in ways that affect which FHIR resources are accessible.

Accomplishments that we're proud of

The Naranjo scoring is fully deterministic: given the same patient data and the same clinician inputs, the score is always the same. That's a meaningful bar for a clinical tool. The agent also surfaces why each question was answered the way it was — not just the score, but the reasoning — which is what a pharmacist reviewing the output actually needs.

The MedWatch Form 3500 output maps to the canonical September 2025 FDA form fields, rendered as live HTML from structured data — not a template fill.

What we learned

LLMs are excellent at orchestration and natural language interaction, but clinical algorithms should not be LLM calls. The combination — LLM for conversation, deterministic Python for scoring — is the right architecture for a tool that a clinician needs to trust.

Session state across multi-turn tool calls is also more nuanced than it looks on paper. Clinician overrides need to persist across the investigation; FHIR context needs to be scoped to a single patient; and the causality assessment needs to see both the structured data and the clinician's answers in the same call.

What's next for Adverse Event Investigator

  • External A2A agent — migrate the AE Investigator out of Prompt Opinion's BYO shell into a standalone agent service, making it invocable from any A2A-compatible platform, not just PO
  • Supabase session & multi-tenancy — externalize in-process session state to Supabase for persistence, audit logging, and support for multiple concurrent users across institutions
  • Signal Detection module — PRR/ROR disproportionality analysis against FAERS, completing the full Pharmacovigilance Suite: investigation → causality → report drafting → signal detection
  • ICH E2B(R3) XML export — structured electronic submission alongside the MedWatch HTML form

Built With

Share this project:

Updates