Adverse Event Investigator

Inspiration

Adverse event reporting is one of the most under-resourced workflows in clinical pharmacy. A pharmacist who suspects a drug reaction must manually cross-reference the patient chart, score a causality questionnaire, apply FDA seriousness criteria, and transcribe everything into a multi-page MedWatch form — a process that can take hours and is error-prone enough that the vast majority of reportable ADEs never get filed. We wanted to see how much of that burden a conversational AI agent could take on, without sacrificing clinical rigor.

What it does

The AE Investigator is a conversational pharmacovigilance assistant that takes a clinical pharmacist from patient FHIR data to a submission-ready FDA MedWatch Form 3500 — entirely in chat.

The agent operates in three modes:

Investigation — pulls patient demographics, medication history, past medical history, and family history directly from the FHIR record
Reasoning — scores Naranjo causality deterministically from structured data, flags red flags, compares against known adverse events in the FDA label, and surfaces what still needs clinician input
Drafting — generates a complete, field-accurate MedWatch Form 3500 as a live HTML render, incorporating all investigation findings and any clinician overrides

Naranjo questions that cannot be answered from the data are explicitly marked unknown rather than guessed. When a clinician provides a missing answer — such as confirming the reaction recurred on re-challenge — the agent redrafts the report instantly with the updated score.

How we built it

MCP server (Python + FastMCP, hosted on Railway) — 14 tools organized across investigate, reason, and draft namespaces. Patient data is fetched live from the Prompt Opinion FHIR proxy. Naranjo scoring and FDA seriousness determination are deterministic Python algorithms — no LLM calls.
BYO agent (Prompt Opinion) — system-prompted AE Investigator with the MCP server attached. Handles the conversational layer and tool orchestration.
Demo FHIR bundles — three hand-authored synthetic patient cases (ibuprofen-GI bleed, Daytrana-leukoderma, azathioprine-hepatotoxicity) uploaded to the PO FHIR server, each with realistic lab progressions, medication timelines, and family history.
openFDA + NLM RxNorm — integrated for known AE lookups and drug normalization.

Challenges we ran into

Getting Naranjo scoring right without any hallucination required careful separation of concerns: each question is either answerable from structured FHIR data or explicitly deferred to clinician override — there is no middle ground. Designing that boundary, and enforcing it consistently across tool calls, was the hardest design problem.

The PO platform's FHIR proxy also has non-obvious constraints — bundles require RFC 4122 UUIDs in fullUrl fields, and token scopes in consult-flow differ from workspace scopes in ways that affect which FHIR resources are accessible.

Accomplishments that we're proud of

The Naranjo scoring is fully deterministic: given the same patient data and the same clinician inputs, the score is always the same. That's a meaningful bar for a clinical tool. The agent also surfaces why each question was answered the way it was — not just the score, but the reasoning — which is what a pharmacist reviewing the output actually needs.

The MedWatch Form 3500 output maps to the canonical September 2025 FDA form fields, rendered as live HTML from structured data — not a template fill.

What we learned

LLMs are excellent at orchestration and natural language interaction, but clinical algorithms should not be LLM calls. The combination — LLM for conversation, deterministic Python for scoring — is the right architecture for a tool that a clinician needs to trust.

Session state across multi-turn tool calls is also more nuanced than it looks on paper. Clinician overrides need to persist across the investigation; FHIR context needs to be scoped to a single patient; and the causality assessment needs to see both the structured data and the clinician's answers in the same call.

What's next for Adverse Event Investigator

External A2A agent — migrate the AE Investigator out of Prompt Opinion's BYO shell into a standalone agent service, making it invocable from any A2A-compatible platform, not just PO
Supabase session & multi-tenancy — externalize in-process session state to Supabase for persistence, audit logging, and support for multiple concurrent users across institutions
Signal Detection module — PRR/ROR disproportionality analysis against FAERS, completing the full Pharmacovigilance Suite: investigation → causality → report drafting → signal detection
ICH E2B(R3) XML export — structured electronic submission alongside the MedWatch HTML form

Built With

fastmcp
fhir-r4
openfda
prompt-opinion
python
railway
rxnorm

Submitted to

Agents Assemble - The Healthcare AI Endgame

Created by

I built the AE Investigator end-to-end — the MCP server (14 tools in Python + FastMCP covering FHIR investigation, Naranjo causality scoring, FDA seriousness determination, and MedWatch Form 3500 generation), the BYO agent system prompt and tool orchestration on Prompt Opinion, three synthetic FHIR R4 patient demo bundles, and the Railway deployment. The Naranjo algorithm is fully deterministic Python — no LLM calls for clinical scoring.

Aditya Mittal
Building production AI systems — RAG pipelines, fine-tuned models, agent orchestration, and eval harnesses that ship.

Updates

Aditya Mittal started this project — May 04, 2026 09:31 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.