## Inspiration

Medication errors cause an estimated 7,000 deaths a year in the US and around $3.5 B in preventable costs. Joint Commission's NPSG.03.06.01 mandates medication reconciliation at every care transition — admission, transfer, discharge — and yet most hospitals still do it on paper, in 15 minutes, by an exhausted intern at 3 AM.

When Agents Assemble launched, we noticed the median submission was going to be a thin "summarize this patient" wrapper. We wanted to ship something a clinician would actually use — a tool that does the boring, dangerous, regulation-driven work that LLM demos love to skip. That's medication reconciliation.

## What it does

Argus is six composable MCP tools that any healthcare agent on PromptOpinion can pick up:

  1. get_active_medications — RxNorm-normalized, deduplicated medication list. The foundation everything else depends on.
  2. check_drug_interactions — Patient-context-aware DDI analysis with SHAP-explained severity from a trained XGBoost model. Same drug pair scores differently based on age, eGFR, INR, QTc, polypharmacy count.
  3. renal_dose_check — eGFR via CKD-EPI 2021, FDA + KDIGO 2022 dose-adjustment recommendations.
  4. reconcile_home_vs_hospital — Home vs. encounter discrepancy detection with LLM-classified intentionality (intentional vs. needs-review).
  5. generate_med_rec_note — Composes a clinician-ready Markdown SOAP note with inline FHIR resource citations post-validated against the source data.
  6. screen_high_risk_patterns — Beers 2023, QTc-prolonging combos, opioid + benzo, anticholinergic burden, adherence gaps, pregnancy risk, serotonin syndrome.

Every output carries traceable FHIR citations and a clinician-in-loop disclaimer. We also built an A2A agent layer that orchestrates these tools into clinician-level workflows: run_admission_med_rec, run_discharge_med_rec, evaluate_new_prescription, explain_medication_concern.

## How we built it

Stack: Python 3.11 · FastMCP 3.2 · FHIR R4 · RxNav · Gemini 2.5 Flash-Lite · XGBoost · SHAP

The architecture is deliberately stateless — every tool call fetches fresh FHIR data via the SHARP-on-MCP-propagated session token, runs in isolation, and returns a structured response with citations. Six tools rather than one fat endpoint, so any agent can compose them in whatever order their workflow needs.

The AI factor sits in three places:

  • Contextual DDI severity (XGBoost regressor with 11 patient-specific features → SHAP-explained per-prediction breakdowns).
  • Intentionality classification for reconciliation discrepancies (LLM with structured JSON-schema output, batched into one call to stay under free-tier quotas).
  • Citation-validated note generation (LLM composes prose, regex-validated against the actual FHIR resource IDs we passed in — no hallucinated references).

We seeded a 305-row SQLite reference KB from open public sources: AGS Beers 2023, CredibleMeds QTc list, FDA renal-dosing labels, an anticholinergic burden scale, the Pregnancy & Lactation Labeling Rule categories, and a curated DDI list. RxCUI ingredient resolution goes through RxNav with a 7-day persistent cache to keep things fast.

For data we used Synthea — generated a 100-patient elderly cohort, ranked them by polypharmacy + lab availability, and used an 84-year-old with 18 active medications and CKD 3b as the demo patient (real eGFR 38.4, real warfarin × aspirin critical interaction, real Beers flags on metoprolol and verapamil). Zero real PHI ever touches the system.

Deploy: Dockerized, hosted on Render free tier (with Fly.io and the Argus repo as fallback options).

## Challenges we ran into

  • SHARP-on-MCP capability declaration. The hackathon dropped, the spec at sharponmcp.com was clear, but PromptOpinion's actual implementation expects the capability under capabilities.extensions["ai.promptopinion/fhir-context"] with a SMART-scope array — not the spec's capabilities.experimental.fhir_context_required. We advertise both for forward compatibility.
  • httpx URL-encoding the + in RxNav's tty=IN+PIN+SCD+SBD. Took us hours to notice — the call returned 200 OK but with non-JSON, every normalization silently failed, and 18 of 19 medications came back unresolved in the demo.
  • Synthea bundle ingestion. Synthea emits transaction bundles with conditional Practitioner / Organization references that PromptOpinion's FHIR server rejects. Wrote a sanitizer that walks the bundle, replaces every conditional reference with an inline stub, and converts the bundle to a batch type.
  • Free-tier LLM quota. Hit Gemini's daily request cap during testing. Switched to gemini-2.5-flash-lite (4× quota), batched the per-discrepancy reconciliation classifications into one LLM call (18 → 1), and added 429 retry-with-backoff.
  • Render free-tier cold starts. Containers suspend after 15 min idle; first request times out. Fixed by dropping the /mcp HTTP healthcheck (FastMCP only serves POST), letting Render fall back to a TCP port check.

## Accomplishments we're proud of

  • 51 tests passing, clean lint, GitHub Actions CI green.
  • Real ML, not LLM-only: the DDI severity model is a trained XGBoost regressor with R² 0.98 on synthetic labels and per-prediction SHAP explanations.
  • Citation discipline: every clinical claim in every output traces to a specific FHIR resource ID. The note generator post-validates citations and refuses to emit unsourced claims.
  • Two substantively-different submissions from one codebase: the raw MCP and the A2A agent that orchestrates it.

## What we learned

  • The interesting part of building healthcare AI is the integration plumbing, not the LLM call. SHARP context propagation, FHIR resource hydration, RxNorm code resolution, citation validation — that's where 80% of the bugs live and 100% of the trust comes from.
  • LLMs degrade gracefully when you have a real deterministic backbone. Every Argus tool falls back to rule-based output when the LLM is unavailable. The structure of the response never changes.
  • Data quality is a feature. Every output carries coverage_score, data_quality flags, and explicit missing_data reasons — clinicians don't trust opaque AI, but they do trust AI that admits what it doesn't know.

## What's next for Argus

  • Real outcome labels for the DDI severity model (currently synthetic).
  • More guideline coverage — STOPP/START for deprescribing, hepatic-impairment dose adjustments, pediatric weight-based dosing.
  • CDS Hooks compatibility so the same tools can fire from EHR write-events, not just on agent request.
  • Multilingual patient handouts wired into the discharge skill.

Built for the Agents Assemble hackathon by PromptOpinion + Darena Health.

Built With

Share this project:

Updates