## Inspiration
Medication errors cause an estimated 7,000 deaths a year in the US and around $3.5 B in preventable costs. Joint Commission's NPSG.03.06.01 mandates medication reconciliation at every care transition — admission, transfer, discharge — and yet most hospitals still do it on paper, in 15 minutes, by an exhausted intern at 3 AM.
When Agents Assemble launched, we noticed the median submission was going to be a thin "summarize this patient" wrapper. We wanted to ship something a clinician would actually use — a tool that does the boring, dangerous, regulation-driven work that LLM demos love to skip. That's medication reconciliation.
## What it does
Argus is six composable MCP tools that any healthcare agent on PromptOpinion can pick up:
get_active_medications— RxNorm-normalized, deduplicated medication list. The foundation everything else depends on.check_drug_interactions— Patient-context-aware DDI analysis with SHAP-explained severity from a trained XGBoost model. Same drug pair scores differently based on age, eGFR, INR, QTc, polypharmacy count.renal_dose_check— eGFR via CKD-EPI 2021, FDA + KDIGO 2022 dose-adjustment recommendations.reconcile_home_vs_hospital— Home vs. encounter discrepancy detection with LLM-classified intentionality (intentional vs. needs-review).generate_med_rec_note— Composes a clinician-ready Markdown SOAP note with inline FHIR resource citations post-validated against the source data.screen_high_risk_patterns— Beers 2023, QTc-prolonging combos, opioid + benzo, anticholinergic burden, adherence gaps, pregnancy risk, serotonin syndrome.
Every output carries traceable FHIR citations and a clinician-in-loop disclaimer. We also built an
A2A agent layer that orchestrates these tools into clinician-level workflows:
run_admission_med_rec, run_discharge_med_rec, evaluate_new_prescription,
explain_medication_concern.
## How we built it
Stack: Python 3.11 · FastMCP 3.2 · FHIR R4 · RxNav · Gemini 2.5 Flash-Lite · XGBoost · SHAP
The architecture is deliberately stateless — every tool call fetches fresh FHIR data via the SHARP-on-MCP-propagated session token, runs in isolation, and returns a structured response with citations. Six tools rather than one fat endpoint, so any agent can compose them in whatever order their workflow needs.
The AI factor sits in three places:
- Contextual DDI severity (XGBoost regressor with 11 patient-specific features → SHAP-explained per-prediction breakdowns).
- Intentionality classification for reconciliation discrepancies (LLM with structured JSON-schema output, batched into one call to stay under free-tier quotas).
- Citation-validated note generation (LLM composes prose, regex-validated against the actual FHIR resource IDs we passed in — no hallucinated references).
We seeded a 305-row SQLite reference KB from open public sources: AGS Beers 2023, CredibleMeds QTc list, FDA renal-dosing labels, an anticholinergic burden scale, the Pregnancy & Lactation Labeling Rule categories, and a curated DDI list. RxCUI ingredient resolution goes through RxNav with a 7-day persistent cache to keep things fast.
For data we used Synthea — generated a 100-patient elderly cohort, ranked them by polypharmacy + lab availability, and used an 84-year-old with 18 active medications and CKD 3b as the demo patient (real eGFR 38.4, real warfarin × aspirin critical interaction, real Beers flags on metoprolol and verapamil). Zero real PHI ever touches the system.
Deploy: Dockerized, hosted on Render free tier (with Fly.io and the Argus repo as fallback options).
## Challenges we ran into
- SHARP-on-MCP capability declaration. The hackathon dropped, the spec at sharponmcp.com was
clear, but PromptOpinion's actual implementation expects the capability under
capabilities.extensions["ai.promptopinion/fhir-context"]with a SMART-scope array — not the spec'scapabilities.experimental.fhir_context_required. We advertise both for forward compatibility. - httpx URL-encoding the
+in RxNav'stty=IN+PIN+SCD+SBD. Took us hours to notice — the call returned 200 OK but with non-JSON, every normalization silently failed, and 18 of 19 medications came back unresolved in the demo. - Synthea bundle ingestion. Synthea emits transaction bundles with conditional Practitioner /
Organization references that PromptOpinion's FHIR server rejects. Wrote a sanitizer that walks the
bundle, replaces every conditional reference with an inline stub, and converts the bundle to a
batchtype. - Free-tier LLM quota. Hit Gemini's daily request cap during testing. Switched to
gemini-2.5-flash-lite(4× quota), batched the per-discrepancy reconciliation classifications into one LLM call (18 → 1), and added 429 retry-with-backoff. - Render free-tier cold starts. Containers suspend after 15 min idle; first request times out.
Fixed by dropping the
/mcpHTTP healthcheck (FastMCP only serves POST), letting Render fall back to a TCP port check.
## Accomplishments we're proud of
- 51 tests passing, clean lint, GitHub Actions CI green.
- Real ML, not LLM-only: the DDI severity model is a trained XGBoost regressor with R² 0.98 on synthetic labels and per-prediction SHAP explanations.
- Citation discipline: every clinical claim in every output traces to a specific FHIR resource ID. The note generator post-validates citations and refuses to emit unsourced claims.
- Two substantively-different submissions from one codebase: the raw MCP and the A2A agent that orchestrates it.
## What we learned
- The interesting part of building healthcare AI is the integration plumbing, not the LLM call. SHARP context propagation, FHIR resource hydration, RxNorm code resolution, citation validation — that's where 80% of the bugs live and 100% of the trust comes from.
- LLMs degrade gracefully when you have a real deterministic backbone. Every Argus tool falls back to rule-based output when the LLM is unavailable. The structure of the response never changes.
- Data quality is a feature. Every output carries
coverage_score,data_qualityflags, and explicitmissing_datareasons — clinicians don't trust opaque AI, but they do trust AI that admits what it doesn't know.
## What's next for Argus
- Real outcome labels for the DDI severity model (currently synthetic).
- More guideline coverage — STOPP/START for deprescribing, hepatic-impairment dose adjustments, pediatric weight-based dosing.
- CDS Hooks compatibility so the same tools can fire from EHR write-events, not just on agent request.
- Multilingual patient handouts wired into the discharge skill.
Built for the Agents Assemble hackathon by PromptOpinion + Darena Health.
Log in or sign up for Devpost to join the conversation.