Medical errors kill 250,000 Americans every year — the third leading cause of death. Adverse drug events send 1.3 million people to the ER annually. And with 40% of elderly patients on five or more
medications, clinicians face an overwhelming cognitive burden at the point of care.

We saw most AI healthcare submissions doing the same thing: dump FHIR data into an LLM and hope it doesn't hallucinate a fake drug interaction or invent a stroke risk score. That's not acceptable in clinical
decision support. A physician needs to audit every number.

So we built the opposite.

## What it does

Clinical Decision Support MCP Server exposes 9 production-grade clinical tools via the Model Context Protocol, working on top of any FHIR R4 endpoint:

  1. generate_patient_summary — Aggregates demographics, conditions, medications, labs, allergies, and encounters into a clinician-ready narrative.
  2. calculate_risk_scores — CHA2DS2-VASc (stroke risk), HEART (chest pain), MELD-Na (liver severity) — computed from FHIR data using published, peer-reviewed formulas.
  3. check_drug_interactions — AI pharmacist analyzing polypharmacy with severity, mechanism, and clinical recommendations.
  4. check_contraindications — Prescribing safety cross-referencing conditions, allergies, current meds, and renal/hepatic function with PASS / CAUTION / CONTRAINDICATED verdicts.
  5. interpret_lab_results — LOINC-keyed reference range flagging plus AI clinical interpretation.
  6. suggest_care_plan — Evidence-based recommendations citing AHA/ACC, ADA, KDIGO, AASLD guidelines.
  7. parse_clinical_notes — NLP extraction of diagnoses, medications, procedures, vitals from unstructured documents.
  8. FindPatientId / GetPatientAge — Foundational lookup utilities.

## How we built it

The core architecture is hybrid AI — and it's the key differentiator:

  • Deterministic Layer: For every clinical score, we use the exact published formula. CHA2DS2-VASc uses Lip et al. (Chest 2010). MELD-Na uses the validated logarithmic formula from Kim et al. (Hepatology
    2008). These never hallucinate.
  • AI Layer (Claude): Adds contextual interpretation, identifies clinically relevant findings, and produces clinician-ready narratives — on top of verified data, not in place of it.

Stack:

  • TypeScript + Express + @modelcontextprotocol/sdk on Node.js 20 (Alpine Docker)
  • Anthropic Claude (Haiku) for AI interpretation with exponential-backoff retry
  • FHIR R4 via Axios with parallel resource fetching (Promise.allSettled for graceful degradation)
  • SHARP Extension headers (x-fhir-server-url, x-fhir-access-token, x-patient-id) — patient ID is preferred from headers to prevent LLM hallucination of fake IDs
  • Deployed on Render with self-ping keep-alive
  • Published on the Prompt Opinion Marketplace

The platform is the spiritual successor to CDS Hooks — same FHIR ecosystem that Josh Mandel brought to every certified EHR in America, now extended to the agentic AI era through MCP.

## Challenges we ran into

  1. Patient ID hallucination: Early LLMs would invent FHIR patient IDs. We fixed this by making FhirDataService.getPatientId() strictly prefer the SHARP header over any tool argument.
  2. Render cold starts: Free tier spins down after 15 min (50s spin-up). We added a self-ping via RENDER_EXTERNAL_URL every 4 minutes — critical for judging windows.
  3. Polypharmacy deduplication: Patients can have the same drug in both MedicationRequest and MedicationStatement. We dedupe by RxNorm code before passing to the interaction checker.
  4. Graceful degradation: If a FHIR resource fetch fails, we don't fail the whole tool — Promise.allSettled lets us return partial summaries with clear "not available" markers.

## Accomplishments we're proud of

  • 49 automated tests — 100% passing — covering every deterministic clinical calculation
  • Zero vendor lock-in — works with any FHIR R4 endpoint (Epic, Cerner, HAPI, open-source). A rural community health center gets the same clinical intelligence as Cleveland Clinic.
  • Audit-grade reasoning — every risk score has a point-by-point breakdown a physician can verify
  • Production-ready — Dockerized, health monitoring, parallel queries, exponential backoff retry, clinical disclaimers on every response
  • Synthetic data only — no PHI ever processed; demo uses HAPI FHIR sandbox + Synthea-generated patients

## What we learned

  • Determinism is non-negotiable for clinical scoring. LLMs can interpret, but they cannot be trusted to calculate. The hybrid pattern is the only responsible architecture.
  • MCP is the right primitive for clinical interoperability. Tool definitions map cleanly to clinical workflows.
  • FHIR-native > vendor-specific. Every EHR has a FHIR R4 endpoint now. Building on FHIR means we're EHR-portable from day one.
  • AI in healthcare needs guardrails baked into the architecture, not bolted on. Disclaimers, source attribution, and deterministic verification have to be in the response shape itself.

## What's next

  • Pediatric-specific CDS — weight-based dosing, age-adjusted reference ranges
  • Integration with pharmacological databases for fully deterministic drug interaction checking (no LLM in the loop for high-risk interactions)
  • PELD scoring for pediatric liver disease
  • Multi-language support for global health equity — rural community clinics anywhere can plug in
  • Per-tool-call pricing on the Prompt Opinion Marketplace ($0.01–$0.05/call) for sustainable scaling

Built on open standards — because clinical reasoning should be both deterministic and intelligent.

Built With

  • 14.
  • anthropic-claude
  • axios
  • clinical-decision-support
  • docker
  • express.js
  • fhir-r4
  • hapi-fhir
  • healthcare-ai
  • model-context-protocol
  • node.js
  • render
  • sharp-extensions
  • smart-on-fhir
  • synthea
  • typescript
Share this project:

Updates