Medical errors kill 250,000 Americans every year — the third leading cause of death. Adverse drug events send 1.3 million people to the ER annually. And with 40% of elderly patients on five or more
medications, clinicians face an overwhelming cognitive burden at the point of care.
We saw most AI healthcare submissions doing the same thing: dump FHIR data into an LLM and hope it doesn't hallucinate a fake drug interaction or invent a stroke risk score. That's not acceptable in clinical
decision support. A physician needs to audit every number.
So we built the opposite.
## What it does
Clinical Decision Support MCP Server exposes 9 production-grade clinical tools via the Model Context Protocol, working on top of any FHIR R4 endpoint:
generate_patient_summary— Aggregates demographics, conditions, medications, labs, allergies, and encounters into a clinician-ready narrative.calculate_risk_scores— CHA2DS2-VASc (stroke risk), HEART (chest pain), MELD-Na (liver severity) — computed from FHIR data using published, peer-reviewed formulas.check_drug_interactions— AI pharmacist analyzing polypharmacy with severity, mechanism, and clinical recommendations.check_contraindications— Prescribing safety cross-referencing conditions, allergies, current meds, and renal/hepatic function with PASS / CAUTION / CONTRAINDICATED verdicts.interpret_lab_results— LOINC-keyed reference range flagging plus AI clinical interpretation.suggest_care_plan— Evidence-based recommendations citing AHA/ACC, ADA, KDIGO, AASLD guidelines.parse_clinical_notes— NLP extraction of diagnoses, medications, procedures, vitals from unstructured documents.FindPatientId/GetPatientAge— Foundational lookup utilities.
## How we built it
The core architecture is hybrid AI — and it's the key differentiator:
- Deterministic Layer: For every clinical score, we use the exact published formula. CHA2DS2-VASc uses Lip et al. (Chest 2010). MELD-Na uses the validated logarithmic formula from Kim et al. (Hepatology
2008). These never hallucinate. - AI Layer (Claude): Adds contextual interpretation, identifies clinically relevant findings, and produces clinician-ready narratives — on top of verified data, not in place of it.
Stack:
- TypeScript + Express +
@modelcontextprotocol/sdkon Node.js 20 (Alpine Docker) - Anthropic Claude (Haiku) for AI interpretation with exponential-backoff retry
- FHIR R4 via Axios with parallel resource fetching (
Promise.allSettledfor graceful degradation) - SHARP Extension headers (
x-fhir-server-url,x-fhir-access-token,x-patient-id) — patient ID is preferred from headers to prevent LLM hallucination of fake IDs - Deployed on Render with self-ping keep-alive
- Published on the Prompt Opinion Marketplace
The platform is the spiritual successor to CDS Hooks — same FHIR ecosystem that Josh Mandel brought to every certified EHR in America, now extended to the agentic AI era through MCP.
## Challenges we ran into
- Patient ID hallucination: Early LLMs would invent FHIR patient IDs. We fixed this by making
FhirDataService.getPatientId()strictly prefer the SHARP header over any tool argument. - Render cold starts: Free tier spins down after 15 min (50s spin-up). We added a self-ping via
RENDER_EXTERNAL_URLevery 4 minutes — critical for judging windows. - Polypharmacy deduplication: Patients can have the same drug in both
MedicationRequestandMedicationStatement. We dedupe by RxNorm code before passing to the interaction checker. - Graceful degradation: If a FHIR resource fetch fails, we don't fail the whole tool —
Promise.allSettledlets us return partial summaries with clear "not available" markers.
## Accomplishments we're proud of
- 49 automated tests — 100% passing — covering every deterministic clinical calculation
- Zero vendor lock-in — works with any FHIR R4 endpoint (Epic, Cerner, HAPI, open-source). A rural community health center gets the same clinical intelligence as Cleveland Clinic.
- Audit-grade reasoning — every risk score has a point-by-point breakdown a physician can verify
- Production-ready — Dockerized, health monitoring, parallel queries, exponential backoff retry, clinical disclaimers on every response
- Synthetic data only — no PHI ever processed; demo uses HAPI FHIR sandbox + Synthea-generated patients
## What we learned
- Determinism is non-negotiable for clinical scoring. LLMs can interpret, but they cannot be trusted to calculate. The hybrid pattern is the only responsible architecture.
- MCP is the right primitive for clinical interoperability. Tool definitions map cleanly to clinical workflows.
- FHIR-native > vendor-specific. Every EHR has a FHIR R4 endpoint now. Building on FHIR means we're EHR-portable from day one.
- AI in healthcare needs guardrails baked into the architecture, not bolted on. Disclaimers, source attribution, and deterministic verification have to be in the response shape itself.
## What's next
- Pediatric-specific CDS — weight-based dosing, age-adjusted reference ranges
- Integration with pharmacological databases for fully deterministic drug interaction checking (no LLM in the loop for high-risk interactions)
- PELD scoring for pediatric liver disease
- Multi-language support for global health equity — rural community clinics anywhere can plug in
- Per-tool-call pricing on the Prompt Opinion Marketplace ($0.01–$0.05/call) for sustainable scaling
Built on open standards — because clinical reasoning should be both deterministic and intelligent.
Built With
- 14.
- anthropic-claude
- axios
- clinical-decision-support
- docker
- express.js
- fhir-r4
- hapi-fhir
- healthcare-ai
- model-context-protocol
- node.js
- render
- sharp-extensions
- smart-on-fhir
- synthea
- typescript
Log in or sign up for Devpost to join the conversation.