Inspiration

In the ICU, a single bedside monitor can fire 350+ alarms a day. Studies show 72-99% are clinically insignificant — false positives from probe motion, brief vital fluctuations, or expected drug effects. Nurses learn to ignore them. Patients have died because of it. We wanted to build something that gives every alarm a defensible second opinion before a human even looks at it.

What it does

AlarmIQ is a SHARP-on-MCP server that triages ICU monitor alarms in a single tool call. Given a triggered alarm, it:

  1. Fetches the patient's last 4 hours of FHIR vital observations
  2. Pulls active medications and recent administrations (enriched with known vital-sign effects — opioids → SpO2↓, beta-blockers → HR↓, etc.)
  3. Computes alarm-recurrence statistics across the window
  4. Returns a structured decision: URGENT | MONITOR | LIKELY_ARTIFACT plus a cited rationale, recommended action, escalation flag, and confidence score

A second tool, suggest_threshold_profile, recommends patient-specific alarm thresholds — gated on prior LIKELY_ARTIFACT history and clamped against hard physiological floors in code, not in the prompt.

How we built it

  • Server: Python 3.11, FastAPI, MCP Streamable HTTP transport via the official Anthropic mcp SDK
  • Standards: SHARP-on-MCP FHIR Context Extension (ai.promptopinion/fhir-context) declared in the initialize response with five SMART-on-FHIR scopes
  • FHIR: Calls back to Po's workspace FHIR server using injected X-FHIR-Server-URL, X-FHIR-Access-Token, and X-Patient-ID headers
  • Reasoning: Vendor-neutral LLM client supporting GitHub Models (GPT-4o-mini), Google Gemini, and Anthropic Claude — swappable via env var
  • Safety: Code-enforced physiological floors and 10% max-delta clamp on threshold suggestions, independent of LLM output
  • Eval suite: 15 hand-labeled ICU cases (6 MONITOR, 5 URGENT, 4 LIKELY_ARTIFACT) running through the real LLM via pytest. No mocking — the reasoning quality is the product.
  • Deploy: Render (free tier), GitHub auto-deploy on push to main

Accomplishments that we're proud of

  • 100% safety precision and 100% exact-tier accuracy across the 15-case eval suite. Zero URGENT alarms were ever downgraded to MONITOR or LIKELY_ARTIFACT — the single clinical metric that matters.
  • ~1,260 input tokens per triage decision. Single LLM call, structured JSON output, 4–5 second latency.
  • Scope discipline. Four tools, not eighteen. Every tool earns its place.
  • Real published standard. SHARP-on-MCP is a public spec at sharponmcp.com, not vendor lock-in.

Challenges we ran into

  • Free-tier LLM quotas. Gemini 2.5 Flash-Lite caps at 20 requests/day. Solved by adding a third LLM backend (GitHub Models / GPT-4o-mini) with independent quota.
  • 8K input cap on GitHub Models. Solved by downsampling vital-trend observations to ~30 points for LLM-facing responses while keeping the full series available for internal statistical computation.
  • Synthea bundles too sparse for ICU demos. We authored our own three-patient FHIR bundles with minute-granularity vital observations and explicit medication administration timestamps.
  • Distinguishing nuisance alarms from real ones. First prompt iteration over-escalated brief threshold dips. Adding alarm-recurrence statistics (event count, median duration) was the unlock.

What we learned

  • Reasoning quality lives or dies on the input shape, not just the prompt. Adding structured recurrence statistics changed Scenario 2 from MONITOR (cautious miss) to LIKELY_ARTIFACT (correct).
  • A labeled eval suite that exercises the real LLM is a project's single most valuable artifact. Mocked tests pass forever and prove nothing.
  • SHARP-on-MCP makes "drop into any FHIR-aware agent" a real claim, not a marketing one. The headers do all the work.

What's next for AlarmIQ

  • Real-time alarm feed integration (HL7 v2 or SDC) instead of one-shot triage
  • Expanded eval suite to 50+ cases including pediatric and OB scenarios
  • Threshold profile auto-application after physician approval, with full audit trail
  • Multi-patient cohort view for charge nurses

Built With

Share this project:

Updates