AlarmIQ

marketplace listing
triage response in chat

Inspiration

In the ICU, a single bedside monitor can fire 350+ alarms a day. Studies show 72-99% are clinically insignificant — false positives from probe motion, brief vital fluctuations, or expected drug effects. Nurses learn to ignore them. Patients have died because of it. We wanted to build something that gives every alarm a defensible second opinion before a human even looks at it.

What it does

AlarmIQ is a SHARP-on-MCP server that triages ICU monitor alarms in a single tool call. Given a triggered alarm, it:

Fetches the patient's last 4 hours of FHIR vital observations
Pulls active medications and recent administrations (enriched with known vital-sign effects — opioids → SpO2↓, beta-blockers → HR↓, etc.)
Computes alarm-recurrence statistics across the window
Returns a structured decision: URGENT | MONITOR | LIKELY_ARTIFACT plus a cited rationale, recommended action, escalation flag, and confidence score

A second tool, suggest_threshold_profile, recommends patient-specific alarm thresholds — gated on prior LIKELY_ARTIFACT history and clamped against hard physiological floors in code, not in the prompt.

How we built it

Server: Python 3.11, FastAPI, MCP Streamable HTTP transport via the official Anthropic mcp SDK
Standards: SHARP-on-MCP FHIR Context Extension (ai.promptopinion/fhir-context) declared in the initialize response with five SMART-on-FHIR scopes
FHIR: Calls back to Po's workspace FHIR server using injected X-FHIR-Server-URL, X-FHIR-Access-Token, and X-Patient-ID headers
Reasoning: Vendor-neutral LLM client supporting GitHub Models (GPT-4o-mini), Google Gemini, and Anthropic Claude — swappable via env var
Safety: Code-enforced physiological floors and 10% max-delta clamp on threshold suggestions, independent of LLM output
Eval suite: 15 hand-labeled ICU cases (6 MONITOR, 5 URGENT, 4 LIKELY_ARTIFACT) running through the real LLM via pytest. No mocking — the reasoning quality is the product.
Deploy: Render (free tier), GitHub auto-deploy on push to main

Accomplishments that we're proud of

100% safety precision and 100% exact-tier accuracy across the 15-case eval suite. Zero URGENT alarms were ever downgraded to MONITOR or LIKELY_ARTIFACT — the single clinical metric that matters.
~1,260 input tokens per triage decision. Single LLM call, structured JSON output, 4–5 second latency.
Scope discipline. Four tools, not eighteen. Every tool earns its place.
Real published standard. SHARP-on-MCP is a public spec at sharponmcp.com, not vendor lock-in.

Challenges we ran into

Free-tier LLM quotas. Gemini 2.5 Flash-Lite caps at 20 requests/day. Solved by adding a third LLM backend (GitHub Models / GPT-4o-mini) with independent quota.
8K input cap on GitHub Models. Solved by downsampling vital-trend observations to ~30 points for LLM-facing responses while keeping the full series available for internal statistical computation.
Synthea bundles too sparse for ICU demos. We authored our own three-patient FHIR bundles with minute-granularity vital observations and explicit medication administration timestamps.
Distinguishing nuisance alarms from real ones. First prompt iteration over-escalated brief threshold dips. Adding alarm-recurrence statistics (event count, median duration) was the unlock.

What we learned

Reasoning quality lives or dies on the input shape, not just the prompt. Adding structured recurrence statistics changed Scenario 2 from MONITOR (cautious miss) to LIKELY_ARTIFACT (correct).
A labeled eval suite that exercises the real LLM is a project's single most valuable artifact. Mocked tests pass forever and prove nothing.
SHARP-on-MCP makes "drop into any FHIR-aware agent" a real claim, not a marketing one. The headers do all the work.

What's next for AlarmIQ

Real-time alarm feed integration (HL7 v2 or SDC) instead of one-shot triage
Expanded eval suite to 50+ cases including pediatric and OB scenarios
Threshold profile auto-application after physician approval, with full audit trail
Multi-patient cohort view for charge nurses

Built With

claude
clinical-decision-support
docker
fastapi
fhir
gemini
gpt-4o-mini
healthcare
mcp
pytest
python
render
sharp-on-mcp

Updates

Janindu Manjuka started this project — Apr 30, 2026 05:32 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.