-
-
Full node map — Intent, Router, O1/O2/O3 layers, Alert Scanner, Document Node, 2 Human Interrupts.
-
-
18 MCP tools. One triggers the full 69-node pipeline. 17 handle direct FHIR queries. One query does it all.
-
Build an orchestrator for kidney monitoring in o2 with 2 workers
-
Build a worker called sepsis_detector that monitors for signs of sepsis including fever, high heart rate, and elevated WBC. Put it in o2.
-
-
One question. Full chart review. NOT SAFE FOR DISCHARGE — opioid accumulation risk caught in seconds.
-
Human interrupt in action. System pauses mid-analysis to confirm a critical drug finding before proceeding.
-
Save note about this patient: Antonia has CKD stage 3b, Percocet contraindicated, nephrology consult ordered today
-
-
-
List all custom workers
-
-
43 conditions. 6 critical flags. Opioid risk, CKD, polypharmacy — all surfaced from one plain-English query.
-
Percocet contraindicated in seizure disorder. System flags it, explains why, and asks doctor to confirm.
-
Doctor confirms. System resumes — full action plan, contraindication table, lab orders, and documents generated.
-
Session persistence, graceful degradation, 7 deterministic calculators, fuzzy search, empathy wrapper. Built for production.
-
-
One query triggers a 6-stage pipeline. Re-entrant router allows mid-flight changes. Clinical math is Python — LLMs interpret only.
-
17 parallel O1 workers. All fire simultaneously via asyncio. Fetch time = slowest worker, not sum of all.
-
O2 — 12 LLM clinical specialists: drug interaction, risk scoring, dosage calc, trend analysis, differential diagnosis, prognosis.
-
O3 answers "what next?" — triage, discharge, referral, escalation. Clinical reasoning into concrete actions.
-
7 calculators, pure Python — eGFR, qSOFA, CURB-65, NEWS2, CHA2DS2-VASc, HAS-BLED, BMI. Numbers guaranteed correct. LLMs interpret only.
-
5 alerts fired for Antonia — creatinine critical, Percocet contraindicated, CKD trend rising. Runs on every query. Cannot be skipped.
-
Alert: 11 rule-based workers, zero LLM. Document: 9 specialists — discharge notes, referrals, prescriptions. O1 data. O2 meaning. O3 action.
-
Three demo patients across complexity levels. Antonia exercises every layer of the system.
MedicsHelper — Devpost Submission Story
Inspiration
A doctor has roughly 2 minutes per patient during a busy ward round. In those 2 minutes, they are expected to review a full chart, check for drug interactions, assess discharge readiness, catch critical lab values, and document everything — across dozens of patients, back to back.
Clinical AI today doesn't solve this. It either fetches data or answers questions. Nothing reasons across labs, medications, and conditions simultaneously. Nothing catches what the doctor missed and then acts on it. The tools that exist are either passive information retrievers or narrow single-purpose chatbots.
The inspiration was simple: what if a doctor could type one plain-English question and get back a response that felt like a senior colleague had just reviewed the full chart, flagged every risk, and handed over a structured plan — in under 45 seconds?
That question became MedicsHelper.
What It Does
MedicsHelper is a 69-node hierarchical agentic AI system for clinical decision support. A doctor types one question in any language. The system:
- Reads the patient's full chart via live FHIR R4 — labs, medications, conditions, vitals, allergies, imaging, appointments, procedures — fetched in parallel across up to 17 specialized data workers
- Reasons clinically across all of that data simultaneously — drug interactions, risk scoring, dosage safety, trend analysis, prognosis, complication detection — using 12 dedicated LLM-powered reasoning workers
- Decides workflow actions — discharge readiness, referrals, escalation, care gaps, follow-up scheduling — via 10 workflow workers
- Scans for critical values using an always-on alert scanner with 10 rule-based workers that run on every query, zero LLM, zero hallucination risk
- Generates clinical documents — discharge summaries, referral letters, medication reconciliation, patient-facing explanations — via 8 document workers
- Returns a response that reads like a senior colleague talking, not a data dump
- Extends itself — via the AI Builder (O0), doctors or admins can define new monitoring workers in plain English. The system writes, registers, and loads them at runtime.
Antonia example: A 48-year-old patient with epilepsy, chronic pain, and Percocet. Creatinine came back at 2.9. The doctor asked one question: "Is it safe to discharge her today?"
MedicsHelper caught that her eGFR was 24 mL/min (calculated in pure Python via Cockcroft-Gault, not LLM math), that oxycodone accumulates dangerously at that kidney function, that her current dose was double the safe maximum, and that discharge would risk respiratory depression. It returned a structured hold recommendation, adjusted dosing plan, nephrology referral timeline, and full discharge summary — in 33 seconds.
How We Built It
The Core Architecture
The system is built on LangGraph StateGraph with a shared AgentState TypedDict. Every node reads from and writes to this state. The central innovation is the XML Work Order — a shared blackboard that every node in the pipeline reads and writes to. No node talks to another directly. This keeps the system loosely coupled and fully auditable.
<work_order>
<understanding>
<intent>discharge_assessment</intent>
<urgency>high</urgency>
</understanding>
<orchestrators>
<o1 status="complete" iteration="1">
<worker name="labs" status="complete" enabled="true"/>
<worker name="medications" status="complete" enabled="true"/>
</o1>
<o2 status="pending">
<worker name="drug_interaction" status="pending" enabled="true"/>
</o2>
</orchestrators>
</work_order>
The Re-entrant Router
The most technically significant piece is the Universal Router — a re-entrant hub that is called multiple times per query. The routing decision for each orchestrator follows three conditions:
route to O_N iff needs_oN AND NOT oN_complete AND iteration < MAX
When a doctor gives a mid-execution instruction like "actually, fetch only the kidney tests from the last 48 hours", the Interrupt LLM reads the current XML, modifies it (disables most workers, sets a time filter on the labs worker, resets o1_complete = False), and the router sends execution back to O1 for a focused re-run. Each orchestrator is limited to 10 iterations to prevent infinite loops.
This is what separates MedicsHelper from a simple sequential LangGraph pipeline. It's dynamic clinical reasoning, not fixed data fetching.
The Three Orchestration Layers
O1 — Patient Data (17 workers, no LLM): Fires all selected data workers in parallel via asyncio.gather(). Five workers complete in ~4 seconds versus ~20 seconds sequentially.
O2 — Clinical Reasoning (12 workers, LLM-powered): Each worker is a focused specialist — drug interaction checker, risk scorer, dosage calculator, trend analyzer, complication detector, differential diagnosis, prognosis estimator, and more.
O3 — Workflow (10 workers, mixed): Translates clinical findings into actionable decisions — discharge readiness, escalation, triage, referral, care gap detection, follow-up scheduling.
Clinical Math — Never LLM
All clinical calculations run through deterministic Python functions in calculations.py. LLMs interpret results; they never perform the arithmetic.
| Calculator | Formula |
|---|---|
| eGFR | Cockcroft-Gault: $\text{eGFR} = \frac{(140 - \text{age}) \times \text{weight}}{72 \times \text{SCr}} \times (0.85 \text{ if female})$ |
| qSOFA | $\text{qSOFA} = [\text{RR} \geq 22] + [\text{SBP} \leq 100] + [\text{GCS} < 15]$ |
| NEWS2 | Weighted composite of 7 physiological parameters |
| CURB-65 | 5-point pneumonia severity score |
| $\text{CHA}_2\text{DS}_2\text{-VASc}$ | Stroke risk in atrial fibrillation |
| HAS-BLED | Bleeding risk assessment |
FHIR Integration
The FHIR client reads /metadata on startup and auto-discovers all resource types, search parameters, and capabilities. It has been verified against the HAPI FHIR public server — 146 resource types discovered, $everything operation confirmed, 57 patient-linked resource types mapped. Compatible with any R4-compliant server: HAPI, Epic, Cerner, Azure, AWS.
MCP Integration with Prompt Opinion
The system exposes 18 MCP tools. One primary tool triggers the full 69-node pipeline. 17 direct FHIR tools handle targeted queries. The MCP server reads SHARP context from Prompt Opinion headers — FHIR base URL, access token, patient ID — injected automatically on every call. No manual configuration per query.
Graceful Degradation
Every critical path has a fallback:
| Component | Failure | Fallback |
|---|---|---|
| Intent Node LLM | API failure | Smart keyword parser (12 intent patterns) |
| O1 worker | FHIR timeout | Others continue; error summary returned |
| O2 worker | LLM failure | Exponential backoff, then partial results |
| Document worker | Not registered | _fallback_generate() via LLM |
| Synthesis LLM | Fails | _fallback_synthesis() from alert counts |
| XML parsing | Malformed | WorkOrder._empty() skeleton |
The pipeline never fully fails. Worst case is partial results with a clear error message.
Challenges We Ran Into
The re-entrant router was the hardest problem. Making a LangGraph graph that can route back to a previously completed node without creating infinite loops required careful iteration tracking at both the state level and the XML level. The solution was to track iterations in two places — AgentState fields and XML iteration attributes — and check both before routing.
Clinical math correctness. The temptation was to let the LLM calculate eGFR and other scores since it "knows" the formulas. But LLM arithmetic is unreliable, especially at edge cases — very low creatinine, pediatric weights, extreme ages. Every calculator had to be implemented as deterministic Python and tested against known clinical references.
Worker selection without running everything. Running all 35+ workers on every query wastes time and tokens. The Intent Node had to learn to select only the workers relevant to the query type. A drug interaction check doesn't need imaging workers. A population search doesn't need O1 at all. Getting this selection right required carefully designed prompting and a fallback keyword parser that could make reasonable selections without any LLM call.
Human interrupt session persistence. LangGraph interrupts work within a single execution, but Prompt Opinion calls the MCP server fresh each time. Maintaining interrupt state across separate API calls — so the system could pause, return a question, and resume exactly where it stopped — required explicit session state management outside the graph.
FHIR data variability. Real FHIR data is messy. Lab results have inconsistent coding. Medication records use different terminologies. Conditions appear in multiple coding systems. Every O1 worker had to handle missing fields, null values, and unexpected resource shapes without crashing the pipeline.
Accomplishments We're Proud Of
The re-entrant router works. Mid-execution plan modification — disabling workers, changing time filters, forcing re-runs — all works correctly with no infinite loops. This is genuinely novel in a hackathon project.
The clinical math is correct. eGFR, qSOFA, NEWS2, CURB-65, CHA₂DS₂-VASc, HAS-BLED — all implemented, tested against reference values, and integrated into the O2 reasoning layer. The system never hallucinates a number.
The alert scanner runs on every query. Not just when explicitly asked. It scanned Antonia's creatinine at 2.9, flagged it as critical independently of the O2 reasoning workers, and would have caught it even if every LLM call had failed.
The empathy wrapper makes the response feel human. The final response for Antonia doesn't read like a JSON dump. It reads like a colleague who just reviewed the chart and wants to talk you through it. That's a product decision as much as a technical one.
End-to-end tests pass on live FHIR data. Not mock data. Real patient records from the HAPI FHIR public server, three distinct clinical scenarios, all passing.
The Self-Modifying System
After the core pipeline was complete, we added one more capability that changed the system's nature entirely: the AI Builder (O0).
A doctor can now type: "Build a worker called sepsis_detector that monitors fever, heart rate, and WBC. Put it in o2. Admin: admin/admin_123"
The system generates a new Python worker file, registers it, verifies it imports correctly, and on the next query — sepsis_detector is automatically loaded into O2 alongside the default workers.
This makes MedicsHelper not just a fixed clinical AI but a self-extending platform. Hospitals can add their own monitoring protocols without touching the codebase.
What We Learned
Hierarchical agents outperform flat agents on complex tasks. A single LLM asked "is it safe to discharge this patient?" will miss things. Twelve specialized workers each focused on one clinical dimension, running in parallel, catches far more — and each result is traceable.
The blackboard pattern solves the coordination problem. Shared XML state that every node reads and writes to, with no direct node-to-node communication, made the system debuggable, extensible, and auditable in ways that a pure LangGraph state dict wouldn't have.
Deterministic math should never be delegated to LLMs in clinical contexts. This seems obvious in retrospect but required active discipline. Every time an O2 worker needed a calculated value, the reflex was to ask the LLM. The correct answer was always to call calculations.py.
Graceful degradation is a first-class design requirement. A clinical system that crashes on an LLM timeout is dangerous. Every failure mode had to be designed for explicitly, not handled reactively.
Prompt Opinion's SHARP context injection is genuinely powerful. Having the FHIR URL, access token, and patient ID injected automatically into every MCP call — without any manual plumbing — made the integration feel like a native capability rather than a bolted-on connector.
What's Next for MedicsHelper
Real hospital FHIR integration. HAPI is a test server. The architecture is already compatible with Epic and Cerner — the next step is a pilot integration with an R4-compliant production server.
Expanded clinical calculators. The current seven cover the most common acute scenarios. Cardiology (GRACE, TIMI), neurology (NIHSS), and oncology (performance status scoring) are the next targets.
Population-scale discharge planning. The architecture already supports population queries. A ward-level discharge planner — scanning all admitted patients, ranking by discharge readiness, generating a prioritized list — is one configuration change away.
Voice input. The system already handles any language. Adding speech-to-text at the input layer would let a doctor speak a question during a ward round and get the answer read back — without touching a screen.
Longitudinal patient tracking. Right now each query is stateless across sessions. Adding a lightweight patient timeline — tracking how risk scores change across visits — would enable silent deterioration detection across days, not just within a single query.
Validation study. The most important next step is a structured evaluation against real clinical decisions — comparing MedicsHelper's recommendations to what a senior clinician would have caught. That's what turns a hackathon project into a tool that belongs in a hospital.
⚠️ How to Use MedicsHelper (Important for Judges)
To experience MedicsHelper correctly, please use it inside Prompt Opinion BYO Agent.
MedicsHelper is NOT a normal chatbot — it runs a full Hierarchical Agentic pipeline through MCP tools. If configured incorrectly, it will fall back to default tools and the system will NOT behave as intended.
Required Setup:
Go to BYO Agents → Add MedicsHelper
Open Edit → Tools tab → Enable: "Disable Community MCP Server"
Use the official prompts from the repository: Visit the GitHub repo and navigate to:
/byo-prompts/
- system_prompt.txt
- consultation_prompt.txt
Use these prompts exactly in:
- System Prompt
- Consultation Prompt
These prompts are required to activate the full MCP pipeline (all 18 tools).
⚠️ Important:
run_clinical_assessmentis not a simple tool — it is the entry point to the full HAA system- It orchestrates all 69 nodes across data, reasoning, safety, and workflow layers
- Without the correct prompts, this orchestration will NOT trigger
⚠️ Without proper setup:
- MCP tools will not be used
- Clinical pipeline will not execute
- Safety checks and drug interaction analysis will be skipped
- Output will degrade to a basic LLM response
Full Setup & Source Code:
https://github.com/off-rkv/MedicsHelper---Hierarchical-Agentic-AI-for-Clinical-Decision-Support
We strongly recommend using the repo prompts to see the system’s full capability.
Log in or sign up for Devpost to join the conversation.