MedicsHelper Hierarchical Agentic AI for Clinical Decision

off rkv posted an update — May 23, 2026 12:29 AM EDT

Post-submission update -- capabilities added after the video was recorded, with technical notes for those reviewing the architecture in depth.

AI BUILDER (O0) -- Runtime Self-Extension

A doctor types: "Build a worker called sepsis_detector that monitors fever, high heart rate, and elevated WBC. Put it in o2. Admin: admin/admin_123"

What happens:

Intent node classifies: ai_builder intent
Universal Router bypasses O1/O2/O3 entirely, routes to O0
Admin credentials verified against server .env
LLM generates a Python worker body from a typed template
Safety filter rejects subprocess, eval, exec, open() calls
File written to agent/workers/custom/ with UTF-8 encoding
registry.json updated with name, layer, description, file path
On the next O2 query: orchestrator reads registry, dynamically imports via importlib, fires alongside 13 default workers

No code changes. No server restart. The running system extends itself through natural language. Supported commands: build, remove, remove all, list, explain architecture, save/show patient notes. Screenshots of each in the gallery.

FOR THOSE REVIEWING FHIR STANDARDS DEPTH

The FHIR client calls /metadata on first connection and parses the full CapabilityStatement. On HAPI public server: 146 resource types discovered, $everything confirmed on Patient, 57 patient-linked resource types mapped with their search parameters (patient= vs subject= parameter distinction handled per resource).

When a server returns 422 on /metadata -- as Prompt Opinion's internal FHIR server does -- the client falls back to a hardcoded 57-resource minimal capability set and continues without interruption. This is the pattern Josh Mandel described in his SMART on FHIR work: graceful degradation when CapabilityStatement is unavailable.

SHARP context (X-FHIR-Server-URL, X-FHIR-Access-Token, X-Patient-ID) is read from MCP request headers on every call. Token propagation is separated into three layers: MCP extraction, FHIRClient class-level shared token, per-worker cache keyed on base_url + token prefix. Architecture is compatible with Epic, Cerner, Azure Health Data Services, and AWS HealthLake.

FOR THOSE FOCUSED ON ICU AND CRITICAL CARE WORKFLOWS

The system includes a qSOFA calculator (pure Python, zero LLM) that flags sepsis risk on every assessment regardless of the query type. The alert scanner runs 10 rule-based workers on every single query -- critical electrolytes, deterioration trends, medication accumulation, allergy violations -- with no LLM involvement. This means zero hallucination risk on the checks that matter most in ICU settings.

The re-entrant router allows mid-execution modification: if an intensivist says "focus only on kidney labs from the last 48 hours," the Interrupt LLM modifies the XML Work Order, resets O1's completion flag, and the router re-runs O1 with a focused instruction on the labs worker. The previous O2 results are preserved and not discarded.

FOR THOSE FOCUSED ON PEDIATRIC AND AGENTIC EHR WORKFLOWS

The architecture is built around the same principle as CHIPPER and Epic's agentic direction: a clinician assigns a task in natural language, agents handle it, human stays in the loop. MedicsHelper has two explicit human-in-the-loop checkpoints (HI1 after data fetch, HI2 after reasoning) where the system pauses and asks the doctor to confirm before proceeding.

For pediatric dosing: the dosage_calculation worker flags adult-dose medications when the patient is under 18. The Schwartz formula for pediatric eGFR (vs Cockcroft-Gault for adults) is the next planned addition to the deterministic calculator layer.

The AI Builder pattern -- where a clinician at a children's hospital could say "build a worker that tracks weight-for-age z-scores for patients under 5" and the system deploys it into the live pipeline -- is the direction this architecture is designed for.

FOR THOSE EVALUATING CLINICAL AI SAFETY AND HALLUCINATION RISK

MedicsHelper separates three distinct components with different hallucination risk profiles:

Rule-based alert scanner -- zero LLM, deterministic rules, always runs
Deterministic calculators -- pure Python (eGFR, qSOFA, NEWS2, CURB-65, CHA2DS2-VASc, HAS-BLED, BMI)
LLM reasoning workers -- clinical interpretation only, never arithmetic

The system is honest about data incompleteness. If the FHIR server returns 3 medications for a patient with 43 conditions, the drug interaction worker flags the mismatch explicitly: "Medication count inconsistent with condition complexity. Manual reconciliation recommended."

FOR THOSE EVALUATING AI INFRASTRUCTURE AND SCALABILITY

The LangGraph StateGraph with MemorySaver handles interrupt state across stateless MCP calls. Session persistence serializes full AgentState to disk at interrupt (patient ID as key, 30-minute TTL). Resume call loads state and routes directly to O3, skipping completed O1/O2. A 37-second full pipeline resumes in under 3 seconds.

Current deployment: Azure B2als_v2, 2 vCPU, 4GB RAM, Groq LPU for inference (~500 tokens/sec). Bottleneck is LLM API latency, not compute. Horizontal scaling path: stateless MCP endpoints + shared Redis for sessions + LLM gateway with rate-limit-aware queuing.

System live at http://4.225.164.37:8000/mcp -- Azure VM, 24/7 uptime since April 20, 2026.

Built by Ratnesh (off.rkv) -- 3rd year CS student, India.

Log in or sign up for Devpost to join the conversation.