Inspiration
Healthcare is rapidly adopting AI agents - a prescriber agent writes orders, a scheduler books procedures, a records-sync agent reconciles external data, an audit agent reviews logs. Each operates in isolation, with its own context, its own assumptions, its own blind spots. *That isolation is the new failure mode no one is designing for. * A prescriber agent doesn't see what the records agent just changed. A scheduler can't see the prescriber's allergy review. A silent records-sync overwrites a documented allergy because some external feed disagrees. The harm comes not from any single agent's mistake, but from the cascade none of them could see.
My inspiration was the layer underneath the agents. Every healthcare action should be intercepted by a single coordination and safety substrate that sees the patient's full state, the resources available, the recent verdicts from other agents and reasons about it before anything executes.
What it does
The Patient State Engine (PSE) is an MCP server that sits between every agent and the patient. Every proposed clinical action prescription, procedure scheduling, record update, lab order, discharge is routed
through validate_action and returns a structured verdict:
- APPROVED, REJECTED, CAUTION, or QUEUED_FOR_REVIEW
- Risk score, confidence, full clinical reasoning
- Concrete safer alternative when rejecting
- Cost note when a generic-equivalent exists (e.g. warfarin $10/pack vs apixaban $450/pack)
- Resource allocation naming a specific clinician, facility slot, and drug stock impact
Beyond per-action safety, PSE includes:
- Resource pool: 9 clinicians, 7 facilities, 13 test types, 28 drugs with class + cost. Real allocation, real depletion.
- Cross-patient insights: a second LLM pass surfaces patterns the per-action validator can't see like outbreak clusters, agent misbehaviour, cost drift, resource pressure.
- Confidence-gated human review: low-confidence verdicts route to a clinician review queue, not auto-approved.
- Adversarial-robust: catches brand-name allergen disguise, allergy stripping, prerequisite bypass, record-tamper via event log.
- SHARP / FHIR-context native: declares
ai.promptopinion/fhir-contextwith six SMART-on-FHIR scopes — Patient, Condition, AllergyIntolerance, MedicationRequest, Observation, Procedure.
How I built it
- Engine: FastAPI server, in-memory state stores for 30 synthetic patients, clinicians, facilities, tests, pharmacy. Audit log persists to JSONL and re-hydrates on restart.
- Reasoning: Pure-LLM, Claude Sonnet 4.6 via the Anthropic API. The system prompt enumerates the safety requirements (allergy cross-reactivity families, CKD dose review, procedural prerequisites, sensitive-record protection, resource feasibility, cost awareness, chain coordination). No rule engine.
- MCP layer:
FastMCPover streamable-HTTP at/mcp. Patches theinitializeresponse socapabilities.extensionsincludes theai.promptopinion/fhir-contextdeclaration with required scopes. HonorsX-Patient-ID,X-FHIR-Server-URL,X-FHIR-Access-Tokenheaders per tool call. - Insights: separate LLM pass that takes the recent audit log + the cohort + the resource snapshot and returns structured outbreak / agent misbehaviour / cost drift / resource pressure signals.
- Public reachability: Render primary + Cloudflare dev.
- Frontend: single-file
pse-demo.html. Five tabs (Story Mode, Live Demo, Resources, Review Queue, Insights). Story mode includes a "Without PSE / With PSE" toggle that runs the same actions twice for visceral contrast. - Adversarial agent:
EvilPrescriberAgentwith six attack classes, plus a dedicated demo script that probes the engine.
Challenges I ran into
- Inverting my own architecture mid-build. Our original design was a hybrid rules-first + LLM-second pipeline. Mid-hack I tore the rule engine out and made the LLM the sole reasoner — better narrative, but it meant rewriting the system prompt to be airtight on every safety axis the rules used to handle.
- MCP extension field discovery. Prompt Opinion expects the FHIR extension under
capabilities.extensions, not the SDK'sexperimentalbucket. Pydantic v2 silently dropped attribute-set extras. I had to inject directly via__pydantic_extra__for the JSON serialisation to include it. - Resource ID drift. When asked to allocate clinicians, the LLM kept using human names ("Dr. Patel") instead of resource IDs ("DR-CARDIO-1"). Fixed with a forgiving matcher on the server side plus a tighter prompt.
- Demo recording vs free-tier quotas. Mid-recording I hit the Gemini Free-tier daily quota in the platform agent's model. Solved by using another gemini account and previous-session footage for the chat beats and the HTML demo for the rest as suggested by Magnus and Pawan.
- Cloudflare tunnel lifetime. Quick tunnels have no uptime guarantee. Resolved by migrating to Render free tier.
Accomplishments that I'm proud of
- Cross-patient insights actually find things. A UTI outbreak cluster. A rogue agent attempting six unsafe actions across patients. Cost-drift on brand-vs-generic anticoagulants. These aren't seeded patterns * they emerged from real audit data the LLM reasoned over*.
- Adversarial robustness. All six red-team attack classes are caught. Brand-name disguise (
Augmentin= amoxicillin-clavulanate). Allergy-list stripping framed as an external feed sync. Prereq bypass with fake "ECG faxed externally" notes. All REJECTED. - The "Without PSE / With PSE" toggle. One click visualises the cascade failure and its prevention. The most-watched moment of the demo.
- SHARP / FHIR-context done right. Six SMART scopes declared, all authorised via Prompt Opinion's user-consent flow, headers received and acknowledged per tool call.
What I learned
- Multi-agent safety is a distinct failure mode. Not just "more bugs at scale" a genuinely new category that no single agent's pipeline can prevent. This convinced me the coordination substrate has to exist.
- Pure-LLM reasoning works when the prompt enumerates the invariants. Sonnet 4.6 gets clinical safety questions right at >0.95 confidence when the system prompt is explicit about allergy classes, CKD dose review, sensitive-field protection, and cross-reactivity families. Vague prompts produced vague verdicts.
- The audit log is more than a record — it's the input to the next LLM pass. Cross-patient pattern detection is just running the LLM over the audit + cohort. I didn't expect the patterns to be as sharp as they are.
- SHARP extension declaration is elegant. Server declares scopes → user authorises per-scope → headers flow per request. A clean trust model.
What's next for Patient State Engine
- Real FHIR backend integration. Replace the in-memory cohort with a patient-ID resolution layer that maps FHIR patient references to the engine's canonical clinical reasoning view.
- A2A agent path. Expose the same engine as an A2A-enabled agent for Path B of the challenge ecosystem, so other agents can consult it via Agent-to-Agent rather than only via tool call.
- Streaming verdicts. Token-by-token reasoning trace in the response so agents see the engine "thinking" instead of waiting for a full block.
- Pluggable formulary / cost catalog. Cost data should come from real pharmacy benefit managers, not hand-coded constants. Same for clinician pool and facility calendar — wire to actual scheduling systems.
- Production HIPAA hardening. Audit log encryption, no PII in response strings, configurable retention.
- Replay mode. Feed an existing audit JSONL back into the UI for post-incident analysis — judge the engine on a historical dataset.
- Multi-tenant deployment. Per-organisation cohorts, formularies, and policy thresholds (e.g. confidence gate).
Operational posture & deployment path
PSE ships as a Python service (FastAPI engine + FastMCP server). For the hackathon judging window I ran a tiered deployment:
- Primary endpoint (durable): a Render web service deployed from this repository via the committed
render.yaml. Stable HTTPS URL, auto-restart on crash, independent of any local machine survives laptop sleep, network changes, and operator absence. - Backup endpoint (development, fast iteration): a Cloudflare Quick Tunnel to a locally-running engine. Stands up a TLS-terminated public URL in seconds with no account required — useful for live debugging and verifying changes pre-deploy.
Render deployment specifics
Service configuration (see render.yaml at repo root):
- Runtime: Python 3.11, free tier
- Build:
pip install -r requirements.txt - Start:
bash start_render.sh— boots FastAPI on127.0.0.1:8001internally, polls/healthuntil ready, then runs the MCP server bound to0.0.0.0:$PORT(the public port Render injects). Internal REST stays local; only the MCP transport is publicly exposed. - Secrets:
ANTHROPIC_API_KEYis set in the Render dashboard and never committed. - Cold start: Render's free tier spins the service down after 15 minutes of inactivity. First request after spin-down takes ~30 seconds; subsequent requests are warm. The Starter plan ($7/mo) eliminates cold starts not required for hackathon judging.
Cloudflare Quick Tunnel (backup) specifics
cloudflared tunnel --url http://localhost:8002 exposes the local engine for development. Tunnel life is bounded by the local process and network connectivity; Cloudflare openly notes Quick Tunnels carry
no uptime guarantee. The local stack is hardened with nohup + disown on both server processes (REST on :8001, MCP on :8002) and macOS caffeinate -d -i -s to suppress display, idle, and system
sleep.
Deployment scope
The architecture cleanly separates application logic from deployment glue:
$$ T_{\text{deploy}} \;\approx\; 15 \text{ to } 25 \text{ minutes}, \quad \Delta_{\text{code, application}} \;=\; 0 \text{ lines}, \quad \Delta_{\text{code, deploy-config}} \;=\; 3 \text{ files}. $$
The three deploy-config files (render.yaml, start_render.sh, and a one-line $PORT fallback in server/mcp_server.py) are committed (4973601) and visible to judges in the public repository.
Availability matrix
| Surface | Submission | Stage One verify | Stage Two judging |
|---|---|---|---|
| Marketplace listing | ✅ | ✅ | ✅ |
| GitHub repository | ✅ | ✅ | ✅ |
| Demo video on YouTube | ✅ | ✅ | ✅ |
HTML demo (pse-demo.html) |
✅ | ✅ | ✅ |
| MCP tool invocation — Render primary | ✅ | ✅ | ✅ (≤50s cold start) |
| MCP tool invocation — Cloudflare backup | ✅ | ✅ | best-effort |
/insights cross-patient LLM call |
✅ | ✅ | ✅ |
The demo video documents every claimed capability end-to-end. With the Render deployment, AI Factor, Potential Impact, and Feasibility all score against a live, durable, publicly-reachable service — not a laptop-anchored demo.
Built With
- cloudflare
- css
- fast-api
- html
- javascript
- mcp
- promptopinion
- pydantic
- python
- render


Log in or sign up for Devpost to join the conversation.