-
-
CareRelay OS — AI agents coordinate clinical handoffs, detect risks, and prevent medical errors in real time.
-
4 AI agents collaborating live — reading patient data, querying meds, and escalating critical risks in seconds.
-
Hallucination caught: AI validator detects unsafe medication not in FHIR data and blocks it before reaching clinician.
-
Multi-agent clinical reasoning: RiskAgent detects critical issues, ClinicalAgent generates SBAR, Validator ensures FHIR-safe output.
-
Agent in Prompt opinion
-
Before vs After: manual handoffs miss critical details; CareRelay ensures verified, structured, and safe transitions.
-
Full audit trail: every agent decision logged with timestamps for transparency, safety, and clinical accountability.
Inspiration
Every 11 minutes, a patient in the United States is harmed during a clinical handoff.
That number isn't from a hypothetical — it is drawn from decades of Joint Commission sentinel event analyses and the landmark CRICO handoff studies. The moment a nurse shifts change, the moment a patient transfers from the ICU to the step-down unit, the moment a specialist signs off before a weekend — these are the fault lines where clinical knowledge evaporates, where critical drug allergies get dropped, where pending labs go unread, where the next clinician walks into a room carrying an incomplete picture of a fragile human being.
I am Samson Ojekunle, a full-stack AI developer from Nigeria. I came to this problem not through academia but through watching healthcare systems — both in Africa and through studying Western clinical workflows — struggle with the same fundamental failure: we have built extraordinary medical knowledge, and then we transmit it via whispered hallway reports, hastily scrawled notes, and overwhelmed memory. The handoff is where medicine's paper-age assumptions collide catastrophically with its 21st-century complexity.
When the Agents Assemble hackathon opened with the theme of AI agents in healthcare, the answer was immediate: build the clinical handoff system that should have existed already.
What I Built: CareRelay OS
CareRelay OS is a 4-agent clinical handoff intelligence system built on three open standards — MCP (Model Context Protocol), A2A (Agent-to-Agent), and FHIR R4 — and deployed as a production-grade platform on Railway. It does not summaries notes. It does not autocomplete templates. It reasons about patient state, flags the risks a sleep-deprived clinician might miss, drafts a structured SBAR handoff narrative, and then validates that narrative against the source record — catching its own hallucinations before they reach the bedside.
The 4-Agent Pipeline
Agent 1 — Context Builder (Groq/Llama-3.3-70B) The pipeline begins with raw FHIR R4 data — medications, problem lists, lab values, vitals, allergies — retrieved from the HAPI FHIR R4 sandbox and structured via the SHARP context propagation protocol through the Node.js API gateway. This agent's sole responsibility is synthesis without opinion: it produces a structured patient context object that is passed downstream, ensuring every subsequent agent reasons from the same canonical source of truth rather than re-parsing unstructured notes.
Agent 2 — Risk Intelligence (Groq/Llama-3.3-70B) Speed matters in clinical triage. Groq's inference engine processes the structured context at near-real-time latency to generate a risk intelligence report: a composite risk score, flagged medication interactions, pending diagnostic items that could alter the clinical picture within the next 12 hours, and identified care gaps. This is the layer that answers the question clinicians rarely have time to ask: "What is likely to go wrong on the next shift?"
Agent 3 — Clinical Reasoning (GPT-4o) Armed with the patient context and the risk intelligence report, GPT-4o performs the reasoning step that rule-based systems cannot: it synthesizes clinical narrative in the SBAR format (Situation, Background, Assessment, Recommendation), calibrating language for the receiving clinician's specialty, and embedding the flagged risks directly into the recommendation layer. This is not template-filling — it is genuine clinical synthesis across heterogeneous data.
Agent 4 — Handoff Validation (GPT-4o) This is where CareRelay OS departs most decisively from existing tools. The validation agent receives both the original FHIR R4 source data and the generated SBAR narrative and performs a systematic hallucination audit: cross-checking every clinical claim in the narrative against the structured record. Fabricated medication doses, invented allergy statuses, hallucinated lab values — these are caught before output. The agent returns a confidence score, a line-by-line verification report, and a corrected narrative where discrepancies are found.
Infrastructure & Interoperability
The system is built for the real world of healthcare interoperability:
- MCP Server: Three tools published to the Prompt Opinion marketplace —
get_patient_context,generate_handoff, andvalidate handoff— making CareRelay OS composable into any MCP-compatible workflow, EHR plugin, or agent orchestration system. - A2A Agent: Published with FHIR context extension enabled, allowing other agents to invoke CareRelay OS as a specialized handoff reasoning service within multi-agent healthcare pipelines.
- FHIR R4 Integration: Full bidirectional integration with the HAPI FHIR R4 sandbox, consuming Patient, Encounter, Medication Request, Condition, Observation, and Allergy Intolerance resources.
- React Frontend: Live demo interface with real-time agent collaboration view, audit trail, and a dedicated hallucination demonstration mode that shows — in real time — what happens when AI fabricates clinical data, and then how the validation agent catches it.
- Railway Deployment: Both the Python FastAPI AI agent service and the Node.js Express API gateway are deployed on Railway, with environment-isolated secrets and production-grade logging.
How I Built It
The build began with a brutal constraint: the handoff problem is not a UI problem or a summarization problem. It is a trust problem. Clinicians will not use AI-generated handoffs unless they can verify the AI is not fabricating. Every architectural decision flowed from that requirement.
I chose FHIR R4 as the data layer not for compliance checkbox reasons, but because structured clinical data is the prerequisite for verifiable AI output — you cannot catch hallucinations against unstructured free text. The HAPI FHIR sandbox gave me a realistic patient data environment without PHI risk.
The two-LLM architecture (Groq for speed-critical synthesis, GPT-4o for reasoning-critical generation and validation) was a deliberate engineering choice. Groq's sub-second latency on the context and risk layers means the pipeline feels responsive even before the heavier reasoning steps complete. GPT-4o's clinical reasoning depth was demonstrably superior for SBAR generation in internal testing.
The MCP server architecture was the piece I was most excited to build. Publishing CareRelay OS's capabilities as MCP tools means the system is not a standalone application — it is a composable clinical intelligence service. Any AI assistant, EHR copilot, or agent orchestration platform that supports MCP can call CareRelay OS as a specialized module. That is the architecture of interoperable healthcare AI: not monolithic applications, but reasoning services with standard interfaces.
Challenges
The hardest technical challenge was the hallucination validation loop. Getting GPT-4o to reliably cross-reference its own peer's output against structured FHIR data — without itself fabricating the verification — required careful prompt engineering, structured output constraints, and a validation schema that forced the model to cite the specific FHIR resource field for each claim it verified or flagged. The schema design took three days of iteration.
The second challenge was SHARP context propagation across the agent boundary. Maintaining patient context integrity as a typed object through four agents, across two different LLM providers, through a Node.js gateway, without any state mutation between agents — this required designing a context schema early and treating it as immutable through the pipeline. The discipline paid off: debugging became straightforward because context errors could only originate at the ingestion layer.
What I Learned
I learned that the gap between "AI can do this in a demo" and "AI can be trusted to do this in a hospital" is not primarily a capability gap — it is an auditability gap. CareRelay OS's hallucination validation layer is not technically impressive. But it is clinically essential. The most important feature of a clinical AI system is not what it generates — it is how clearly it shows you what it got wrong.
I also learned that FHIR R4 is both more powerful and more verbose than most AI developers expect. Parsing a Medication Request resource correctly — understanding the difference between medication Codeable Concept and medication Reference, handling the timing and dosage instruction hierarchy — this is domain work, not just API work. The standards exist for good reason.
Impact
If deployed in a 500-bed hospital processing 80 handoffs per day, CareRelay OS's validation layer alone could catch an estimated 3–8 clinically significant documentation errors per day for each of which carries real risk of patient harm. At scale, across health systems, this is not a quality improvement metric. It is a patient safety intervention.
The system is built to be deployable today. It requires no EHR vendor partnership, no HIPAA BAA negotiation with a proprietary AI vendor, and no custom integration beyond a FHIR R4 endpoint. That is the point. Healthcare AI that is only accessible to well-resourced academic medical centres is not healthcare AI. It is a research project. CareRelay OS is built for the community hospital in Lagos, the district hospital in rural India, the understaffed ED in the American Midwest. Anywhere clinicians hand off patients, the stakes are the same.
Log in or sign up for Devpost to join the conversation.