ChronoCare
From scattered data to clinical clarity in seconds.
Inspiration
Every clinician knows the feeling. A patient arrives in the ER. Five years of data sits in the EHR across dozens of visits, three different providers, hundreds of lab results. The chart has the answer. Nobody has time to read the chart.
Electronic health records are excellent at answering one question: what is happening to this patient right now? Today's vitals. Today's labs. Today's medications. The snapshot is detailed and immediate.
But they systematically miss two questions that matter just as much:
What has been happening? A patient's creatinine drifts from 0.9 to 1.3 mg/dL over 36 months. Each individual result is flagged normal, inside the reference range, no alert fires. Three different clinicians saw three different snapshots. None of them saw the slope. The trajectory. The cascade in progress.
What is coming? Hypertension diagnosed in 2019 became Stage 2 CKD by 2022, but no system flagged the cardiometabolic cascade developing across those three years. A 12-minute appointment does not leave time to read hundreds of records. The chart has the answer. Nobody has time to read the chart.
Diagnostic errors are the third leading cause of death in the United States. Handoff failures cause 80% of serious medical errors. These are not technology failures. They are information architecture failures. The data exists. The connections do not get made.
The question that drove ChronoCare was simple: what if a clinician could ask one question and get back the full story of a patient, the hidden warning signs, the root causes, and the next steps, all in under 15 seconds? Not a summary. Not a data dump. An actual answer.
That question became a production system.
What it does
ChronoCare is a clinical reasoning engine that answers four questions no EHR answers today, in a single conversation, in under 15 seconds.
1. What happened to this patient? ChronoCare fetches a patient's complete FHIR record, normalizes every condition, lab, medication, and encounter into a unified chronological timeline, identifies the 3 to 5 clinical turning points that changed the patient's trajectory, and generates a 200 to 300 word clinical narrative written the way an experienced physician would brief an incoming colleague. Not a list of events. A story with cause and consequence.
2. What is quietly going wrong right now? The silent deterioration detector analyzes recent signals holistically across vitals, labs, and clinical notes together. Blood pressure at 138/88 is technically normal. Creatinine at 1.3 is technically normal. Fatigue mentioned twice in recent notes is easy to miss. Together, they tell a different story. ChronoCare reasons across weak signals the way an experienced clinician would, flagging the pattern before any individual alarm fires. This is the core product differentiator: no rule-based system can do this.
3. Why did this happen? The root cause analyzer correlates events across the patient's full timeline, identifies plausible causal pairs with confidence levels and rationale, and synthesizes a causal hypothesis explaining how the patient arrived at their current state.
4. What should we do next? Recommendations are generated as 3 to 5 patient-specific actions cross-checked against ADA, JNC, KDIGO, and ACC/AHA clinical guidelines. Every recommendation cites specific findings from this patient's actual data. No generic advice. No boilerplate.
The full pipeline runs in 14 to 16 seconds on a warm server and produces a structured clinical brief covering all four layers: narrative, early warning risk score with trend direction, causal hypothesis, comorbidity map, guideline gaps, and prioritized recommendations.
ChronoCare ships as two surfaces. A custom React web app at sriksven.github.io/chronocare where anyone can run a live analysis on four synthetic demo patients right now. And a ChronoCore A2A agent published to the Prompt Opinion Marketplace, where clinicians can request a patient analysis in plain language inside their existing clinical workspace and get the full brief back in chat.
How we built it
ChronoCare is four replaceable layers held together by open standards. Every layer can be swapped independently without touching the others.
Layer 1: Data
All patient data is synthetic. Four demo patients were built programmatically using hand-crafted FHIR R4 transaction bundles, each covering a distinct clinical archetype:
| Patient | Profile | Clinical story |
|---|---|---|
| John Doe, 62M | HTN + CKD | Slow cardiometabolic cascade over 5 years |
| Maria Rodriguez, 58F | T2DM + nephropathy | Diabetes management failure until SGLT2 added |
| Robert Chen, 71M | CHF + AFib + COPD | Cardiorenal syndrome with multiple interacting conditions |
| Sarah Williams, 45F | Prediabetes | Control case, lifestyle intervention success |
All four patients live on the HAPI public FHIR R4 sandbox. No PHI anywhere in the system. A reproducible generator script produces the bundles deterministically.
FHIR resources are normalized using local lookup tables covering 37 LOINC codes, 32 ICD-10 codes, and 30 RxNorm codes before reaching any LLM, converting raw clinical codes into human-readable descriptions.
Layer 2: Reasoning server
Python 3.11 on Railway, Starlette and Uvicorn, 14 tools exposed over Streamable HTTP (the MCP transport Prompt Opinion requires).
Multi-model LLM routing across two providers:
| Task | Model | Why |
|---|---|---|
| Narrative, pattern analysis, causal hypothesis, recommendations | GPT-4o | Prose quality and reasoning depth |
| Turning points, warning report, comorbidity map | Llama-3.3-70b via Groq | Fast structured JSON, lower cost |
| Event correlation, guideline matching | GPT-4o-mini | Accuracy without GPT-4o cost |
| Voice synthesis | OpenAI TTS-1 | Natural prosody |
Pipeline topology: Rather than running 13 tools sequentially, the demo endpoint uses a fan-out / fan-in architecture where five independent reasoning branches execute concurrently via asyncio.gather and asyncio.to_thread:
Sequential head:
FHIR fetch -> chronological ordering -> recent signal filter
Parallel branches (concurrent):
A: turning points -> narrative (~9s)
B: pattern analysis -> warning report (~5s)
C: event correlation -> causal hypothesis (~6s)
D: comorbidity mapping (~2s)
E: guideline matching (~5s)
Sequential tail:
recommendations -> unified brief assembly
Wall-clock latency dropped from 25 to 35 seconds sequential to 14 to 16 seconds parallel.
The full pipeline costs approximately $0.09 per analysis by routing structured extraction to Groq and reserving GPT-4o for outputs where quality directly affects clinical credibility.
Layer 3: Agent
ChronoCore is an A2A agent configured on Prompt Opinion with GPT-4.1 as its model. Its system prompt encodes a 13-step reasoning protocol. When a clinician asks the general chat agent for a patient analysis, Prompt Opinion matches the skill via A2A, hands off to ChronoCore, ChronoCore executes the full pipeline by calling the MCP tools, and returns the formatted brief. The FHIR context extension injects workspace credentials into every MCP request automatically.
Both the MCP server and the ChronoCore agent are published to the Prompt Opinion Marketplace and publicly installable by any organization.
Layer 4: Frontend
React, TypeScript, Vite, Tailwind CSS with a custom clinical design system, Framer Motion animations. Four pages: landing, how it works, live demo with animated pipeline trace, admin panel for uploading FHIR bundles. Bundle size: 110KB gzipped. Two independent GitHub Actions CI/CD pipelines handle backend testing and frontend deployment. 42 pytest tests pass on every push to main.
Challenges we ran into
Prompt engineering for clinical specificity. Early LLM outputs sounded clinical but cited no specific patient data. The prompts went through multiple iterations before reliably producing outputs that reference specific dates, specific lab values, and specific medication changes from the actual record. The breakthrough was an explicit anti-hallucination instruction in every reasoning prompt combined with structured JSON output schemas: "Only reference findings present in the data provided. Do not infer information not present." Generic-sounding outputs are worse than useless in a clinical context because they erode clinician trust in the system.
Parallelizing the reasoning pipeline without breaking trace recording. The initial sequential implementation ran all 13 tools in order, producing correct results at 25 to 35 seconds. The fan-out topology required careful mapping of data dependencies to identify which branches could run concurrently. The shared trace recorder that logs each tool call's result also needed a mutex guard via asyncio.Lock to prevent concurrent branches from racing when appending entries. Debugging async race conditions in a multi-branch pipeline is not straightforward.
Prompt Opinion OAuth credentials. The original architecture used Prompt Opinion's workspace FHIR server as the data source, allowing the SHARP context extension to inject live workspace credentials into every MCP call. We could not generate valid OAuth client credentials in time. The pivot to HAPI public R4 required rewriting the FHIR client auth logic, updating the context middleware, and regenerating all four demo patient bundles against the new endpoint. This cost roughly six hours mid-build.
FHIR R4 edge cases in production. The spec is comprehensive but real-world data quality varies even on a reference implementation. Building a normalizer that handles missing codes, malformed dates, empty resource arrays, and inconsistent reference formats gracefully without crashing took more engineering time than any other single component. In healthcare data pipelines, graceful degradation is non-negotiable.
Accomplishments that we are proud of
The silent deterioration detector works correctly across all four clinical archetypes. It correctly identifies the concerning multi-signal pattern in the CKD case, correctly flags the rising UACR in the diabetic nephropathy case, correctly identifies cardiorenal syndrome markers in the CHF case, and correctly returns low risk for the healthy control case. No false positives on the healthy patient. No missed patterns in the designed scenarios. The core product thesis holds in production.
14 to 16 seconds for 8 LLM calls. The fan-out / fan-in parallel topology makes this possible. Each of the eight model calls receives exactly the context it needs, nothing more, and runs at exactly the right point in the dependency chain. This is the engineering achievement we are most proud of.
A live production system, not a hackathon demo. Both offerings are publicly listed on the Prompt Opinion Marketplace and installable by any organization. The GitHub repo is open source. The FHIR backend is a real public FHIR server. The frontend is deployed on GitHub Pages with automated CI/CD. Anyone can run a full patient analysis right now at the live URL.
$0.09 per full analysis. Multi-model routing produces a pipeline that costs less than ten cents per run despite 8 LLM calls, making it economically viable at scale without compromising quality on the outputs that matter most.
Brand, design system, and product identity. The custom clinical aesthetic — cream background, deep navy ink, teal accent, Source Serif Pro for display text — was designed to feel like a tool clinicians would actually trust. The animated pipeline trace on the demo page makes the AI reasoning visible and legible to non-technical users.
What we learned
The most valuable AI capability in healthcare is not generation. It is connection. Every piece of data ChronoCare needs already exists in the EHR. The gap is not data availability. The gap is the absence of a system that reads it all together and surfaces the pattern. LLMs are uniquely suited to fill that gap because they can reason across unstructured, heterogeneous, temporally distributed information in a way no rule-based system can.
Speed is a product feature, not an engineering metric. A clinical brief that takes 90 seconds to generate will not be used in a 12-minute appointment. A brief that takes 15 seconds will. The parallelization work was not optimization for its own sake. It was a product requirement.
Multi-agent architecture enables specialization without fragmentation. The A2A pattern, where a generalist agent delegates to a specialist, is the right abstraction for clinical AI. The clinician talks to one interface. The specialist reasoning happens behind it. ChronoCore does not need to be the only agent a clinician uses. It needs to be the best agent for one specific task.
Open standards are a competitive moat, not a constraint. Because ChronoCare speaks MCP and FHIR R4 with no vendor-specific code, it can integrate with any compliant EHR, any compliant agent platform, and any compliant data store. The system we built for a hackathon could be deployed against a real Epic environment by changing one environment variable. That portability is the product's long-term value proposition.
Shipping with production discipline produces better submissions and better products. 42 tests, two CI/CD pipelines, structured logging with hashed patient IDs, runbooks, and architecture decision records are not overhead. They force clarity about how the system works, catch regressions before they reach production, and demonstrate to anyone evaluating the project that the engineering was taken seriously.
What's next for ChronoCare
Live EHR integration. Testing against Epic's sandbox and Cerner's developer program is the immediate next step. The architecture is already compatible. The normalizer needs validation against real EHR data edge cases before being trusted in a clinical environment.
Specialty-specific reasoning modules. Oncology, psychiatry, and pediatrics each have domain-specific turning points, guideline sets, and deterioration patterns that the current general medicine pipeline does not handle optimally. Dedicated modules for each specialty would significantly increase clinical utility.
Streaming brief output. Delivering each section of the brief as it completes rather than waiting for the full pipeline to finish would make the product feel significantly faster and allow clinicians to start reading the narrative while the recommendations are still generating.
Longitudinal monitoring mode. A scheduled version that runs every 24 hours per patient, diffs the output against the previous brief, and surfaces only what changed would serve inpatient monitoring teams as a fundamentally different and more powerful product.
Federated deployment. A hospital system could deploy ChronoCare entirely inside their own infrastructure, pointed at their internal FHIR server, with no patient data ever leaving their network. The open-source codebase makes self-hosting possible today.
Fine-tuned clinical reasoning model. The current system uses general-purpose LLMs with carefully engineered prompts. A model fine-tuned specifically on clinical reasoning tasks with verified outputs would improve consistency, reduce hallucination risk, and eliminate the prompt engineering overhead that currently requires domain expertise to maintain.
Log in or sign up for Devpost to join the conversation.