ChronoCare
From scattered data to clinical clarity in seconds.
Inspiration
Every clinician knows the feeling. A patient arrives in the ER. Five years of data sits in the EHR across dozens of visits, three different providers, hundreds of lab results. The chart has the answer. Nobody has time to read the chart.
Electronic health records are excellent at answering one question: what is happening to this patient right now? Today's vitals. Today's labs. Today's medications. The snapshot is detailed and immediate.
But they systematically miss two questions that matter just as much:
What has been happening? A patient's creatinine drifts from 0.9 to 1.3 mg/dL over 36 months. Each individual result is flagged normal, inside the reference range, no alert fires. Three different clinicians saw three different snapshots. None of them saw the slope. The trajectory. The cascade in progress.
What is coming? Hypertension diagnosed in 2019 became Stage 2 CKD by 2022, but no system flagged the cardiometabolic cascade developing across those three years. A 12-minute appointment does not leave time to read hundreds of records. The chart has the answer. Nobody has time to read the chart.
Diagnostic errors are the third leading cause of death in the United States. Handoff failures cause 80% of serious medical errors. These are not technology failures. They are information architecture failures. The data exists. The connections do not get made.
This is the problem that generative AI and large language models are uniquely positioned to solve. Not because AI can replace clinicians, but because AI can do one thing clinicians structurally cannot: read five years of scattered records, reason across all of them simultaneously, and surface the pattern before it becomes a crisis.
ChronoCare was built to prove that thesis in production.
What it does
ChronoCare is a clinical reasoning engine built on generative AI and agentic architecture. It answers four questions no EHR answers today, in a single conversation, in under 15 seconds.
1. What happened to this patient? ChronoCare fetches a patient's complete FHIR record, normalizes every condition, lab, medication, and encounter into a unified chronological timeline, identifies the 3 to 5 clinical turning points that changed the patient's trajectory, and generates a 200 to 300 word clinical narrative written the way an experienced physician would brief an incoming colleague.
2. What is quietly going wrong right now? The silent deterioration detector analyzes recent signals holistically across vitals, labs, and clinical notes together. Blood pressure at 138/88 is technically normal. Creatinine at 1.3 is technically normal. Fatigue mentioned twice in recent notes is easy to miss. Together, they tell a different story. The LLM reasons across weak signals the way an experienced clinician would, flagging the pattern before any individual alarm fires. This is the core AI differentiator: no rule-based system can reason across multiple individually-normal values to detect an emerging pattern.
3. Why did this happen? The root cause analyzer uses generative AI to correlate events across the patient's full timeline, identifies plausible causal pairs with confidence levels and clinical rationale, and synthesizes a causal hypothesis explaining how the patient arrived at their current state.
4. What should we do next? Recommendations are generated as 3 to 5 patient-specific actions cross-checked against ADA, JNC, KDIGO, and ACC/AHA clinical guidelines. Every recommendation cites specific findings from this patient's actual data. No generic advice.
The full pipeline produces a structured clinical brief covering all four layers: narrative, early warning risk score with trend direction, causal hypothesis, comorbidity map, guideline gaps, and prioritized recommendations.
ChronoCare is accessible through two surfaces. A custom React web app at sriksven.github.io/chronocare where anyone can run a full analysis on four synthetic demo patients. And a ChronoCore A2A agent on the Prompt Opinion platform, where clinicians can request a patient analysis in plain language inside their existing clinical workspace.
How we built it
ChronoCare is four replaceable layers held together by open standards, with generative AI at the core of every reasoning step.
AI architecture
The system runs 8 LLM calls per pipeline, routed across two providers based on task characteristics:
| Task | Model | Reasoning |
|---|---|---|
| Clinical narrative generation | GPT-4o | Best prose quality for human-readable output |
| Silent deterioration pattern analysis | GPT-4o | Most critical reasoning step, needs best model |
| Causal hypothesis synthesis | GPT-4o | Nuanced multi-step causal reasoning |
| Actionable recommendations | GPT-4o | Consequential output, needs accuracy and depth |
| Clinical turning point extraction | Llama-3.3-70b via Groq | Structured JSON extraction, speed matters |
| Early warning report formatting | Llama-3.3-70b via Groq | Structured output, low latency |
| Comorbidity relationship mapping | Llama-3.3-70b via Groq | Well-defined clinical relationships |
| Guideline cross-checking | GPT-4o-mini | Accuracy without GPT-4o cost |
| Voice synthesis | OpenAI TTS-1 | Natural prosody for hands-free clinical use |
Every LLM reasoning prompt includes an explicit anti-hallucination instruction: "Only reference findings present in the data provided. Do not infer information not present." Temperature is set to 0.2 for reasoning steps (consistency) and 0.5 for narrative generation (readable prose).
Agentic architecture
ChronoCare is built on two open agentic standards:
MCP (Model Context Protocol): The reasoning server exposes 14 tools over Streamable HTTP. Any compliant agent platform can call these tools without custom integration code. The tools are composable and independently callable.
A2A (Agent-to-Agent): The ChronoCore agent on Prompt Opinion uses A2A to receive handoffs from a general-purpose chat agent. When a clinician asks for a patient analysis, the general agent recognizes the intent, hands off to ChronoCore, ChronoCore executes the 13-step pipeline by calling the MCP tools in deliberate order, and returns the formatted brief. This is multi-agent collaboration working in production.
Pipeline topology
The HTTP demo endpoint runs the pipeline as a fan-out / fan-in topology rather than a flat sequential chain. After the sequential head (FHIR fetch and chronological ordering), five independent reasoning branches execute concurrently via asyncio.gather and asyncio.to_thread:
Sequential head:
get_full_patient_history -> order_events_chronologically -> get_recent_signals
Parallel branches (concurrent):
A: identify_turning_points -> generate_narrative (~9s)
B: analyze_weak_patterns -> generate_early_warning (~5s)
C: correlate_events -> generate_causal_hypothesis (~6s)
D: map_comorbidities (~2s)
E: match_clinical_guidelines (~5s)
Sequential tail:
generate_recommendations -> generate_unified_brief
This topology dropped wall-clock latency from a sequential 25 to 35 seconds to a parallel 14 to 16 seconds warm.
Data layer
All patient data is synthetic. Four demo patients cover distinct clinical archetypes: a 62-year-old male with hypertension progressing to CKD, a 58-year-old female with Type 2 diabetes and early diabetic nephropathy, a 71-year-old male with CHF and cardiorenal syndrome, and a 45-year-old female as a control case. All four patients live on the HAPI public FHIR R4 sandbox. No PHI anywhere in the system.
FHIR resources are normalized using local lookup tables for 37 LOINC codes, 32 ICD-10 codes, and 30 RxNorm codes before reaching any LLM, converting raw clinical codes into human-readable descriptions.
Frontend and infrastructure
The React app was built with Vite, TypeScript, Tailwind CSS, and Framer Motion. Four pages: landing, how it works, live demo with animated pipeline trace, and admin panel for uploading FHIR bundles. Two independent GitHub Actions CI/CD pipelines handle backend testing and frontend deployment automatically. 42 pytest tests pass on every push to main.
Challenges we ran into
Prompt engineering for clinical specificity. Early LLM outputs sounded clinical but cited no specific patient data. The prompts went through multiple iterations before reliably producing outputs that reference specific dates, specific lab values, and specific medication changes from the actual record. Generic-sounding outputs are worse than useless in a clinical context because they erode trust. The anti-hallucination instruction and structured output schemas were the breakthrough.
Parallelizing the reasoning pipeline. The initial sequential implementation produced correct results at 25 to 35 seconds. The fan-out topology required careful mapping of data dependencies between all 13 tools to identify which branches could run concurrently without waiting for upstream outputs. The shared trace recorder also needed a mutex guard via asyncio.Lock to prevent concurrent branches from racing when appending entries.
Prompt Opinion OAuth credentials. The original plan used Prompt Opinion's workspace FHIR server, which would have allowed live workspace credentials to be injected into every MCP call via the SHARP context extension. We could not generate valid OAuth client credentials in time. The pivot to HAPI public R4 required rewriting the FHIR client's auth logic to treat the token as optional and updating the context middleware to fall back gracefully to environment variables. This cost roughly six hours mid-build.
Learning FHIR R4 during a build sprint. The spec is comprehensive and real-world data quality varies even on a reference implementation. Handling missing codes, malformed dates, empty resource arrays, and inconsistent reference formats gracefully took more engineering time than any other single component.
Accomplishments that we are proud of
The silent deterioration detector works. The core thesis of ChronoCare is that LLMs can reason holistically across multiple individually-normal signals to detect an emerging pattern. In testing across all four demo patients, the detector correctly identifies the concerning multi-signal pattern in the CKD case, correctly flags the rising UACR in the diabetic nephropathy case, correctly identifies the cardiorenal syndrome markers in the CHF case, and correctly returns low risk for the healthy control case. No false positives. No missed patterns in the designed scenarios.
8 LLM calls in 14 to 16 seconds. The fan-out / fan-in parallel topology makes this possible. Each of the eight models is called exactly once, with exactly the context it needs, at exactly the right point in the dependency chain.
A production system, not a prototype. ChronoCare is fully live. The MCP server and ChronoCore A2A agent are published to the Prompt Opinion Marketplace and publicly installable by any organization. The React frontend is deployed on GitHub Pages. All four patients are analyzable right now.
Multi-model routing that reduces cost without reducing quality. The full pipeline costs approximately $0.09 per analysis by routing structured extraction tasks to Groq Llama-3.3-70b and reserving GPT-4o for the four outputs where prose quality and reasoning depth directly affect clinical credibility.
42 tests and two CI/CD pipelines. Shipping with full test coverage and automated deployment demonstrates that production-grade engineering discipline matters regardless of the project's origin.
What we learned
The specific value of generative AI in healthcare is cross-signal reasoning. Rule-based systems are genuinely good at single-threshold alerts. They fail completely at reasoning across multiple individually-normal signals that together indicate a pattern. That is the specific capability gap where LLMs are irreplaceable, and it is worth building around that gap explicitly rather than using AI as a generic enhancement.
Multi-agent architecture enables specialization without fragmentation. The A2A pattern, where a generalist agent delegates to a specialist agent, is the right abstraction for clinical AI. The clinician talks to one interface. The specialist reasoning happens behind it. Neither system needs to know the other exists in detail.
Temperature is a more powerful lever than model size. Temperature 0.2 for reasoning steps and 0.5 for narrative generation produced more consistent and clinically appropriate outputs than switching between GPT-4o and GPT-4o-mini for the same task. Deterministic reasoning, expressive narrative.
Open standards create real optionality. Because the server speaks MCP and FHIR R4 with no vendor-specific code, the entire FHIR backend can be swapped by changing one environment variable. We exercised this in production when we switched from Prompt Opinion's workspace FHIR to HAPI public sandbox mid-build. The rest of the system did not change.
Graceful degradation is non-negotiable in healthcare data pipelines. Every FHIR resource fetch must handle 404s, empty arrays, malformed codes, and missing dates without crashing. Every LLM call must have a retry policy and a fallback response. In a clinical context, a system that crashes on missing data is worse than a system that returns a partial result with a clear note about what was missing.
What's next for ChronoCare
Live EHR integration. The architecture already supports any FHIR R4 endpoint. The immediate next step is testing against Epic's sandbox environment and Cerner's developer program to validate that the normalizer handles real EHR data edge cases correctly.
Specialty-specific reasoning modules. The current pipeline is designed for general medicine with a focus on cardiometabolic conditions. Oncology, psychiatry, and pediatrics each warrant dedicated reasoning modules with domain-specific turning points, guideline sets, and deterioration patterns.
Streaming brief output. Streaming each section as it completes, narrative first, then early warning, then recommendations, would make the latency feel significantly shorter even if wall-clock time is unchanged.
Longitudinal monitoring. A scheduled version that runs every 24 hours per patient, diffs the output against the previous brief, and surfaces only what changed would serve inpatient monitoring teams differently and more powerfully than the current on-demand model.
Federated deployment. A hospital system could deploy ChronoCare inside their own infrastructure, pointed at their internal FHIR server, with no patient data ever leaving their network. The open-source codebase makes self-hosting possible today. The Prompt Opinion marketplace listing makes discovery easy.
Fine-tuned clinical reasoning model. The current system uses general-purpose LLMs with carefully engineered prompts. A model fine-tuned on clinical reasoning tasks with verified outputs would improve consistency and reduce the prompt engineering burden significantly. This is the long-term research direction.
Log in or sign up for Devpost to join the conversation.