ChronoCare
From scattered data to clinical clarity in seconds.
Inspiration
Every clinician knows the feeling. A patient arrives in the ER. Five years of data sits in the EHR across dozens of visits, three different providers, hundreds of lab results. The chart has the answer. Nobody has time to read the chart.
Electronic health records are excellent at answering one question: what is happening to this patient right now? Today's vitals. Today's labs. Today's medications. The snapshot is detailed and immediate.
But they systematically miss two questions that matter just as much:
What has been happening? A patient's creatinine drifts from 0.9 to 1.3 mg/dL over 36 months. Each individual result is flagged normal, inside the reference range, no alert fires. Three different clinicians saw three different snapshots. None of them saw the slope. The trajectory. The cascade in progress.
What is coming? Hypertension diagnosed in 2019 became Stage 2 CKD by 2022, but no system flagged the cardiometabolic cascade developing across those three years. A 12-minute appointment does not leave time to read hundreds of records. The chart has the answer. Nobody has time to read the chart.
Diagnostic errors are the third leading cause of death in the United States. Handoff failures cause 80% of serious medical errors. These are not technology failures. They are information architecture failures. The data exists. The connections do not get made.
As a student studying AI and data science, this felt like exactly the kind of problem worth building toward. Not a toy demo. A real system, on real standards, solving a real gap.
What it does
ChronoCare is a clinical reasoning engine that fills the two missing tenses of patient care. It answers four questions no EHR answers today, in a single conversation, in under 15 seconds.
1. What happened to this patient? ChronoCare fetches a patient's complete FHIR record, normalizes every condition, lab, medication, and encounter into a unified chronological timeline, identifies the 3 to 5 clinical turning points that changed the patient's trajectory, and generates a 200 to 300 word clinical narrative written the way an experienced physician would brief an incoming colleague.
2. What is quietly going wrong right now? The silent deterioration detector analyzes recent signals holistically across vitals, labs, and clinical notes together. Blood pressure at 138/88 is technically normal. Creatinine at 1.3 is technically normal. Fatigue mentioned twice in recent notes is easy to miss. Together, they tell a different story. ChronoCare reasons across weak signals the way an experienced clinician would, and flags the pattern before any individual alarm fires.
3. Why did this happen? The root cause analyzer correlates events across the patient's timeline, identifies plausible causal pairs with confidence levels and clinical rationale, and synthesizes a causal hypothesis explaining how the patient arrived at their current state.
4. What should we do next? Recommendations are generated as 3 to 5 patient-specific actions cross-checked against ADA, JNC, KDIGO, and ACC/AHA guidelines. Every recommendation cites specific findings from this patient's data. No generic advice.
The output is a structured clinical brief covering all four layers: narrative, early warning risk score, causal hypothesis, comorbidity map, guideline gaps, and prioritized recommendations. The full pipeline runs in 14 to 16 seconds on a warm server.
ChronoCare is accessible through two surfaces. A custom React web app at sriksven.github.io/chronocare where anyone can run a full analysis on four synthetic demo patients. And a ChronoCore A2A agent on the Prompt Opinion platform where clinicians can request a patient analysis in plain language inside their existing workspace.
How we built it
ChronoCare is four replaceable layers held together by open standards.
Data layer
All patient data is synthetic. Four demo patients were generated programmatically using hand-crafted FHIR R4 transaction bundles covering distinct clinical archetypes: a 62-year-old male with hypertension progressing to CKD, a 58-year-old female with Type 2 diabetes and early diabetic nephropathy, a 71-year-old male with CHF and cardiorenal syndrome, and a 45-year-old female as a control case with prediabetes caught early. All four patients live on the HAPI public FHIR R4 sandbox. No PHI anywhere in the system.
Reasoning server
The MCP server is Python 3.11 on Railway, built with Starlette and Uvicorn, exposing 14 tools over Streamable HTTP. The pipeline uses a fan-out / fan-in topology where independent reasoning branches run concurrently via asyncio.gather and asyncio.to_thread. This dropped wall-clock latency from a sequential 25 to 35 seconds to a parallel 14 to 16 seconds on warm runs.
Eight LLM calls run per pipeline, routed across two providers based on task type:
| Task | Model | Reason |
|---|---|---|
| Clinical narrative, weak pattern analysis, causal hypothesis, recommendations | GPT-4o | Prose quality and deep reasoning |
| Turning points, warning report, comorbidity map | Llama-3.3-70b via Groq | Fast structured JSON, lower cost |
| Event correlation, guideline matching | GPT-4o-mini | Accuracy without GPT-4o cost |
| Voice synthesis | OpenAI TTS-1 | Natural prosody for hands-free use |
Multi-provider routing also provides partial-failure resilience. A transient OpenAI outage does not kill the four Groq-backed tools.
FHIR resources are normalized using local lookup tables for 37 LOINC codes, 32 ICD-10 codes, and 30 RxNorm codes, converting raw clinical codes into human-readable descriptions before they reach any LLM. Patient IDs are SHA-256 hashed before logging so they never appear in server logs.
Agent layer
ChronoCore is an A2A agent configured on Prompt Opinion with GPT-4.1 as its model. Its system prompt encodes a 13-step reasoning protocol that calls the MCP tools in deliberate order, building context at each step. The FHIR context extension injects workspace credentials into every MCP request automatically.
Frontend
The React app was built with Vite, TypeScript, Tailwind CSS, and Framer Motion. It has four pages: a landing page, a plain-English explainer of how it works, a live demo runner with animated pipeline trace, and an admin panel for uploading new FHIR bundles. The frontend calls the backend directly via a public /api/demo/analyze endpoint, so no authentication is required to run a demo analysis.
Infrastructure
Two independent GitHub Actions pipelines handle CI/CD. The backend pipeline runs 42 pytest tests and three live smoke checks on every push to main. The frontend pipeline runs TypeScript checks, Vite builds, and deploys to GitHub Pages automatically.
Challenges we ran into
Prompt Opinion OAuth credentials. The original plan used Prompt Opinion's workspace FHIR server as the data source. We could not generate valid OAuth client credentials in time. The pivot to HAPI public R4 sandbox required rewriting the FHIR client's auth logic to treat the token as optional, updating the context middleware to fall back gracefully to environment variables, and regenerating all four demo patient bundles against the new endpoint. This cost roughly six hours.
Parallelizing the reasoning pipeline. The initial implementation ran all 13 tools sequentially, producing correct results but at 25 to 35 seconds of wall-clock latency. The fan-out topology required careful analysis of data dependencies between tools to identify which branches could run concurrently. The trace recorder also needed a mutex guard to prevent concurrent branches from racing when appending entries to the shared trace list.
Prompt engineering for clinical specificity. Early versions of the narrative and recommendation prompts produced outputs that sounded clinical but cited no specific data. The prompts went through multiple iterations before reliably producing outputs that reference specific dates, specific lab values, and specific medication changes from the patient's actual record. The key was an explicit anti-hallucination instruction in every reasoning prompt: "Only reference findings present in the data provided."
Learning FHIR from scratch. FHIR R4 is a comprehensive standard and learning it during a build sprint was genuinely challenging. Handling missing codes, malformed dates, empty resource arrays, and inconsistent reference formats gracefully took more time than any other single component and is probably the most underestimated part of any real healthcare integration project.
Accomplishments that we are proud of
A production system, not a demo. ChronoCare is fully live. The MCP server and the ChronoCore A2A agent are published to the Prompt Opinion Marketplace and publicly installable. The React frontend is deployed on GitHub Pages. All four patients are analyzable right now at the live URL.
The parallel pipeline topology. The fan-out / fan-in design that runs five concurrent reasoning branches via asyncio is the piece of engineering we are most proud of. It required understanding the exact data dependencies between all 13 tools and is what makes 14 to 16 second warm latency possible despite 8 LLM calls.
Multi-model routing that makes sense. The GPT-4o and Groq Llama-3.3-70b split is not arbitrary. GPT-4o handles the tasks where output quality directly affects clinical credibility. Llama-3.3-70b handles structured JSON extraction tasks where speed matters more than prose quality. The full pipeline costs approximately $0.09 per analysis.
Four clinical archetypes that tell different stories. John Doe demonstrates a slow cardiometabolic cascade. Maria Rodriguez shows diabetes management with emerging nephropathy. Robert Chen presents complex cardiorenal syndrome. Sarah Williams is a control case showing ChronoCare does not over-flag healthy patients.
42 passing tests and two CI/CD pipelines. As a student project, shipping with full test coverage and automated deployment felt like the right way to demonstrate that production-grade engineering matters regardless of the context.
What we learned
The specific value of AI in healthcare is cross-signal reasoning. Rule-based systems are genuinely good at single-threshold alerts. Where they fail completely is in reasoning across multiple individually-normal signals that together indicate a pattern. That is the specific gap where LLMs add irreplaceable value in clinical settings, and it is where we focused the most engineering effort.
FHIR is both the right standard and a genuinely difficult one. The R4 spec is comprehensive but real-world data quality on even a reference implementation like HAPI varies significantly. Graceful degradation matters enormously in healthcare data pipelines.
Prompt temperature matters more than model choice for reliability. Using temperature 0.2 for reasoning steps and 0.5 for narrative generation made a larger difference to output consistency than switching between model sizes. Deterministic reasoning steps cite the same data points on repeated runs. Variable narrative steps do not sound like templates.
Open standards make architecture genuinely replaceable. Because the reasoning server speaks MCP and FHIR R4 with no vendor-specific code, the FHIR backend can be swapped from HAPI public to any Epic or Cerner endpoint by changing one environment variable. We actually exercised this when we switched FHIR backends mid-build.
Shipping a real system is a different skill from building a prototype. CI/CD, structured logging, graceful error handling, cold start mitigation, and documentation are not extras. They are what separates something that works once in a demo from something that works reliably when a judge clicks the link.
What's next for ChronoCare
Live EHR integration. The architecture already supports any FHIR R4 endpoint. The next step is testing against Epic's sandbox and Cerner's developer program to validate that the normalizer handles real EHR data edge cases correctly.
Specialty-specific reasoning modules. The current pipeline is designed for general medicine with a focus on cardiometabolic conditions. Oncology, psychiatry, and pediatrics each warrant dedicated reasoning modules with domain-specific turning points and guideline sets.
Streaming brief output. Streaming each section as it completes, narrative first, then early warning, then recommendations, would make the latency feel much shorter even if the wall-clock time is the same.
Longitudinal tracking. A scheduled version that runs every 24 hours per patient, diffs the output against the previous brief, and surfaces only what changed would be a materially different product for inpatient monitoring teams.
Federated deployment. A hospital system could deploy ChronoCare inside their own infrastructure, pointed at their internal FHIR server, with no patient data ever leaving their network. The open-source codebase makes self-hosting possible today.
Log in or sign up for Devpost to join the conversation.