Inspiration

This came directly out of my experience building healthcare agents. The same gap kept showing up at every clinic and every project: the patient's data is there — labs, wearables, EHRs, genetics, medications — but it's spread across a dozen vendors that all name the same thing differently and use different units. So the clinician spends the visit translating data instead of practicing medicine, and any agent built on top of that mess is forced to either guess or refuse to answer. It's one of the biggest unsolved gaps in clinical AI right now.

The gap hits hardest where it matters most — in preventative, proactive care. Longevity, functional, and concierge medicine are the cleanest example: the whole point is to catch disease decades before it shows up by reading across labs, wearables, and genetics together. But that only works if the data layer actually delivers a unified view, and right now it doesn't. So clinicians end up doing intake at the speed of paperwork, the proactive signal gets lost, and the patient — who's paying out of pocket for the clinician's attention — gets a visit that feels reactive.

So this is where I built Longevity Copilot for Prompt Opinion's Agents Assemble — The Healthcare AI Endgame challenge: a multi-agent system that fixes the data layer first, so the reasoning layer can actually be trusted, and proactive care can run at the speed of conversation.

What it does

A longevity, functional, or concierge clinician opens the patient and asks one question in plain English. Behind the scenes, Longevity Copilot:

  1. Pulls normalized data through the MCP server — biomarkers from 18 lab vendors, streams from 55 wearables, records from 19 EHRs, all mapped to FHIR R4 + LOINC + UCUM.
  2. Routes the question through the Orchestrator agent (A2A v1) to the right specialists — Lab Interpreter, Pattern Detective, Genomic Counselor, Drug-Supplement Check, Wearable Correlator, Trend Analyst, Clinical Educator, Report Generator, ED-Copilot.
  3. Runs deterministic clinical math on the data plane — HOMA-IR, ACC/AHA ASCVD, CKD-EPI 2021, FIB-4, FINDRISC, BMI/BSA — with literature citations built in.
  4. Synthesizes one brief that names the dominant pattern, lists findings by body system with color-coded verdicts, ties in genetics and wearable correlation, and proposes a three-protocol plan with doses and a 90-day follow-up panel.
  5. Renders a real PDF through the Report Generator. Six pages, IBM Carbon–styled, customizable in plain English: drop a section, change the brand color, rebrand for the practice, shorten for a same-day visit — and the next brief comes back exactly the way you asked.
  6. Writes back to the chart as a FHIR DiagnosticReport with the PDF attached, plus a derived Observation for any calculator output.

Every number on screen comes from the deterministic math engine. Every sentence comes from Claude Sonnet 4.6. Zero hallucinated numbers.

How I built it

Two engines, on purpose. From day one I split the system into two engines that talk to each other but never overlap:

  • Deterministic engine — a FastAPI MCP server hosted on Render. Owns ingestion, normalization, units, reference ranges, calculators, FHIR R4 read/write, drug-interaction lookups, and PDF rendering. 28 tools across 4 layers (reads, normalizers, calculators, writes).
  • Narrative engine — Claude Sonnet 4.6, surfaced as Prompt Opinion BYO Agents and a single Orchestrator agent, communicating agent-to-agent via A2A v1.

The agents never compute a value themselves. They call a tool, the tool returns a value, and the agents reason on top of it. If a required input is missing from the chart, the agent says exactly which input is missing instead of guessing.

The normalization pipeline is the moat. A new lab vendor or wearable doesn't break the agents — it just needs an entry in the LOINC + UCUM mapping table. normalize_biomarker resolves "GLU" from LabCorp and "Fasting Blood Sugar" from Boston Heart and "Glucose, fasting" from Quest to the same LOINC code, in mg/dL, with the active reference range and a verdict (outside optimal / suboptimal / within reference). The agents reason over normalized observations, not vendor records — which is why the system stays stable as new vendors are added.

Calculators with citations. Every calculator is a small, audited function. calc_homa_ir returns {value, formula, citation: "Matthews 1985"}. calc_ascvd_10yr runs the ACC/AHA pooled cohort equations with the Goff 2014 citation. The result is something a clinician can defend.

FHIR write-back. The brief is not just a PDF — it's a DiagnosticReport written back to the patient chart with the PDF as the presentedForm. Verified end-to-end against the public HAPI FHIR R4 sandbox. Closes the loop.

The PDF, designed. reportlab with custom IBM Carbon–design page templates. Six pages, color-coded verdict pills (Carbon red / amber / green), structured protocol cards, a customization log in the appendix. The whole template is steerable through a customization block — accent_color, include_sections, length, clinician_name, practice_name, brand_name. The clinician steers it in plain English; the orchestrator translates that to a structured payload on the next render. The clinician never sees a parameter name — only conversation.

Challenges I ran into

  • Vendor chaos. Eighteen lab vendors really do have eighteen different names and unit conventions for the same biomarker. The first thing I built was the LOINC + UCUM mapping table; everything else got faster after that.
  • Hallucinated numbers were the cardinal sin. I had to design the system so the agents can't make a number up even if they want to. The two-engine split + the "say what's missing" rule together solved this.
  • A2A parallel-dispatch races. Early on the orchestrator would fan out to four specialists in parallel and hit a rate-limit race. I capped parallel dispatch at two and chained anything bigger sequentially.
  • PDF design. Default reportlab output looks like a 1998 lab report. I rebuilt the template against the IBM Carbon design system — verdict pills, system-grouped tables, a real cover band with title block on the band — and kept it under six pages.
  • Tight customization without exposing internals. The clinician should never see accent_color: "#0F62FE" in chat. The orchestrator now parses plain English ("rebrand for Atlas Longevity," "drop the wearable section") into a structured customization payload behind the scenes.

Accomplishments that I'm proud of

  • Trustable math. Every number in every brief is reproducible from a published formula. Calculators ship with literature citations.
  • Real read + write FHIR. Not a mock. The system reads from and writes to a live HAPI FHIR R4 sandbox; server-assigned IDs come back and are logged.
  • A real deliverable. The clinician gets a downloadable, multi-page, brand-customizable PDF — not a chat transcript pretending to be a report.
  • Plain-English customization. Each practice can rebrand the deliverable through conversation. No template editor. No engineer in the loop.
  • End-to-end inside Prompt Opinion. Every agent is a Prompt Opinion BYO agent, the Orchestrator is a Prompt Opinion Orchestrator agent, and the MCP is registered in Prompt Opinion's MCP Servers panel. The whole product runs natively on the platform.

What I learned

  • The agentic-vs-algorithmic split is the architecture. When you separate "narrative synthesis" from "deterministic math" cleanly, every other design decision gets easier.
  • Normalization is the moat. Anyone can ask an LLM to read a lab report. Almost nobody is doing the LOINC + UCUM groundwork that makes the LLM's read actually portable across vendors.
  • Multi-agent only beats single-agent when the orchestrator is disciplined. A good orchestrator is a router and a synthesizer, not a generalist that re-does the specialists' work.
  • Customizability is a product feature, not a code feature. Letting the clinician rebrand in plain English changes the product category from "AI tool" to "their practice's tool."

What's next for Longevity Copilot

  • Proactive monitoring. Subscribe to wearable webhooks so the agents push briefs the moment HRV drifts, sleep regresses, or glucose excursions cross thresholds — turning the visit cycle from reactive to proactive without adding clinician work.
  • Practice memory. Customizations become saved templates per clinic. The orchestrator learns each clinician's defaults and applies them automatically.
  • Outcomes tracking. When 90-day recheck values come back, the orchestrator computes the delta and surfaces "protocol working / not working" without being asked.
  • Multi-tenant control plane. Single-tenant today; next is per-clinic FHIR endpoints, customization presets, and PHI segmentation at the data layer.
  • Pediatric and obstetric branches. Same architecture, different reference ranges and calculators.

Longevity Copilot is built for licensed-clinician review. Every brief carries a scope-of-practice statement and is not intended for direct patient distribution without clinician sign-off. The system does not diagnose, prescribe, or auto-sign orders. All values are normalized against published reference ranges; all calculators run on published equations with literature citations.

Built With

  • a2a
  • boston-heart
  • claude
  • claude-sonnet-4.6
  • clinical-decision-support
  • epic-fhir
  • fhir
  • fhir-r4
  • functional-medicine
  • hapi
  • healthcare-ai
  • interoperability
  • json
  • labcorp
  • loinc
  • longevity-medicine
  • markdown
  • mcp
  • oauth2
  • oura
  • po
  • prompt-opinion
  • quest-diagnostics
  • sharp
  • smart-health-it
  • whoop
Share this project:

Updates