Inspiration

Every year, 12 million Americans are misdiagnosed. The leading cause isn't incompetence it's anchoring bias.

Once a diagnosis is written in a patient's chart, every subsequent clinician anchors to it. Nobody systematically asks: "What else could this be?" A GP writes "viral infection." The next doctor sees "viral infection" and moves on. Three weeks later the patient is back with bacterial endocarditis.

We asked ourselves: what if the EHR could argue with itself?

Standard clinical AI tools summarise notes. ClinicalLens challenges them. It reads the patient's complete FHIR record and produces a structured Devil's Advocate report alternative diagnoses, treatment gaps, missing investigations, and red flags grounded entirely in the patient's real data.


What it does

ClinicalLens is a multi-agent second opinion engine integrated into the Prompt Opinion platform.

You select a patient, type "Run a ClinicalLens report", and within seconds you receive:

Section What you get
🔴 Alternative Diagnoses 3 diagnoses the patient data could also support, ranked by likelihood with evidence citations from the actual record
🟡 Treatment Plan Gaps Conditions with no medication, medications absent for active conditions, drug-condition mismatches
🔵 Missing Investigations Tests that clinical guidelines require but are absent from the record
⚠️ Red Flags Dangerous drug combinations, critically abnormal untreated values
Summary Verdict The single most important thing a reviewing clinician should act on today

Every finding is grounded in the patient's actual FHIR data not generic advice, not hallucination. If the patient has HbA1c of 9.2% and no HbA1c check in 6 months, ClinicalLens says exactly that, and explains why it matters.


How we built it

ClinicalLens is a three-agent system built on Google ADK using the A2A v1 protocol, deployed on Google Cloud Run and integrated with the Prompt Opinion multi-agent healthcare platform.

Architecture:

Prompt Opinion (Judge UI)
        │  A2A + FHIR context
        ▼
ClinicalLens Orchestrator  ─── Groq Llama 3.3 70B
        │
        ├──► Clinical Context Agent  ─── Groq Llama 3.3 70B
        │         └── FHIR R4 tools: Patient, Conditions,
        │                            Medications, Observations,
        │                            Documents, Allergies
        │
        └──► Advocate Agent  ─── Groq Llama 3.3 70B
                  └── Pure clinical reasoning over the
                      structured summary — no FHIR access

Clinical Context Agent pulls all six FHIR resource types from the patient's Prompt Opinion workspace demographics, active conditions, medications, allergies, vitals, lab results, and clinical documents and synthesises a structured clinical brief.

Advocate Agent receives the brief and performs pure clinical reasoning: generating alternative diagnoses ranked by evidence, identifying treatment gaps against clinical guidelines, flagging missing investigations, and surfacing dangerous combinations. It has no FHIR access it reasons from the summary text, which means its analysis is reproducible and auditable.

Orchestrator coordinates both agents, injects FHIR credentials securely via A2A metadata, and composes the final formatted report.

Tech stack:

  • Agent framework: Google ADK
  • Agent protocol: A2A v1 (JSON-RPC)
  • Healthcare platform: Prompt Opinion
  • Data standard: FHIR R4
  • Model: Groq Llama 3.3 70B (all agents)
  • Deployment: Google Cloud Run + Docker
  • CI/CD: GitHub Actions

Challenges we ran into

FHIR context routing. The A2A v1 specification requires the fhir-context extension URI in the agent card to exactly match the Prompt Opinion platform's schema URL. When the URI pointed to localhost (the default fallback), Prompt Opinion treated FHIR as optional and silently stripped the patient credentials before sending requests. The agent received messages with no metadata and returned empty responses. Fixing this required understanding how Prompt Opinion reads agent cards and ensuring PO_PLATFORM_BASE_URL was always set correctly in production.

Rate limits on multi-agent pipelines. A single ClinicalLens report triggers 5–6 consecutive LLM calls through the orchestration chain. The Gemini free tier (5 requests/minute) was exhausted on the first real request. We switched to Groq Llama 3.3 70B - 6,000 requests/day on the free tier which eliminated rate limiting entirely and improved response speed.

Keeping clinical reasoning grounded. The Advocate Agent's most common failure mode was producing generic medical advice instead of patient-specific insights. We iterated the system prompt extensively to enforce citation from specific data points: not "consider hypertension" but "systolic BP 158/94 on a single antihypertensive with CKD Stage 3 guideline-recommended dual therapy is absent."


Accomplishments that we're proud of

  • End-to-end FHIR-grounded reasoning across 6 resource types in a single agent interaction
  • Zero hallucination design the Advocate Agent can only reference what the Clinical Context Agent retrieved; it has no internet access or LLM memory to draw from
  • Multi-turn depth judges can ask follow-up questions ("What diagnoses were missed?", "What investigations should be ordered?") and get contextually accurate responses across the full conversation
  • Production deployment on Google Cloud Run with Docker, running live during the judging period

What we learned

The hardest part of clinical AI is not the AI it's the trust architecture. A tool that says "consider heart failure" is useless if the clinician can't verify why. ClinicalLens is built around a simple principle: every insight must be traceable to a specific data point in the patient's record. This constraint made the system harder to build but dramatically more useful in practice.

We also learned that A2A v1 agent cards are not just documentation they're the mechanism by which the platform discovers and configures your agent's capabilities. Getting the FHIR extension URI exactly right was the difference between a working integration and a broken one.


What's next for ClinicalLens A Second Opinion, Instantly

  • Differential diagnosis scoring rank alternative diagnoses with a confidence score derived from the number and specificity of supporting data points
  • Guideline integration ground gap detection in specific clinical guidelines (ACC/AHA for cardiology, ADA for diabetes) rather than general LLM knowledge
  • Time-series analysis track how the Devil's Advocate report changes across multiple visits to surface slow-developing conditions
  • MCP server layer expose ClinicalLens as an MCP server so it can be called from any A2A orchestrator, not just Prompt Opinion
  • EHR direct integration connect to Epic and Cerner FHIR endpoints directly, without requiring the Prompt Opinion workspace as an intermediary

Built With

  • a2a-v1-(json-rpc)
  • fhir-r4
  • github-actions
  • google-adk
  • google-cloud-run+docker
  • groq-llama-3.3-70b(all-agents)
  • prompt-opinion
Share this project:

Updates