Red Team MD

Red Team MD

Every diagnosis needs an opponent.

🚨 The Problem

$$\text{40,000 - 80,000 Americans die each year from diagnostic error}$$

Not from wrong medications. Not from surgical mistakes. From a cognitive bias called premature closure — the moment a clinician's brain locks onto an explanation and stops looking.

The 34-year-old woman whose palpitations and weight loss get labeled anxiety for three months before someone checks her thyroid.
The 58-year-old man whose shortness of breath gets attributed to deconditioning while his BNP climbs quietly in the chart.
The chest pain sent home as musculoskeletal that comes back by ambulance.

These are not rare edge cases. They are weekly occurrences in every health system in the country.

Every tool we build for clinicians is designed to support their thinking — to confirm, to summarize, to assist. Nobody built the opponent.

💡 Inspiration

When I looked at what AI agents could uniquely do that rule-based software cannot, the answer was clear: AI can argue. It can hold a diagnosis up to the light and find every crack. It can read a chart the way a malpractice attorney reads a chart — looking for what doesn't fit.

That's Red Team MD.

🔬 What It Does

Red Team MD is an adversarial clinical reasoning agent that challenges working diagnoses before they cause harm.

A clinician — or any referring agent in the Prompt Opinion ecosystem — provides a patient's FHIR context and a working diagnosis. Red Team MD does not validate it. It attacks it.

The agent performs a four-part adversarial analysis:

Part 1 — Contradictions

Red Team MD scans the patient's full FHIR record — labs, vitals, medications, encounter notes, problem list — and surfaces every data point that does not fit the working diagnosis.

Abnormal values that point elsewhere
Normal values that should be abnormal if the diagnosis were correct
Symptoms documented in notes that are unexplained by the working assessment

Every finding is cited with specific values and dates. No hallucinations. No generalizations.

Part 2 — Evidence-Anchored Differential

Based only on contradictions found in the actual chart, the agent generates a ranked differential diagnosis. Every alternative is anchored to real patient data. Every entry includes the specific missing workup that would confirm or rule it out.

Signal Strength	Meaning
STRONG SIGNAL	Multiple chart findings support this alternative
POSSIBLE	Some evidence present, needs workup
WEAK BUT WORTH RULING OUT	Low signal but high-stakes miss

Part 3 — Premature Closure Audit

Red Team MD audits the diagnostic process itself — not just the data.

How quickly was the diagnosis assigned after first presentation?
Was minimum required workup completed before the diagnosis was anchored?
Were any symptoms documented but never reconciled with the working assessment?
Did any note express clinical uncertainty that was later dropped from the record?

This is the layer no rule-based system can reach.

Part 4 — The Verdict

The agent issues one of three verdicts:

🔴 DIAGNOSIS UNDER FIRE — Multiple high-severity contradictions exist. Do not proceed without addressing these findings. Specific urgent workup required.

🟡 PROCEED WITH CAUTION — Diagnosis is plausible but gaps exist. Specific follow-up required within a defined timeframe.

🟢 DIAGNOSIS HOLDS — We attempted to break this diagnosis and could not. The working assessment is well-supported by available evidence.

⚙️ How We Built It

Red Team MD is built entirely on the Prompt Opinion platform as an A2A-compliant agent using SHARP Extension Specs for FHIR context propagation.

Core Architecture

Patient FHIR Context (via SHARP headers)
        ↓
  Red Team MD Agent
        ↓
┌──────────────────────────────┐
│  Part 1: Contradiction Scan  │
│  Part 2: Differential Dx     │
│  Part 3: Closure Audit       │
│  Part 4: Verdict             │
└──────────────────────────────┘
        ↓
  Clinician Review

The core insight driving the architecture is that adversarial reasoning requires adversarial prompting. Standard AI systems are trained to be helpful and agreeable — which is exactly the wrong disposition for catching diagnostic error.

The system prompt is engineered specifically to override that tendency, giving the model explicit permission and instruction to dissent, contradict, and challenge.

FHIR data flows through the SHARP context layer automatically, giving the agent access to the complete patient record — labs with reference ranges and dates, vitals trends, encounter history, active medications, and clinical notes — without any custom data pipeline.

The Agent's Core Directive

You are Red Team MD — an adversarial clinical reasoning engine.

Your ONLY job is to challenge the working diagnosis.
You are NOT a supportive assistant.
You are a devil's advocate hired to find every reason
the current diagnosis might be WRONG or INCOMPLETE.

RULES YOU MUST NEVER BREAK:
- Never validate or agree with the working diagnosis.
- Never hallucinate findings. If not in the chart, say
  "not documented" — that absence may be significant.
- Always cite specific values with dates.
- If a finding supports both diagnoses, flag as AMBIGUOUS.

🧱 Challenges We Ran Into

Hallucination in the differential. Early versions generated plausible-sounding alternatives with no anchoring in the patient's chart — which is exactly the failure mode you cannot have in a clinical safety tool.

Solution: A hard rule enforced in the system prompt: every differential entry must cite specific chart evidence. "Not documented" became a legitimate output rather than a gap to be filled with inference.

Calibrating the verdict system. An agent that issues 🔴 on every case will be ignored within a week.

Solution: The 🟢 verdict — "we tried to break this and couldn't" — is as important as the red flag. It gives clinicians confidence when the workup is actually solid.

🏆 Accomplishments We're Proud Of

The premature closure audit is the feature we're most proud of. Every other diagnostic support tool focuses on the data — what labs are abnormal, what conditions are present.

Red Team MD is the first agent we're aware of that audits the diagnostic process itself: how fast the diagnosis was anchored, whether the workup preceded or followed the label, whether documented uncertainty was later erased.

This is the difference between reviewing a chart and reading a chart the way a clinician's conscience should.

📚 What We Learned

Adversarial AI is an underexplored design pattern in healthcare. The entire field has been focused on AI that assists, summarizes, and confirms. There is enormous untapped value in AI that deliberately challenges, contradicts, and stress-tests.

The absence of a finding is a finding. "Not documented" appears throughout Red Team MD's outputs — and in many cases, the missing workup is more clinically significant than any abnormal value.

Composability matters. The most powerful version of this agent is not the standalone version — it's the version called automatically by every other agent before a diagnosis gets acted upon.

🚀 What's Next

Red Team MD is designed to become a standard safety layer in any clinical AI workflow — the agent that every other agent calls before a diagnosis proceeds.

Discharge integration — no patient leaves with an unchallenged diagnosis
Specialty modes — cardiology, oncology, and pediatrics red teaming with domain-specific contradiction libraries
Longitudinal tracking — flagging cases where the same diagnosis has been applied across multiple visits without adequate workup evolution
Outcome feedback loop — tracking cases where Red Team MD issued 🔴 to measure real-world diagnostic correction rates

🛠️ Built With

Prompt Opinion Platform · A2A Protocol · SHARP Extension Specs · FHIR R4 · GEMINI · MCP

⚠️ Red Team MD is an adversarial analysis tool for physician review only. All findings require clinical judgment before action. This is not a substitute for medical advice.

Built With

a2a
fhir
gemini
mcp
promptopinion

Updates

Jossue Amador started this project — May 07, 2026 03:53 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.