Red Team MD
Every diagnosis needs an opponent.
๐จ The Problem
$$\text{40,000 - 80,000 Americans die each year from diagnostic error}$$
Not from wrong medications. Not from surgical mistakes. From a cognitive bias called premature closure โ the moment a clinician's brain locks onto an explanation and stops looking.
- The 34-year-old woman whose palpitations and weight loss get labeled anxiety for three months before someone checks her thyroid.
- The 58-year-old man whose shortness of breath gets attributed to deconditioning while his BNP climbs quietly in the chart.
- The chest pain sent home as musculoskeletal that comes back by ambulance.
These are not rare edge cases. They are weekly occurrences in every health system in the country.
Every tool we build for clinicians is designed to support their thinking โ to confirm, to summarize, to assist. Nobody built the opponent.
๐ก Inspiration
When I looked at what AI agents could uniquely do that rule-based software cannot, the answer was clear: AI can argue. It can hold a diagnosis up to the light and find every crack. It can read a chart the way a malpractice attorney reads a chart โ looking for what doesn't fit.
That's Red Team MD.
๐ฌ What It Does
Red Team MD is an adversarial clinical reasoning agent that challenges working diagnoses before they cause harm.
A clinician โ or any referring agent in the Prompt Opinion ecosystem โ provides a patient's FHIR context and a working diagnosis. Red Team MD does not validate it. It attacks it.
The agent performs a four-part adversarial analysis:
Part 1 โ Contradictions
Red Team MD scans the patient's full FHIR record โ labs, vitals, medications, encounter notes, problem list โ and surfaces every data point that does not fit the working diagnosis.
- Abnormal values that point elsewhere
- Normal values that should be abnormal if the diagnosis were correct
- Symptoms documented in notes that are unexplained by the working assessment
Every finding is cited with specific values and dates. No hallucinations. No generalizations.
Part 2 โ Evidence-Anchored Differential
Based only on contradictions found in the actual chart, the agent generates a ranked differential diagnosis. Every alternative is anchored to real patient data. Every entry includes the specific missing workup that would confirm or rule it out.
| Signal Strength | Meaning |
|---|---|
| STRONG SIGNAL | Multiple chart findings support this alternative |
| POSSIBLE | Some evidence present, needs workup |
| WEAK BUT WORTH RULING OUT | Low signal but high-stakes miss |
Part 3 โ Premature Closure Audit
Red Team MD audits the diagnostic process itself โ not just the data.
- How quickly was the diagnosis assigned after first presentation?
- Was minimum required workup completed before the diagnosis was anchored?
- Were any symptoms documented but never reconciled with the working assessment?
- Did any note express clinical uncertainty that was later dropped from the record?
This is the layer no rule-based system can reach.
Part 4 โ The Verdict
The agent issues one of three verdicts:
๐ด DIAGNOSIS UNDER FIRE โ Multiple high-severity contradictions exist. Do not proceed without addressing these findings. Specific urgent workup required.
๐ก PROCEED WITH CAUTION โ Diagnosis is plausible but gaps exist. Specific follow-up required within a defined timeframe.
๐ข DIAGNOSIS HOLDS โ We attempted to break this diagnosis and could not. The working assessment is well-supported by available evidence.
โ๏ธ How We Built It
Red Team MD is built entirely on the Prompt Opinion platform as an A2A-compliant agent using SHARP Extension Specs for FHIR context propagation.
Core Architecture
Patient FHIR Context (via SHARP headers)
โ
Red Team MD Agent
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Part 1: Contradiction Scan โ
โ Part 2: Differential Dx โ
โ Part 3: Closure Audit โ
โ Part 4: Verdict โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
Clinician Review
The core insight driving the architecture is that adversarial reasoning requires adversarial prompting. Standard AI systems are trained to be helpful and agreeable โ which is exactly the wrong disposition for catching diagnostic error.
The system prompt is engineered specifically to override that tendency, giving the model explicit permission and instruction to dissent, contradict, and challenge.
FHIR data flows through the SHARP context layer automatically, giving the agent access to the complete patient record โ labs with reference ranges and dates, vitals trends, encounter history, active medications, and clinical notes โ without any custom data pipeline.
The Agent's Core Directive
You are Red Team MD โ an adversarial clinical reasoning engine.
Your ONLY job is to challenge the working diagnosis.
You are NOT a supportive assistant.
You are a devil's advocate hired to find every reason
the current diagnosis might be WRONG or INCOMPLETE.
RULES YOU MUST NEVER BREAK:
- Never validate or agree with the working diagnosis.
- Never hallucinate findings. If not in the chart, say
"not documented" โ that absence may be significant.
- Always cite specific values with dates.
- If a finding supports both diagnoses, flag as AMBIGUOUS.
๐งฑ Challenges We Ran Into
Hallucination in the differential. Early versions generated plausible-sounding alternatives with no anchoring in the patient's chart โ which is exactly the failure mode you cannot have in a clinical safety tool.
Solution: A hard rule enforced in the system prompt: every differential entry must cite specific chart evidence. "Not documented" became a legitimate output rather than a gap to be filled with inference.
Calibrating the verdict system. An agent that issues ๐ด on every case will be ignored within a week.
Solution: The ๐ข verdict โ "we tried to break this and couldn't" โ is as important as the red flag. It gives clinicians confidence when the workup is actually solid.
๐ Accomplishments We're Proud Of
The premature closure audit is the feature we're most proud of. Every other diagnostic support tool focuses on the data โ what labs are abnormal, what conditions are present.
Red Team MD is the first agent we're aware of that audits the diagnostic process itself: how fast the diagnosis was anchored, whether the workup preceded or followed the label, whether documented uncertainty was later erased.
This is the difference between reviewing a chart and reading a chart the way a clinician's conscience should.
๐ What We Learned
Adversarial AI is an underexplored design pattern in healthcare. The entire field has been focused on AI that assists, summarizes, and confirms. There is enormous untapped value in AI that deliberately challenges, contradicts, and stress-tests.
The absence of a finding is a finding. "Not documented" appears throughout Red Team MD's outputs โ and in many cases, the missing workup is more clinically significant than any abnormal value.
Composability matters. The most powerful version of this agent is not the standalone version โ it's the version called automatically by every other agent before a diagnosis gets acted upon.
๐ What's Next
Red Team MD is designed to become a standard safety layer in any clinical AI workflow โ the agent that every other agent calls before a diagnosis proceeds.
- Discharge integration โ no patient leaves with an unchallenged diagnosis
- Specialty modes โ cardiology, oncology, and pediatrics red teaming with domain-specific contradiction libraries
- Longitudinal tracking โ flagging cases where the same diagnosis has been applied across multiple visits without adequate workup evolution
- Outcome feedback loop โ tracking cases where Red Team MD issued ๐ด to measure real-world diagnostic correction rates
๐ ๏ธ Built With
Prompt Opinion Platform ยท A2A Protocol ยท SHARP Extension Specs ยท FHIR R4 ยท GEMINI ยท MCP
โ ๏ธ Red Team MD is an adversarial analysis tool for physician review only. All findings require clinical judgment before action. This is not a substitute for medical advice.
Log in or sign up for Devpost to join the conversation.