DenialGPT: The AI Copilot That Turns Denials Into Decisions
Inspiration
Every year, US hospitals lose $262 billion to insurance claim denials. 85% of those denials are preventable. Yet today, a medical biller's only defense is manually reading 40-page payer policy PDFs before every single claim — a process that is slow, error-prone, and completely unscalable.
We kept asking the same question: why does this problem still exist when we have LLMs that can read policy documents, reason over clinical records, and surface exactly what is missing?
The answer was that nobody had connected those pieces into a workflow that fits how billing teams actually operate. That is what DenialGPT is.
What It Does
DenialGPT is a healthcare AI agent that intervenes at the two most critical moments in the claims lifecycle — currently scoped to orthopedic procedures with Aetna as the primary payer, with architecture designed to expand to any specialty by adding policy documents.
Phase 1 — Pre-Submission Prevention
Before a claim leaves the building, DenialGPT checks it against CMS coverage policies (LCD/NCD documents) and known payer denial patterns. It surfaces risk flags like missing prior authorizations, diagnosis-procedure mismatches, and insufficient conservative therapy documentation — before the payer ever sees the claim.
Phase 2 — Post-Denial Gap Analysis
When a denial comes back, DenialGPT reads the denial reason, pulls the patient's clinical records from FHIR, and performs a structured gap analysis:
$$\text{Appeal Viability} = f(\text{Evidence Required} - \text{Evidence Found})$$
The output is one of three verdicts:
| Verdict | Meaning |
|---|---|
| STRONG | Evidence exists in FHIR. File the appeal now with these documents. |
| WEAK | Specific evidence is missing. Get it first, then appeal. |
| DO NOT APPEAL | The denial is clinically correct. A write-off memo is generated. |
What Makes Us Different
The healthcare AI agent space today is largely focused on two workflows: appeal letter generation — automatically drafting the letter a biller sends to contest a denial — and prior authorization automation — streamlining the process of requesting payer approval before a procedure.
Both of these are valuable. But they address the problem after it has already become expensive. Appeal letter generation assumes you already have a denial and the documentation to fight it. Prior auth automation assumes you know you need auth in the first place.
DenialGPT operates earlier and deeper in the workflow:
- Prevention before submission — we check the claim against CMS policy and payer patterns before it goes out, catching prior auth requirements, diagnosis-procedure mismatches, and documentation gaps at the point where fixing them costs nothing
- Evidence reasoning, not letter writing — instead of generating an appeal letter, we tell the biller whether the evidence to win that appeal actually exists in the clinical record. A well-written letter built on missing documentation still loses.
- The write-off justification memo — when a denial is clinically correct and should not be appealed, DenialGPT generates a structured memo the biller hands to their manager for sign-off. This is the only artifact in this space that documents why a denial should be accepted, saving teams from wasting effort on unwinnable appeals.
- Root cause categorization — every denial is classified not just by type but by the workflow failure that caused it, with a prevention pointer that feeds back into Phase 1. Over time, every denial makes the prevention system smarter.
How We Built It
DenialGPT is built on the Google ADK framework and deployed as an external A2A agent on the Prompt Opinion platform.
Architecture
Prompt Opinion Platform
│
▼
DenialGPT A2A Agent (Google ADK)
│
├── check_claim_policy (Prevention)
│ ├── Policy KB (ChromaDB + CMS LCD/NCD embeddings)
│ ├── Voyage AI voyage-3 embeddings
│ └── PAYER_PATTERNS intelligence lookup
│
├── analyze_denial (Classification + Root Cause)
│ └── root_cause: PROCESS_FAILURE | DOCUMENTATION_GAP
│ CODING_ERROR | CLINICAL_CRITERIA_UNMET
│
├── fetch_clinical_evidence (FHIR via SHARP)
│ └── Condition, Observation, Procedure,
│ MedicationRequest, DocumentReference
│
└── gap_analysis (AI Core)
├── Chain-of-thought evidence reasoning
├── Payer pattern intelligence injection
└── Write-off justification memo (conditional)
Policy Knowledge Base
We embedded CMS LCD and NCD documents for orthopedic procedures
into ChromaDB using Voyage AI voyage-3 embeddings. The retrieval
query is enriched with procedure context:
$$\text{query} = \text{CPT} + \text{ICD-10} + \text{Payer} + \text{Procedure Description} + \text{Coverage Criteria Keywords}$$
Chunks scoring below a minimum relevance threshold are filtered out, ensuring Claude only reasons over clinically relevant policy text.
Root Cause Categorization
Every denial is classified not just by type but by cause — the workflow failure that produced it:
$$\text{root_cause} \in { \text{PROCESS_FAILURE},\ \text{DOCUMENTATION_GAP},\ \text{CODING_ERROR},\ \text{CLINICAL_CRITERIA_UNMET} }$$
The prevention field in each root cause points back to Phase 1 —
creating a closed feedback loop where every denial teaches the
prevention system something new.
Payer Pattern Intelligence
A structured lookup table (PAYER_PATTERNS) stores known denial
rates and winning appeal evidence for specific payer + CPT + ICD-10
combinations. For example:
$$P(\text{denial} \mid \text{Aetna},\ \text{CPT 73721},\ \text{M17.11}) = 68\%$$
$$P(\text{appeal win} \mid \text{CARC 50},\ \text{PT notes} + \text{physician statement}) = 41\%$$
This intelligence surfaces in both the prevention check (before submission) and the gap analysis (after denial) — giving billers data-driven context at every decision point.
Challenges We Faced
1. Embedding pipeline retrieval quality
Initial retrieval returned administrative boilerplate (revision
history, AHA copyright notices) instead of clinical criteria. We
fixed this by switching from PDF parsing to clean .txt extracts
of the clinical sections only, and adding a minimum relevance
score filter post-retrieval.
2. Vector space consistency
The embedding model used at ingestion (voyage-3-lite) did not
match the model used at query time (voyage-3), causing all
similarity scores to be negative. We standardized both pipeline
stages to voyage-3 and re-embedded from scratch.
What We Learned
Prevention beats reaction. The most valuable thing DenialGPT does is not analyze denials — it is stopping them before they happen. The check_claim_policy tool pays for itself on the first prior auth catch.
The DO NOT APPEAL verdict is underrated. Revenue cycle teams waste enormous effort appealing denials they cannot win. Telling a biller to stop — and handing them a memo that explains why — saves more money than winning a marginal appeal.
RAG quality is everything. A prevention check is only as good as the policy chunks it reasons over. Getting clean, relevant text into the knowledge base mattered more than any prompt engineering.
Open standards unlock real deployment. FHIR + SHARP + A2A means DenialGPT can connect to any EHR system without custom integration work. That is not a nice-to-have; it is what makes this deployable in a real hospital today.
What's Next
- Expand beyond orthopedics to cardiology and behavioral health
- Add real-time payer policy monitoring — flag when a covered procedure becomes subject to new prior auth requirements
- Build a denial pattern dashboard — aggregate root causes across hundreds of claims to surface systemic billing workflow failures
- Connect directly to EHR pre-submission workflows so prevention checks run automatically before any claim is queued
Built With
- a2a
- ai
- anthropic
- chromadb
- claude
- claude-sonnet-4-5
- cms
- context
- data
- database
- deployment
- development
- document
- fhir
- llamaindex
- llm
- mcp
- ncd
- opinion
- platform
- processing
- prompt
- protocols
- pydantic
- python
- python-dotenv
- r4
- render
- sharp
- sources
- standards
- vector
- voyage
- voyage-3
Log in or sign up for Devpost to join the conversation.