DenialGPT: The AI Copilot That Turns Denials Into Decisions

Inspiration

Every year, US hospitals lose $262 billion to insurance claim denials. 85% of those denials are preventable. Yet today, a medical biller's only defense is manually reading 40-page payer policy PDFs before every single claim — a process that is slow, error-prone, and completely unscalable.

We kept asking the same question: why does this problem still exist when we have LLMs that can read policy documents, reason over clinical records, and surface exactly what is missing?

The answer was that nobody had connected those pieces into a workflow that fits how billing teams actually operate. That is what DenialGPT is.

What It Does

DenialGPT is a healthcare AI agent that intervenes at the two most critical moments in the claims lifecycle — currently scoped to orthopedic procedures with Aetna as the primary payer, with architecture designed to expand to any specialty by adding policy documents.

Phase 1 — Pre-Submission Prevention

Before a claim leaves the building, DenialGPT checks it against CMS coverage policies (LCD/NCD documents) and known payer denial patterns. It surfaces risk flags like missing prior authorizations, diagnosis-procedure mismatches, and insufficient conservative therapy documentation — before the payer ever sees the claim.

Phase 2 — Post-Denial Gap Analysis

When a denial comes back, DenialGPT reads the denial reason, pulls the patient's clinical records from FHIR, and performs a structured gap analysis:

$$\text{Appeal Viability} = f(\text{Evidence Required} - \text{Evidence Found})$$

The output is one of three verdicts:

Verdict	Meaning
STRONG	Evidence exists in FHIR. File the appeal now with these documents.
WEAK	Specific evidence is missing. Get it first, then appeal.
DO NOT APPEAL	The denial is clinically correct. A write-off memo is generated.

What Makes Us Different

The healthcare AI agent space today is largely focused on two workflows: appeal letter generation — automatically drafting the letter a biller sends to contest a denial — and prior authorization automation — streamlining the process of requesting payer approval before a procedure.

Both of these are valuable. But they address the problem after it has already become expensive. Appeal letter generation assumes you already have a denial and the documentation to fight it. Prior auth automation assumes you know you need auth in the first place.

DenialGPT operates earlier and deeper in the workflow:

Prevention before submission — we check the claim against CMS policy and payer patterns before it goes out, catching prior auth requirements, diagnosis-procedure mismatches, and documentation gaps at the point where fixing them costs nothing
Evidence reasoning, not letter writing — instead of generating an appeal letter, we tell the biller whether the evidence to win that appeal actually exists in the clinical record. A well-written letter built on missing documentation still loses.
The write-off justification memo — when a denial is clinically correct and should not be appealed, DenialGPT generates a structured memo the biller hands to their manager for sign-off. This is the only artifact in this space that documents why a denial should be accepted, saving teams from wasting effort on unwinnable appeals.
Root cause categorization — every denial is classified not just by type but by the workflow failure that caused it, with a prevention pointer that feeds back into Phase 1. Over time, every denial makes the prevention system smarter.

How We Built It

DenialGPT is built on the Google ADK framework and deployed as an external A2A agent on the Prompt Opinion platform.

Architecture

Prompt Opinion Platform
        │
        ▼
DenialGPT A2A Agent (Google ADK)
        │
        ├── check_claim_policy (Prevention)
        │       ├── Policy KB (ChromaDB + CMS LCD/NCD embeddings)
        │       ├── Voyage AI voyage-3 embeddings
        │       └── PAYER_PATTERNS intelligence lookup
        │
        ├── analyze_denial (Classification + Root Cause)
        │       └── root_cause: PROCESS_FAILURE | DOCUMENTATION_GAP
        │                       CODING_ERROR | CLINICAL_CRITERIA_UNMET
        │
        ├── fetch_clinical_evidence (FHIR via SHARP)
        │       └── Condition, Observation, Procedure,
        │           MedicationRequest, DocumentReference
        │
        └── gap_analysis (AI Core)
                ├── Chain-of-thought evidence reasoning
                ├── Payer pattern intelligence injection
                └── Write-off justification memo (conditional)

Policy Knowledge Base

We embedded CMS LCD and NCD documents for orthopedic procedures into ChromaDB using Voyage AI voyage-3 embeddings. The retrieval query is enriched with procedure context:

$$\text{query} = \text{CPT} + \text{ICD-10} + \text{Payer} + \text{Procedure Description} + \text{Coverage Criteria Keywords}$$

Chunks scoring below a minimum relevance threshold are filtered out, ensuring Claude only reasons over clinically relevant policy text.

Root Cause Categorization

Every denial is classified not just by type but by cause — the workflow failure that produced it:

$$\text{root_cause} \in { \text{PROCESS_FAILURE},\ \text{DOCUMENTATION_GAP},\ \text{CODING_ERROR},\ \text{CLINICAL_CRITERIA_UNMET} }$$

The prevention field in each root cause points back to Phase 1 — creating a closed feedback loop where every denial teaches the prevention system something new.

Payer Pattern Intelligence

A structured lookup table (PAYER_PATTERNS) stores known denial rates and winning appeal evidence for specific payer + CPT + ICD-10 combinations. For example:

$$P(\text{denial} \mid \text{Aetna},\ \text{CPT 73721},\ \text{M17.11}) = 68\%$$

$$P(\text{appeal win} \mid \text{CARC 50},\ \text{PT notes} + \text{physician statement}) = 41\%$$

This intelligence surfaces in both the prevention check (before submission) and the gap analysis (after denial) — giving billers data-driven context at every decision point.

Challenges We Faced

1. Embedding pipeline retrieval quality Initial retrieval returned administrative boilerplate (revision history, AHA copyright notices) instead of clinical criteria. We fixed this by switching from PDF parsing to clean .txt extracts of the clinical sections only, and adding a minimum relevance score filter post-retrieval.

2. Vector space consistency The embedding model used at ingestion (voyage-3-lite) did not match the model used at query time (voyage-3), causing all similarity scores to be negative. We standardized both pipeline stages to voyage-3 and re-embedded from scratch.

What We Learned

Prevention beats reaction. The most valuable thing DenialGPT does is not analyze denials — it is stopping them before they happen. The check_claim_policy tool pays for itself on the first prior auth catch.
The DO NOT APPEAL verdict is underrated. Revenue cycle teams waste enormous effort appealing denials they cannot win. Telling a biller to stop — and handing them a memo that explains why — saves more money than winning a marginal appeal.
RAG quality is everything. A prevention check is only as good as the policy chunks it reasons over. Getting clean, relevant text into the knowledge base mattered more than any prompt engineering.
Open standards unlock real deployment. FHIR + SHARP + A2A means DenialGPT can connect to any EHR system without custom integration work. That is not a nice-to-have; it is what makes this deployable in a real hospital today.

What's Next

Expand beyond orthopedics to cardiology and behavioral health
Add real-time payer policy monitoring — flag when a covered procedure becomes subject to new prior auth requirements
Build a denial pattern dashboard — aggregate root causes across hundreds of claims to surface systemic billing workflow failures
Connect directly to EHR pre-submission workflows so prevention checks run automatically before any claim is queued

Built With

a2a
ai
anthropic
chromadb
claude
claude-sonnet-4-5
cms
context
data
database
deployment
development
document
fhir
google
llamaindex
llm
mcp
ncd
opinion
platform
processing
prompt
protocols
pydantic
python
python-dotenv
r4
render
sharp
sources
standards
vector
voyage
voyage-3

Updates

Harshitha M started this project — May 11, 2026 10:38 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.