Inspiration

Every year, insurance companies deny 93 million prior authorization requests in the United States. Of those, fewer than 1% are ever appealed — not because the cases are weak, but because fighting back takes a physician 4 hours of manual chart review and medical-legal writing. A solo practitioner in a rural clinic simply cannot afford that time. The result is a healthcare inequity hidden in plain sight: wealthy hospital systems with dedicated appeals teams win; small clinics and their patients lose.

We built DenialFighter because the appeal process shouldn't require a law degree and half a workday. The evidence is already in the patient's chart. The guidelines already exist. The letter has been written thousands of times before. What was missing was a system that could connect those dots in seconds instead of hours.

What We Learned

Building a healthcare AI agent taught us three things we didn't expect:

  1. Persona specificity drives output quality dramatically. When Agent 2 was prompted as "a board-certified oncology pharmacist with 15 years of NCCN guideline experience," the evidence it selected was categorically different — and more clinically defensible — than when prompted generically as "a medical AI assistant."

  2. Speed compounds across sequential LLM calls. Our pipeline makes three LLM calls in sequence. A 3× speedup per call becomes a 27× speedup overall. Choosing Groq's LPU inference over standard GPU-based APIs was the single highest-leverage architectural decision we made.

  3. FHIR is the unlock for real healthcare AI. Every piece of evidence in our appeal letters — lab results, medication history, diagnosis codes, prior treatment failures — comes directly from structured FHIR resources. This eliminates hallucination at the evidence layer, which is the layer that gets an appeal approved or rejected.

How We Built It

DenialFighter is a three-agent pipeline built around a Model Context Protocol (MCP) FHIR tool server:

The FHIR Layer — A FastAPI server exposes five FHIR R4 tools (patient summary, active medications, conditions, diagnostic reports, medication history) published to the Prompt Opinion Marketplace. When called from Prompt Opinion, SHARP context headers inject the patient ID and FHIR server URL per-request, making the same server work standalone or inside the Prompt Opinion ecosystem.

Agent 1 — Denial Intake reads the raw denial letter and outputs structured JSON: denied drug, HCPCS code, denial reason codes (MED_NECESSITY, STEP_THERAPY, MISSING_DOCS), appeal deadline, and urgency level. Temperature 0 — we need determinism here.

Agent 2 — Evidence Matching receives the denial JSON and the patient's complete FHIR chart, then returns a medical necessity score (0–100), an appeal strength classification, and itemized evidence mapped to each denial reason — including NCCN guideline citations and step therapy proof from medication history. Every evidence item is FHIR-sourced or guideline-cited. No confabulation.

Agent 3 — Appeal Drafting receives the denial data and evidence summary, then writes a 600–750 word appeal letter across eight mandatory sections (clinical background, medical necessity argument, step therapy proof, guideline citations, formal request). The persona: a board-certified physician who is also a healthcare attorney with a 73% reversal rate.

The whole pipeline runs asynchronously behind a polling API (/run-appealjob_id/appeal-status), with a React + Vite frontend showing live step-by-step progress.

Total time from denial letter to submission-ready appeal packet: ~90 seconds.

Challenges We Faced

FHIR data modeling for legal arguments. Insurance appeals require proving step therapy: you must show the patient tried cheaper alternatives first, they failed, and why they failed. FHIR MedicationRequest resources record what was prescribed but not always why it stopped. We built logic to infer failure reasons from status codes, date gaps, and subsequent escalations — and taught Agent 2 to be explicit about the strength of this inference in the output.

Keeping evidence grounded. LLMs hallucinate. In healthcare legal documents, a hallucinated lab value or fabricated guideline citation isn't just wrong — it's grounds for appeal rejection and potentially a liability. We enforced a strict rule in Agent 2's system prompt: cite FHIR resource IDs or named NCCN guidelines only. We then verified in testing that the model respected this constraint across varied denial scenarios.

The sequential latency problem. Three LLM calls in sequence means latency adds up. We benchmarked GPT-4o, Claude Haiku, and Groq LLaMA 3.3-70b. Groq was 8–12× faster per call with comparable output quality for structured extraction and prose drafting. That difference made the product viable — 90 seconds feels instant; 8 minutes feels broken.

Making SHARP context transparent. The Prompt Opinion SHARP extension injects context via HTTP headers. We needed the MCP server to work identically whether called directly (with explicit parameters) or via Prompt Opinion (with headers). The middleware layer that resolves context source at runtime — without changing any agent code — took several iterations to get right.

Built With

  • a2a-protocol
  • anthropic-sdk
  • fastapi
  • fhir-r4-(hl7)
  • gemini
  • groq
  • hapi-fhir-server
  • httpx
  • llama-3.3-70b
  • model-context-protocol-(mcp)
  • prompt-opinion-marketplace
  • pydantic
  • python
  • railway
  • react
  • sharp-extension-specs
  • vite
Share this project:

Updates