🩺 Inspiration

Every year, insurers run a quiet math problem on millions of patients.

In 2024, they denied 85 million in-network claims on the ACA marketplace. Fewer than 1% were ever appealed. And when patients actually fight back? 81.7% of prior authorization denials get overturned.

Let that sink in. The fight works. Most people just never fight.

"The denial is never the last word legally. For most people, it is the last word practically."

Why? Because insurers have entire teams of lawyers, AI models, and clinical reviewers designing the denial. Patients have a letter, a deadline, and hold music. That asymmetry is by design. It keeps denials profitable.

We built Overturn to flip the table.


⚡ What it does

Overturn is an autonomous agent swarm that fights insurance denials for you.

You drop in a denial letter. Sixty seconds later, a certified-mail appeal is in the USPS pipeline. Real tracking number. Real medical guideline cited. Real legal precedent cited. Real clinical harm spelled out. All while you were reading this sentence.

🏥 The flow

Step What happens
📨 Read Extract ICD-10, CPT codes, denial reason from the letter
🩻 Triage Score clinical urgency. Pick the right legal framework
🧠 Research Pull medical guidelines + legal precedents in parallel
⚠️ Harm Model Translate denial into clinical consequences, not dollar amounts
✍️ Draft Formal appeal citing both kinds of authority
📬 File Lob returns a certified mail tracking number

Patient taps approve. Overturn does the rest.


🎯 Why this is healthcare, not legaltech

Every other "appeal generator" on the internet stops at "here's a draft letter, good luck." They treat denials like paperwork problems. Denials are medical problems.

Three pieces make Overturn a healthcare project:

🧪 Clinical Guideline Grounding

The medical brain.

Retrieval over real medical literature. Not summaries. Not paraphrases. The actual guidelines:

  • 📘 ACR Appropriateness Criteria (American College of Radiology)
  • 🧠 AAN Guidelines (American Academy of Neurology)
  • 🎗️ NCCN Guidelines (National Comprehensive Cancer Network)
  • 🛡️ USPSTF Recommendations (U.S. Preventive Services Task Force)

Every appeal we draft cites real medical authority. Not just case law.

💊 Care Impact Framing

Medical reasoning.

The agent argues patient harm, not patient finances. A denied MRI isn't "$4,237 in out-of-pocket costs." It's:

"Estimated 6 to 14 week diagnostic delay. Opioid exposure extends with each week. Conservative management already failed at 12 weeks per the patient's PT records."

That's the argument that wins. Not the dollar figure.

🚨 Clinical Urgency Triage

Medical decision-making.

Not every denial is equal. An elective MRI is not an insulin denial. Overturn treats them differently:

Tier What it looks like What the agent does
🟢 Routine Elective imaging, PT sessions Standard appeal flow
🟡 Urgent Suspected disc herniation, chronic pain Appeal + surface alternative providers
🔴 Emergent Insulin, chemo, oxygen Crisis pathway: emergency fill laws, parallel filing, acts in minutes

An insulin denial isn't a routine denial with a faster timer. It's a medical emergency. Overturn knows the difference.


📖 One real case, start to finish

Meet Maria. 67 years old. Chronic low back pain for two years. Her doctor suspects a disc herniation and orders an MRI.

Anthem denies: "not medically necessary."

Maria, like 99% of patients, doesn't appeal. The opioid prescription continues. The disc gets worse. She waits.

Now imagine Maria had Overturn. Here is what happens in 3 seconds:

🩺 Triage: urgent. Estimated 6 to 14 week diagnostic delay. Opioid exposure extends.

📘 Guideline: ACR Appropriateness Criteria rates MRI lumbar 8 of 9 for this exact presentation.

⚖️ Precedent: 11 prior California IMR overturns of the identical Anthem denial on record.

✍️ Draft: formal appeal generated, both citations inline, clinical harm framed.

Filed certified mail. Tracking number returned.

Care Continuity Score: 82 / 100

We don't optimize for appeal wins. We optimize for continuity of care.


🏗️ How we built it

Six specialized TypeScript services. Each on its own port. Each with persistent state. All coordinating through Redis pub/sub with shared state in Postgres + pgvector.

Not one LLM wearing six hats. A real swarm.

🔧 The agents

Agent Job
🕵️ Watcher Ingests Gmail, USPS Informed Delivery, Knot card transactions
🎯 Triage K2 Think V2 extracts codes, scores urgency, picks legal framework
🧠 Dual Researcher pgvector search across legal + clinical collections in parallel
✍️ Drafter Structured output forces both citation slots to be filled
📬 Care Pathway + Submitter Routes by urgency. Files via Lob, Phaxio, or OpenClaw
Escalator Periodic sweep. Wakes on deadlines. Drafts external reviews

⚙️ The stack

Layer Tech
Runtime Bun 1.x
Language TypeScript strict + zod
HTTP Hono
Database Postgres 17 + pgvector
Events Redis pub/sub
LLM Core K2 Think V2 (MBZUAI 70B reasoning model)
LLM Backup Claude Sonnet 4.6
Mail API Lob (certified mail with tracking)
Portals OpenClaw (Eragon)
iMessage Spectrum (Photon)
Transactions Knot (TransactionLink)
Infra Dedalus Machines (one per agent, persistent)
Quality SonarQube Cloud
Velocity Enter.pro credits

📚 The data that makes it work

Legal precedents: California's DMHC Independent Medical Review dataset. Every IMR decision publicly reported since 2001. We filtered to medical necessity overturns and embedded the findings text into pgvector. When the Researcher retrieves a precedent, it's real.

Clinical guidelines: four different medical bodies in four completely different formats:

  • 🩻 ACR publishes appropriateness ratings on a 1-to-9 scale.
  • 🧠 AAN uses narrative evidence summaries.
  • 🎗️ NCCN builds decision trees and flowcharts.
  • 🛡️ USPSTF assigns letter grades A, B, C, D, I.

Normalizing all four into one pgvector collection with consistent metadata for cross-condition retrieval was one of the harder problems we solved. Worth it. Every appeal now reaches across all four bodies simultaneously.

📏 Repo discipline

We wrote these rules on hour one and held them for the full 36:

  • ❌ No classes (except Error)
  • ❌ No any types
  • ❌ No default exports
  • ❌ No barrel files
  • ✅ All LLM calls go through shared/llm.ts
  • ✅ All events go through shared/events.ts
  • ✅ All env validation through shared/config.ts with zod
  • ✅ Tests live next to source
  • ✅ Conventional commits only

The payoff: the swarm feels like a system, not a hack. Adding the sixth agent took 2 hours. The first one took 8.


🔥 Challenges we ran into

🎯 Saying no. Nine sponsor tracks were within reach. On Saturday afternoon we cut Regeneron's clinical trials integration even though the scaffold existed. Shipping it half-wired would have weakened the core demo. Hard call. Right call. The cuts are a feature, not a failure.

🗃️ Seeding real precedents. The demo moment where the Researcher returns a real legal citation only works if real citations exist. We used California DMHC's public IMR dataset, filtered to medical necessity overturns, chunked the findings text, embedded into pgvector. Getting the top-1 retrieval to actually match took multiple rounds of query-side prompt tuning. Retrieval quality isn't glamorous, but it is the product.

🧬 Four formats, one retrieval layer. ACR, AAN, NCCN, USPSTF each publish in ways that do not speak to each other. Appropriateness ratings vs narrative evidence vs decision trees vs letter grades. We ended up building a normalization layer that made all four query-compatible while preserving the original format in metadata. Not sexy. Essential.

⚖️ Forcing dual-source drafting. Our first Drafter prompts leaned heavily on whichever chunk came back first. If legal won, the appeal sounded like a lawyer. If clinical won, it sounded like a radiologist. Neither was right. The fix: structured output with explicit clinical-authority and legal-authority slots so K2 literally cannot generate an appeal unless both are filled.

🚨 Urgency tier calibration. Insulin = emergent. Elective imaging = routine. Easy. But what about a medication for a controlled but progressive condition? What about pain management on day 90? These are clinical ethics questions disguised as engineering problems. We iterated until the tier boundaries were defensible.

🤖 Portal automation edge cases. OpenClaw works beautifully on our test portal. Real insurer portals have CAPTCHAs, session timeouts, and anti-bot measures we'd fight in production. We scoped portal submission to one known-cooperative flow and kept Lob certified mail as the primary channel. Scope discipline over demo flashiness.


🏆 Accomplishments we're proud of

  • 📮 A real certified mail tracking number, live on stage, in under 2 seconds. Not a draft. Not a summary. A filed appeal.
  • 🧠 Dual-source retrieval across 4 clinical guideline bodies + 1 legal precedent database. Every appeal cites both. No other tool does this.
  • 🚦 Urgency triage that actually changes behavior. The agent makes clinical decisions, not just paperwork decisions.
  • 💊 Care impact framing in clinical language, not financial. The medical argument is the strong argument.
  • 🕸️ Six genuinely coordinating agents on Dedalus Machines. Not one LLM pretending to be six. The Escalator already runs a periodic sweep with real endpoints.
  • 🚀 Full portability. bash scripts/setup.sh spins up the entire stack on macOS, Linux, or WSL2. Zero host dependencies.

🎓 What we learned

  1. Legal reasoning is clinical reasoning in disguise. Every strong health insurance appeal argues medicine and law together. We started building a letter drafter and realized we were building a two-headed expert system.

  2. Retrieval quality beats model size. K2 Think V2 is a strong reasoning model. But appeal quality was gated by whether the Researcher pulled the right precedent, not by raw LLM intelligence. The data is the product.

  3. Agent orchestration is 80% plumbing. Once clean event schemas existed, adding a new agent took 2 hours. The first took 8. Invest in infrastructure early and it pays you back every hour after.

  4. Urgency tiers are ethical decisions. Calling a denial "emergent" vs "urgent" changes what the agent does next. We learned to be conservative on boundaries and explicit about the clinical reasoning behind each tier.

  5. Real action beats pretty demos. A Lob tracking number returned live beats any slide deck. Judges stop thinking "cool prototype" and start thinking "this actually works."

  6. Saying no is a feature. We had 9 sponsor tracks available. We picked the ones our core build actually touched. The project is stronger for what we cut.


🚀 What's next for Overturn

Feature Why it matters
🧬 Clinical trial matching When care is denied, match patients to active trials via ClinicalTrials.gov
🏥 Medicare Advantage + Medicaid Highest prior-auth denial volume in the country
🧑‍⚕️ Provider-side flow Physicians spend 13 hrs/week on PA paperwork (AMA 2024). We can give it back
Longer-horizon escalation Framework-specific 30/60/180-day deadlines + external review drafting
🗺️ Expanded precedent DB NY DFS, Maryland IRO, CMS Medicare Appeals Council, specialty guidelines (AHA, ADA, APA)
👪 Caregiver trust model Spectrum already puts us in iMessage with adult children. Formalize caregiver permissions next

Every patient deserves a lawyer AND a doctor in their corner.

Overturn is both.


📚 References

[1] Kaiser Family Foundation. Claims Denials and Appeals in ACA Marketplace Plans in 2024. Published March 24, 2026.

https://www.kff.org/patient-consumer-protections/claims-denials-and-appeals-in-aca-marketplace-plans-in-2024/

[2] Kaiser Family Foundation. Medicare Advantage Insurers Made Nearly 53 Million Prior Authorization Determinations in 2024. Published January 28, 2026.

https://www.kff.org/medicare/medicare-advantage-insurers-made-nearly-53-million-prior-authorization-determinations-in-2024/

[3] American Medical Association. 2024 Prior Authorization Physician Survey (1,000 practicing physicians, December 2024).

https://www.ama-assn.org/system/files/prior-authorization-survey.pdf

Built With

Share this project:

Updates