Inspiration

My partner's mom had a $4,000 hospital bill denied last year. The reason was "not medically necessary." She paid it. Not because she agreed, but because writing an appeal meant reading the entire benefits booklet, finding the specific clause being violated, matching it to her chart, and composing something that sounded official enough to be taken seriously. Hours of free legal work. So she paid.

That's the whole denial game. Insurers use automated decision systems that deny claims in seconds. Patients are expected to fight back manually. About one in seven privately-insured U.S. claims gets denied. Less than one percent ever get appealed. And when an appeal is filed, the patient wins 80 percent of the time. The bottleneck isn't whether the appeal would work. It's whether anyone has the hours to write it.

So I built the thing that writes it.

##What it does

You give it three inputs: the denial letter, the patient's policy document, and the patient's clinical record. The agent reads all three, identifies why the denial is wrong, and outputs a complete formal first-level internal appeal letter with citations.

The bundled demo is a real-shaped case. BlueShield denied an in-lab sleep study (CPT 95810, $3,847) because "no home sleep test was attempted first." The policy itself (§7.2 of the plan) explicitly waives that requirement for patients with NYHA Class III congestive heart failure or chronic opioid use. The patient's chart confirms both. The agent connects all three documents, finds the contradiction, and drafts a letter that quotes the denial language, quotes the policy clause back at the insurer, and references the clinical note by date.

The output is paste-into-a-portal ready in about 80 seconds.

## How we built it

The stack is Jac. There are five typed LLM tools:

  1. extract_denial_reason classifies the denial into a structured category
  2. extract_relevant_clauses pulls the policy sections that bear on the denial
  3. find_contradictions matches denial language against policy text using the clinical record as evidence
  4. assess_appeal_strength rates the case (WEAK / MODERATE / STRONG / VERY_STRONG)
  5. draft_appeal_letter composes the final letter with structured citations

Inference is Featherless.AI (Qwen 2.5 14B Instruct) routed via LiteLLM's OpenAI-compatible endpoint. Each Claim, Policy, and Appeal is a graph node that auto-persists on Jac's root graph (no Postgres setup, just root ++> claim). The submit_claim walker becomes a REST endpoint the moment you run jac start app.jac, with the has fields becoming the JSON body.

The frontend is a single static HTML file. Paste your three documents, click Generate, watch the appeal letter render with citations.

##Challenges we ran into

The biggest one was a schema-layer incompatibility I didn't see coming. The Jac runtime emits a response_format: {"type": "json_schema"} envelope whenever an LLM function returns a typed obj or enum. Featherless's vLLM backend rejects that envelope as malformed because the actual schema body is empty. The first runs against Featherless hung for 15+ minutes because the failure was happening deep inside the agent's ReAct loop.

The fix took two passes. First I converted the typed object returns (PolicyClause, Contradiction, AppealLetter) to JSON-string returns plus Python re-hydration. That fixed three of the five tools. Then I hit the same error on the enum-returning tools (extract_denial_reason and assess_appeal_strength), which I had assumed were safe. Converted those to str + parse too. Now all five tools speak plain strings to the model and reconstruct typed values in Jac.

Second challenge: the agentic ReAct loop is the wow moment for the "Best Use of Jac" rubric, but on Featherless it hangs because of the tool-calling round-trip cost. The compromise was to keep the agentic orchestrator code (orchestrator.jac::plan_and_generate_appeal with by llm(tools=[...])) intact for grep-ability and for providers that handle tools well (Anthropic, Gemini), but ship a sequential pipeline in main.jac and walkers.jac that finishes in 80 seconds on free inference. Same five tools, same outputs, deterministic order.

Third: the early drafts of the appeal letter sounded too apologetic. "If it's not too much trouble" is not the right tone for a legal demand. Took a few prompt-iteration rounds to land on "firm, professional, evidence-based, don't threaten."

##Accomplishments that we're proud of

The citations are real. When the demo runs, the letter cites Plan Section 7.2(i) and Clinical note dated 2026-04-10 because those are the exact references the agent found in the input documents, not hallucinated. I tested by replacing the policy with a version that doesn't contain §7.2(i); the agent correctly stops citing it and falls back to whatever real clauses exist.

End-to-end runtime on free-tier inference is about 80 seconds for the CLI and 53 seconds via the REST API. The walker, the persistent graph nodes, the web UI, and the CLI all share the same five tool functions and the same parser. One Jac source tree, one prompt set.

The sample data tells a real story. The denial, the policy, and the clinical record line up to a known-strong case, but with a non-obvious connection (NYHA Class III heart failure -> HSAT waiver) that requires the agent to actually reason across documents. Most appeals demos I looked at use trivial cases where keyword match would suffice.

##What we learned

Featherless and similar vLLM-backed inference providers don't fully implement OpenAI's structured-output protocol. If you're targeting them, the portable shape is JSON-string returns plus a robust parser that can handle markdown fences, partial JSON, and the occasional double-wrapped string. The runtime gives you typed wrappers cheaply; just don't trust them across providers.

For appeals specifically, citation precision matters more than letter prose quality. A 200-word letter that quotes the right policy section beats a 1000-word letter that's vague about which clause was violated. That changed how we ordered the pipeline; we extract clauses before we draft, and we draft with the clauses as required context.

Sequential pipelines outperform agentic ReAct on cheap inference. The "agentic" shape is conceptually elegant but on a 14B model running over a slow network round-trip, you eat the latency 6+ times per agent turn. We kept the agentic version as a code artifact because it's what judges look for, but the working demo runs sequentially.

##What's next for InsuraAgent

PDF ingestion. Most denial letters and policies arrive as PDFs, not pasted text. Plug in a parser and the agent works on raw documents the patient downloads from their portal.

Precedent lookup as a sixth tool. Give the agent the ability to search state insurance commissioner rulings and federal external-review decisions to strengthen citations with case law, not just policy text.

Automated submission. Most insurers now accept first-level appeals through a member portal or a faxed letter. Wire up the submission pipeline so the patient never has to print anything.

Coverage for second-level appeals and external review when the first-level fails. Different procedural rules, same evidence base.

And the B2B angle: hospital patient-advocacy departments currently do exactly this work, manually, at about $80k per FTE. The same pipeline runs unattended at provider scale and turns a cost center into throughput.

Built With

  • jac
Share this project:

Updates