Inspiration

ClaimPilot is grounded in field research, not a hackathon brainstorm.

As an MDes candidate at San José State University working on a thesis about Medicaid billing transparency, I interviewed five expert informants across the Medicaid claims pipeline:

  • A Gainwell MMIS specialist who builds state Medicaid systems
  • A CMS quality officer working on regulatory frameworks
  • A Medi-Cal engineering director leading state FHIR adoption
  • A community managed care CEO focused on health equity
  • A frontline optometrist with 30 years of practice

The optometrist said this:

"If we don't have time to have staff check, we are going to lose money. There is a time limit, six months. So if you dispute after, then it's too late."

That's the failure pattern. Roughly 15% of Medicaid claims are denied or rejected. Another 5% are paid less than billed and slip through unnoticed. The dispute window closes whether the clinic catches the error or not. Community health clinics serving dual enrolled Medicare and Medi-Cal patients lose millions every year to denials they don't have time to fight.

The strategic insight from this research: state Medicaid claims systems (MMIS) are operationally impossible to replace. Innovation can only enter at the FHIR exposure layer that states are standing up now. That insight became ClaimPilot.

What it does

ClaimPilot is a FHIR aware MCP server with 7 specialized tools that help billing coordinators at community health clinics resolve Medicaid denials and detect silent underpayments.

The 7 tools:

  1. triage_denial_code — rule based denial code classifier (CO-22, OA-23, and others)
  2. analyze_claim_denial — FHIR grounded denial analysis with reasoning chain
  3. detect_cob_sequencing_error — coordination of benefits detection for dual enrollees, citing 42 CFR 433.139
  4. generate_appeal_letter — formal payer appeal with real regulatory citations
  5. generate_resubmission_checklist — owner attributed action plan for biller, payer, and provider
  6. explain_denial_to_patient — patient facing letter in English, Spanish, Vietnamese, Tagalog, or Chinese at sixth grade reading level
  7. detect_underpayment_risk — variance analysis to catch silent underpayments before the dispute window closes

What makes it different from a generic LLM:

  • Reads real FHIR resources (Coverage, Claim, ExplanationOfBenefit, Patient) rather than flat summary text
  • Cites specific resource IDs as evidence and refuses to invent data
  • Generates appeal letters with real federal regulations: 42 CFR 433.139, Part 411, 424.44
  • Computes regulatory deadlines from FHIR timestamps
  • Patient communication in the five most common Medi-Cal application languages

How I built it

Stack

  • TypeScript on Node.js
  • Model Context Protocol (MCP) server framework
  • FHIR R4 with synthetic patient bundles
  • Anthropic API (Claude) for AI reasoning
  • Railway for production deployment
  • Postgres for state

Architecture

ClaimPilot operates at the FHIR exposure layer between billing software and state Medicaid endpoints. Each tool accepts a claimId and resolves the relevant FHIR context internally (the SHARP pattern). The AI reasons over the actual FHIR resources, not over flat summary text. This is what makes the citations defensible and prevents hallucination.

I built six synthetic patient bundles for the demo, modeling real failure patterns from my research:

  • Maria Gonzalez (dual COB sequencing error, demo flagship)
  • Jerome Washington (diagnosis mismatch denial)
  • Anh Nguyen (clerical error denial)
  • Darius Okafor (pre denial review case)
  • Rosa Delgado (second COB sequencing case)
  • Carlos Reyes (silent underpayment, 13% variance from billed amount)

The system handles missing or partial FHIR data, computes regulatory deadlines from Coverage.period.start and similar timestamps, and produces structured output that downstream agents can consume.

Challenges I ran into

FHIR bundle imports. Prompt Opinion's FHIR server reassigns resource IDs on import, which broke my initial assumption that claimId would route to the right bundle. Solution was a hybrid approach with synthetic bundle precomputation and live fetch fallback.

Tool schema design for free tier LLMs. The default agent runtime had trouble serializing nested objects between chained tool calls. I refactored all 7 tools to accept claimId only inputs and resolve nested FHIR context internally. This made the agent reliable.

Regulatory accuracy without legal training. Citing 42 CFR 433.139 by name is one thing. Making sure the regulation actually applies to the specific FHIR claim pattern is harder. I cross referenced every citation against the actual regulatory text and validated against my CMS quality officer informant.

Translation at controlled reading level. Generating Spanish, Vietnamese, Tagalog, and Chinese patient letters at sixth grade reading level required iterating on prompts to keep medical and billing terms accessible without losing accuracy or warmth.

Hackathon timeline vs thesis depth. Wanting to ship something substantial in a week while also keeping it defensible against eventual thesis review.

Accomplishments that I'm proud of

  • It actually works in production. ClaimPilot is live in the Prompt Opinion Marketplace and processes real tool calls end to end.
  • FHIR grounded reasoning, not buzzword FHIR. The tools read actual FHIR resources, cite their IDs, and refuse to invent data.
  • Real regulatory citations. 42 CFR 433.139, Part 411, 424.44 are not placeholders. The appeal letters cite them correctly because the AI is grounded in the actual claim's FHIR context.
  • Multilingual patient communication. Patient explanation letters in five languages at sixth grade reading level is a serious equity feature for the populations that Medicaid serves.
  • Research grounding. Every product decision traces back to something a specific informant said about how Medicaid billing actually works in practice.

What I learned

  • FHIR R4 in depth. Coverage sequencing, EOB adjustment codes, Provenance, and the semantics of dual enrollment.
  • MCP server architecture. Tool design patterns, context propagation, and the SHARP pattern for FHIR.
  • The reality of state Medicaid systems. MMIS is much more entrenched than the policy literature suggests. That is exactly why the FHIR exposure layer matters so much for any innovation to land.
  • Constraints of agent runtimes. Schema design has to account for what the runtime can actually serialize, not just what is semantically clean.

What's next for ClaimPilot

  • Production integration with state Medicaid FHIR endpoints (California, New York, Massachusetts leading the way)
  • Batch claim analysis across patient panels for proactive denial detection
  • Calibrated confidence scoring trained on real appeal outcomes
  • FHIR Provenance audit trails for HIPAA grade deployment
  • Pilot partnerships with FQHCs and community health managed care plans

The thesis work continues. ClaimPilot is the speculative intervention that emerged from the research, but the research itself argues for a broader rethinking of how billing transparency could work for the patients most likely to fall through the system's cracks.

Built With

Share this project:

Updates