Sentinel: Multi-Agent Fraud Investigation System

Inspiration

I got into this because the numbers didn't make sense to me. Medicare loses over $100 billion every year to fraud — that's not a rounding error, that's a systemic failure. The money that gets stolen is money that should be funding actual patient care. That bothered me enough to want to do something about it.

What bothered me even more was learning how investigators work today. They get a risk score from a system, and then they're on their own. No explanation, no evidence, no suggested next steps — just a number and a list of providers to manually dig through. Most flagged cases never get investigated at all because there simply aren't enough hours. I wanted to build something that did the hard part for them.

What It Does

Sentinel is a multi-agent fraud investigation system built on Jac. Five specialized agents walk a knowledge graph of Medicare claims data, each looking for a different fraud signal:

Billing Agent — flags providers billing significantly above peer average in volume or dollar amount
Collusion Agent — maps physician networks shared across multiple providers to detect coordinated fraud rings
Patient Agent — identifies beneficiaries appearing across too many providers, or claims filed after a patient's recorded death
Temporal Agent — catches impossible timelines like overlapping inpatient stays
Synthesis Agent — requires corroborating evidence from multiple agents before escalating to HIGH risk, driving precision to 98%

The output isn't a score. It's a complete case file: cited evidence, risk verdict, recommended action, and an AI assistant that can answer questions about any specific provider.

How I Built It

The Knowledge Graph

The core of Sentinel is a Jac knowledge graph with four node types: Provider, Patient, Claim, and Physician. Edges encode real billing relationships: root → Provider → Claim → Physician Patient → Claim Each agent is a Jac walker that traverses a different path through this graph. The Billing walker moves through Provider → Claim edges. The Collusion walker follows Claim → Physician edges looking for physicians shared across multiple provider networks.

Peer-Average Anomaly Detection

The Billing Agent flags providers whose claim volume or reimbursement amount deviates significantly from the peer group. For each provider, I compute two ratios:

$$r_{claims} = \frac{C_i}{\bar{C}}, \quad r_{amount} = \frac{A_i}{\bar{A}}$$

where $C_i$ is the provider's total claim count, $\bar{C}$ is the peer average, $A_i$ is total reimbursed amount, and $\bar{A}$ is the peer average amount. A provider is escalated to HIGH risk when:

$$\max(r_{claims}, \ r_{amount}) > 3.0$$

and to MEDIUM when the ratio exceeds 1.5. This peer-relative approach means the threshold adapts to the dataset — a provider billing 3x the average is suspicious whether the average is $10,000 or $10,000,000.

Collusion Ring Detection

The Collusion Agent builds a bipartite physician-provider graph during traversal. A physician $p$ is flagged when:

$$|providers(p)| \geq 2 \quad \text{and} \quad total_amount(p) > \theta$$

where $\theta$ is dynamically set to the peer average of all shared physicians. This catches coordinated networks where the same physician appears on claims from multiple providers — a strong signal of organized fraud.

Synthesis Corroboration

The Synthesis Agent requires convergent evidence before escalating to HIGH risk:

$$risk = HIGH \iff (billing_{HIGH} \land corroborated) \lor collusion_{HIGH} \lor temporal_{HIGH}$$

where corroborated means at least one other agent also flagged the provider. A single agent signal alone only reaches MEDIUM. This corroboration requirement is what drives precision to 98% on the ground truth labels.

Data Pipeline

The data comes from the real CMS Medicare Provider Fraud Detection dataset — over 500,000 claims across thousands of providers, with ground truth fraud labels. I built a Python ingestion layer to parse the CSVs and prepare graph-ready data structures for the Jac pipeline.

The frontend is React with Framer Motion and D3.js for the force-directed investigation graphs. The server is Express with SSE streaming so the UI updates in real time as each agent finishes. I also built a live upload feature — drop in your own Beneficiary, Inpatient, and Outpatient CSVs and the full pipeline runs on your data.

Accomplishments

98% precision with zero label leaking. Ground-truth fraud labels are never read during detection — agents detect fraud purely from behavioral signals like billing ratios, physician network overlaps, patient patterns, and timeline impossibilities. Labels are used only in the validation page to measure accuracy after the fact. 39 of 40 HIGH risk flags confirmed as real fraud.

Five coordinated graph walkers on one knowledge graph. Each agent traverses a different path through the same Jac graph — Billing walks Provider → Claim, Collusion walks Provider → Claim → Physician, Patient walks Patient → Claim → Provider, Temporal walks Claim → Date dimension. The Synthesis Agent requires $\geq 2$ agents to corroborate before escalating to HIGH risk:

$$risk = HIGH \iff (billing_{HIGH} \land corroborated) \lor collusion_{HIGH} \lor temporal_{HIGH}$$

This corroboration requirement is what drives precision to 98% — no single-agent signal alone can trigger a HIGH flag.

Physically impossible timelines caught. The Temporal Agent found patients admitted to two different hospitals on overlapping dates — a physical impossibility that proves at least one claim is fabricated. Patient BENE11494 was simultaneously at PRV51501 (Sept 5-15) and PRV51590 (Sept 9-10). No statistical model catches this. Graph traversal with date arithmetic does.

Real-time investigation assistant. Every case file has an AI assistant that knows the full investigation context. Ask "what should I investigate first?" and it responds with specific physician IDs, patient IDs, dollar amounts, and recommended next steps — not generic advice, cited evidence from the actual case.

Live upload pipeline. Drop your own Medicare CSVs — Beneficiary, Inpatient, Outpatient — and the full 5-agent pipeline runs on your data with SSE progress streaming. The system is not locked to one dataset. It investigates whatever claims data you give it.

$50.6M in estimated fraud exposure identified across 200 CMS Medicare providers, with 40 HIGH risk case files, 8 physician collusion rings, and 4 impossible timeline violations — all from agents walking a knowledge graph, not from hardcoded rules or manual thresholds.

Built solo in 4 days. One person, five agents, one knowledge graph, one interactive dashboard. The entire detection pipeline — graph schema, walkers, byllm reasoning, synthesis corroboration — is native Jac.

Challenges

The biggest technical challenge was performance. The Jac pipeline running on the full dataset took over 10 minutes — completely unusable for a live demo. I profiled the bottleneck down to synchronous flush calls through subprocess pipes — 1,380 print statements each blocking for ~300ms. Removing unnecessary flush calls and batching output brought total processing time from 10 minutes to under 2 minutes.

The harder challenge was making the results trustworthy rather than just impressive. Early versions of the synthesis agent were reading ground truth fraud labels during detection — essentially cheating. I stripped all label access from the detection pipeline and rebuilt the synthesis logic to rely purely on behavioral signals. The numbers got smaller (from 96 HIGH risk to 40), but every flag is now genuinely earned by detected evidence, not by reading the answer key.

What I Learned

Jac's walker paradigm genuinely changes how you think about graph problems. In Python you reach for dictionaries and adjacency lists and bolt on a separate agent framework. In Jac, the walker IS the agent and the graph IS the execution model — at the same time. For a problem like fraud detection, which is fundamentally about relationships between entities, that's a real fit, not just a language choice.

The biggest thing I took away is that the hardest part of building a detection system isn't the detection — it's making the output actionable. The Sentinel AI assistant, the case files, the exportable reports — those came from realizing that a 98% precise model is worthless if the investigator can't figure out what to do with the result.

Built With

cms-medicare-dataset
d3.js
express.js
framermotion
jac
multer
node.js
openaigpt-4o-mini
python
react

Updates

yash patil started this project — May 18, 2026 07:38 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.