Inspiration
Fraud teams face an impossible choice. Rule-based systems can be explained to a regulator but go stale while fraud keeps moving. Modern AI adapts but can't prove why it flagged anything, so nobody signs their name to it. We built Filum, Latin for thread, to close that gap.
What it does
Filum is a fraud triage desk. The analyst opens a ranked queue, and each case explains itself: the evidence in plain English, a defensible risk score, and a visual built for that fraud type — a network graph for circular flows, a threshold histogram for structuring, a timeline for dormant bursts. One of three clicks decides it: Escalate, Flag for Review, or Dismiss, and the case closes as a signed report. Every click retunes the detector that caught the case, while Agent 0 studies the transactions no detector explains and builds new detectors when it finds new patterns. We didn't write those rules. It did.
How we built it
Detection is deterministic and auditable: threshold-band clustering for structuring, cycle detection on the transaction graph for circular flows, shared-counterparty analysis for mule networks. Each detector outputs a risk score and a human-readable reason. Agents sit on top and never replace the core — they retune parameters from analyst decisions and propose new detectors, but never decide what's suspicious. All agents share memory, and every recall prints in the on-screen feed. Frontend: React, with React Flow for network graphs, Recharts for statistical evidence. Escalated cases hand drafted outreach to Geodo through a simulated integration, one API call from real.
Challenges we ran into
Making interpretability survive adaptivity: every retune had to leave the rule as readable as it found it. Auto-layout engines tangled our graphs, so we computed ring positions by hand. And six hours forced early feature freezes.
Accomplishments that we're proud of
Filum found the hidden twelve-account ring in the event's 5,000-transaction dataset, as explicit account IDs checkable against the answer key. And every flag traces to a sentence a human can sign. Nothing here asks for trust. It shows its work.
What we learned
Regulators and judges want the same thing: show your work. A system that explains itself beats a smarter one that can't. And the analyst's normal workday is the best training data there is.
What's next for Filum
The pattern generalizes: anywhere a human reviews machine flags and signs the outcome — AML, insurance claims, credit risk — an interpretable core kept alive by agents that learn from the reviewer works. Next: a real fraud team, and timing how much faster a case closes when the evidence explains itself.
Video:
Log in or sign up for Devpost to join the conversation.