The raw/flag columns (straight from the data)

- total_amount = sum of amount; txn_count = number of transactions (pandas groupby in
account_risk.account_features).
- shares_device = boolean, account shares a device_id with another account (device_ip.shared_devices).
- impossible_travel_count = consecutive transactions in different ip_region within 1h (device_ip.impossible_travel);
0 in this data.
- is_new_account = age_days < 14 (config.NEW_ACCOUNT_AGE_DAYS).

Where the normalization math lives

fraudkit/analytics/_util.py — robust_z (median/MAD), minmax_0_100, and clip_z_to_score. So the whole leaderboard is
deterministic statistics + graph centrality + community detection (no LLM); the numbers are reproducible by
re-running scripts/02_compute_findings.py.

Built With

Share this project:

Updates