bunq Protect


Inspiration

Authorized push-payment fraud is the fastest-growing fraud category in European banking, and the hardest to undo. Once the user has tapped approve, the money is gone, the IBAN is gone, and the recovery rate is in the low single digits.

What makes modern scams uncatchable by traditional defenses is that the deception lives outside the bank. A "KLM invoice" arrives as a perfectly valid bunq payment request. The convincing part — the WhatsApp screenshot, the AI-cloned voice on the phone, the fake "your son was in an accident" voicemail — never reaches the bank's servers. The bank only sees the result: a request the user is about to authorize while panicking.

bunq Protect is built to close that gap.


What it does: "Check for scam"

When a user is unsure about a pending bunq request, they tap Check for scam. A chat opens, pre-loaded with the bunq payment context. The user attaches whatever they have — a screenshot, a voicemail, or a live phone call — and the agent gives a verdict in seconds.

Three things make this work:

  1. The bank already knows one half. bunq Protect pulls the counterparty IBAN, the amount, the description, and the user's last 50 payments straight from the bunq API. The user doesn't type anything — they just bring the missing context.

  2. The user brings the interesting half. A scam victim almost always has the evidence on their phone already: the screenshot of the SMS, the voicemail, the call that's still ringing. The agent meets them where they are.

  3. The agent decides per-modality, then fuses. This is the technical core and is described in The Build below.

The user, not the agent, makes the final call. Approve triggers PUT request-response status=ACCEPTED; Reject triggers status=REJECTED — both real bunq endpoints.

A second mode (passive auto-reject, fired by bunq's webhook on every incoming request) is also implemented but gated off in the demo build — the simulator cases are tuned high-confidence and would all auto-reject before the user could run the active flow. The architecture is deliberately split so that auto-rejection only ever uses data the bank already has — photos, voicemails, and live calls require the user to share them.


The Build

Multi-modal AI: late fusion with a max-rule combiner

The classifier is intentionally not a single multimodal prompt that sees everything at once. It uses late fusion (also known as decision-level fusion): three Claude calls run in parallel, one per modality, each with its own carefully tuned system prompt, and the combination happens in plain Python.

  • Text + behavioral. Reasons over the bunq RequestResponse plus the user's payment history. Looks for brand-on-personal-IBAN mismatches, IBAN reuse drift on repeat merchants, urgency / fear language, manipulation tactics ("don't tell Mom", "I'll explain later"), amount anomalies.
  • Vision. Given a user-forwarded screenshot, looks for visual fraud signals — claimed brand vs. visible sender, urgency markers, layout inconsistencies, internal contradictions in the document itself (a "PAID" stamp on a document being used to demand new payment, a corporate brand using a personal email or NL##BUNQ######## IBAN as the beneficiary).
  • Audio. Transcribes via Groq Whisper-large-v3-turbo (or accepts a pre-transcribed string from the browser's Web Speech API for live calls), then asks Claude to detect known voice-scam patterns: relative-in-distress, fake bank fraud-team agents, fake tax officials, AI-cloned voices.

Each call returns a calibrated signal — scam_prob and a self-rated confidence, with two-decimal granularity (the prompts explicitly forbid round-number defaults like 0.5 / 0.9 / 0.95).

Why split, then fuse in Python

A single multimodal prompt that sees text + screenshot + voicemail together averages the signals. That is exactly the wrong behavior for fraud detection, because a clean visual is the expected state of a successful scam — the scammer's job is to look legitimate. Averaging "this image looks fine" with "this counterparty IBAN is wildly inconsistent with the claimed brand" hides the smoking gun behind a polished disguise.

So fusion uses a max-rule combiner over confidence-gated signals, with a disagreement detector for cross-modal contradictions — short enough to fit on one slide:

# Security-first: the strongest confident red flag wins.
confident = [s for s in signals if s.confidence > 0.4]
top    = max(confident, key=lambda s: s.scam_prob)
bottom = min(confident, key=lambda s: s.scam_prob)
risk   = top.scam_prob

if top.scam_prob - bottom.scam_prob > 0.5:
    # Cross-modal contradiction — flag it, don't average it away.
    explanation = (f"{top.name} shows clear scam patterns, "
                   f"even though {bottom.name} looks polished — "
                   f"a polished disguise is exactly what a real scam aims for.")

The verdict surfaces this in the UI as a per-modality breakdown plus a plain-language summary. The user sees why, not just a score.

bunq integration

bunq Protect is built directly on the bunq sandbox API and uses real production endpoints.

Used for bunq endpoint
Pull pending requests + context for "Check for scam" GET /request-response
Approve a payment PUT /request-response/{id} {"status":"ACCEPTED"}
Reject a payment PUT /request-response/{id} {"status":"REJECTED"}
IBAN-history signal for the text modality GET /payment
Passive-mode webhook (gated off in demo) POST /notification-filter-url
Block an outgoing draft payment PUT /draft-payment/{id} {"status":"REJECTED"}
"Nuclear option" card freeze PUT /card/{id} {"status":"DEACTIVATED"}
Sandbox demo plumbing POST /sandbox-user-person, POST /request-inquiry

The UI is styled to feel like a feature inside the bunq app — phone frame, the lowercase bunq wordmark, the deep-navy canvas, the electric blue accent. The Pending tab is the entry point; tapping Check for scam is the only action that escalates beyond text. This is exactly where bunq Protect would slot into the real bunq app: a one-tap button on every pending payment request.


Learnings

  • Multimodal ≠ one big prompt. Throwing every modality into a single Claude call sounds simpler, but it averages the signals — exactly the wrong behavior for fraud, where a clean visual is the expected state of a successful scam. Splitting per-modality and fusing late lets the system surface cross-modal contradictions instead of smoothing them away.
  • Calibrated probabilities matter more than a verdict bit. Asking Claude for a scam_prob and a self-rated confidence (with explicit prompt rules against round-number defaults) gave the fusion layer signals it could actually combine. A boolean-only output would have collapsed the contradiction-detection layer into noise.
  • The bank-vs-user data split is a feature, not a limitation. What the bank already has (text, history) can power aggressive auto-reject; what's on the user's phone (photos, voicemails, calls) requires consent. Encoding that asymmetry into the architecture is what lets the same product be both protective and trustable.

Why this matters

Authorized push-payment fraud is rising sharply across Europe, recovery rates are extremely low, and the status quo — warnings, cooling-off periods, post-hoc disputes — does not work, because the failure mode is a panicked user under pressure approving a payment they shouldn't.

bunq Protect is built to mitigate this.

Built With

  • 4.5
  • anthropic
  • api
  • bunq
  • claude
  • css
  • fastapi
  • groq
  • python
  • react
  • sandbox
  • sonnet
  • speech
  • tailwind
  • vite
  • web
  • whisper-large-v3-turbo
Share this project:

Updates