Finn: Multimodal AI Claims Assistant for bunq

Poster
thumbnail
Final Designs

Inspiration

bunq's elite subscription tier bundle real insurance products that are one of the strongest reasons users upgrade and stay. The insurance is underwritten by Quvos, a third-party partner. We started by asking a simple question: what actually happens the moment a bunq user needs to claim?

The answer is not good. The user is pushed out of the bunq ecosystem entirely, onto a third-party portal, asked to fill out a fourteen-field form (with information bunq already has) and made to wait 7–14 days for a decision. Industry data puts utilization on bundled insurance below 15%, meaning most premium subscribers pay every month for a benefit they will never use. Satisfaction collapses, NPS drops, and they downgrade. At bunq's scale, even a single percentage point of premium-tier churn is worth millions of euros in annual recurring revenue. We saw a multi-million-euro retention leak hiding inside a UX problem, and an innovative place to put multimodal AI to work.

The insight that unlocked the whole project: bunq already holds roughly 80% of the data needed to file a claim: the transaction, the merchant, the amount, the policy. Quvos has none of that. The right question is not "how do we fix the third-party portal". It is "why does the user need to leave bunq at all?"

What it does

Finn is a multimodal AI claim assistant that lives natively inside the bunq app and replaces the entire form-based claim flow with a 30-second conversation.

The user opens bunq, taps Start a Claim from the home screen, picks a category (device, travel, medical, luggage), photographs what happened, and records a short voice note. Finn then does the rest — automatically:

Vision model classifies the photo and surfaces a live damage pill on screen ("cracked iPhone screen", "delay board · 4h").
Speech-to-text transcribes the voice note in real time and extracts the structured facts (when, where, how, severity, third parties involved).
Reasoning layer fuses photo + transcript + the user's bunq transactions + the relevant Quvos policy clause, and emits a structured decision via forced tool-use.
Auto-submission to Quvos through their existing intake API in a clean, pre-validated package.
For high-confidence low-amount claims, bunq fronts an instant payout to the user's main account and reconciles with Quvos in the background.

The user's only inputs are a picture and a quick voice memo. The bank's only output is money in their account in seconds; or a warm, plain-language explanation if a claim cannot be auto-resolved. The user never leaves the bunq ecosystem.

How we built it

The whole system runs on a thin, deliberate stack with a single source of LLM truth.

Frontend is a Next.js 16 app deployed on Vercel (teller-eight.vercel.app). All API calls flow through a Next.js catch-all proxy (/api/[…]) into the FastAPI backend, so the browser never holds AWS or bunq secrets and we get to skip CORS entirely.

Backend is a FastAPI service deployed on Fly.io in ams (teller-api.fly.dev). Routes are surgical: POST /classify-photo for live vision on the Review screen, POST /chat for an SSE-streamed multi-turn agent over the bunq API, plus the claim-pipeline endpoints.

AI is centralized. api/app/llm.py exposes a single claude() + model() pair that every call site (agent.py, claims.py, receipts.py, photo_classify.py) goes through. We default to Claude Sonnet 4.5 via AWS Bedrock (us-east-1), with a USE_BEDROCK=false flag to flip back to the direct Anthropic API for local dev when AWS session creds expire. Forced tool-use is what gives us reliable, structured claim decisions instead of free-form prose.

Speech runs on AWS Transcribe Streaming over HTTP/2. We transcode the browser's webm/opus chunks to 16 kHz PCM in-process with ffmpeg, which got us from ~17 s on the legacy batch path down to ~3 s warm-state — the difference between "feels like AI" and "feels like waiting for AI."

Voice output uses AWS Polly for the moments where Finn talks back. AWS S3 holds the audio assets and uploaded photos. bunq sandbox API (api-sandbox.bunq.com) provides the transaction history, payment rails, and instant-payout primitives.

A separate Claude Sonnet 4.5 agent loop sits behind /chat and handles multi-turn tool use over the bunq API for proactive nudges, balance queries, and money movement — same model, same Bedrock entry point, different system prompt.

Challenges we ran into

Latency budget on the voice note. The first version transcribed the audio after the user finished recording, on a batch endpoint, and the wait felt eternal. Re-architecting to AWS Transcribe Streaming over HTTP/2 — with an in-process ffmpeg transcode from webm/opus to 16 kHz PCM — is what turned the experience from "wait for AI" into "feels instant."

Forcing reliable structured output. Free-form LLM prose was unusable as a claim decision. Forcing tool-use with a tightly-defined schema, and rejecting any decision that didn't match it, is what made the auto-submit pipeline trustworthy enough to sit in front of a real payout.

Keeping secrets out of the browser. A consumer-facing app proxying bunq + AWS + Bedrock has to be paranoid. The Next.js catch-all proxy into FastAPI was the cleanest pattern — no CORS, no client-side secrets, and the browser sees only our own API surface.

Making the user feel safe in a stressful moment. The hardest design decision was not what to show on the happy path — it was what to say when Finn cannot auto-resolve. We landed on warm, plain-language reasoning and a one-tap human escalation, because the worst outcome of an AI claim assistant is a user who feels a robot rejected them.

bunq sandbox quirks. PSD2-flavored auth, sandbox card activation, and triggering test payments all required workarounds.

Accomplishments that we're proud of

We turned a multi-million-euro retention leak into a business case with payback measured in weeks. Conservatively scoped at €15–20M in annual upside for bunq, against an implementation cost that fits inside a small engineering team. This is the line that lands with finance and product equally.
Premium-tier retention as the headline ROI lever. Finn directly attacks bunq's biggest premium churn driver: bundled insurance that users pay for but never feel. Lifting utilization from industry-average single digits toward double digits compounds into multi-million-euro ARR preservation every single year, on top of the existing book.
Stronger commercial leverage with Quvos. Cleaner submissions raise approval rates, reduce fraud exposure, and put bunq in real position to renegotiate the commission split: a 1–2% improvement on the insurance commission line is itself a seven-figure annual gain.
A new top-of-funnel marketing line for the premium plans. "File your claim by talking to Finn" is the kind of feature that lifts premium-plan attach rate at the point of conversion. At bunq's scale, every 1% of additional premium conversions is worth millions in new ARR.
30-second median claim-to-decision time: the operational metric that makes the entire ROI possible. Days to under a minute is what causes utilization to climb, churn to fall, and NPS to rise; without the speed, the business case does not exist.
The user never leaves the bunq ecosystem. The most stressful financial moment of the year happens natively inside bunq, where loyalty and lifetime value are actually earned — converting a third-party trust-breaker into bunq's strongest retention engine.
End-to-end demo running on real bunq sandbox infrastructure, with a clean architectural pattern that bunq engineering could productionize on Monday — meaning the ROI clock can start almost immediately.

What we learned

Multimodal AI gets interesting when the modalities replace the interface, not decorate it. Image + voice are not features in Finn; they are the entire claim form.
The biggest UX win in a financial app is not adding screens. It is removing the stress from the user. The whole project came from chasing one specific exit point in the bunq journey.
Latency is a product decision, not an infra detail. Moving from batch transcription to streaming is what made the difference between the demo feeling magical and feeling like a chatbot.
Forced tool-use is the unlock for AI in regulated workflows. Claim decisions only become trustworthy once the model is constrained to emit a strict, validated schema.
AI should explain itself when it says no. A warm, plain-language rejection costs nothing extra and is the difference between "this AI hates me" and "this AI is on my side."

What's next for Multimodal AI Claims Assistant for bunq

Production integration with the real Quvos API, including direct webhook reconciliation for instant payouts that bunq fronts.
Expanding modality coverage: flight/train delay screenshots auto-recognized from a banner notification, medical bill OCR, and lost-luggage tag scanning from a phone-snapshot of the airline tag.
Proactive triggers: the bunq app already detects when a user is abroad. The next step is Finn surfacing itself the moment a covered event likely happened (a flight cancellation, a card declined at a hospital), instead of waiting for the user to start a claim.
Anti-fraud co-pilot: using the same multimodal stack on the inverse problem: catching fraudulent claims before they reach Quvos, raising approval rates further, and improving bunq's commercial leverage on the commission split.
A/B-tested rollout to Easy Travel and Elite cohorts, instrumented end-to-end on the retention metrics the business case rests on: bundled-insurance utilization, premium-tier churn, NPS, and support ticket volume on claims.