project sotry: The first thing we threw out was the screen as a primary surface. Lynn lives on voice and tap. You unlock the app by tapping a rhythm — tap-tap-PAUSE-tap-tap — instead of seeing and entering a PIN. You hold the screen to speak, and release to send. "How much did I spend on groceries last month?" — and Lynn answers from your real bunq transaction history, not from a hallucination. "Send €10 to Sophie for pizza." — and Lynn reads back the details, waits for a single tap to confirm, and never moves money without your explicit go-ahead.
Underneath, Lynn is a Claude-driven agent with 36 bunq tools wired in: payments, drafts, scheduled transfers, cards, requests, contact lookup, and a custom financial reasoning tool that synthesizes affordability, anomalies, and cashflow forecasts on demand. OpenAI Whisper handles speech-to-text. ElevenLabs handles the voice. Every interaction is logged through an accessibility event stream so you can see, in real time, why Lynn did what she did.
The hard parts were never about the APIs. They were about the seams.
The agentic loop confirmed a payment correctly and then immediately created a duplicate draft, because Claude re-read the conversation history and decided the original "send to Sophie" request was still unfulfilled. We hard-stopped the loop after a successful confirm.
The tap state machine had a subtle race: a single tap fires a balance shortcut after 400ms — but if a payment draft was created during those 400ms, the shortcut would still fire on top of the confirmation flow. Two intents, one gesture, the wrong one winning. We added a guard.
Speech-to-text mishearing names — "Wang" became "Wayne" — sent Lynn into clarification dead-ends. We added fuzzy matching: now Lynn asks "Did you mean Chase Wang?" instead of giving up.
And the system prompt told Lynn to say "tap once to confirm" before every consequential action. But the tap-confirm mechanism only existed for outgoing payments. Users were asking Lynn to request money and watching her stop mid-flow waiting for a tap that did nothing. We rewrote the rule to scope it.
Each of these bugs taught us the same lesson: when you're building a voice-first product, latency, ambiguity, and contradiction are the enemy. Every dead-end loop, every silent moment, every "tap that does nothing" — they don't just feel bad, they break trust. And in banking, trust is the entire product.
What we ended up with is something we genuinely think we'd hand to a blind family member — not as a charity demo, but as a real way to bank. Lynn never invents data. She never moves money without an unambiguous gesture. She speaks amounts as words, IBANs in NATO phonetic, and refuses to leak visual assumptions into her speech. She's fast: prompt caching keeps the tail latency low enough that conversations actually feel like conversations.
It's still a prototype. There's no multi-user auth yet. The TTS key still leaves the backend. We haven't tested with real visually impaired users — and that's the most important next step, because nothing teaches you about accessibility like watching someone who actually depends on it use what you built.
But it works. Hold the screen, speak, release, hear the answer. Tap once to confirm, twice to cancel. Close your eyes — and bank.
Built With
- claud
- python

Log in or sign up for Devpost to join the conversation.