Inspiration

bunq has one of the best public banking APIs in Europe and almost no consumer ever touches it. Every "AI agent" demo today is either read-only or YOLO-executes your money. We wanted the missing middle: an agent that proposes a precise change to your bank state, and a human who executes it. Voice is the natural input — money intents are faster spoken than tapped through six screens.

What it does

Vox is a voice-first control plane for your bunq account.

  • Hold the mic, say "move €400 to rent, split my next salary 60/30/10 across rent/groceries/savings, freeze my travel card after 10pm".
  • An LLM planner converts the transcript into a typed Plan: sub-account transfers, recurring splits, conditional card freezes, per-tx limits.
  • The plan renders as diff cards. You tick the actions you want and approve.
  • Only then does the backend hit the real bunq API. The LLM never moves a euro.
  • Active rules live server-side; when one fires (salary lands, bar spend, large tx) the UI hot-flashes the affected sub-account and toasts the event over SSE.
  • Demo buttons fire fake salary / bar spend / large tx so the rule engine reacts live on stage.

How we built it

  • Backend: Python · FastAPI · bunq SDK · LLM planner · SQLite. Two core endpoints — /plan (text → typed Plan) and /execute (selected indices → bunq calls). /events is a Server-Sent Events stream for live rule firings.
  • Web: React + Vite + TypeScript + Tailwind + framer-motion, Web Speech API for transcription, animated diff cards, status pills for bunq / llm / events.
  • Mobile: native Android + iOS apps sharing one backend, each using the platform-native speech recognizer (android.speech.SpeechRecognizer, SFSpeechRecognizer).
  • Shape: voice → transcript → /plan → diff cards → user-selected indices → /execute → bunq → SSE firings → toasts. Same loop on every client.

Challenges we ran into

  • Forcing the LLM to only emit a structured diff (no tool calls, no side effects) without hallucinating account IDs took aggressive schema constraints + retry-on-parse-fail.
  • bunq sandbox quirks: sub-account rate limits, the OAuth/installation/device-server dance, and callbacks needing a publicly reachable URL during a hackathon.
  • No first-class SSE client on mobile — we parsed event: / data: framing by hand off the raw HTTP body channel.
  • Native speech parity: Android partial results, iOS permission prompts, Safari's silent Web Speech all hidden behind one SpeechRecognizer interface.
  • Per-platform backend URLs (10.0.2.2 vs localhost vs LAN IP) solved with a runtime-overridable config.
  • Designing four different action types (transfer, recurring split, conditional freeze, tx limit) so they all render as scannable, tickable cards without bespoke components per type.

Accomplishments that we're proud of

  • Three clients (web + Android + iOS) on one backend, feature-parity, in 70 hours.
  • A real plan → diff → approve → execute loop against the live bunq API — not a mock.
  • The "LLM proposes, human executes" boundary is enforced architecturally, not by prompting.
  • End-to-end live rule firings: demo button → backend rule engine → SSE → mobile toast → sub-account hot-flash, sub-second.

What we learned

  • For agentic apps that touch real-world state, the diff is the product. Voice is just the input modality.
  • Structured outputs + a strict schema beat clever prompting for consumer-grade reliability.
  • SSE is still the lowest-friction way to push events to a mobile app — no socket infra, no FCM/APNs round-trip, just a long-lived HTTP stream.
  • bunq's sandbox teaches you what a well-designed banking API looks like; most of our planner schema is shaped by what bunq endpoints actually accept.

What's next for Vox

  • Production bunq OAuth instead of sandbox installation/device-server.
  • On-device LLM for the planner so transcripts never leave the phone — the executor stays server-side because it needs bunq credentials.
  • Richer rule grammar: time windows, geofencing, payee allowlists, merchant-category limits.
  • Undo / rewind 5 minutes — reverse the last executed plan in one tap, since every action is already a typed diff.
  • Shared household accounts where any member can speak an intent but execution requires the owner's approval.
  • Plan templates — save a frequently-spoken intent ("monthly bills") as a one-tap card.
  • Open the planner — publish the action schema so other Open Banking providers (Revolut, Monzo, N26) plug in behind the same voice + diff UX.

Built With

Share this project:

Updates