Chill Bill — Automating the Boring Side of Money

Inspiration

As international students in a foreign country, we constantly dealt with unfamiliar bureaucracy, tax letters, fines, invoices, and forms in a language we didn’t fully understand. It was slow, frustrating, and error-prone.

We realized this isn’t just our problem. Many users of bunq, especially expats and digital nomads, face the same challenges. At the same time, everyday tasks like splitting bills with friends are repetitive and tedious.

This led to a simple idea:

What if you could just upload a document or receipt, say what you want, and everything gets handled automatically?


What We Built

Chill Bill is a multimodal AI agent that automates financial and bureaucratic tasks.

Users can:

  • Upload receipts, invoices, fines, or tax letters (images, PDFs, screenshots)
  • Add instructions via text or voice
  • Let the agent:

    • Extract and understand the content
    • Translate and explain complex documents
    • Split expenses and generate payment links
    • Initiate payments via the bunq API

At a high level, the system maps user intent to structured actions:

[ \text{Input (image/audio/text)} -> \text{Understanding} \rightarrow \text{Intent Extraction} \rightarrow \text{Action (API)} ]


How We Built It

We combined financial APIs with a multimodal AI layer:

  • AI Layer

    • Anthropic API with Claude Sonnet 4.6
    • Handles document understanding, reasoning, and intent extraction
    • Processes text, images, PDFs, and audio (transcribed)
  • Backend

    • Python + FastAPI for orchestration
    • Structured outputs (JSON) for safe execution
    • Session-based chat + memory
  • Financial Layer

    • bunq API
    • Payment links, transactions, and account interactions
  • Multimodal Pipeline

    • Audio → transcription → intent
    • Image/PDF → OCR + semantic understanding
    • Unified into one reasoning flow

What We Learned

  • Multimodal AI is powerful but brittle: small ambiguities in user input can break downstream actions
  • Structured outputs are critical: free-form LLM responses are not safe for financial execution
  • User intent is harder than extraction: understanding what the user wants done is the real challenge
  • APIs + LLMs = agents: the real value comes from connecting reasoning to action

Challenges

  • Ambiguity in user instructions Users might say: “split this fairly” → requires interpretation, not just extraction

  • Document variability Receipts and government letters have inconsistent formats, layouts, and languages

  • Safety & reliability Financial actions require high precision: [ \text{Error cost} \gg \text{typical AI tolerance} ]

  • End-to-end orchestration Connecting multimodal input → reasoning → API execution without breaking the pipeline


Why It Matters

Chill Bill reduces friction in everyday financial life, especially for people navigating unfamiliar systems. By automating both social finance (splitting bills) and bureaucracy (documents, taxes, fines), it turns frustrating tasks into a single interaction.

Upload. Say what you need. Done.

Built With

  • adb
  • amazon-web-services
  • android
  • anthropic-api
  • anthropic-claude-sonnet-4.6
  • audio
  • aws-transcribe
  • bash
  • bunq-api
  • conda
  • docker-(optional)
  • fastapi
  • git
  • http/multipart
  • images
  • jetpack-compose
  • json
  • kotlin
  • mediarecorder
  • multimodal-ai-(text
  • ocr
  • pdf)
  • pip
  • python
  • rest-apis
  • speech-to-text
Share this project:

Updates