Chill Bill — Automating the Boring Side of Money
Inspiration
As international students in a foreign country, we constantly dealt with unfamiliar bureaucracy, tax letters, fines, invoices, and forms in a language we didn’t fully understand. It was slow, frustrating, and error-prone.
We realized this isn’t just our problem. Many users of bunq, especially expats and digital nomads, face the same challenges. At the same time, everyday tasks like splitting bills with friends are repetitive and tedious.
This led to a simple idea:
What if you could just upload a document or receipt, say what you want, and everything gets handled automatically?
What We Built
Chill Bill is a multimodal AI agent that automates financial and bureaucratic tasks.
Users can:
- Upload receipts, invoices, fines, or tax letters (images, PDFs, screenshots)
- Add instructions via text or voice
Let the agent:
- Extract and understand the content
- Translate and explain complex documents
- Split expenses and generate payment links
- Initiate payments via the bunq API
At a high level, the system maps user intent to structured actions:
[ \text{Input (image/audio/text)} -> \text{Understanding} \rightarrow \text{Intent Extraction} \rightarrow \text{Action (API)} ]
How We Built It
We combined financial APIs with a multimodal AI layer:
AI Layer
- Anthropic API with Claude Sonnet 4.6
- Handles document understanding, reasoning, and intent extraction
- Processes text, images, PDFs, and audio (transcribed)
Backend
- Python + FastAPI for orchestration
- Structured outputs (JSON) for safe execution
- Session-based chat + memory
Financial Layer
- bunq API
- Payment links, transactions, and account interactions
Multimodal Pipeline
- Audio → transcription → intent
- Image/PDF → OCR + semantic understanding
- Unified into one reasoning flow
What We Learned
- Multimodal AI is powerful but brittle: small ambiguities in user input can break downstream actions
- Structured outputs are critical: free-form LLM responses are not safe for financial execution
- User intent is harder than extraction: understanding what the user wants done is the real challenge
- APIs + LLMs = agents: the real value comes from connecting reasoning to action
Challenges
Ambiguity in user instructions Users might say: “split this fairly” → requires interpretation, not just extraction
Document variability Receipts and government letters have inconsistent formats, layouts, and languages
Safety & reliability Financial actions require high precision: [ \text{Error cost} \gg \text{typical AI tolerance} ]
End-to-end orchestration Connecting multimodal input → reasoning → API execution without breaking the pipeline
Why It Matters
Chill Bill reduces friction in everyday financial life, especially for people navigating unfamiliar systems. By automating both social finance (splitting bills) and bureaucracy (documents, taxes, fines), it turns frustrating tasks into a single interaction.
Upload. Say what you need. Done.
Log in or sign up for Devpost to join the conversation.