potbot

example of inputs in whatsapp
demo dashboard

Inspiration

Shared-finance apps fail because logging an expense means leaving the conversation where the expense happened. Most users can't complete moderately complex flows in dedicated apps, let alone prompt an AI. Meanwhile, every "agent" I've seen in Slack or Teams is noisy and annoying.

So the question became: what if the agent shut up? What if it sat quietly in your group chat, the one you already share with your partner or roommates, and just listened?

What it does

Add the bot to any group, then live normally: type "kupil som pivo za 3", forward a receipt photo, send a voice note saying "andrej mne 10 za benzin", or just write "obed 15". Everything gets parsed and logged silently into a local SQLite ledger.

The bot never initiates and never replies to free-form messages — no confirmations, no "got it!", nothing. When you want to see state, you ask:

/expenses [N] — last N active entries.
/balance — net per-member balance in this chat.
/seed name1 name2 … — add synthetic participants (test helper).

How we built it

Messaging. WhatsApp via neonize. We went this route deliberately after the Meta Cloud API turned out not to support group messages — a dealbreaker for the whole premise. Neonize gives us full group visibility (members, display names, media, voice notes) on a personal number.

The react agent. Messages that pass the filter go to a DSPy ReAct agent (agent.py) wired to four SQLite-backed tools: add_expense, remove_expense, list_expenses, show_balances. The signature is aggressive about not replying, the prompt literally says "you do NOT reply to the user, the user reads no text from you." The agent parses direction (payer vs. ower), auto-creates unknown people as they're mentioned, and deduplicates tool calls within a turn so ReAct loops can't double-insert.

Voice. ElevenLabs scribe_v2 via their SDK. The transcript is prepended to any caption text and then fed to the same classifier/agent pipeline — voice is just text with extra steps.

Receipts / PDFs. Images and documents are downloaded and stored in the message audit log (conversation_log.jsonl). A separate DSPy extractor (extractor.py), accessible via the CLI, uses the attachments library to feed receipt images and PDFs to a multimodal LM and pull out from_whom / to_whom / what / price. Wiring this into the live WhatsApp path is the obvious next step but isn't in yet — the live bot currently uses text + voice-transcript only.

Challenges we ran into

WhatsApp access. We started on the Meta Cloud API and bounced off it, it's gated behind a verified business number and doesn't deliver group messages, which rules out the entire use case. Switching to neonize (WhatsApp Web session) cost us a weekend but unblocked everything.

ReAct loops. Duplicate tool calls during reasoning were silently inserting the same expense twice. We added a per-turn _seen_writes guard keyed on (payer, ower, cents, note) that refuses duplicates and returns a sentinel string so the agent moves on.

Accomplishments that we're proud of

A live WhatsApp group demo where receipts, voice notes, and free-text expenses all flow into a structured ledger without a single prompt-engineering moment for the user.
Getting the "silent" interaction model right, the bot is genuinely invisible until called, which is rarer than it should be.

What we learned

Distribution beats features. A mediocre tool in WhatsApp beats a great tool in a standalone app.
Agents don't have to be chatty. There's a whole design space for agents that are mostly silent and occasionally useful.
Official platform APIs lie about their scope. "Supports messaging" and "supports group messaging" are very different sentences.
LLMs are great at the messy middle. Parsing "riso mi dal 20 za benzin" into (riso → me, 20 EUR, petrol) is now one signature.