Brim Expense Intelligence

Inspiration

Small and medium businesses generate thousands of card transactions every month, but they're flying blind — the data sits in spreadsheets nobody reads, expense policies live in a forgotten PDF, and finance teams spend their days chasing receipts and rubber-stamping approvals instead of finding the money they're losing. Brim's challenge was simple: make the data talk. We took it further — we wanted a tool that doesn't just chart spend, but proactively finds waste, enforces policy, and drafts the fix, while always leaving the human in command.

What it does

Brim Expense Intelligence is an AI spend-intelligence co-pilot for finance teams, built on a real (anonymized) US/Canada heavy-haul trucking fleet's card data — ~$1.5M across 4,235 transactions over six months.

Talk to your data — ask any question in plain English; an agent writes a live MongoDB aggregation, reads the real rows, and answers with a data-driven chart and follow-up context.
Findings Review — an AI analyst investigates each anomaly by querying the data itself (duplicates, inflated fees, outliers), then drafts the fix. You triage by keyboard; nothing is ever auto-sent.
Policy Compliance Engine — see where spend has no controls (Reality Gap), author rules in plain English and simulate them against 6 months before enforcing (Rule Studio), and catch violations live — split-charge structuring, repeat offenders, missing receipts (Compliance).
AI Pre-Approval — an inbox that surfaces vendor history, department budget headroom, policy checks, and an approve/review/deny recommendation with reasoning.
Receipt matching — every charge shows whether a receipt is linked, matched, or mismatched, with a viewable receipt document.
Expense Reports & department budget forecasting — CFO-ready reports, plus next-month overrun projections.

Throughout: AI recommends, you decide. The AI only ever drafts; its database access is read-only.

How we built it

Frontend — Vite + React SPA, Recharts for visualizations, a clean Brim-inspired design system.
Backend — Express + Node, with MongoDB Atlas as the single source of truth (the txns collection).
AI — the Anthropic Claude API via a custom agentic loop: the model is given a read-only run_query tool, decides what MongoDB aggregations to run, reads the results, and self-corrects — producing a visible reasoning trail rather than a one-shot answer.
Robust anomaly engine — pure, deterministic statistics (median + MAD, shrinkage baselines, noisy-OR signal combination) so dollar figures are never hallucinated; the AI supplies judgment on top of real numbers.
Deterministic guardrails — read-only aggregation enforcement, canonical policy-rule dedup, and seeded/curated data (a simulated roster and a receipts CSV) layered on top of the real transactions.

Challenges we ran into

Hallucination-proofing the numbers — every dollar figure had to come from a real MongoDB aggregation, never the model. We split each feature into a deterministic data layer and an AI judgment layer.
The $50-threshold trap — the policy flags everything over $50, so naïve checks flagged every charge as "needs review." We had to make the AI reason decisively from context instead of restating the rule.
No employee data — the dataset has no people or departments, so we deterministically simulated a roster on top of real transactions. This also broke our first split-charge detector (clustering by department fragmented every group) until we re-keyed it to merchant + day.
Agentic reliability — getting the tool-use loop to self-correct on bad queries, converge, and return clean JSON took careful prompt and parser design.

Accomplishments that we're proud of

A genuinely agentic analyst that investigates before it judges — not a single-prompt wrapper.
Split-charge / structuring detection with AI contextual review — catching the exact "two $300s to duck a $500 limit" pattern from the brief.
Every Ask answer returns a visual generated from real Atlas data, guaranteed.
A cohesive, polished product where all features share one design language and one principle: human-in-command.

What we learned

The strongest AI products pair deterministic math with model judgment — let MongoDB compute the truth, let the model explain and decide.
Context beats rules: a $200 team dinner isn't a $200 solo dinner, and good UX makes that distinction legible to a non-technical finance manager.
Agentic loops with a visible reasoning trail are what make "AI depth" believable to a user — showing the work matters as much as the answer.

What's next for Brim Expense Intelligence

Persistence — move in-memory rules, limits, and receipt submissions into MongoDB for a multi-session, multi-user product.
Real receipt OCR — match uploaded images/PDFs to charges automatically instead of curated metadata.
Live ingestion & alerts — stream new transactions and push proactive overrun/anomaly alerts ("at this burn rate, Maintenance exceeds budget by week 8").
Per-employee profiles & peer benchmarking once real cardholder data is available.
Deeper vendor consolidation analysis — "you're paying 4 fuel vendors; here's what you'd save."