Inspiration

Two facts collided in our heads in early 2026.

One: Pennylane just raised €175M at a $4.25B valuation on the pitch "AI assists the accountant." Capital One paid $5.15B for Brex on "agent + spend attribution." Meanwhile, every CFO of every European scale-up we spoke to had the same two problems: their books were 30 days dark, and their Anthropic bill was climbing every month with no idea who,
internally, was triggering it.

Two: the EU AI Act starts full enforcement on August 2, 2026. Article 6 will require that automated decisions inside regulated workflows — and accounting absolutely qualifies — be explainable, auditable, and traceable to the policy the model relied on. Today, every competitor logs agent activity as an opaque JSON sidecar and calls it "agents cite their
work." Nobody exposes (journal_line_id, decision_id, wiki_revision_id) as a queryable contract.

We saw a 12-month window to plant a flag, and built it.

## What it does

Agnes is the first self-improving accounting agent for European SMEs — an autonomous CFO that:

  • Posts journal entries in under five seconds of a Swan webhook firing — no nightly
    batch, no eventually-consistent reconciliation. The journal entry, the budget envelope decrement, the per-employee cost attribution, and the audit trace all land inside one
    BEGIN IMMEDIATE transaction.
  • Attributes every AI token to the employee who triggered it. One SQL GROUP BY answers "Anthropic billed us €1,240 this month — Marie burned €480 on the pricing agent, Sophie
    €320 on document extraction, Paul €110."
    Per-employee × per-pipeline × per-model, at micro-USD precision.
  • Re-writes its own rulebook. Every monthly close, a post-mortem agent observes what
    slipped through ("a €300 dinner passed the €250 limit"), drafts the policy tweak, and files it to a markdown Living Rule Wiki that the agents themselves re-read on the very
    next run. The CFO ratifies the change in one click. The system gets cheaper and more accurate every month.
  • Stamps EU AI Act provenance on every line. Click any number on the P&L → see the agent that ran, the model, the prompt-hash, the wiki revision it cited, the alternatives it
    rejected, and the cost in cents. Provenance is a SQL JOIN, not a PDF export.
  • Speaks MCP natively. The full agentic surface — run pipelines, query the GL, approve entries, read and write the wiki — is exposed as an MCP server, so any MCP-aware client (Claude Desktop, Cursor, a custom CFO copilot) drives Agnes the same way the dashboard does. ## How we built it

The architecture is uncompromising on the boring parts, because that's where the moat lives.

Three SQLite databases, one writer each. accounting.db (the GL truth), orchestration.db (run metadata, wiki pages + revisions, prompt cache), and audit.db
(decisions, costs, gamification). A per-DB asyncio.Lock + BEGIN IMMEDIATE gives us serializable writes without the ops cost of Postgres at the SME scale we target. Every write goes through store.writes.write_tx — no conn.commit() calls anywhere else, ever.

Pipelines are data, not code. Every workflow — transaction_booked, document_ingested, period_close, vat_return, year_end_close — is a YAML DAG. A new event type ships as YAML + a tool + one routing line. Never executor surgery. That's how we ship a custom monthly close per customer in days, not quarters.

One chokepoint for journal writes. gl_poster.post is the only code path outside
migrations that inserts into journal_entries. CI greps for it. It also enforces period-lock backdate protection: posting an entry_date inside a closed accounting_periods row raises RuntimeError. Backdating is impossible by construction.

Money is integer cents. No floats on a money path, ever. CI grep enforces it. AI cost is integer micro-USD with the same rule.

The Living Rule Wiki is the compounding moat. Wiki pages live in orchestration.db with full revision history and an FTS5 index. Every reasoning agent calls wiki_reader.fetch(tags=[…]) at the start of its run, threads (page_id, revision_id) into its prompt-hash, and stamps the citation onto its agent_decisions row. So a wiki edit
invalidates exactly the cached agent calls that read that page — surgical cache correctness, exact audit replay, no drift.

Multi-provider runners with env-var swap. Anthropic Sonnet 4.6 is the default (and the only choice for vision — document OCR runs on Claude). Cerebras gpt-oss-120b via OpenAI-compat for sub-second classification on the cheap path. AGNES_LLM_PROVIDER=anthropic|cerebras flips the runtime; cost is computed from a pinned
per-(provider, model) micro-USD rate table.

MCP server wraps the FastAPI surface in-process. No business-logic duplication: the MCP tools call the FastAPI routers via httpx.ASGITransport, so anything the dashboard can do, an MCP client can do — and they go through the exact same audit trail.

Frontend is Vite + React 18 + TypeScript + shadcn/ui + TanStack Query, talking to the backend over a local Vite proxy. Five live pages: Dashboard, AI-Spend (the killer SQL query, charted), Reports (period close + VAT return), Adoption (the gamification leaderboard), and Wiki.
## How we built it

The architecture is uncompromising on the boring parts, because that's where the moat lives.

Three SQLite databases, one writer each. accounting.db (the GL truth), orchestration.db (run metadata, wiki pages + revisions, prompt cache), and audit.db
(decisions, costs, gamification). A per-DB asyncio.Lock + BEGIN IMMEDIATE gives us serializable writes without the ops cost of Postgres at the SME scale we target. Every write goes through store.writes.write_tx — no conn.commit() calls anywhere else, ever.

Pipelines are data, not code. Every workflow — transaction_booked, document_ingested, period_close, vat_return, year_end_close — is a YAML DAG. A new event type ships as YAML + a tool + one routing line. Never executor surgery. That's how we ship a custom monthly close per customer in days, not quarters.

One chokepoint for journal writes. gl_poster.post is the only code path outside
migrations that inserts into journal_entries. CI greps for it. It also enforces period-lock backdate protection: posting an entry_date inside a closed accounting_periods row raises RuntimeError. Backdating is impossible by construction.

Money is integer cents. No floats on a money path, ever. CI grep enforces it. AI cost is integer micro-USD with the same rule.

The Living Rule Wiki is the compounding moat. Wiki pages live in orchestration.db with full revision history and an FTS5 index. Every reasoning agent calls wiki_reader.fetch(tags=[…]) at the start of its run, threads (page_id, revision_id) into its prompt-hash, and stamps the citation onto its agent_decisions row. So a wiki edit
invalidates exactly the cached agent calls that read that page — surgical cache correctness, exact audit replay, no drift.

Multi-provider runners with env-var swap. Anthropic Sonnet 4.6 is the default (and the only choice for vision — document OCR runs on Claude). Cerebras gpt-oss-120b via OpenAI-compat for sub-second classification on the cheap path. AGNES_LLM_PROVIDER=anthropic|cerebras flips the runtime; cost is computed from a pinned
per-(provider, model) micro-USD rate table.

MCP server wraps the FastAPI surface in-process. No business-logic duplication: the MCP tools call the FastAPI routers via httpx.ASGITransport, so anything the dashboard can do, an MCP client can do — and they go through the exact same audit trail.

Frontend is Vite + React 18 + TypeScript + shadcn/ui + TanStack Query, talking to the backend over a local Vite proxy. Five live pages: Dashboard, AI-Spend (the killer SQL query, charted), Reports (period close + VAT return), Adoption (the gamification leaderboard), and Wiki.

Built With

  • mcp
Share this project:

Updates