TunaTax 🐟

Automated multi-jurisdiction tax compliance, powered by AI, with human oversight.


Inspiration

Tax compliance is one of those problems that's deceptively brutal. On the surface it sounds simple β€” file your taxes on time β€” but for any business operating across borders, it quickly becomes a nightmare of overlapping deadlines, jurisdiction-specific forms, currency conversions, and regulatory changes that shift under your feet.

We spoke to accountants at small and mid-sized firms who described their workflow: manually matching bank transactions to email invoices in spreadsheets, tracking filing deadlines across 4+ countries on paper calendars, and spending days assembling the data needed for a single VAT return. One told us she keeps a personal notebook of every deadline because "if I rely on my memory, someone gets fined."

That felt like a problem worth solving. Not with another dashboard that shows you data β€” but with an agentic system that actually does the work: pulls in invoices and transactions, reconciles them, tracks every deadline, and generates the filing documents automatically. The human stays in control (every AI output requires approval), but the tedious 90% is handled.

The tuna? Every good product needs a mascot. Tuna are migratory fish that cross international waters β€” just like the tax obligations we're tracking.

What it does

TunaTax is an end-to-end tax compliance platform with three core capabilities:

1. Intelligent Reconciliation

Bank transactions flow in from Open Banking feeds. Invoices arrive from Gmail. Our linking engine (till) uses AI to match them β€” comparing amounts, dates, vendor names, payment references, and IBANs to produce match proposals with confidence scores.

For a transaction of €2,450.00 to HubSpot, the engine might find the corresponding invoice email, extract the PDF, parse it, and produce a match at 98.5% confidence. The tax worker reviews and approves in one click. For ambiguous cases (amount discrepancies, missing references), it flags them for human review.

The reconciliation threshold follows a simple model:

$$P(\text{match}) \geq 0.85 \implies \text{auto-matched (pending approval)}$$ $$0.50 \leq P(\text{match}) < 0.85 \implies \text{needs_review}$$ $$P(\text{match}) < 0.50 \implies \text{no_match}$$

2. Compliance Calendar & Deadline Tracking

The platform maintains a knowledge base of tax obligations across the US, UK, Germany, and Australia β€” covering corporate income tax, VAT/GST, payroll, estimated quarterly payments, and more. Given a company's jurisdictions and entity types, it generates a personalised compliance calendar.

A background scheduler service monitors deadlines continuously. When a filing is $\leq 30$ days away, it automatically triggers document preparation. At $\leq 7$ days, it escalates priority. Past deadline? Marked overdue immediately.

3. AI-Generated Filing Documents

This is the most technically interesting part. When a compliance document needs to be produced (say, a UK VAT return), the system:

  1. Queries all reconciled transactions for the relevant period and jurisdiction
  2. Prompts the AI agent with the obligation rules, tax rates, and data schema
  3. The AI writes a Python computation script specific to that filing
  4. The script runs in a sandboxed subprocess with the financial data injected
  5. Output is a structured document with computed figures (net VAT payable, tax liability, etc.)
  6. The worker reviews, approves, and submits

The AI doesn't just fill in a template β€” it reasons about the data and adapts to edge cases. The generated script is stored for audit.

Two User Roles

  • Workers (accountants): Full access β€” reconcile transactions, approve AI outputs, manage compliance filings, generate documents
  • Businesses (clients): Read-only financial overview β€” P&L, expenses, tax exposure across jurisdictions. No compliance management.

How we built it

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    React Frontend    │────▢│    FastAPI Backend    β”‚
β”‚  Vite + Tailwind     β”‚     β”‚    Python 3.12        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚                          β”‚                  β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
     β”‚   LiteLLM    β”‚          β”‚   Scheduler   β”‚   β”‚  Sandbox    β”‚
     β”‚  (LLM proxy) β”‚          β”‚  (background) β”‚   β”‚  Executor   β”‚
     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  Gemini β”‚ Anthropic β”‚ Ollama β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Backend: FastAPI with async SQLite (aiosqlite). The compliance engine, document generator, and agent are all Python modules behind a REST API.

LLM Layer: We used LiteLLM as a unified interface to swap between providers without changing application code. During development we ran Ollama locally on an RTX 4070 (12GB VRAM) with gemma3:12b for text and llama3.2-vision:11b for invoice OCR. For the demo we could switch to Gemini or Anthropic by changing one environment variable.

# Same code works for any provider
from litellm import completion
response = completion(
    model="ollama/gemma3:12b",  # or "gemini/gemini-2.5-flash" or "anthropic/claude-sonnet-4-20250514"
    messages=[{"role": "user", "content": "..."}]
)

Linking Engine (till): A separate service that processes bank transactions against email invoices. It uses AI to extract invoice data from PDFs, then scores match quality across multiple dimensions (amount, date, vendor, reference, IBAN).

Sandboxed Document Generation: The AI generates Python scripts that are executed in restricted subprocesses β€” no network access, no filesystem access, whitelisted imports only, 30-second timeout. Financial data is injected as a JSON variable. This gives us the flexibility of AI-generated code with the safety of a sandbox.

Frontend: React with Plus Jakarta Sans typography, a #4059AD blue primary palette, and a clean light-theme fintech aesthetic. The animated tuna on the homepage is pure CSS @keyframes.

Deployment: Single docker compose up β€” multi-stage Dockerfile builds the React frontend, bundles it with the Python backend, and serves everything from one container.

Tax Knowledge Base

We built a structured JSON knowledge base covering obligations for four jurisdictions:

Country Obligations Tracked
πŸ‡ΊπŸ‡Έ US Corporate Income Tax (1120), S-Corp (1120-S), Estimated Quarterly, Sales Tax, Payroll (941)
πŸ‡¬πŸ‡§ UK Corporation Tax (CT600), VAT Return (quarterly), PAYE RTI
πŸ‡©πŸ‡ͺ DE KΓΆrperschaftsteuer, Gewerbesteuer, Umsatzsteuer (monthly/quarterly)
πŸ‡¦πŸ‡Ί AU Company Tax, Business Activity Statement (quarterly), PAYG

Each obligation includes form numbers, due date rules, applicable entity types, tax rates, penalty information, and the data fields required to generate the filing.

Challenges we ran into

LLM provider inconsistency. Function calling works differently across Gemini, Anthropic, and Ollama. Gemini auto-executes tools; Anthropic returns tool-use blocks; Ollama's support varies by model. LiteLLM normalises most of this, but edge cases (especially around vision + tool calling in the same request) required careful handling.

Sandboxed code execution is hard to get right. The AI occasionally generates scripts that import blocked libraries or produce malformed JSON. We added a retry mechanism β€” if the script fails, the error message is fed back to the AI for a second attempt β€” which improved success rate from ~70% to ~95%.

Realistic demo data. Tax data needs to be plausible to be convincing. Amounts can't be round numbers, dates need to align with actual fiscal calendars, and VAT calculations need to be correct to the penny. We spent significant time crafting seed data that an accountant would find believable.

The compliance detail panel was the most complex frontend interaction β€” fetching transactions for a specific period, computing VAT stats client-side, handling the approve β†’ generate β†’ preview β†’ submit flow in a single slide-out panel without losing state.

Running vision models locally. The RTX 4070's 12GB VRAM is enough for llama3.2-vision:11b but just barely β€” we had to be careful about not loading multiple models simultaneously. Ollama's auto-unload behaviour saved us here.

Accomplishments that we're proud of

The full loop works. In the demo, we go from raw bank transactions and email invoices β†’ AI reconciliation β†’ worker approval β†’ compliance document generation β†’ PDF output β†’ filed. That's a complete business workflow, not just a proof of concept.

Provider-agnostic AI. The same codebase runs on a self-hosted RTX 4070 via Ollama, Google Gemini, or Anthropic Claude. One environment variable. No code changes. This matters for businesses with data sovereignty concerns.

The linking engine achieves >85% auto-match rates on our test data, with meaningful confidence scoring that accountants can trust. The match reasons are human-readable explanations, not just numbers.

The scheduler is genuinely useful β€” it doesn't just show you deadlines, it actually starts preparing documents ahead of time. By the time an accountant opens a filing due next week, the draft is already waiting.

The fish. The animated tuna swimming into the logo is objectively delightful, and the CSS-only implementation means zero JavaScript overhead.

What we learned

Tax is deeply jurisdiction-specific. UK VAT quarters don't align with calendar quarters. German trade tax (Gewerbesteuer) has municipality-dependent multipliers. Australian BAS combines GST and PAYG in one form. You can't build a generic "tax engine" β€” you need jurisdiction-aware logic at every layer.

Accountants want control, not replacement. Every accountant we talked to was clear: they don't want AI filing taxes autonomously. They want AI doing the grunt work (data entry, matching, calculations) while they retain approval authority. The human-in-the-loop design isn't a compromise β€” it's the product.

LLM-generated code is surprisingly good at tax calculations when given precise rules and clean data. The key is constraining the output format tightly and providing the tax rules as structured context, not expecting the model to "know" tax law.

Docker compose is underrated for hackathon demos. One command to start everything, including the database, scheduler, and frontend. No "let me just start this other terminal" moments during the presentation.

What's next for TunaTax

Real integrations. The connector architecture is built but uses mock data. Next steps are plugging in the Gmail API for invoice ingestion, Open Banking (PSD2/CDR) for live transaction feeds, and QuickBooks/Xero for accounting data sync.

More jurisdictions. The knowledge base is designed to be extensible. We want to add the EU VAT One-Stop Shop (OSS), India GST, Singapore GST, and Canadian HST/PST.

Regulatory change monitoring. Using the LLM's web search grounding to monitor government gazettes and tax authority announcements, then automatically flagging changes that affect specific businesses.

Invoice OCR at scale. Using the vision model to extract structured data from any invoice format β€” not just the ones we've trained the linking engine on. Multi-page invoices, handwritten receipts, foreign-language documents.

Filing API integrations. Direct submission to HMRC (Making Tax Digital), the IRS (Modernized e-File), the BZSt (ELSTER), and the ATO β€” replacing the "Submit" button with an actual API call.

Multi-tenant SaaS. The current architecture supports one company. The database schema and auth system need to support accounting firms managing multiple clients, each with their own jurisdictions and data sources.


Built with Python, React, LiteLLM, Ollama, FastAPI, and an unreasonable number of fake invoices.

Built With

Share this project:

Updates