FinanceGPT

Your AI-powered personal CPA. Connect bank accounts, upload tax forms, and get instant insights on spending, investments, tax refunds, and credit card optimization—all in one privacy-focused app.


Inspiration

Every year, millions of Americans face the same frustrating ritual: gathering scattered financial documents, deciphering cryptic tax forms, and wondering if they're leaving money on the table. I was one of them.

After spending hours manually categorizing transactions, cross-referencing W2s with pay stubs, and googling "what is Box 12 code DD?", I had a realization: we have AI that can write poetry and generate images, but nothing that can simply tell me if I'm using the right credit card for groceries.

Three observations drove this project:

  1. Financial literacy is gatekept — Quality financial advice costs \$200+/hour from a CPA
  2. Our financial data is fragmented — Bank accounts, brokerages, tax documents, and credit cards all live in different silos
  3. AI can bridge this gap — Large language models can now understand context, perform calculations, and provide personalized advice

I wanted to build something like having a knowledgeable friend who happens to be a CPA, available 24/7, who already knows your complete financial picture.


What it does

FinanceGPT is an AI-powered personal finance assistant that:

  • Connects to your bank accounts via Plaid to aggregate transactions, balances, and investment holdings
  • Parses tax documents (W2, 1099-INT, 1099-DIV, 1099-B, 1099-MISC, 1095-C) using LLM-powered extraction
  • Answers financial questions using RAG (Retrieval-Augmented Generation) over your complete financial data
  • Estimates tax refunds using federal tax brackets and your uploaded forms
  • Optimizes credit card usage by analyzing spending patterns and recommending the best card per category
  • Finds subscriptions and identifies "zombie" subscriptions you forgot about
  • Analyzes portfolio allocation and compares against investment philosophies (Bogleheads, Three-Fund Portfolio)

All with privacy-first design: PII (SSN, EIN) is masked before any LLM processing, and you can self-host with local models.


How I built it

Tech Stack

Layer Technology Purpose
Frontend Next.js 14 + TypeScript Server components, type safety
Backend FastAPI + Python Async API with auto-generated docs
Database PostgreSQL + pgvector Relational + vector search
Task Queue Celery + Redis Async document processing
AI LiteLLM Provider-agnostic (OpenAI, Anthropic, Ollama)
Banking Plaid API Financial data aggregation
Auth Better Auth Secure authentication

Key Implementation Details

Tax Form Processing Pipeline:

# 1. Mask PII before LLM call
masked_text = mask_pii_in_text(raw_text)
# SSN: 123-45-6789 → XXX-XX-XXXX

# 2. LLM extracts structured data
extracted = await llm.parse(masked_text, response_format=W2ExtractedData)

# 3. Store in normalized tables
await db.save(W2Form(**extracted.model_dump()))

Challenges we ran into

1. Tax Form Variability

W2 forms look different from every employer—grids, columns, varying layouts.

Initial approach: Regex patterns like r'Box 1[:\s]+\$([\d,]+)'

Problem: Matched "Box 12: \$1,500" incorrectly (captured just "1")

Solution: LLM-first parsing with Pydantic structured output. The LLM understands context and handles layout variations that regex cannot.

2. Tax Year Detection

Users upload "W2_2024.pdf" but the document might be for tax year 2023.

Solution: Cascade detection:

  1. Filename pattern → 2. Document content search → 3. Default to previous year

3. Orphaned Database Records

Deleting documents left tax form records with document_id = NULL, causing duplicate data.

Solution: ON DELETE CASCADE foreign key constraint.

4. LLM Cost Management

GPT-4 extraction cost ~\$0.10/document.

Solution:

  • User-configurable LLMs (including free local models via Ollama)
  • Smaller models for simple tasks
  • Caching extracted data

5. PII Security

Financial documents contain SSN and EIN—can't send to external LLMs.

Solution: Regex-based masking before any LLM call, with local extraction for sensitive fields.


Accomplishments that I am proud of

  • Zero PII Exposure: SSN/EIN never leaves the server—masked before any LLM processing
  • Provider Agnostic: Works with OpenAI, Anthropic, Google, or fully local with Ollama
  • Intelligent Tax Parsing: 95%+ accuracy on W2/1099 extraction using LLM + Pydantic structured output
  • Real-time Credit Card Optimization: Fetches current rewards rates and calculates exact missed rewards
  • Sub-second RAG Queries: pgvector enables fast semantic search across all financial documents
  • Full Self-Hosting: Docker Compose setup for complete privacy

What I learned

Technical

  • Structured LLM Output > Free-form: Pydantic models with LiteLLM dramatically improved extraction reliability
  • Fallbacks are Essential: LLM → Heuristic → Manual review flag cascade ensures no data is lost
  • Database Normalization Matters: Separate tables for W2Form, Form1099Int, etc. simplified queries vs. JSON blobs

Product

  • Privacy is Non-Negotiable: Users won't connect financial accounts without trust
  • Context is Everything: "What are my expenses?" needs to understand: which accounts? what time period? what categories?
  • Financial Jargon is a Barrier: The AI must translate "Box 12 Code DD" to plain English

What's next for FinanceGPT

  • [ ] Multi-user households — Joint accounts and family financial planning
  • [ ] Proactive tax optimization — Suggestions for HSA contributions, 401k limits, estimated payments
  • [ ] Investment benchmarking — Compare portfolio performance against S&P 500, target-date funds
  • [ ] Bill negotiation assistant — Identify bills that could be reduced and draft negotiation scripts
  • [ ] Mobile app — React Native companion for on-the-go financial insights
  • [ ] Scheduled reports — Weekly spending summaries, monthly net worth updates

Built with ❤️ for anyone who's ever stared at a W2 wondering what it all means.

Built With

Share this project:

Updates