FinanceGPT
Your AI-powered personal CPA. Connect bank accounts, upload tax forms, and get instant insights on spending, investments, tax refunds, and credit card optimization—all in one privacy-focused app.
Inspiration
Every year, millions of Americans face the same frustrating ritual: gathering scattered financial documents, deciphering cryptic tax forms, and wondering if they're leaving money on the table. I was one of them.
After spending hours manually categorizing transactions, cross-referencing W2s with pay stubs, and googling "what is Box 12 code DD?", I had a realization: we have AI that can write poetry and generate images, but nothing that can simply tell me if I'm using the right credit card for groceries.
Three observations drove this project:
- Financial literacy is gatekept — Quality financial advice costs \$200+/hour from a CPA
- Our financial data is fragmented — Bank accounts, brokerages, tax documents, and credit cards all live in different silos
- AI can bridge this gap — Large language models can now understand context, perform calculations, and provide personalized advice
I wanted to build something like having a knowledgeable friend who happens to be a CPA, available 24/7, who already knows your complete financial picture.
What it does
FinanceGPT is an AI-powered personal finance assistant that:
- Connects to your bank accounts via Plaid to aggregate transactions, balances, and investment holdings
- Parses tax documents (W2, 1099-INT, 1099-DIV, 1099-B, 1099-MISC, 1095-C) using LLM-powered extraction
- Answers financial questions using RAG (Retrieval-Augmented Generation) over your complete financial data
- Estimates tax refunds using federal tax brackets and your uploaded forms
- Optimizes credit card usage by analyzing spending patterns and recommending the best card per category
- Finds subscriptions and identifies "zombie" subscriptions you forgot about
- Analyzes portfolio allocation and compares against investment philosophies (Bogleheads, Three-Fund Portfolio)
All with privacy-first design: PII (SSN, EIN) is masked before any LLM processing, and you can self-host with local models.
How I built it
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 14 + TypeScript | Server components, type safety |
| Backend | FastAPI + Python | Async API with auto-generated docs |
| Database | PostgreSQL + pgvector | Relational + vector search |
| Task Queue | Celery + Redis | Async document processing |
| AI | LiteLLM | Provider-agnostic (OpenAI, Anthropic, Ollama) |
| Banking | Plaid API | Financial data aggregation |
| Auth | Better Auth | Secure authentication |
Key Implementation Details
Tax Form Processing Pipeline:
# 1. Mask PII before LLM call
masked_text = mask_pii_in_text(raw_text)
# SSN: 123-45-6789 → XXX-XX-XXXX
# 2. LLM extracts structured data
extracted = await llm.parse(masked_text, response_format=W2ExtractedData)
# 3. Store in normalized tables
await db.save(W2Form(**extracted.model_dump()))
Challenges we ran into
1. Tax Form Variability
W2 forms look different from every employer—grids, columns, varying layouts.
Initial approach: Regex patterns like r'Box 1[:\s]+\$([\d,]+)'
Problem: Matched "Box 12: \$1,500" incorrectly (captured just "1")
Solution: LLM-first parsing with Pydantic structured output. The LLM understands context and handles layout variations that regex cannot.
2. Tax Year Detection
Users upload "W2_2024.pdf" but the document might be for tax year 2023.
Solution: Cascade detection:
- Filename pattern → 2. Document content search → 3. Default to previous year
3. Orphaned Database Records
Deleting documents left tax form records with document_id = NULL, causing duplicate data.
Solution: ON DELETE CASCADE foreign key constraint.
4. LLM Cost Management
GPT-4 extraction cost ~\$0.10/document.
Solution:
- User-configurable LLMs (including free local models via Ollama)
- Smaller models for simple tasks
- Caching extracted data
5. PII Security
Financial documents contain SSN and EIN—can't send to external LLMs.
Solution: Regex-based masking before any LLM call, with local extraction for sensitive fields.
Accomplishments that I am proud of
- Zero PII Exposure: SSN/EIN never leaves the server—masked before any LLM processing
- Provider Agnostic: Works with OpenAI, Anthropic, Google, or fully local with Ollama
- Intelligent Tax Parsing: 95%+ accuracy on W2/1099 extraction using LLM + Pydantic structured output
- Real-time Credit Card Optimization: Fetches current rewards rates and calculates exact missed rewards
- Sub-second RAG Queries: pgvector enables fast semantic search across all financial documents
- Full Self-Hosting: Docker Compose setup for complete privacy
What I learned
Technical
- Structured LLM Output > Free-form: Pydantic models with LiteLLM dramatically improved extraction reliability
- Fallbacks are Essential: LLM → Heuristic → Manual review flag cascade ensures no data is lost
- Database Normalization Matters: Separate tables for W2Form, Form1099Int, etc. simplified queries vs. JSON blobs
Product
- Privacy is Non-Negotiable: Users won't connect financial accounts without trust
- Context is Everything: "What are my expenses?" needs to understand: which accounts? what time period? what categories?
- Financial Jargon is a Barrier: The AI must translate "Box 12 Code DD" to plain English
What's next for FinanceGPT
- [ ] Multi-user households — Joint accounts and family financial planning
- [ ] Proactive tax optimization — Suggestions for HSA contributions, 401k limits, estimated payments
- [ ] Investment benchmarking — Compare portfolio performance against S&P 500, target-date funds
- [ ] Bill negotiation assistant — Identify bills that could be reduced and draft negotiation scripts
- [ ] Mobile app — React Native companion for on-the-go financial insights
- [ ] Scheduled reports — Weekly spending summaries, monthly net worth updates
Built with ❤️ for anyone who's ever stared at a W2 wondering what it all means.
Built With
- docker
- electric
- fastapi
- postgresql
- python
- react
- typescript
Log in or sign up for Devpost to join the conversation.