Inspiration

I had an aspiration to create an application that not only could teach finance, but could visualize it as well. When I thought of it, I began to understand that a Financial OS is a complete version of this. From there, I decided on the name "Folia", and the rest goes on.

What it does

Folia is a universal financial intelligence platform built for every life stage from 13 to retirement. It gives users a complete picture of their financial life through seven core capabilities: a RAG-powered AI advisor grounded in IRS, CFPB, and SEC documents that maintains memory across sessions; a life simulator that projects net worth across 5, 10, and 20-year horizons using Monte Carlo analysis with inflation adjustment, Social Security modeling, and real estate appreciation; a six-component financial health score benchmarked against Federal Reserve Survey of Consumer Finances data; a 90-day cash flow forecasting engine that flags specific calendar dates when cash drops below safe thresholds; a spending anomaly detection system using statistical z-score analysis that identifies unusual patterns and explains what caused them; a document intelligence layer powered by Gemini that extracts structured data from pay stubs, W-2s, brokerage statements, and financial aid letters; and a paper trading environment where users can practice investing with $100,000 in virtual cash against live market data.

How we built it

The stack is Next.js 15 on the frontend deployed on Vercel, FastAPI on the backend, deployed on Render, and Supabase as the primary database with 28 tables, RLS policies, and Realtime subscriptions for live UI updates. Clerk handles authentication with Google and GitHub OAuth, webhook sync to Supabase, and JWT verification on every API endpoint. Vectors are stored in Pinecone with namespace isolation — one shared namespace for the government knowledge base and a private namespace per user for uploaded documents. We use three LLMs routed by task: GPT-4o for the advisor because it needs deep reasoning, Groq llama-3.3-70b-versatile for all narration and summaries because it's near-instant and free tier, gpt-4o-mini for glossary definitions because structured JSON output at 90% less cost, and Gemini 2.0 Flash for multimodal document parsing. Transactional email goes through SendGrid with six dynamic templates. Redis sits in front of the database with fingerprint-versioned cache keys that guarantee personalized results are never stale — if any underlying user data changes, the cache key changes automatically.

Challenges we ran into

The hardest problem was making the RAG pipeline actually return correct, grounded answers instead of hallucinating. IRS publications are long and dense — naive word-count chunking would split a paragraph about contribution limits across two chunks, making neither chunk useful in isolation. We rebuilt the chunker to use semantic boundaries, splitting on paragraph breaks first, then sentence ends, with adaptive chunk sizes: 512 tokens for government documents to preserve context and 256 tokens for user-uploaded personal documents for precision. We also discovered that returning too many chunks to GPT-4o made it ignore the low-relevance ones and sometimes synthesize across unrelated sections. The fix was a relevance scoring filter that drops chunks below a 0.45 cosine similarity threshold before they ever reach the prompt. The second major challenge was the simulator. A 30-year projection that assumes fixed expenses, no salary growth, and no Social Security produces numbers that are meaningfully wrong for older users. Modeling inflation-adjusted expenses, 2% annual salary growth for W-2 earners, Social Security income kicking in at 67, and separate return rates per asset class required rebuilding the simulation engine from scratch. The Monte Carlo implementation uses a lognormal return distribution instead of a normal one, which matters because real market returns have fat tails, and a normal distribution systematically underestimates crash severity.

Accomplishments that we're proud of

The spending anomaly detection system. Most financial apps show you that you spent more this month — Folia tells you that your dining spending is 3.2 standard deviations above your three-month baseline and is statistically significant, or that 60% of your monthly spending happened on three days which is a binge pattern. Five detection algorithms run simultaneously and the AI synthesizes all flagged patterns into a single causation sentence that answers "why did my budget break this month" rather than just showing the total. The health score benchmarking. The trajectory component compares your actual net worth against the Fidelity income-multiplier benchmark (1x salary at 30, 3x at 40, 6x at 50, 10x at 67) cross-referenced with Fed SCF 2022 medians by age bracket. The tax efficiency component measures what percentage of your assets are in tax-advantaged accounts and calculates the approximate annual tax cost of the gap. This is the kind of analysis that previously required a fee-only financial advisor. The advisor memory system. Conversation history is persisted to Supabase and loaded at the start of every session, with a compression algorithm that summarizes older turns into a single context message when the conversation gets long. The advisor genuinely remembers what you discussed last week.

What we learned

Prompt engineering for financial advice is substantially harder than general-purpose prompting because the cost of a hallucinated number is real. We went through four versions of the system prompt before landing on the current one — the critical breakthrough was adding a structured output requirement (Answer / Why it matters / Next step) that forces the model to connect every response to the user's specific financial profile rather than giving generic advice. We also learned that model routing matters more than model quality at the top end. Groq llama-3.3-70b is genuinely as good as GPT-4o for one-paragraph narration tasks and returns in under a second. Using GPT-4o for everything would have cost 15-20x more per user session with no quality improvement for most features. On the infrastructure side, fingerprint-versioned cache keys solved a class of cache invalidation bugs that explicit invalidation never fully handles — it's a pattern we'll carry into every future project.

What's next for Folia

The immediate priority is the cron job layer — weekly email digests, quarterly tax reminders for freelance users, and nightly net worth snapshots are all built on the backend but not yet scheduled. The second priority is open banking integration via Plaid to auto-import transactions instead of requiring manual entry, which is the biggest friction point in the current flow. Longer term, the most interesting direction is social benchmarking — the community module already stores anonymized simulation outcomes by life stage and income bracket, and with enough users those peer percentiles become genuinely meaningful data. A 28-year-old making $70k could see that their net worth puts them at the 62nd percentile of their cohort and understand exactly which decisions account for the gap. We also want to build a tax optimization planner that runs quarterly — connecting the tax engine to the user's actual transaction data to calculate projected tax liability in real time, flag Roth conversion windows, and identify tax-loss harvesting opportunities before year-end.

Built With

Share this project:

Updates