Financial Copilot

Inspiration

Every month I'd look at my bank statement and wonder where my grocery money went. I tried expense apps but they only tracked totals, they couldn't tell me that milk had quietly gone up 40% over six months, or that I was buying the same items repeatedly without realizing it. I wanted an AI that didn't just record my spending but actually understood it and talked to me about it like a smart friend would.

What it does

Financial Copilot turns every grocery receipt into actionable intelligence:

Receipt Scanning - Upload a photo or PDF of any receipt. Gemini 2.5 Flash Vision extracts every line item, price, quantity, and store name automatically.
Price Anomaly Detection - Every item is compared against your personal price history. If milk costs 30% more than usual, the AI agent asks: "Did you switch brands or buy a larger size?"
Buying Cycle Prediction - After 3+ purchases of the same item, the app calculates your average restock frequency and predicts when you'll need it next.
Predictive Grocery List - Automatically populated based on your detected buying cycles. No manual input needed.
RAG-Grounded AI Agent - The conversational agent is injected with your real purchase history as context so it never hallucinates every price and date it mentions is pulled directly from your Firestore database.
Spending Analytics - Category breakdown, per-item price history, and spending trends across 7, 30, and 90-day windows. ## How we built it The backend is a single FastAPI Python file deployed to Google Cloud Run. Receipt images are processed by Gemini 2.5 Flash Vision using a strict JSON output prompt to extract structured line items. These are stored in Google Firestore keyed by user and receipt ID.

The intelligence layer is pure Python — an AnomalyDetector class compares current prices against historical averages, and a CyclePredictor calculates average gaps between purchases using time-series data from Firestore.

When a user opens the voice agent, a RAG context is built fresh — pulling the latest receipt, detected anomalies, buying cycle predictions, and 30-day spending summary — and injected as the system prompt into Gemini 2.5 Flash. The agent responds through a WebSocket connection in real-time.

The frontend is a single React HTML file (no build tools needed) with a clean dark UI, Chart.js for analytics, and the Web Speech API for voice input. The entire infrastructure is provisioned with Terraform and deployed with a one-command shell script using Google Cloud Build.

Challenges we ran into

Gemini model compatibility - The google-generativeai SDK was deprecated mid-build. Migrated to the new google-genai SDK and updated all API call patterns.
Python 3.9 type hints - The dict | None union syntax isn't supported in Python 3.9. Fixed by removing the union type hints for backwards compatibility.
Image processing - Different receipt formats (JPEG, PNG, HEIC from iPhone, PDF) needed different handling. Built a Pillow preprocessing pipeline that normalizes all formats to clean JPEG before sending to Gemini Vision.
Gemini Live API model access - The gemini-2.0-flash-live-001 model wasn't available on our API key. Discovered the correct model name gemini-2.5-flash-native-audio-latest by querying the models list API. ## Accomplishments that we're proud of
Gemini model compatibility - The google-generativeai SDK was deprecated mid-build. Migrated to the new google-genai SDK and updated all API call patterns.
Python 3.9 type hints - The dict | None union syntax isn't supported in Python 3.9. Fixed by removing the union type hints for backwards compatibility.
Image processing - Different receipt formats (JPEG, PNG, HEIC from iPhone, PDF) needed different handling. Built a Pillow preprocessing pipeline that normalizes all formats to clean JPEG before sending to Gemini Vision.
Gemini Live API model access - The gemini-2.0-flash-live-001 model wasn't available on our API key. Discovered the correct model name gemini-2.5-flash-native-audio-latest by querying the models list API. ## What we learned
Gemini 2.5 Flash Vision is production-ready for document parsing when given a strict JSON schema in the prompt temperature 0.1 with explicit format instructions gives near-perfect structured output
RAG is more powerful than fine-tuning for personalized agents injecting user-specific context at inference time gives better results than any model training
Firestore + Cloud Run is an excellent pairing for AI apps both are serverless, both scale to zero, and Firestore's document model maps naturally to receipt data
The Web Speech API (built into Chrome) is dramatically more accurate than any custom transcription approach and it's completely free
Terraform makes hackathon deployments reproducible being able to terraform destroy and terraform apply from scratch saved hours of debugging

What's next for Financial Copilot

Full Gemini Live API integration — real-time bidirectional audio streaming so the agent speaks back to you in a natural voice with true interruption support
Multi-store price comparison — "You paid $5 for milk at Walmart but you got it for $3.50 at Aldi last month — want me to add Aldi to your route?"
Inflation tracker — track how your personal grocery basket cost has changed month over month vs national CPI
Shared household mode — multiple family members uploading to the same account
Mobile app — wrap the web app in Capacitor for iOS/Android with native camera receipt scanning

Built With

chart.js
css
docker
fastapi
gemini-vision-api
google-cloud-run
google-clould-build
google-firestore
google-gemini-2.5-flash
google-genai-sdk
html
javascript
pillow
python
react
terraform
uvicorn
web-speech-api
websockets

Updates

Isha Shetye started this project — Mar 16, 2026 07:58 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.