Inspiration
Every month I'd look at my bank statement and wonder where my grocery money went. I tried expense apps but they only tracked totals, they couldn't tell me that milk had quietly gone up 40% over six months, or that I was buying the same items repeatedly without realizing it. I wanted an AI that didn't just record my spending but actually understood it and talked to me about it like a smart friend would.
What it does
Financial Copilot turns every grocery receipt into actionable intelligence:
- Receipt Scanning - Upload a photo or PDF of any receipt. Gemini 2.5 Flash Vision extracts every line item, price, quantity, and store name automatically.
- Price Anomaly Detection - Every item is compared against your personal price history. If milk costs 30% more than usual, the AI agent asks: "Did you switch brands or buy a larger size?"
- Buying Cycle Prediction - After 3+ purchases of the same item, the app calculates your average restock frequency and predicts when you'll need it next.
- Predictive Grocery List - Automatically populated based on your detected buying cycles. No manual input needed.
- RAG-Grounded AI Agent - The conversational agent is injected with your real purchase history as context so it never hallucinates every price and date it mentions is pulled directly from your Firestore database.
- Spending Analytics - Category breakdown, per-item price history, and spending trends across 7, 30, and 90-day windows. ## How we built it The backend is a single FastAPI Python file deployed to Google Cloud Run. Receipt images are processed by Gemini 2.5 Flash Vision using a strict JSON output prompt to extract structured line items. These are stored in Google Firestore keyed by user and receipt ID.
The intelligence layer is pure Python — an AnomalyDetector class compares current
prices against historical averages, and a CyclePredictor calculates average gaps
between purchases using time-series data from Firestore.
When a user opens the voice agent, a RAG context is built fresh — pulling the latest receipt, detected anomalies, buying cycle predictions, and 30-day spending summary — and injected as the system prompt into Gemini 2.5 Flash. The agent responds through a WebSocket connection in real-time.
The frontend is a single React HTML file (no build tools needed) with a clean dark UI, Chart.js for analytics, and the Web Speech API for voice input. The entire infrastructure is provisioned with Terraform and deployed with a one-command shell script using Google Cloud Build.
Challenges we ran into
- Gemini model compatibility - The google-generativeai SDK was deprecated mid-build.
Migrated to the new
google-genaiSDK and updated all API call patterns. - Python 3.9 type hints - The
dict | Noneunion syntax isn't supported in Python 3.9. Fixed by removing the union type hints for backwards compatibility. - Image processing - Different receipt formats (JPEG, PNG, HEIC from iPhone, PDF) needed different handling. Built a Pillow preprocessing pipeline that normalizes all formats to clean JPEG before sending to Gemini Vision.
- Gemini Live API model access - The
gemini-2.0-flash-live-001model wasn't available on our API key. Discovered the correct model namegemini-2.5-flash-native-audio-latestby querying the models list API. ## Accomplishments that we're proud of - Gemini model compatibility - The google-generativeai SDK was deprecated mid-build.
Migrated to the new
google-genaiSDK and updated all API call patterns. - Python 3.9 type hints - The
dict | Noneunion syntax isn't supported in Python 3.9. Fixed by removing the union type hints for backwards compatibility. - Image processing - Different receipt formats (JPEG, PNG, HEIC from iPhone, PDF) needed different handling. Built a Pillow preprocessing pipeline that normalizes all formats to clean JPEG before sending to Gemini Vision.
- Gemini Live API model access - The
gemini-2.0-flash-live-001model wasn't available on our API key. Discovered the correct model namegemini-2.5-flash-native-audio-latestby querying the models list API. ## What we learned - Gemini 2.5 Flash Vision is production-ready for document parsing when given a strict JSON schema in the prompt temperature 0.1 with explicit format instructions gives near-perfect structured output
- RAG is more powerful than fine-tuning for personalized agents injecting user-specific context at inference time gives better results than any model training
- Firestore + Cloud Run is an excellent pairing for AI apps both are serverless, both scale to zero, and Firestore's document model maps naturally to receipt data
- The Web Speech API (built into Chrome) is dramatically more accurate than any custom transcription approach and it's completely free
- Terraform makes hackathon deployments reproducible being able to
terraform destroyandterraform applyfrom scratch saved hours of debugging
What's next for Financial Copilot
- Full Gemini Live API integration — real-time bidirectional audio streaming so the agent speaks back to you in a natural voice with true interruption support
- Multi-store price comparison — "You paid $5 for milk at Walmart but you got it for $3.50 at Aldi last month — want me to add Aldi to your route?"
- Inflation tracker — track how your personal grocery basket cost has changed month over month vs national CPI
- Shared household mode — multiple family members uploading to the same account
- Mobile app — wrap the web app in Capacitor for iOS/Android with native camera receipt scanning
Built With
- chart.js
- css
- docker
- fastapi
- gemini-vision-api
- google-cloud-run
- google-clould-build
- google-firestore
- google-gemini-2.5-flash
- google-genai-sdk
- html
- javascript
- pillow
- python
- react
- terraform
- uvicorn
- web-speech-api
- websockets
Log in or sign up for Devpost to join the conversation.