Inspiration

Every month I'd look at my bank statement and wonder where my grocery money went. I tried expense apps but they only tracked totals, they couldn't tell me that milk had quietly gone up 40% over six months, or that I was buying the same items repeatedly without realizing it. I wanted an AI that didn't just record my spending but actually understood it and talked to me about it like a smart friend would.

What it does

Financial Copilot turns every grocery receipt into actionable intelligence:

  • Receipt Scanning - Upload a photo or PDF of any receipt. Gemini 2.5 Flash Vision extracts every line item, price, quantity, and store name automatically.
  • Price Anomaly Detection - Every item is compared against your personal price history. If milk costs 30% more than usual, the AI agent asks: "Did you switch brands or buy a larger size?"
  • Buying Cycle Prediction - After 3+ purchases of the same item, the app calculates your average restock frequency and predicts when you'll need it next.
  • Predictive Grocery List - Automatically populated based on your detected buying cycles. No manual input needed.
  • RAG-Grounded AI Agent - The conversational agent is injected with your real purchase history as context so it never hallucinates every price and date it mentions is pulled directly from your Firestore database.
  • Spending Analytics - Category breakdown, per-item price history, and spending trends across 7, 30, and 90-day windows. ## How we built it The backend is a single FastAPI Python file deployed to Google Cloud Run. Receipt images are processed by Gemini 2.5 Flash Vision using a strict JSON output prompt to extract structured line items. These are stored in Google Firestore keyed by user and receipt ID.

The intelligence layer is pure Python — an AnomalyDetector class compares current prices against historical averages, and a CyclePredictor calculates average gaps between purchases using time-series data from Firestore.

When a user opens the voice agent, a RAG context is built fresh — pulling the latest receipt, detected anomalies, buying cycle predictions, and 30-day spending summary — and injected as the system prompt into Gemini 2.5 Flash. The agent responds through a WebSocket connection in real-time.

The frontend is a single React HTML file (no build tools needed) with a clean dark UI, Chart.js for analytics, and the Web Speech API for voice input. The entire infrastructure is provisioned with Terraform and deployed with a one-command shell script using Google Cloud Build.

Challenges we ran into

  • Gemini model compatibility - The google-generativeai SDK was deprecated mid-build. Migrated to the new google-genai SDK and updated all API call patterns.
  • Python 3.9 type hints - The dict | None union syntax isn't supported in Python 3.9. Fixed by removing the union type hints for backwards compatibility.
  • Image processing - Different receipt formats (JPEG, PNG, HEIC from iPhone, PDF) needed different handling. Built a Pillow preprocessing pipeline that normalizes all formats to clean JPEG before sending to Gemini Vision.
  • Gemini Live API model access - The gemini-2.0-flash-live-001 model wasn't available on our API key. Discovered the correct model name gemini-2.5-flash-native-audio-latest by querying the models list API. ## Accomplishments that we're proud of
  • Gemini model compatibility - The google-generativeai SDK was deprecated mid-build. Migrated to the new google-genai SDK and updated all API call patterns.
  • Python 3.9 type hints - The dict | None union syntax isn't supported in Python 3.9. Fixed by removing the union type hints for backwards compatibility.
  • Image processing - Different receipt formats (JPEG, PNG, HEIC from iPhone, PDF) needed different handling. Built a Pillow preprocessing pipeline that normalizes all formats to clean JPEG before sending to Gemini Vision.
  • Gemini Live API model access - The gemini-2.0-flash-live-001 model wasn't available on our API key. Discovered the correct model name gemini-2.5-flash-native-audio-latest by querying the models list API. ## What we learned
  • Gemini 2.5 Flash Vision is production-ready for document parsing when given a strict JSON schema in the prompt temperature 0.1 with explicit format instructions gives near-perfect structured output
  • RAG is more powerful than fine-tuning for personalized agents injecting user-specific context at inference time gives better results than any model training
  • Firestore + Cloud Run is an excellent pairing for AI apps both are serverless, both scale to zero, and Firestore's document model maps naturally to receipt data
  • The Web Speech API (built into Chrome) is dramatically more accurate than any custom transcription approach and it's completely free
  • Terraform makes hackathon deployments reproducible being able to terraform destroy and terraform apply from scratch saved hours of debugging

What's next for Financial Copilot

  • Full Gemini Live API integration — real-time bidirectional audio streaming so the agent speaks back to you in a natural voice with true interruption support
  • Multi-store price comparison — "You paid $5 for milk at Walmart but you got it for $3.50 at Aldi last month — want me to add Aldi to your route?"
  • Inflation tracker — track how your personal grocery basket cost has changed month over month vs national CPI
  • Shared household mode — multiple family members uploading to the same account
  • Mobile app — wrap the web app in Capacitor for iOS/Android with native camera receipt scanning

Built With

Share this project:

Updates