Inspiration

It's tax season, and it's the same story every year. Hundreds of physical receipts pile up annually. I always feel bad handing all those receipts to my CPA.

I knew there had to be a better way and an opportunity to make this process easier for everyone. I wanted to create a "digital CPA" that could not only see and read my receipts in real time but also understand the nuances of tax categorization, saving me time and making my accountant much happier.

What it does

Blue Hills Tax Automator is a multimodal AI agent that transforms messy, physical receipts into structured, audit-ready tax data.

Users can simply snap a photo of a receipt. Using Gemini 3 Flash's advanced vision capabilities, the agent extracts the vendor, date, and amount. It then goes a step further by using reasoning logic to assign the expense to a standard IRS Schedule C category. All data is securely stored in Firestore, creating a real-time ledger that can generate full tax season reports with a single voice command.

How I built it

The project is built on a robust Google-native stack:

Orchestration: Built with the Google Agent Development Kit (ADK) to manage the agentic workflow and tool calls.

Intelligence: Utilizes Gemini 3 Flash for its low-latency multimodal processing, allowing it to "see" and "reason" simultaneously.

Infrastructure: Deployed on Google Cloud Run with Firebase handling authentication and database needs.

Development: Developed entirely within Google Antigravity, utilizing "vibe coding" principles to iterate rapidly on complex agentic logic.

Challenges I faced

The road to a functional agent wasn't as smooth as the initial "vibe coding" sessions suggested. I faced a steep learning curve integrating the Google ADK with the broader Google Cloud infrastructure. There was significant back-and-forth, debugging service account permissions, managing Vertex AI quotas, and ensuring the Antigravity environment variables aligned perfectly with the Cloud Run deployment.

Furthermore, real-time image integration remains a persistent challenge. While the agent's "eyes" work well with static uploads, achieving a seamless, low-latency live feed that allows the agent to process receipts as they move through the camera's view is an ongoing optimization. Balancing high-resolution extraction with the speed required for a "live" interaction has been a masterclass in multimodal trade-offs.

Accomplishments that I'm proud of

I am incredibly proud of creating a system that handles "Decision Boundaries" effectively. It doesn't just blindly save data; it understands the difference between a business-related AutoZone trip and a personal grocery run. Seeing the agent successfully process a crumpled receipt and correctly categorize it in Firestore for the first time was a huge "aha" moment.

What I learned

This project was a deep dive into the world of agentic systems. I learned how to move beyond basic prompting and into governance-first architecture, where AI agents are treated as digital employees with specific roles, permissions, and audit trails.

What's next for Blue Hills Tax Automator

The goal is to expand the agent's capabilities. My plan is to create an agent for my accountant, so I'm exploring A2A (Agent-to-Agent) protocols to enable my Tax Automator to communicate directly with my accountant's AI. I also intend to develop a predictive model for estimating expenses and tax returns.

Built With

  • antigravity
  • fastapi
  • firebase-authentication
  • firebase-firestore
  • gemini-3-flash
  • google
  • google-agent-development-kit-(adk)
  • google-cloud-run
  • next.js
  • python
  • tailwind-css
  • vertex-ai
Share this project:

Updates