Taxia
Links
Presentation
Video
Blog on https://builder.aws.com/ and use the tag Amazon-Nova
Repositories
Architecture Diagram
Live Demo URL
Inspiration
Tax compliance is one of the most universal sources of anxiety across the globe. In Spain alone, 4.7 million self-employed workers (autónomos) and 3.2 million SMEs navigate a labyrinth of obligations every quarter — Modelo 303 for VAT, Modelo 130 for income estimates, Modelo 200 for corporate tax — each with different deadlines, different rules, and penalties that can reach 150% of the unpaid amount. In the United States, the IRS estimates that taxpayers spend an average of 13 hours preparing a single return, and small businesses spend between $3,000 and $8,000 per year on professional tax advisory.
We kept asking: why does this process still feel like it belongs in 1995?
The inspiration for TAXIA came from a personal frustration shared by every member of our team. One of us is a freelance developer who moved from Colombia to Spain and had to figure out Spanish tax obligations from scratch — in a second language, with no context, and no affordable help. Another manages finances for a small startup and spends two full days every quarter just preparing VAT filings. A third member's family runs a gestoría (tax management firm) in Madrid, manually copying data between client spreadsheets and the Agencia Tributaria portal for dozens of clients.
When Amazon announced the Nova AI Hackathon, we saw the opportunity to build something real. Nova's portfolio — voice AI, multimodal document understanding, browser automation, and semantic search — mapped perfectly onto every pain point in the tax compliance journey. No single AI capability solves the problem alone. But together? We could build a system that talks to you about your taxes, reads your receipts, understands your W-2, calculates your obligations, searches tax law for answers, and files your return on the government portal — all without you ever touching a spreadsheet.
That's TAXIA: your AI tax team, from first question to filed return.
What it does
TAXIA is an end-to-end AI-powered tax compliance platform that serves three audiences:
For individuals (B2C): TAXIA replaces the expensive tax advisor or the terrifying DIY approach. You connect your bank, upload your documents, and talk to your AI assistant — by voice or text, in English, Spanish, Portuguese, French, German, Italian, or Hindi. TAXIA figures out what you owe, generates the paperwork, and files it for you.
For businesses and startups (B2B): TAXIA acts as a fractional tax department. It onboards your entity, understands your activity and jurisdiction, monitors obligations, classifies every transaction, generates compliance reports, and automates filings — reducing tax compliance costs by up to 80%.
For gestores and finance managers (multi-tenant B2B): TAXIA is a practice management platform. One login manages all client accounts — personal and business — with AI doing the heavy lifting while the professional reviews and approves. A solo practitioner can scale from 20 clients to 200.
The complete user journey:
- Voice onboarding — Call TAXIA or open the web app and start talking. Nova 2 Sonic conducts a natural conversation: "Are you filing as an individual or a business? What's your activity? Where are you located?" It builds your tax profile while you talk, hands-free, in your preferred language.
- Document scanning — Photograph receipts, invoices, or tax forms. Nova 2 Lite's multimodal vision extracts every field — merchant names, amounts, dates, tax IDs — and auto-categorizes them as deductible expenses. Upload a W-2 or 1099 PDF, and it extracts every box value instantly.
- Intelligent classification — Import your bank transactions (CSV or bank API). Nova 2 Lite's tool-calling capability classifies each transaction: office supplies, meals, travel, equipment, personal. Conservative by default — uncertain items get flagged for your review.
- Tax calculation with deep reasoning — Nova 2 Lite's extended thinking capability analyzes your complete financial picture: compares itemized vs. standard deductions, checks AMT thresholds, calculates federal and state liabilities bracket-by-bracket, and projects estimated quarterly payments. The code interpreter runs precise arithmetic — no rounding errors.
- Knowledge-grounded Q&A — Ask "Can I deduct my home office?" and TAXIA searches a knowledge base of tax regulations using Nova Multimodal Embeddings — the only embedding model that unifies text, images, documents, video, and audio in a single vector space. Answers come with citations to specific IRS publications or AEAT circulars.
- Live law updates — Nova 2 Lite's web grounding searches official sources in real-time for the latest deadline changes, new deduction rules, or regulatory announcements — always cited.
- Automated filing — When you're ready, Nova Act opens the government tax portal, navigates the interface, fills in every field with your calculated data, and pauses for your approval before submitting. Credentials are handled securely through Playwright (never sent to the AI model). For gestores, a fleet of parallel Nova Act agents can file 10+ client returns simultaneously, each with its own human-in-the-loop checkpoint.
- PDF reports and audit trails — TAXIA generates professional tax summary PDFs, compliance reports, and maintains a complete audit trail including video recordings of every Nova Act filing session.
The platform is fully real-time: WebSocket connections push every AI response, filing progress update, and document extraction result to the frontend the instant it happens. Voice conversations use WebRTC for sub-100ms audio latency. The experience feels instant.
How we built it
Architecture
TAXIA is a four-service architecture with clean separation of concerns:
| Service | Tech | Role |
|---|---|---|
| taxia-web | NextJS 15, Bun, TypeScript, CSS Modules | Frontend: dashboard, chat, voice, document management |
| taxia-back | Rust (Axum + Tungstenite + Tonic) | Backend: REST API, WebSocket server, gRPC bridge, WebRTC signaling |
| taxia-back-db | Rust, PostgreSQL, sqlx | Database: migrations, queries, connection pooling |
| taxia-ai | Python, Strands Agents SDK | AI: multi-agent orchestration with all four Nova models |
Amazon Nova integration — all four models, 30+ capabilities
Nova 2 Lite (us.amazon.nova-2-lite-v1:0) is TAXIA's reasoning core. We use 10 distinct sub-capabilities within a single model:
- Text reasoning for conversational tax Q&A
- Image analysis with native OCR for receipt and invoice scanning (supports up to 8,000×8,000px resolution)
- PDF/document understanding via vision-based parsing of W-2s, 1099s, and financial statements (up to 400 pages in the 1M token context window)
- Video understanding for summarizing recorded tax consultation sessions
- Tool use with constrained decoding for transaction classification and tax calculations
- Web grounding (
nova_groundingsystem tool) for live tax law updates with citations - Code interpreter (
nova_code_interpretersystem tool) for precise arithmetic — bracket calculations, depreciation schedules, amortization - Extended thinking at configurable effort levels for complex multi-factor tax optimization (itemized vs. standard, AMT analysis)
- MCP support for connecting to external accounting systems
- 1M token context window for analyzing a full year of financial data in a single request
Nova 2 Sonic (amazon.nova-2-sonic-v1:0) powers voice interactions via bidirectional streaming over HTTP/2:
- Polyglot voices — Tiffany voice speaks 7 languages and switches mid-conversation without lag
- Asynchronous tool calling — continues talking while tax lookups run in the background
- Configurable turn-taking — MEDIUM sensitivity for general Q&A, HIGH for quick confirmations
- Emotional awareness — calmer tone when users are stressed about audits
- 8KHz telephony support — works over standard phone lines via Amazon Connect
Nova Act (nova-act Python SDK) automates government portal filing:
- Natural language + Python hybrid —
nova.act("Fill the gross income field with 75000")for navigation,nova.page.keyboard.type(password)for secure credential entry - Human-in-the-loop checkpoints — custom
HumanInputCallbacksBaseimplementation sends screenshots to the frontend for gestor approval before submission - Parallel agent fleets —
ThreadPoolExecutorruns up to 10 simultaneous filing sessions for batch gestor workflows - Video recording — every filing session is recorded for audit compliance
Nova Multimodal Embeddings (amazon.nova-2-multimodal-embeddings-v1:0) powers the tax knowledge base:
- 5-modality embeddings (text, image, document, video, audio) in a unified 1024-dimensional vector space
- Bedrock Knowledge Bases integration for managed RAG — tax regulations, IRS publications, AEAT circulars indexed and retrievable
- Cross-modal search — query "charitable donation receipt" and retrieve matching receipt images alongside relevant tax code sections
Multi-agent orchestration
We used the Strands Agents SDK with the Graph pattern for deterministic routing. An Intake Agent classifies every request and routes it to the appropriate specialist:
User message → Intake Agent (classify)
├→ Document Agent (Nova 2 Lite multimodal)
├→ Tax Calc Agent (Nova 2 Lite + extended thinking + code interpreter)
├→ Knowledge Agent (Nova MME + Bedrock KB RAG)
├→ Web Grounding Agent (Nova 2 Lite web grounding)
├→ Classification Agent (Nova 2 Lite tool calling)
├→ Filing Agent (Nova Act browser automation)
└→ General Agent (conversation, onboarding)
Each agent has its own system prompt, tool set, and model configuration optimized for its task. The Tax Calc Agent uses maxReasoningEffort: "high" for complex scenarios; the Classification Agent uses temperature: 0 for consistent categorization.
Real-time infrastructure
The Rust backend is the communication hub:
- WebSocket server (tokio-tungstenite) — persistent bidirectional connection for chat streaming, filing progress, and document extraction events
- gRPC (tonic) — high-performance binary protocol between backend and AI service, with streaming RPCs for voice and filing events
- WebRTC signaling — negotiates peer connections for sub-100ms voice latency between browser microphone and Nova 2 Sonic
- Event bus (tokio broadcast channels) — internal pub/sub for cross-cutting real-time events
Database
PostgreSQL with a multi-tenant schema from day one. The accounts table supports three types — personal, business, and client — linked to users through account_members with role-based access (owner, admin, member, viewer). A gestor logs in once and switches between their personal account, their firm's business account, and any client accounts they manage.
Frontend
NextJS 15 with App Router, server components where possible, and client components for real-time features. Custom SVG icon components referencing Material UI's Outlined set (zero external icon library dependencies). CSS Modules for scoped styling with CSS custom properties for theming (light/dark mode). Geist variable font for clean, modern typography.
All services are containerized with individual Dockerfiles and composed via a root docker-compose.yml that brings up PostgreSQL, runs migrations, starts the Rust backend, launches the Python AI service, and serves the NextJS frontend — all with a single docker compose up.
Challenges we ran into
1. Bridging WebRTC audio to Nova 2 Sonic's streaming API. Nova 2 Sonic expects PCM audio via its bidirectional HTTP/2 stream, but WebRTC delivers Opus-encoded audio via RTP. We had to build an audio bridge in Python that decodes Opus frames to raw PCM at 16kHz, chunks them appropriately, and forwards them to Sonic's audioInput events — while simultaneously routing Sonic's audioOutput back through the WebRTC peer connection. Getting the sample rate conversion right without introducing perceptible latency was the hardest real-time engineering challenge we faced.
2. Nova Act and government portal variability. Tax portals are notoriously inconsistent — different layouts across jurisdictions, unpredictable load times, CAPTCHAs, multi-factor authentication prompts. We learned that keeping Nova Act workflows under 30 steps per sequence dramatically improves reliability, and that using nova.page (Playwright) directly for all sensitive input (passwords, SSNs, tax IDs) is critical for security. The human-in-the-loop callbacks became essential — not just for approval, but for handling CAPTCHA challenges and MFA that the agent can't bypass.
3. Multi-agent state coordination. When the Document Agent extracts data from a W-2 and the Tax Calc Agent needs that data for calculation, the state handoff between agents had to be clean and reliable. Strands' Graph pattern helped, but we still needed a shared state store (PostgreSQL-backed) that agents could read from and write to atomically. Getting the orchestrator to route correctly — especially for ambiguous requests like "I uploaded my W-2, now calculate my taxes" that span two agents — required careful prompt engineering on the Intake Agent.
4. Extended thinking token costs. Nova 2 Lite's extended thinking with maxReasoningEffort: "high" produces remarkably accurate tax calculations, but reasoning tokens are billed at output rates and can reach 128K tokens for complex scenarios. We implemented a tiered strategy: "low" effort for simple lookups, "medium" for standard calculations, and "high" only for multi-factor optimization problems (itemized vs. standard with AMT consideration). This reduced our per-calculation cost by roughly 70% without sacrificing accuracy on routine queries.
5. Real-time streaming through three layers. A single chat response flows: Nova 2 Lite → Python gRPC stream → Rust gRPC client → Rust WebSocket → Browser. Each hop adds latency and complexity. Getting token-by-token streaming to work smoothly through the entire chain — without buffering artifacts, dropped chunks, or race conditions — required careful async programming in both Rust (tokio) and Python (asyncio/grpcio).
6. Multi-tenant data isolation. Every query to the AI must be scoped to the correct account. When a gestor switches from Client A to Client B, the entire AI context — knowledge base filters, document access, tax profile — must switch atomically. We solved this by threading account_id through every gRPC call and enforcing row-level security at the database layer.
Accomplishments that we're proud of
We integrated all four Amazon Nova models with 30+ distinct sub-capabilities into a single coherent product. Most hackathon projects use one model for one task. TAXIA uses Nova 2 Lite (text reasoning, image OCR, PDF analysis, video understanding, tool calling, web grounding, code interpreter, extended thinking, MCP), Nova 2 Sonic (voice with polyglot, async tools, emotional awareness), Nova Act (browser automation with HITL and parallel fleets), and Nova Multimodal Embeddings (5-modality semantic search) — all working together through a multi-agent orchestration layer. We believe this is one of the most comprehensive Nova integrations in the hackathon.
Voice-first tax onboarding actually works. Talking to TAXIA about your taxes feels natural. You say "I'm a freelance designer in Barcelona, I earn about 45,000 euros a year, and I have some business expenses" — and by the end of a 2-minute conversation, your tax profile is built, your obligations are identified, and the system knows which forms you need to file and when. The polyglot switching is genuinely magical: start in Spanish, ask a technical question in English, get an answer that code-switches seamlessly.
The Nova Act filing automation is visceral. Watching the AI navigate a tax portal, fill in every field with your data, pause for your approval with a screenshot, and then click submit — it's the kind of demo moment that makes people lean forward. The parallel fleet capability means a gestor can file 10 client returns simultaneously, each with its own approval checkpoint.
Real-time everything. Every interaction streams. Chat responses appear token-by-token. Document extractions push to the UI the instant they're complete. Filing progress updates show screenshots in real-time. Voice responses arrive in under 250ms. The Rust backend handles all of this with minimal resource usage — a single binary serving REST, WebSocket, gRPC, and WebRTC signaling concurrently.
Multi-tenancy from day one. The same user can be an individual taxpayer, a startup founder, and a gestor managing client accounts — all from one login, with real-time account switching and complete data isolation.
What we learned
Nova 2 Lite's multimodal capabilities are deeper than we expected. We initially planned to use separate OCR services for receipts and a different tool for PDF parsing. Then we discovered that Nova 2 Lite handles images, PDFs, and video natively — with surprisingly accurate extraction from messy real-world receipts, handwritten notes, and complex multi-page financial statements. The 1M token context window means we can load an entire year of financial documents in a single request. The key insight: one well-prompted multimodal model replaces an entire pipeline of specialized tools.
Extended thinking changes what's possible with tax calculations. Standard LLM responses for tax math are unreliable — they hallucinate numbers and skip bracket transitions. With extended thinking at "high" effort, Nova 2 Lite produces step-by-step calculations that match our manual verification to the cent. The tradeoff is latency and cost, but for a tax calculation that a user will rely on, correctness is non-negotiable.
Nova 2 Sonic's async tool calling is an underappreciated feature. The ability to say "Let me look that up for you" and keep talking while a database query runs in the background transforms voice AI from a call-and-response pattern to a genuine conversation. Users don't sit in silence waiting for a tool call to complete — the assistant fills the gap naturally.
Browser automation is hard, but Nova Act makes it viable. Traditional RPA (Selenium scripts) breaks every time a portal changes a button label. Nova Act's natural language approach is fundamentally more resilient — "click the Submit button" works regardless of whether the portal calls it "Submit", "File", "Enviar", or "Presentar". The Python + Playwright hybrid gives you the best of both worlds: AI-driven navigation and programmatic control for sensitive operations.
Rust is the right choice for real-time backends. Our WebSocket server handles hundreds of concurrent connections with microsecond message routing and zero garbage collection pauses. The type system caught dozens of serialization bugs at compile time that would have been runtime errors in Node.js or Python. The build takes longer, but the production reliability is worth it.
Strands Agents SDK simplified what would have been months of plumbing. Defining specialist agents with their own tools and system prompts, then wiring them into a deterministic graph with conditional routing — this took hours instead of weeks. The Graph pattern is the right abstraction for tax compliance, where you need predictable routing (not autonomous chain-of-thought that might go off-rails with someone's money).
What's next for TAXIA
Bank API integration. We currently support CSV transaction imports. The next step is direct bank connectivity via Open Banking APIs (PSD2 in Europe, Plaid in the US) for real-time transaction streaming and automatic expense classification.
Jurisdiction expansion. TAXIA's architecture is jurisdiction-agnostic — the tax calculation logic and portal automation are parameterized by country. We plan to expand from our initial US (IRS) and Spain (AEAT) support to the UK (HMRC), Germany (ELSTER), Brazil (Receita Federal), and Mexico (SAT). Each jurisdiction requires a new Knowledge Base corpus and Nova Act filing workflow, but the core platform is ready.
Mobile app with camera-first experience. A native iOS/Android app where you point your camera at any receipt, invoice, or tax form and TAXIA instantly extracts, categorizes, and files it. Nova 2 Lite's image analysis at 8,000×8,000px resolution makes this viable without compression.
Real-time tax projection dashboard. A live dashboard that updates your estimated tax liability as new transactions flow in from connected bank accounts — showing how each purchase affects your bottom line. "You just spent $500 at an office supply store — this is deductible and reduces your estimated Q2 payment by $120."
Gestor marketplace. Connect individuals who need professional tax help with verified gestores who use TAXIA. The platform already supports the multi-tenant relationship — the next step is onboarding, matching, and payment.
Automated quarterly filing. For self-employed users in jurisdictions with quarterly obligations (Spain's Modelo 303, US estimated payments), TAXIA can automatically prepare and file quarterly returns on schedule — with a simple notification and one-click approval.
Compliance monitoring. Continuous scanning of regulatory changes using Nova 2 Lite's web grounding, with proactive alerts: "Spain has updated the digital nomad tax regime — here's how it affects your filing."
TAXIA started as a hackathon project, but the problem it solves is real, urgent, and universal. Taxes aren't going away. The complexity isn't decreasing. And the people who need help the most — freelancers, small businesses, immigrants navigating a new tax system — are the ones who can least afford professional help. We're building TAXIA to change that.
#AmazonNova
Log in or sign up for Devpost to join the conversation.