Inspiration

Over the past few years, my leaseholder friends and I watched our service charges jump relentlessly. When we tried to challenge these increases, we discovered professional audits cost hundreds to thousands of pounds, and we also lacked the expertise to know what to audit.

This isn't just my story. Five million UK leaseholders collectively pay £11 billion annually in service charges that rose 11% in 2024, four times faster than inflation! London's Mayor called for a cap after finding charges "unaffordable" and "poorly explained". BBC investigations found cases where charges nearly tripled, leaving leaseholders feeling "helpless".

It's not that the UK lacks expertise, Chartered Accountants exist and audit standards are well-established. The problem is accessibility, professional audits require significant investment and specialised knowledge. Only a small fraction of leaseholders are willing to pay thousands to verify if their charges are reasonable. The rest remain uncertain.

What if AI could make professional audits accessible to everyone? That's why I built Countable.

What it does

Countable replicates the exact workflow a Chartered Accountant follows when auditing service charges. Bringing professional methodology to every leaseholder.

The Audit Workflow

  1. Users can upload their service charge statement. Countable applies audit standards to define a materiality threshold, flagging items that exceed what matters financially.
  2. For flagged items, it identifies required evidence, for example, wages charge requires evidence of payroll breakdowns, etc.
  3. Countable then auto-generates letters citing correct legal provisions to request these documents professionally.
  4. Once uploaded, it designs sampling tests, selecting payment records, invoices, and tender documents, etc for verification. Exactly as Chartered Accountants do.
  5. After users upload audit evidence, Countable cross-references everything: wages against payroll records, invoices against bank statements and quotes against competitive tendering requirements, etc. It also benchmarks costs using UK market data (Office for National Statistics), then generates a tribunal-ready report with confidence scores and action plans.

Validation

We tested the complete workflow using a real anonymised service charge statement and mocked detailed listings and audit evidence representative of typical cases. Countable successfully detected a £17,187 overcharge in staff wages, demonstrating it can identify misstatements requiring professional intervention at under £1 in API costs. Making professional audit methodology financially accessible.

How we built it

Countable demonstrates how Gemini 3's advanced reasoning capabilities transform complex professional workflows from theoretical to buildable. What traditionally requires months of state management code became autonomous agent coordination

Stateful Sessions with Interactions API

The breakthrough was Gemini 3's Interactions API with previous_interaction_id. When users upload supplemental documents, the system resumes exactly where it paused, no manual history reconstruction needed.

Thought Signatures (Gemini 3's encrypted reasoning state) are automatically managed by the SDK, preserving the model's reasoning chain across the iterative "request documents → user uploads → continue testing" loop that mirrors real auditors workflows.

Adaptive Thinking Levels for Agent Specialisation

Each agent uses task-appropriate thinking levels to balance reasoning depth with latency:

  • Phase 1 Router (thinking_level: "low"): Fast agent selection decisions
  • Phase 2 Router (thinking_level: "high"): Complex UI decisions requiring multi-step reasoning
  • Testing Agent (thinking_level: "medium"): Substantive audit tests demanding careful evidence evaluation
  • Legal/Market/Standards Agents (thinking_level: "medium"): Balanced reasoning for specialised tasks

Production-Grade Structured Outputs

Every agent enforces type-safe JSON schemas using zod validation:

// Example: Testing Agent guaranteed output structure
responseMimeType: "application/json",
responseJsonSchema: z.toJSONSchema(TestingAgentOutputSchema)

Zero parsing errors across 6 specialised agents and 15+ workflow decision points. The Phase 2 Router returns guaranteed-valid { action: "render_ui" | "auto_continue" } decisions. Enabling our fully dynamic UI where frontend components are 100% generated from agent responses.

Why this matters: Gemini 3's structured outputs work seamlessly with function calling, unlike Gemini 2.5 where combining these features caused schema conflicts.

Multi-Evidence Document Understanding

The DocumentParser leverages Gemini 3's multimodal vision to extract structured data from complex PDFs containing multiple evidence types.

Critical capability: The system automatically detected 3 distinct document types within a single upload, extracted structured data from each format (legal prose, tables, formulas), and enabled the Testing Agent to perform cross-document reconciliation.

Temperature = 1.0 (Gemini 3 Best Practice)

Following Google's guidance, I never override temperature from the default 1.0. Previous models benefited from temperature: 0 for deterministic outputs, but Gemini 3's reasoning is optimised for the default. Changing it caused looping issues during development.

Two-Phase Router Pattern

I separated agent execution (Phase 1) from UI rendering (Phase 2):

Phase 1 Router (thinking_level: "low") orchestrates agents through the state machine using compositional function calling:

ingestion → planning → standards → testing → market → report

Phase 2 Router (thinking_level: "high") emits structured JSON decisions controlling UI behavior:

{ "action": "render_ui" | "auto_continue", "nextInteractionGoal": "..." }

This architecture showcases Gemini 3's ability to execute up to 7 sequential function calls autonomously in Phase 1, while Phase 2 performs complex reasoning about when to show user checkpoints vs. continuing silently.

Fully Dynamic UI

The interface is 100% generated from agent responses with no hardcoded workflows. When agents return render_ui, the frontend dynamically constructs components based on guaranteed-valid JSON schemas. This means the UI automatically adapts to any audit complexity without frontend code changes.

Real-Time Streaming

Firebase callable functions with async generators stream agent updates in real-time.

IndexedDB stores session state locally, enabling offline viewing and eliminating Firestore setup complexity for the hackathon.

System Architecture

View complete orchestration state machine 🔗

The diagram shows our two-phase router pattern with 15+ decision points coordinating 6 specialised agents, powered by:

  • Interactions API for stateful session continuity
  • Adaptive thinking levels per agent specialisation
  • Structured outputs for zero-error agent coordination
  • Compositional function calling for autonomous multi-step workflows

Challenges We Ran Into

Free Tier Rate Limits with Multi-Agent Orchestration

Gemini 3 Flash's free tier caps at 20 requests per day. With 6 agents orchestrated through multiple router phases, a single complete audit consumed 12-15 API calls, limiting me to 1-2 full workflows daily during development.

Solution

  1. Checkpoint-based development Using previous_interaction_id to resume from specific agents (e.g., testing TestingAgent by reusing a DetailedListing phase checkpoint, turning 12 calls into 3-4), and
  2. Mock-driven development By recording Gemini responses as JSON fixtures and building an auditMock function with realistic streaming. This decoupled UI development from API quota while maintaining identical interfaces for production.

Structured Output Schema Depth Limits

Phase 2 Router needed deeply nested objects for complex UI components, but Gemini 3's JSON schema validator has depth constraints.

Solution

Hybrid validation approach using loose schema (z.any()) for component data + detailed prompt-based validation in router_phase2.md. Strict types for routing decisions, flexible types for UI data.

Agent Hallucination in Data Transformation

Phase 2 Router initially transformed raw agent outputs into UI-ready formats, causing hallucinations.

Solution

Delegated transformation to specialised agents, made Phase 2 a pass-through layer returning exact outputs. This separation of concerns improved reliability.

Accomplishments that we're proud of

Technical Achievements

  • Made professional audit methodology accessible to everyone: Applied ISA 315 (risk assessment) and ISA 500 (audit evidence) standards to replicate what Chartered Accountants do
  • Autonomous multi-round workflows: Built true stateful sessions with Interactions API. The system pauses, generates legal Section 22 letters, then resumes exactly where it left off when users upload supplemental documents
  • Simultaneous multi-document recognition: Uploaded a 4-page PDF combining employment contract, payroll records, and NI calculation worksheet. DocumentParser identified all three evidence types in one pass, completing multiple audit requirements at once
  • Cross-document evidence linking: Testing Agent autonomously analysed the three PDFs and connected the employment contract (stating £40,000 annual salary for Estate Manager EMP001), the bank statements (showing £2,733.33 net monthly payments), and the detailed staffing schedule (claiming £54,000 annual salary for the same employee). The system flagged a £14,000 salary discrepancy between the contract and reported wages without any explicit instruction to cross-reference these documents

Impact Achievements

  • ❤️ Makes professional audits financially accessible: Under £1 in API costs, making audit methodology available to leaseholders who previously couldn't afford professional services
  • Market scale: Targets 5 million UK leaseholders paying £11 billion annually in service charges
  • Addresses urgent need: Service charges rose 11% in 2024 (4× inflation), yet the majority of leaseholders never challenge due to cost barriers

What Surprised Me

I didn't think I could build a system that truly replicates professional methodology. I proved AI can make expert knowledge accessible, not just automate simple tasks.

What We Learned

  • Interactions API enables true stateful agents - Server-side session management with previous_interaction_id eliminates manual history tracking and enables multi-round workflows
  • Thinking Mode powers autonomous reasoning - thinkingLevel: "High" lets agents self-direct complex audit logic without explicit orchestration
  • Structured Outputs eliminate parsing errors - response_schema with zod guarantees perfectly formatted JSON every time

What's Next for Countable

Countable aims to bring professional audit methodology to every leaseholder in the UK and beyond. Our vision is to expand beyond service charges to cover ground rent disputes, major works consultations, and Section 20 challenges. As property costs continue rising faster than inflation, we believe Countable will be at the forefront of making financial scrutiny accessible, helping leaseholders challenge unfair charges and hold managing agents accountable.

Shout Out 🫶

Huge thanks to my Chartered Accountant friend who patiently explained real audit workflows and verified that Countable's methodology actually mirrors professional practice. Her domain expertise transformed this from "what I think auditors do" to "what auditors actually do."

Built With

Share this project:

Updates