Inspiration

Four incidents from the last 24 months shaped this project:

  • NYC's own MyCity chatbot (March 2024) told small business owners they could take worker tips, refuse Section 8 tenants, and fire people for reporting harassment. All illegal under NYC law. The chatbot stayed live after The Markup reported it. The city of New York shipped an unguarded LLM into a compliance-sensitive workflow.
  • Air Canada (February 2024) was ruled legally liable for a bereavement fare its chatbot hallucinated. The tribunal rejected the airline's "the AI is a separate entity" defense. AI hallucinations are now a balance-sheet item.
  • JPMorgan banned ChatGPT firm-wide for 250,000+ employees in 2023 because there was no trust layer between the model and internal workflows. The biggest US bank chose turn it off over deploy without governance.
  • FBI IC3 2023 report: approximately $2.9 billion in reported Business Email Compromise losses in the United States alone. A large share is fake-invoice and vendor-impersonation fraud.

The pattern: enterprises want to deploy autonomous AI agents, but every incident is a variation on "the agent acted on something it shouldn't have." Today's answer is either "don't deploy the agent" or "put a human on every decision" — both defeat the purpose. We wanted to build the third option: a checkpoint the agent has to pass before it acts.


What We Built

TrustGate is a trust layer that sits between an AI agent and its next action. When an agent is about to approve an invoice, pay a vendor, or execute a policy decision, it POSTs the decision to TrustGate. TrustGate runs four guardrails in parallel and returns a verdict the agent can trust.

The four guardrails

  1. Policy Check — Claude (via Lava gateway) evaluates the decision against a written policy document and cites the specific clause violated (e.g., P4: vendor domain verification).
  2. Source VerificationTavily pulls live web evidence (LinkedIn, PitchBook, ZoomInfo) to confirm the vendor is real. Catches lookalike-domain impersonation — the exact pattern in billions of dollars of FBI-reported BEC fraud.
  3. Anomaly Detection — pure statistics, no LLM. Flags transactions that deviate from a vendor's historical pattern: $$r = \frac{\text{amount}}{\text{historical_avg}}$$ Flag if $r \geq 2.0$, block-level alert if $r \geq 3.0$.
  4. Confidence Decomposition — Claude breaks the agent's stated confidence into weighted factors and flags when the factors don't justify the headline number. Turns opaque "94% confident" into inspectable reasoning.

Each guardrail returns {status: "pass" | "flag" | "block", reason, evidence, started_at, completed_at, duration_ms}. Verdicts stream to the UI incrementally as each check lands — the frontend polls and animates a live pipeline graph.

The pipeline

[Agent Submit] → [Extract] → ┬→ [Policy]     ─┐
                             ├→ [Source]     ─┤
                             ├→ [Anomaly]    ─┼→ [Aggregator] → [Verdict] → [Audit]
                             └→ [Confidence] ─┘

Eight nodes. Each one lights up, transitions color, or blocks the flow as real API calls resolve. The pipeline graph is the hero visual — judges watch an AI decision get scored in ~13 seconds, with real PDF invoices rendered alongside and real Tavily source URLs appearing in the verdict panel.

Real documents, real evidence

  • Real invoice PDFs generated via reportlab, rendered to PNG via pypdfium2, and displayed alongside the pipeline so judges see the actual document being processed.
  • Real Tavily searches return real URLs (LinkedIn company profiles, PitchBook, ZoomInfo) — demo-proof evidence of vendor impersonation.
  • Append-only audit log writes every submission, verdict, and human decision to disk as JSON.
  • Hybrid demo safety — cached replay mode reproduces successful runs with realistic pacing if the venue WiFi dies mid-pitch.

How We Built It

Architecture

  • Backend: Python FastAPI, single main.py orchestration file. Four guardrails run in parallel via asyncio.gather and asyncio.as_completed, so verdicts land in the UI one at a time.
  • Frontend: Single static HTML file. Tailwind CSS via CDN, vanilla JavaScript, hand-drawn SVG pipeline graph. Zero build step, zero node_modules. The frontend is served directly by FastAPI at the root URL — one process runs the entire app.
  • Dependency management: uv — teammates clone the repo and run uv sync && uv run uvicorn src.main:app --port 8787 to go from zero to running in under 30 seconds.
  • In-memory state + append-only JSON audit log. No database. Keeps the hackathon scope tight and demo-reliable.

Integrations (five sponsor tools, each doing real work)

Tool Role
Anthropic Claude Powers 3 of 4 guardrails (policy reasoning, source-verification reasoning over Tavily results, confidence decomposition)
Lava All LLM traffic routes through Lava's AI gateway for usage governance — the guardrail platform uses a guardrail platform
Tavily Live web search for vendor-legitimacy verification — the single guardrail that can catch real impersonation fraud
reportlab Generates realistic invoice PDFs matching each seed invoice
pypdfium2 Renders those PDFs to PNG for bulletproof cross-browser preview

Demo paths

# one command runs everything
uv run uvicorn src.main:app --port 8787
# open http://localhost:8787 → pick invoice → Submit as agent

A secondary CLI simulator (scripts/simulate_agent.py) posts invoices directly to the agent endpoint at configurable cadence, so the demo can be driven from a visible terminal that makes it clear the trigger is external.


Challenges We Ran Into

1. The mid-session pivot

The biggest shift happened 45 minutes into the build. We had designed TrustGate as an approval queue for human AP managers reviewing AI suggestions. Mid-build, we realized the real product is the opposite: guardrails for AI agents, not humans. Agents are the ones who need a checkpoint; humans are observers. That reframe changed the pitch, the hero visual (node view instead of spreadsheet), and the Track 2 fit. Costly in morale to rewrite mid-stream, but the right call. The lesson: in hackathons, a 30-second category-shift insight can be worth more than 30 minutes of code.

2. Demo pacing

The first working demo completed the entire pipeline in under 4 seconds. Judges would never be able to track what happened. We solved this in layers:

  • Backend asyncio.as_completed so verdicts land incrementally, not in one burst
  • Explicit frontend staggers at Submit (1.1s), Extract (1.4s), guardrail activation (0.5s)
  • Cached-mode 1.2s warmup + 2.4s per-verdict pacing
  • CSS transitions bumped from 0.45s to 0.7s for smoother color fades

Final demo: ~13 seconds from button click to final verdict. Readable at hackathon-stage pace.

3. Cross-browser PDF embedding

Our initial UI embedded invoice PDFs via <embed src="*.pdf">. Some browsers rendered it inline; others forced a download. We tried Content-Disposition: inline — helped Chrome, didn't help Safari. We pre-rendered each PDF to PNG via pypdfium2 (Chrome's PDF engine, wrapped in Python) and swapped <embed> for <img>. Works on every browser, no disposition negotiation, no user preferences to fight. The lesson: when cross-browser reliability matters, pre-render server-side and ship an image.

4. Venue WiFi anxiety

Hackathon demos famously fail on network. We built a hybrid mode: /api/agent/submit defaults to auto, which tries live APIs first and falls back to cache/verdicts.json if anything errors. The cache is pre-built from successful live runs by scripts/build_cache.py and committed to the repo. On stage, the pipeline animates identically whether it's live Tavily searches or cached replays — judges can't tell the difference. The live mode is only one URL parameter away (?mode=live).

5. Contract-first collaboration

Splitting backend and frontend work between teammates was a recipe for "I thought you were returning X." We stopped, wrote API.md — a full contract for the three routes the frontend needs, with example JSON for every schema — before writing another line of code. The file took 10 minutes and prevented hours of reconciliation. The best hackathon productivity hack isn't a framework; it's a README.


What We Learned

  • Reframe early. The pivot from "human queue" to "agent checkpoint" was ~30 seconds of conversation that rewrote the pitch, the visual, and the category fit. When you feel a product framing shift while you're building, stop and take it seriously.
  • Parallel, not sequential. Guardrails are independent by design. Running them in parallel with asyncio.as_completed makes them faster AND lets the UI animate each result as it lands. Concurrency is a UX feature, not just a performance one.
  • Rasterize when reliability matters. PDFs in browsers are a 20-year-old cross-browser nightmare. <img src="*.png"> has zero variance. Pre-rendering once was cheaper than debugging five browsers.
  • Contracts before code, when you have a teammate. API.md paid for itself in under 15 minutes of avoided integration pain.
  • Cached fallback is an architecture, not a hack. Graceful degradation from live API to cached replay is the same pattern production systems use for incident handling. Ship it in the demo, learn the reflex.
  • Judges remember the catch, not the architecture. The moment TrustGate blocked a lookalike-domain vendor with LinkedIn and PitchBook URLs cited live — that's the 10-second image a judge walks out with. Everything else is scaffolding.

What's Next

  • Ingest from real internal APIs — today the simulator pushes invoices to /api/agent/submit. Tomorrow it's a webhook from an ERP system (NetSuite, SAP, Coupa).
  • Docling extraction — replace the placeholder Extract node with real PDF ingestion so agents can ship unstructured invoice PDFs into TrustGate directly.
  • Policy editor — today policy.md is a static file. Let enterprises compose and version their own policy docs.
  • Second vertical: claims approval. Same four guardrails, different workflow. Insurance adjusters are the second-most-common AI automation target in the FBI BEC data.
  • Production hardening: auth (SSO), multi-tenancy, per-tenant audit exports, SLAs on guardrail latency, webhook retries with exponential backoff.

The core thesis holds for every vertical: autonomous AI agents need a trust layer. TrustGate is what that layer can look like.


Built with: Python 3.11, FastAPI, uv, Anthropic Claude, Lava, Tavily, reportlab, pypdfium2, Tailwind CSS, vanilla JS, SVG, ~3 hours, and one mid-session pivot we'll remember.

Built With

  • claude
  • lava
  • tavily
Share this project:

Updates