TrustGate

Inspiration

Four incidents from the last 24 months shaped this project:

NYC's own MyCity chatbot (March 2024) told small business owners they could take worker tips, refuse Section 8 tenants, and fire people for reporting harassment. All illegal under NYC law. The chatbot stayed live after The Markup reported it. The city of New York shipped an unguarded LLM into a compliance-sensitive workflow.
Air Canada (February 2024) was ruled legally liable for a bereavement fare its chatbot hallucinated. The tribunal rejected the airline's "the AI is a separate entity" defense. AI hallucinations are now a balance-sheet item.
JPMorgan banned ChatGPT firm-wide for 250,000+ employees in 2023 because there was no trust layer between the model and internal workflows. The biggest US bank chose turn it off over deploy without governance.
FBI IC3 2023 report: approximately $2.9 billion in reported Business Email Compromise losses in the United States alone. A large share is fake-invoice and vendor-impersonation fraud.

The pattern: enterprises want to deploy autonomous AI agents, but every incident is a variation on "the agent acted on something it shouldn't have." Today's answer is either "don't deploy the agent" or "put a human on every decision" — both defeat the purpose. We wanted to build the third option: a checkpoint the agent has to pass before it acts.

What We Built

TrustGate is a trust layer that sits between an AI agent and its next action. When an agent is about to approve an invoice, pay a vendor, or execute a policy decision, it POSTs the decision to TrustGate. TrustGate runs four guardrails in parallel and returns a verdict the agent can trust.

The four guardrails

Policy Check — Claude (via Lava gateway) evaluates the decision against a written policy document and cites the specific clause violated (e.g., P4: vendor domain verification).
Source Verification — Tavily pulls live web evidence (LinkedIn, PitchBook, ZoomInfo) to confirm the vendor is real. Catches lookalike-domain impersonation — the exact pattern in billions of dollars of FBI-reported BEC fraud.
Anomaly Detection — pure statistics, no LLM. Flags transactions that deviate from a vendor's historical pattern: $$r = \frac{\text{amount}}{\text{historical_avg}}$$ Flag if $r \geq 2.0$, block-level alert if $r \geq 3.0$.
Confidence Decomposition — Claude breaks the agent's stated confidence into weighted factors and flags when the factors don't justify the headline number. Turns opaque "94% confident" into inspectable reasoning.

Each guardrail returns {status: "pass" | "flag" | "block", reason, evidence, started_at, completed_at, duration_ms}. Verdicts stream to the UI incrementally as each check lands — the frontend polls and animates a live pipeline graph.

The pipeline

[Agent Submit] → [Extract] → ┬→ [Policy]     ─┐
                             ├→ [Source]     ─┤
                             ├→ [Anomaly]    ─┼→ [Aggregator] → [Verdict] → [Audit]
                             └→ [Confidence] ─┘

Eight nodes. Each one lights up, transitions color, or blocks the flow as real API calls resolve. The pipeline graph is the hero visual — judges watch an AI decision get scored in ~13 seconds, with real PDF invoices rendered alongside and real Tavily source URLs appearing in the verdict panel.

Real documents, real evidence

Real invoice PDFs generated via reportlab, rendered to PNG via pypdfium2, and displayed alongside the pipeline so judges see the actual document being processed.
Real Tavily searches return real URLs (LinkedIn company profiles, PitchBook, ZoomInfo) — demo-proof evidence of vendor impersonation.
Append-only audit log writes every submission, verdict, and human decision to disk as JSON.
Hybrid demo safety — cached replay mode reproduces successful runs with realistic pacing if the venue WiFi dies mid-pitch.

How We Built It

Architecture

Backend: Python FastAPI, single main.py orchestration file. Four guardrails run in parallel via asyncio.gather and asyncio.as_completed, so verdicts land in the UI one at a time.
Frontend: Single static HTML file. Tailwind CSS via CDN, vanilla JavaScript, hand-drawn SVG pipeline graph. Zero build step, zero node_modules. The frontend is served directly by FastAPI at the root URL — one process runs the entire app.
Dependency management: uv — teammates clone the repo and run uv sync && uv run uvicorn src.main:app --port 8787 to go from zero to running in under 30 seconds.
In-memory state + append-only JSON audit log. No database. Keeps the hackathon scope tight and demo-reliable.

Integrations (five sponsor tools, each doing real work)

Tool	Role
Anthropic Claude	Powers 3 of 4 guardrails (policy reasoning, source-verification reasoning over Tavily results, confidence decomposition)
Lava	All LLM traffic routes through Lava's AI gateway for usage governance — the guardrail platform uses a guardrail platform
Tavily	Live web search for vendor-legitimacy verification — the single guardrail that can catch real impersonation fraud
reportlab	Generates realistic invoice PDFs matching each seed invoice
pypdfium2	Renders those PDFs to PNG for bulletproof cross-browser preview

Demo paths

# one command runs everything
uv run uvicorn src.main:app --port 8787
# open http://localhost:8787 → pick invoice → Submit as agent

A secondary CLI simulator (scripts/simulate_agent.py) posts invoices directly to the agent endpoint at configurable cadence, so the demo can be driven from a visible terminal that makes it clear the trigger is external.

Challenges We Ran Into

1. The mid-session pivot

The biggest shift happened 45 minutes into the build. We had designed TrustGate as an approval queue for human AP managers reviewing AI suggestions. Mid-build, we realized the real product is the opposite: guardrails for AI agents, not humans. Agents are the ones who need a checkpoint; humans are observers. That reframe changed the pitch, the hero visual (node view instead of spreadsheet), and the Track 2 fit. Costly in morale to rewrite mid-stream, but the right call. The lesson: in hackathons, a 30-second category-shift insight can be worth more than 30 minutes of code.

2. Demo pacing

The first working demo completed the entire pipeline in under 4 seconds. Judges would never be able to track what happened. We solved this in layers:

Backend asyncio.as_completed so verdicts land incrementally, not in one burst
Explicit frontend staggers at Submit (1.1s), Extract (1.4s), guardrail activation (0.5s)
Cached-mode 1.2s warmup + 2.4s per-verdict pacing
CSS transitions bumped from 0.45s to 0.7s for smoother color fades

Final demo: ~13 seconds from button click to final verdict. Readable at hackathon-stage pace.

3. Cross-browser PDF embedding

Our initial UI embedded invoice PDFs via <embed src="*.pdf">. Some browsers rendered it inline; others forced a download. We tried Content-Disposition: inline — helped Chrome, didn't help Safari. We pre-rendered each PDF to PNG via pypdfium2 (Chrome's PDF engine, wrapped in Python) and swapped <embed> for <img>. Works on every browser, no disposition negotiation, no user preferences to fight. The lesson: when cross-browser reliability matters, pre-render server-side and ship an image.

4. Venue WiFi anxiety

Hackathon demos famously fail on network. We built a hybrid mode: /api/agent/submit defaults to auto, which tries live APIs first and falls back to cache/verdicts.json if anything errors. The cache is pre-built from successful live runs by scripts/build_cache.py and committed to the repo. On stage, the pipeline animates identically whether it's live Tavily searches or cached replays — judges can't tell the difference. The live mode is only one URL parameter away (?mode=live).

5. Contract-first collaboration

Splitting backend and frontend work between teammates was a recipe for "I thought you were returning X." We stopped, wrote API.md — a full contract for the three routes the frontend needs, with example JSON for every schema — before writing another line of code. The file took 10 minutes and prevented hours of reconciliation. The best hackathon productivity hack isn't a framework; it's a README.

What We Learned

Reframe early. The pivot from "human queue" to "agent checkpoint" was ~30 seconds of conversation that rewrote the pitch, the visual, and the category fit. When you feel a product framing shift while you're building, stop and take it seriously.
Parallel, not sequential. Guardrails are independent by design. Running them in parallel with asyncio.as_completed makes them faster AND lets the UI animate each result as it lands. Concurrency is a UX feature, not just a performance one.
Rasterize when reliability matters. PDFs in browsers are a 20-year-old cross-browser nightmare. <img src="*.png"> has zero variance. Pre-rendering once was cheaper than debugging five browsers.
Contracts before code, when you have a teammate. API.md paid for itself in under 15 minutes of avoided integration pain.
Cached fallback is an architecture, not a hack. Graceful degradation from live API to cached replay is the same pattern production systems use for incident handling. Ship it in the demo, learn the reflex.
Judges remember the catch, not the architecture. The moment TrustGate blocked a lookalike-domain vendor with LinkedIn and PitchBook URLs cited live — that's the 10-second image a judge walks out with. Everything else is scaffolding.

What's Next

Ingest from real internal APIs — today the simulator pushes invoices to /api/agent/submit. Tomorrow it's a webhook from an ERP system (NetSuite, SAP, Coupa).
Docling extraction — replace the placeholder Extract node with real PDF ingestion so agents can ship unstructured invoice PDFs into TrustGate directly.
Policy editor — today policy.md is a static file. Let enterprises compose and version their own policy docs.
Second vertical: claims approval. Same four guardrails, different workflow. Insurance adjusters are the second-most-common AI automation target in the FBI BEC data.
Production hardening: auth (SSO), multi-tenancy, per-tenant audit exports, SLAs on guardrail latency, webhook retries with exponential backoff.

The core thesis holds for every vertical: autonomous AI agents need a trust layer. TrustGate is what that layer can look like.

Built with: Python 3.11, FastAPI, uv, Anthropic Claude, Lava, Tavily, reportlab, pypdfium2, Tailwind CSS, vanilla JS, SVG, ~3 hours, and one mid-session pivot we'll remember.