Inspiration
Four incidents from the last 24 months shaped this project:
- NYC's own MyCity chatbot (March 2024) told small business owners they could take worker tips, refuse Section 8 tenants, and fire people for reporting harassment. All illegal under NYC law. The chatbot stayed live after The Markup reported it. The city of New York shipped an unguarded LLM into a compliance-sensitive workflow.
- Air Canada (February 2024) was ruled legally liable for a bereavement fare its chatbot hallucinated. The tribunal rejected the airline's "the AI is a separate entity" defense. AI hallucinations are now a balance-sheet item.
- JPMorgan banned ChatGPT firm-wide for 250,000+ employees in 2023 because there was no trust layer between the model and internal workflows. The biggest US bank chose turn it off over deploy without governance.
- FBI IC3 2023 report: approximately $2.9 billion in reported Business Email Compromise losses in the United States alone. A large share is fake-invoice and vendor-impersonation fraud.
The pattern: enterprises want to deploy autonomous AI agents, but every incident is a variation on "the agent acted on something it shouldn't have." Today's answer is either "don't deploy the agent" or "put a human on every decision" — both defeat the purpose. We wanted to build the third option: a checkpoint the agent has to pass before it acts.
What We Built
TrustGate is a trust layer that sits between an AI agent and its next action. When an agent is about to approve an invoice, pay a vendor, or execute a policy decision, it POSTs the decision to TrustGate. TrustGate runs four guardrails in parallel and returns a verdict the agent can trust.
The four guardrails
- Policy Check — Claude (via Lava gateway) evaluates the decision against a written policy document and cites the specific clause violated (e.g.,
P4: vendor domain verification). - Source Verification — Tavily pulls live web evidence (LinkedIn, PitchBook, ZoomInfo) to confirm the vendor is real. Catches lookalike-domain impersonation — the exact pattern in billions of dollars of FBI-reported BEC fraud.
- Anomaly Detection — pure statistics, no LLM. Flags transactions that deviate from a vendor's historical pattern: $$r = \frac{\text{amount}}{\text{historical_avg}}$$ Flag if $r \geq 2.0$, block-level alert if $r \geq 3.0$.
- Confidence Decomposition — Claude breaks the agent's stated confidence into weighted factors and flags when the factors don't justify the headline number. Turns opaque "94% confident" into inspectable reasoning.
Each guardrail returns {status: "pass" | "flag" | "block", reason, evidence, started_at, completed_at, duration_ms}. Verdicts stream to the UI incrementally as each check lands — the frontend polls and animates a live pipeline graph.
The pipeline
[Agent Submit] → [Extract] → ┬→ [Policy] ─┐
├→ [Source] ─┤
├→ [Anomaly] ─┼→ [Aggregator] → [Verdict] → [Audit]
└→ [Confidence] ─┘
Eight nodes. Each one lights up, transitions color, or blocks the flow as real API calls resolve. The pipeline graph is the hero visual — judges watch an AI decision get scored in ~13 seconds, with real PDF invoices rendered alongside and real Tavily source URLs appearing in the verdict panel.
Real documents, real evidence
- Real invoice PDFs generated via
reportlab, rendered to PNG viapypdfium2, and displayed alongside the pipeline so judges see the actual document being processed. - Real Tavily searches return real URLs (LinkedIn company profiles, PitchBook, ZoomInfo) — demo-proof evidence of vendor impersonation.
- Append-only audit log writes every submission, verdict, and human decision to disk as JSON.
- Hybrid demo safety — cached replay mode reproduces successful runs with realistic pacing if the venue WiFi dies mid-pitch.
How We Built It
Architecture
- Backend: Python FastAPI, single
main.pyorchestration file. Four guardrails run in parallel viaasyncio.gatherandasyncio.as_completed, so verdicts land in the UI one at a time. - Frontend: Single static HTML file. Tailwind CSS via CDN, vanilla JavaScript, hand-drawn SVG pipeline graph. Zero build step, zero
node_modules. The frontend is served directly by FastAPI at the root URL — one process runs the entire app. - Dependency management:
uv— teammates clone the repo and runuv sync && uv run uvicorn src.main:app --port 8787to go from zero to running in under 30 seconds. - In-memory state + append-only JSON audit log. No database. Keeps the hackathon scope tight and demo-reliable.
Integrations (five sponsor tools, each doing real work)
| Tool | Role |
|---|---|
| Anthropic Claude | Powers 3 of 4 guardrails (policy reasoning, source-verification reasoning over Tavily results, confidence decomposition) |
| Lava | All LLM traffic routes through Lava's AI gateway for usage governance — the guardrail platform uses a guardrail platform |
| Tavily | Live web search for vendor-legitimacy verification — the single guardrail that can catch real impersonation fraud |
| reportlab | Generates realistic invoice PDFs matching each seed invoice |
| pypdfium2 | Renders those PDFs to PNG for bulletproof cross-browser preview |
Demo paths
# one command runs everything
uv run uvicorn src.main:app --port 8787
# open http://localhost:8787 → pick invoice → Submit as agent
A secondary CLI simulator (scripts/simulate_agent.py) posts invoices directly to the agent endpoint at configurable cadence, so the demo can be driven from a visible terminal that makes it clear the trigger is external.
Challenges We Ran Into
1. The mid-session pivot
The biggest shift happened 45 minutes into the build. We had designed TrustGate as an approval queue for human AP managers reviewing AI suggestions. Mid-build, we realized the real product is the opposite: guardrails for AI agents, not humans. Agents are the ones who need a checkpoint; humans are observers. That reframe changed the pitch, the hero visual (node view instead of spreadsheet), and the Track 2 fit. Costly in morale to rewrite mid-stream, but the right call. The lesson: in hackathons, a 30-second category-shift insight can be worth more than 30 minutes of code.
2. Demo pacing
The first working demo completed the entire pipeline in under 4 seconds. Judges would never be able to track what happened. We solved this in layers:
- Backend
asyncio.as_completedso verdicts land incrementally, not in one burst - Explicit frontend staggers at Submit (1.1s), Extract (1.4s), guardrail activation (0.5s)
- Cached-mode 1.2s warmup + 2.4s per-verdict pacing
- CSS transitions bumped from 0.45s to 0.7s for smoother color fades
Final demo: ~13 seconds from button click to final verdict. Readable at hackathon-stage pace.
3. Cross-browser PDF embedding
Our initial UI embedded invoice PDFs via <embed src="*.pdf">. Some browsers rendered it inline; others forced a download. We tried Content-Disposition: inline — helped Chrome, didn't help Safari. We pre-rendered each PDF to PNG via pypdfium2 (Chrome's PDF engine, wrapped in Python) and swapped <embed> for <img>. Works on every browser, no disposition negotiation, no user preferences to fight. The lesson: when cross-browser reliability matters, pre-render server-side and ship an image.
4. Venue WiFi anxiety
Hackathon demos famously fail on network. We built a hybrid mode: /api/agent/submit defaults to auto, which tries live APIs first and falls back to cache/verdicts.json if anything errors. The cache is pre-built from successful live runs by scripts/build_cache.py and committed to the repo. On stage, the pipeline animates identically whether it's live Tavily searches or cached replays — judges can't tell the difference. The live mode is only one URL parameter away (?mode=live).
5. Contract-first collaboration
Splitting backend and frontend work between teammates was a recipe for "I thought you were returning X." We stopped, wrote API.md — a full contract for the three routes the frontend needs, with example JSON for every schema — before writing another line of code. The file took 10 minutes and prevented hours of reconciliation. The best hackathon productivity hack isn't a framework; it's a README.
What We Learned
- Reframe early. The pivot from "human queue" to "agent checkpoint" was ~30 seconds of conversation that rewrote the pitch, the visual, and the category fit. When you feel a product framing shift while you're building, stop and take it seriously.
- Parallel, not sequential. Guardrails are independent by design. Running them in parallel with
asyncio.as_completedmakes them faster AND lets the UI animate each result as it lands. Concurrency is a UX feature, not just a performance one. - Rasterize when reliability matters. PDFs in browsers are a 20-year-old cross-browser nightmare.
<img src="*.png">has zero variance. Pre-rendering once was cheaper than debugging five browsers. - Contracts before code, when you have a teammate.
API.mdpaid for itself in under 15 minutes of avoided integration pain. - Cached fallback is an architecture, not a hack. Graceful degradation from live API to cached replay is the same pattern production systems use for incident handling. Ship it in the demo, learn the reflex.
- Judges remember the catch, not the architecture. The moment TrustGate blocked a lookalike-domain vendor with LinkedIn and PitchBook URLs cited live — that's the 10-second image a judge walks out with. Everything else is scaffolding.
What's Next
- Ingest from real internal APIs — today the simulator pushes invoices to
/api/agent/submit. Tomorrow it's a webhook from an ERP system (NetSuite, SAP, Coupa). - Docling extraction — replace the placeholder Extract node with real PDF ingestion so agents can ship unstructured invoice PDFs into TrustGate directly.
- Policy editor — today
policy.mdis a static file. Let enterprises compose and version their own policy docs. - Second vertical: claims approval. Same four guardrails, different workflow. Insurance adjusters are the second-most-common AI automation target in the FBI BEC data.
- Production hardening: auth (SSO), multi-tenancy, per-tenant audit exports, SLAs on guardrail latency, webhook retries with exponential backoff.
The core thesis holds for every vertical: autonomous AI agents need a trust layer. TrustGate is what that layer can look like.
Built with: Python 3.11, FastAPI, uv, Anthropic Claude, Lava, Tavily, reportlab, pypdfium2, Tailwind CSS, vanilla JS, SVG, ~3 hours, and one mid-session pivot we'll remember.
Built With
- claude
- lava
- tavily