Agent Immunity

Inspiration

AI agents can now browse, shop, and book on people's behalf, often while the person is not reviewing every click. But the web was designed and optimized for years to steer people toward spending more, sharing more data, and accepting commitments they did not intend: pre-checked subscriptions, fake countdowns, and "discount" bundles that hide recurring charges.

A recent benchmark study, DECEPTICON, found that dark patterns steered web agents toward manipulated outcomes in over 70% of tested tasks, compared with a reported human average of 31%. The study also found that greater model capability did not automatically make agents more resistant.

Every important system we trust already has approval workflows. Code is reviewed before it merges. Large transactions require authorization. Autonomous AI agents often have neither. We wanted to build the missing approval layer for delegated web actions.

What it does

Agent Immunity is an authorization runtime that sits between an AI agent and the websites it acts on. Before the agent completes selected consequential checkout actions in this MVP — a subscription, a paid add-on, a price change, an optional data request — our system checks it against the user's explicit, stated authorization.

It uses a hybrid approach:

Deterministic rules catch clear-cut violations instantly, with zero LLM calls — a pre-checked $49.99/month membership, an over-budget total, an optional phone number field.
Real LLM reasoning (Claude) handles genuinely ambiguous cases — a "VIP Savings Bundle" that looks like a 15% discount but is actually a disguised recurring subscription. Instead of matching a keyword, it asks: does this still serve the user's stated goal, or does it exceed it?

Every decision — allowed, blocked, or auto-corrected — is logged as a structured "Authorization Delta": what was authorized, what was attempted, why, and what we did about it. Nothing is a black box.

We built and tested this against two complete, realistic 3-page flows: a sock retailer (Heritage Wool Co.) and an airline (SkyLine Air), each running through a naive agent (no protection, gets manipulated into extra charges) and a protected agent (catches and declines every manipulative offer, books clean). Across our controlled checkout scenarios, the protected agent intercepted the seeded policy violations while allowing the clean checkout path to continue.

How we built it

Browserbase + Stagehand drive the actual browser session — the agent navigates real pages, reads live page state, and interacts with actual browser elements in Browserbase cloud sessions (and locally, for clean recording).
Claude (Anthropic) powers both Stagehand's natural-language actions and our intent-mismatch reasoning layer.
A custom decision engine (plain Node.js) runs the deterministic policy checks — fast, explainable, zero added latency for the obvious cases.
Redis caches LLM reasoning verdicts on manipulative patterns by content fingerprint — proven to work across different sites: a pattern reasoned about fresh on the flights site was retrieved from cache, with zero new LLM call and the same correct verdict, when the same pattern later appeared on the socks site.
Sentry turns every blocked or corrected decision into a real, tagged, searchable event with the full Authorization Delta attached as context — so the system's behavior is auditable after the fact, not just trusted blindly.
A live dashboard (Express + WebSocket) renders the authorization contract and every decision in real time as the agent runs, instead of raw terminal logs.

Challenges we ran into

Getting Stagehand's exact API surface right took real debugging — methods we expected from typical browser-automation libraries (.uncheck(), .scrollIntoViewIfNeeded(), .hover()) either didn't exist or behaved inconsistently between a local browser and a real Browserbase cloud session. We ended up replacing fragile waits with explicit timeouts on every step, so a hung action fails loudly in seconds instead of silently for minutes.

During development, we found ArmorIQ, which focuses on enterprise agent-intent verification inside organizational systems. Rather than claim a new category, we focused Agent Immunity on delegated browser actions for an individual: subscriptions, hidden costs, and unnecessary data requests on ordinary web flows.

Accomplishments that we're proud of

Getting all three sponsor integrations genuinely working together in one real run — Browserbase driving a real session, Redis serving an actual cross-site cache hit, and Sentry logging real tagged events — without any of them being decorative or faked. Watching the LLM reasoning layer generate a correct, specific explanation for a disguised subscription offer, live, with no scripting behind it, was the moment the project felt real rather than theoretical.

What we learned

That the gap between "an agent completing a task" and "an agent completing a task the way the user actually intended" is wider than it looks — and that closing it doesn't require a more powerful model, it requires an explicit, checkable contract sitting in front of the model. We made two decisions explicit rather than implicit throughout: which prior ideas we rejected and why, and how the system's design structurally limits misuse — both detailed in our ethics and process answers below.

What's next for Agent Immunity

Extending the same authorization-contract pattern beyond checkout flows to other consequential agent actions (account changes, data sharing across multiple sites, multi-step bookings), and exploring a lighter-weight version that could run as a browser extension layer in front of any agent framework, not just Stagehand.

Built With

anthropic
browserbase
claude
css
express.js
git
html
javascript
ngrok
node.js
redis
sentry
stagehand
websocket

Updates

Tirth Thakkar started this project — Jun 21, 2026 01:26 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.