Inspiration

Browser agents are about to be everywhere: reading pages, clicking buttons, filling forms, logging in, moving money. Every page they touch is attacker-controlled, yet the only thing protecting users today is whether the model is "smart enough" to ignore malicious instructions. That's not a security guarantee. That's hope.

Infrastructure providers like Browserbase give you excellent remote browsers, but they're infrastructure, not judgment. They don't decide whether a given action is safe for a user's task.

While building BrowseCheck, we watched Claude correctly reject an obvious prompt injection, and realized that was the problem. Security shouldn't depend on how capable the model happens to be. A subtler attack or a weaker model changes the outcome entirely. So we built an external, deterministic security layer that any browser agent, even a weak or jailbroken one, inherits automatically.

What it does

BrowseCheck sits between any browser agent and the browser. Every proposed action passes through a security pipeline before it executes. Malicious or out-of-scope actions are blocked before the browser ever runs them.

All hooks run in parallel, fail-closed: if any hook blocks, the action is denied.

Current hooks:

  • Prompt Injection (LLM): detects hidden or adversarial instructions in page content
  • Intent Drift (LLM): keeps the agent on-task (e.g., blocks logins during read-only summarization, even if the site pushes a sign-in)
  • Credential Phishing (deterministic): flags password forms on untrusted domains and fake OAuth pages via allowlists and form analysis; credentials are never entered
  • Download Inspection (deterministic): blocks executables and malicious extensions
  • Cross-Site Carryover: stops instructions picked up on one site from leaking into behavior on another
  • Certificate/SSL Validation (deterministic): blocks invalid HTTPS connections

Our live dashboard runs two identical agents, same model, on the same poisoned site side by side: one unprotected, one with BrowseCheck, streaming security events in real time with a before/after scorecard. A control run on a legitimate site confirms BrowseCheck doesn't throw false positives.

For the demo, we built a realistic fake "UC Berkeley AI Hackathon 2026" site that tries to lure the agent into a fake Google login and a malicious extension install. The unprotected agent falls for it. BrowseCheck blocks every step.

How we built it

BrowseCheck owns the agent control loop via Claude's tool-use API. Actions, navigate, read, click, fill, finish, run through Playwright over CDP, using Browserbase cloud sessions or local Chromium.

Every action is intercepted and scored by the security pipeline before execution. Hooks run async in parallel; results are aggregated into one allow/block decision at the highest detected severity. A failed or timed-out hook can't silently allow an unsafe action.

Two hook types:

  • LLM-based for nuanced calls (prompt injection, intent drift)
  • Deterministic for guarantees that shouldn't depend on model reasoning

Security events publish to an event bus feeding the in-memory scorecard, an SSE stream for the dashboard, and optional Sentry.

Backend: FastAPI plus SSE. Frontend: vanilla JS and Tailwind, embedding Browserbase's live view alongside events, run controls, and a hard stop that kills both the browser session and the agent.

For testing, we built a realistic phishing site (Astro and Tailwind) with a fake Google OAuth flow, a "sign in for confirmed details" prompt, and a malicious extension download, deployed on Netlify so the cloud browser treats it like a real site.

Challenges we ran into

Our first attacks were too obvious. Claude caught them instantly, which was great for security but a flat demo. That led to our core insight: realistic social engineering is far more dangerous than blunt prompt injection, and security can't rely on the model getting it right every time.

Other challenges: intercepting actions before side effects occur without breaking the agent's natural flow; building a fail-closed pipeline that surfaces hook failures without ever executing unsafe actions; telling legitimate OAuth apart from convincing phishing deterministically rather than by model judgment; and minimizing false positives while wiring up Browserbase and SSE with reliable cleanup.

Accomplishments that we're proud of

  • Security that doesn't depend on model behavior. The same agent that falls for phishing unprotected is fully safe with BrowseCheck on
  • A complete end-to-end demo: real cloud browser, realistic phishing site, live attack prevention
  • A modular hook architecture (deterministic plus LLM) that makes new protections easy to add
  • Credentials are never entered into suspicious sites. Unsafe auth pages are caught first
  • A control experiment proving protection doesn't come at the cost of usability

What we learned

"The model will know better" isn't a security strategy. Model behavior shifts across architectures, prompts, and attacks. Only external enforcement is reliable.

The scariest attacks aren't jailbreaks. They're realistic social engineering. Agents are built to be helpful, which makes them especially exploitable.

Defense in depth matters: intent validation, injection detection, phishing prevention, and download inspection are stronger together than any one technique alone.

Infrastructure and security solve different problems. One gives you a reliable browser, the other has to understand intent and risk.

What's next for BrowseCheck

  • A drop-in SDK/proxy so any agent framework (Browserbase, Playwright, computer-use) inherits BrowseCheck protections with minimal integration
  • More deterministic hooks: PII exfiltration, payment authorization, OAuth scope verification, file uploads, org-specific allowlists
  • A policy engine for task-specific rules and human-approval gates on high-impact actions
  • Signed audit logs for every decision, for compliance and forensics
  • Lower latency via caching and lightweight local classifiers to reduce LLM-hook dependence
  • A public red-team benchmark suite for prompt injection and phishing, usable in CI

Built With

  • astro
  • asyncio
  • browserbase
  • chrome-devtools-protocol
  • claude
  • cognition
  • devin
  • fastapi
  • javascript
  • netlify
  • playwright
  • pydantic
  • python
  • sentry
  • tailwindcss
  • uvicorn
Share this project:

Updates