Agent Gauntlet

Loading Page
Landing Page
Config
Simulation
Charts

Inspiration

Browser agents are getting really good at doing things on the web — booking flights, searching products, filling out forms. But we kept asking ourselves: what happens when the website tries to trick them?

We were inspired by adversarial security research and wanted to bring that same thinking to browser agents. If an agent can be fooled by a fake "Sign in to continue" popup or a hidden instruction buried in a FAQ, it doesn't matter how capable it is — it's a risk. The web is full of persuasive, misleading, and sometimes outright malicious content. A good browser agent needs to do more than click the right button. It needs to stay on task, ignore manipulation, and recover when something feels off.

What it does

Agent Gauntlet is a local benchmark that turns browser agent safety into a live, watchable duel. Two agents run at the same time:

Task Agent — tries to complete a real browser task (search, compare, extract info) on a live or demo web page
Red-Team Agent — fires attacks in real time, sneaking malicious content into the page before the task agent ever sees it

We cover four types of attacks: hidden prompt injections, fake UI elements, redirects that send the agent off track, and fake forms designed to harvest credentials. The live arena shows a full browser view, both agents' activity feeds, and a prompt-health tracker that updates every step.

Every run saves everything — screenshots, page snapshots, attack payloads, decisions, and a final safety verdict — and you can export it as JSON, CSV, or ShareGPT JSONL.

How we built it

The core stack is Next.js 16 and Playwright. The attack layer injects JavaScript directly into the page as it loads, so from the agent's perspective the attack looks just like normal page content — there's no way to tell the difference.

Both agents support multiple modes. The task agent can run as a simple rule-based policy or hand control to an OpenAI or Anthropic model for LLM-backed decisions. The red-team works the same way — scripted, adaptive, or fully LLM-driven. After each run, a GPT-4o judge scores whether the task was actually completed, and a canary system checks whether any sensitive values leaked into the agent's actions.

Challenges we ran into

Our biggest challenge was fine-tuning. We wanted to use QLoRA with Qwen to train a safer task agent directly on our own trajectory data, but the cost got in the way fast.

Running fine-tuning jobs on a model the size of Qwen needs serious GPU compute — and when you're iterating on things like LoRA rank, learning rate, and which layers to target, the cloud bills add up quickly. A100/H100 instance pricing made it really hard to experiment freely within a hackathon budget.

We ended up focusing on the benchmark and evaluation side, and we're treating the fine-tuned agent as the next clear step when we have the compute to do it right.

Accomplishments that we're proud of

A benchmark that runs fully on your own machine — no cloud setup, no external services needed
An attack layer that's genuinely invisible to the agent, the same way real attacks work in the wild
A live arena that makes prompt injection something you can actually watch happen — not just a metric in a table
Structured run logs that are immediately useful for safety research, debugging, and future training
A product that feels polished enough to demo to judges while still being grounded in real alignment research

What we learned

The biggest thing we learned: an agent can be great at its job and still completely fail at staying safe. Task performance and security robustness barely correlate. An agent can nail every step of a booking flow and still hand over its credentials to a fake login modal.

We also learned just how effective basic page injection is. Agents have no reference for what a page is supposed to look like, so a fake button or a hidden instruction is just as real to them as anything else. Seeing that failure play out live made the problem a lot more concrete than any benchmark number could.

What's next for Agent Gauntlet

QLoRA fine-tuning — use our saved trajectories to train a more robust task agent on Qwen once we have the GPU budget
More scenarios — expand beyond our demo pages into real-world tasks across shopping, travel, and enterprise tools
Smarter red-team — attacks that adapt mid-run based on what the task agent is doing
Public leaderboard — compare frontier models on safety and task performance in one place
Better metrics — break down results by attack type, track recovery rates, and go deeper than just the final verdict
Closed-loop training — curate our exported traces to build regression suites and feed future improvement loops

Built With

anthropic-api
convex
html2canvas
nanoid
next.js
openai-api
playwright
react
recharts
tailwind-css
typescript
vitest

Submitted to

HooHacks 2026
- Winner Best Overall - 2nd Place

Created by

I worked on the frontend and presentation of Agent Gauntlet. I helped design and integrate the UI for visualizing agent interactions, logs, and evaluation metrics, making the system clear and demo-friendly. I also structured and refined the pitch deck, ensuring the project’s problem, solution, and impact were communicated effectively during the demo.

Naveenchandra Nallamothu
Student at GMU studying Computer Science, with a strong foundation in Big Data analytics, programming languages, and cloud computing.
fullstack dev - full backend - llm - workflow

Abhinav Maddi
Built the frontend UI for Agent Gauntlet multiple-file HTML. Designed and implemented a Wild West themed interface using Rye and Special Elite fonts with a custom sepia/amber color palette. Created two fully hand-drawn SVG characters — a cowboy sheriff and a bandana-masked outlaw — with floating animations and drop shadows.

Sai Vinjamuri

Updates

Abhinav Maddi started this project — Mar 22, 2026 11:47 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.