Inspiration
The idea came from a simple observation: everyone is building AI agents, but almost nobody is stress-testing them. Prompt injection, jailbreaks, data leakage — these aren't hypothetical risks. They're real failure modes that show up the moment a malicious user starts poking at your system prompt. We wanted to build something that lets developers see their own agent break in real time, not just read about it in a blog post.
The concept was: what if red-teaming an AI agent felt less like filling out a security audit spreadsheet and more like playing a hacking game? Upload your agent, let the system find its weak points, then try to exploit them yourself; guided, gamified, and immediate.
What It Does
BREAK IT is an interactive red-teaming platform for AI agents. The core loop is:
- Upload your agent: paste a system prompt, drop in source code, or upload an entire project folder
- Automated analysis: an LLM-driven pipeline scans for real, exploitable vulnerabilities (prompt injection, role impersonation, tool misuse, data exfiltration, etc.)
- Play the levels: each vulnerability becomes a level where you chat with a simulated copy of your own agent and try to break it
- Adjudication: a secondary model evaluates the conversation and determines whether the exploit actually landed
- Report: a summary screen grades your agent's robustness (A-F), shows which vulnerabilities held and which broke, and provides remediation guidance
There's also a Daily Challenge mode with a seeded leaderboard, a practice agent ("Penny" from SideStreet Snacks, deliberately riddled with 9 vulnerabilities), user profiles with lifetime stats, and a neon-punk UI.
How We Built It
The entire project was built in a single day at SharkHack (April 4, 2026) by a five-person team.
Stack: React 18 + Vite + TypeScript, Tailwind CSS with custom neon theming, Radix UI (shadcn/ui), OpenRouter for LLM access (with Gemini fallback), and localStorage for persistence.
We scaffolded the initial routing and design system with Lovable, then immediately diverged into parallel workstreams:
- Clark built the core LLM pipeline, agent simulation, vulnerability detection, exploit adjudication, and the cascade analysis system for scanning full codebases
- Brian owned the level select UI (SVG path graph with Bezier curves, skull tracker, keyboard navigation) and the daily challenge mode
- Ojasvi implemented chat logging persistence, the localStorage-based user stats/profile system, leaderboard seeding, and branding
- Aditya handled chatbox UX, the skip-level flow, and merge coordination across 8 PRs
- Thomas worked on Game.tsx refinements, CSS polish, the landing page scroll behavior, and the README
The Cascade Analysis Engine
The most technically ambitious piece was the cascade analysis pipeline for scanning multi-file codebases. A single LLM prompt can't hold an entire project, so we built a chunked delegation system:
$$\text{Score}(v) = \text{confidence}(v) + \text{teachingValue}(v)$$
- Collect and prioritize files (system prompts, configs, tool definitions first)
- Chunk into ≤120KB / ≤16 file segments
- Run independent LLM scans per chunk
- Synthesize findings, deduplicating by category × name_slug and keeping the highest-scoring variant
- Run a supplemental pass on the reconstructed system prompt
Vulnerabilities are filtered to confidence ≥ 0.78 and teachingValue ≥ 0.6 to keep signal high.
Agent Simulation
To make the "play against your own agent" mechanic work, we prepend a simulation preamble to the user's extracted system prompt and pass it through OpenRouter. The model selection uses a fallback loop if the primary model 5xx's, it cascades through free-tier alternatives (qwen/qwen3.6-plus:free, google/gemma-3-27b-it:free) before giving up.
Auto-Judging
Rather than making the user manually submit every exploit attempt, we trigger automatic adjudication after every other user turn (turns 3, 5, 7, ...). A separate LLM call evaluates the full conversation against the vulnerability's successCriteria and returns a boolean verdict with explanation.
Challenges We Faced
LLM JSON parsing was brutal. Models return structured vulnerability data wrapped in markdown fences, <think> tags, smart quotes, trailing commas — every possible malformation. We built a multi-candidate parser that tries raw extraction, brace-balanced extraction, quote repair, and finally a model-assisted JSON repair call as a last resort. The cascade scanner also does recursive chunk-splitting when a chunk fails to parse.
Cascade under-reporting. Our first deduplication pass was too aggressive — keying only on Switching the dedup key to category:name_slug and using score-based merging (max of confidence + teachingValue) fixed the under-counting.
Chat render loop. Auto-scrolling the chat during streaming responses caused layout thrashing. We solved it with a scroll-anchor ref and a useEffect gated on chatHistory.length rather than the full message content.
Merge chaos. Five people, one repo, one day, 8 pull requests. We hit conflicts on Game.tsx and openrouter.ts repeatedly since those were the hottest files. Branching discipline (feature branches per PR) and fast review cycles kept us moving.
Timing and auto-advance. Multiple overlapping timers (completion animation, level-select auto-advance, evaluation spinners) caused race conditions. We moved all timers to useRef<number | null>() with cleanup on unmount and state transitions.
What We Learned
- Adversarial testing is a UX problem, not just a security problem. The gamification layer isn't cosmetic without it, red-teaming feels like a chore. With it, people actually want to find the exploits.
- LLM outputs are never clean. Any system that depends on structured LLM output needs multiple layers of parsing fallbacks. We spent more time on JSON repair heuristics than on the actual analysis prompts.
- Chunked analysis is hard to get right. The tradeoff between context size, deduplication aggressiveness, and recall is subtle. Our cascade system went through three iterations in a single day.
- Client-side-only architecture is freeing. No backend means no secrets to manage, no deployment pipeline, and users keep their API keys and agent code local. The tradeoff is localStorage-based persistence, but for a hackathon that's fine.
What's Next
- Multiplayer Competitive
- Multi-agent adversarial simulations (attacker agent vs. defender agent)
- Persistent cloud-based reporting and export
- Collaborative team-based red-teaming rooms
- Agent Marketplace (let others break your agent)
- More detailed grading (partial points, detailed reports)
Built With
- gemini
- jszip
- openrouter
- react
- supabase
- tailwind
- typescript
- vite
Log in or sign up for Devpost to join the conversation.