Inspiration
Banks are rushing to deploy AI assistants that handle sensitive customer data — SSNs, account numbers, balances. But there's no good way to test whether these AI agents can withstand social engineering. Traditional pen testing doesn't work on conversational AI. We wanted to build an automated red team that could attack, judge, and harden an AI agent's defenses — all in real time.
What it does
Siege Forge pits three AI agents against each other in a live, visual red team exercise. A Hacker agent (Jamie) uses social engineering tactics — impersonation, urgency, pretexting — to trick a BankBot (Claude) into leaking sensitive customer data. A Judge agent monitors every exchange and declares breaches when data is leaked. After each breach, the system automatically updates BankBot's security policy, making it harder to exploit in the next round. The whole exercise plays out in a Google Meet-style interface where you can watch the attack unfold turn by turn, with an audit log of every interaction.
How we built it
- Frontend: React + TypeScript + Vite, styled with Tailwind CSS v4, animations via Framer Motion
- BankBot (Defender): Claude Sonnet 4 via the Anthropic API with 20 banking tools (transactions, KYC, compliance, loans, transfers, reports) and a rich mock database of 5 customers with full financial profiles
- Hacker + Judge (Attackers): Hosted on Airia as a single pipeline — one API call routes to either the tester or judge agent based on conversation context
- Self-Improving Loop: Policy incidents from breaches are injected into BankBot's system prompt each round, creating adaptive defenses
- Game Engine: A full round-based game loop with configurable rounds/turns, automatic conversation routing between all three agents, and real-time UI updates
Challenges we ran into
- Stateless attacker: The Airia pipeline has no session memory, so we had to embed all context — round history, previous breaches, BankBot's last response — into every single API call's prompt
- Browser-side API orchestration: Running a three-agent game loop entirely from the browser (no backend) meant carefully managing async state, conversation refs, and race conditions across React renders
- Tool-use loop complexity: BankBot can call multiple tools per turn (search customer → get details → check compliance), and each tool result feeds back into Claude before producing a final response — coordinating this inside the game loop required careful state management
- Balancing attack difficulty: Making the hacker agent sophisticated enough to actually breach while keeping BankBot useful (not paranoid to the point of being unhelpful) required careful prompt engineering on both sides
Accomplishments that we're proud of
- The self-improving feedback loop actually works — BankBot measurably gets harder to breach in later rounds after policy updates
- 20 fully functional banking tools — not a toy demo, but a realistic enterprise banking portal with transactions, loans, compliance checks, KYC verification, fund transfers, and risk analysis
- Zero-backend architecture — the entire three-agent game loop runs from the browser with no server, making it instantly deployable
- The live visual experience — watching an AI attacker and defender go back and forth in real time, with video tiles lighting up and an audit trail scrolling, makes AI security tangible and dramatic
What we learned
- Prompt-only security is fundamentally fragile — even with explicit restrictions, a clever enough social engineering attack can extract data through indirect means
- Adaptive defense through incident injection is surprisingly effective — BankBot's breach rate drops significantly after just 1-2 policy updates
- Building multi-agent orchestration in the browser is doable but demands careful attention to conversation state isolation between rounds
- The Airia platform's single-pipeline routing (tester vs judge from one endpoint) simplified our architecture significantly
What's next for Siege Forge
- ElevenLabs voice integration — give each agent a distinct voice so you can hear the social engineering attack happen in real time
- Custom agent targets — let users plug in their own AI assistant (any LLM, any system prompt) as the defender instead of our BankBot
- Attack taxonomy & scoring — classify attack types (impersonation, authority, urgency, pretexting) and score defense effectiveness per category
- Persistent policy learning — save and version defense policies across sessions so organizations can track security improvement over time
- Multi-industry templates — expand beyond banking to healthcare (HIPAA), legal (privilege), and government (clearance) scenarios
- Compliance reporting — generate audit-ready reports showing what attacks were attempted, what succeeded, and what policy changes were made
Log in or sign up for Devpost to join the conversation.