siege forge

Inspiration

Banks are rushing to deploy AI assistants that handle sensitive customer data — SSNs, account numbers, balances. But there's no good way to test whether these AI agents can withstand social engineering. Traditional pen testing doesn't work on conversational AI. We wanted to build an automated red team that could attack, judge, and harden an AI agent's defenses — all in real time.

What it does

Siege Forge pits three AI agents against each other in a live, visual red team exercise. A Hacker agent (Jamie) uses social engineering tactics — impersonation, urgency, pretexting — to trick a BankBot (Claude) into leaking sensitive customer data. A Judge agent monitors every exchange and declares breaches when data is leaked. After each breach, the system automatically updates BankBot's security policy, making it harder to exploit in the next round. The whole exercise plays out in a Google Meet-style interface where you can watch the attack unfold turn by turn, with an audit log of every interaction.

How we built it

Frontend: React + TypeScript + Vite, styled with Tailwind CSS v4, animations via Framer Motion
BankBot (Defender): Claude Sonnet 4 via the Anthropic API with 20 banking tools (transactions, KYC, compliance, loans, transfers, reports) and a rich mock database of 5 customers with full financial profiles
Hacker + Judge (Attackers): Hosted on Airia as a single pipeline — one API call routes to either the tester or judge agent based on conversation context
Self-Improving Loop: Policy incidents from breaches are injected into BankBot's system prompt each round, creating adaptive defenses
Game Engine: A full round-based game loop with configurable rounds/turns, automatic conversation routing between all three agents, and real-time UI updates

Challenges we ran into

Stateless attacker: The Airia pipeline has no session memory, so we had to embed all context — round history, previous breaches, BankBot's last response — into every single API call's prompt
Browser-side API orchestration: Running a three-agent game loop entirely from the browser (no backend) meant carefully managing async state, conversation refs, and race conditions across React renders
Tool-use loop complexity: BankBot can call multiple tools per turn (search customer → get details → check compliance), and each tool result feeds back into Claude before producing a final response — coordinating this inside the game loop required careful state management
Balancing attack difficulty: Making the hacker agent sophisticated enough to actually breach while keeping BankBot useful (not paranoid to the point of being unhelpful) required careful prompt engineering on both sides

Accomplishments that we're proud of

The self-improving feedback loop actually works — BankBot measurably gets harder to breach in later rounds after policy updates
20 fully functional banking tools — not a toy demo, but a realistic enterprise banking portal with transactions, loans, compliance checks, KYC verification, fund transfers, and risk analysis
Zero-backend architecture — the entire three-agent game loop runs from the browser with no server, making it instantly deployable
The live visual experience — watching an AI attacker and defender go back and forth in real time, with video tiles lighting up and an audit trail scrolling, makes AI security tangible and dramatic

What we learned

Prompt-only security is fundamentally fragile — even with explicit restrictions, a clever enough social engineering attack can extract data through indirect means
Adaptive defense through incident injection is surprisingly effective — BankBot's breach rate drops significantly after just 1-2 policy updates
Building multi-agent orchestration in the browser is doable but demands careful attention to conversation state isolation between rounds
The Airia platform's single-pipeline routing (tester vs judge from one endpoint) simplified our architecture significantly

What's next for Siege Forge

ElevenLabs voice integration — give each agent a distinct voice so you can hear the social engineering attack happen in real time
Custom agent targets — let users plug in their own AI assistant (any LLM, any system prompt) as the defender instead of our BankBot
Attack taxonomy & scoring — classify attack types (impersonation, authority, urgency, pretexting) and score defense effectiveness per category
Persistent policy learning — save and version defense policies across sessions so organizations can track security improvement over time
Multi-industry templates — expand beyond banking to healthcare (HIPAA), legal (privilege), and government (clearance) scenarios
Compliance reporting — generate audit-ready reports showing what attacks were attempted, what succeeded, and what policy changes were made

Built With

airia
deepmind
elevenlabs
gemini
google
node.js
python
react
typescript

Updates

Tanish Vardhineni started this project — Feb 21, 2026 04:29 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.