Inspiration
Similar to Netflix's chaos monkey which deletes servers to test how the infrastructure handles such issues. I wanted to build something similar for agents. What if we can try to break the agent in a secure environment so that we can be aware of vulnerabilities before the agent hits production
What it does
An attacker agent generates numerous attacks ( prompt injection, tool manipulation, data leaks etc) and then tries to break the agent in a secure environment by spinning up and testing in daytona sandboxes.
How we built it
-Attack Library: Curated database of 35+ proven attack vectors across 5 vulnerability categories (prompt injection, tool manipulation, data leakage, resource exhaustion, session bleeding)
- Attacker Agent V2: Advanced test generator using GPT-4o that creates context-aware attacks through multiple strategies - proven templates, diverse sampling, LLM-based adaptation, intelligent mutations, and parallel execution for speed
- Target Agent: Vulnerable customer support agent with realistic tools (database queries, email sending) serving as our test subject
- Chaos Executor: Orchestrates test execution with optional Daytona sandbox isolation and LLM-powered vulnerability evaluation for accurate detection
What's next for ChaosAgent
Finetune a model on the open data available on from hackaprompt, jailbreak etc to make a agent specifically for creating more robust and specific targetted attacks.
Built With
- daytona
- nextjs
- typescript
Log in or sign up for Devpost to join the conversation.