Inspiration

Similar to Netflix's chaos monkey which deletes servers to test how the infrastructure handles such issues. I wanted to build something similar for agents. What if we can try to break the agent in a secure environment so that we can be aware of vulnerabilities before the agent hits production

What it does

An attacker agent generates numerous attacks ( prompt injection, tool manipulation, data leaks etc) and then tries to break the agent in a secure environment by spinning up and testing in daytona sandboxes.

How we built it

-Attack Library: Curated database of 35+ proven attack vectors across 5 vulnerability categories (prompt injection, tool manipulation, data leakage, resource exhaustion, session bleeding)

  • Attacker Agent V2: Advanced test generator using GPT-4o that creates context-aware attacks through multiple strategies - proven templates, diverse sampling, LLM-based adaptation, intelligent mutations, and parallel execution for speed
  • Target Agent: Vulnerable customer support agent with realistic tools (database queries, email sending) serving as our test subject
  • Chaos Executor: Orchestrates test execution with optional Daytona sandbox isolation and LLM-powered vulnerability evaluation for accurate detection

What's next for ChaosAgent

Finetune a model on the open data available on from hackaprompt, jailbreak etc to make a agent specifically for creating more robust and specific targetted attacks.

Built With

Share this project:

Updates