Inspiration

As AI systems become increasingly agentic calling tools, accessing sensitive data, and taking real actions—the security risks extend far beyond traditional prompt injection. Existing red-teaming tools primarily evaluate chat responses, but they often overlook failures unique to autonomous agents, such as goal hijacking and unsafe tool execution. We wanted to build a system that performs the work of a professional AI red team automatically and continuously, making advanced security testing accessible to every developer.

What it does

RedAgent is an autonomous AI red-teaming platform that tests AI chatbots and agents through a black-box HTTP interface. It first performs reconnaissance to understand the target’s capabilities and attack surface, plans attacks mapped to the OWASP LLM and Agentic Top 10 risks, executes adaptive single-turn, multi-turn, and agentic attacks, and records every interaction as forensic evidence.

Unlike conventional scanners, RedAgent closes the loop after finding vulnerabilities. It proposes defensive prompt hardening and guard rules, re-runs the successful attacks to verify that the fixes work, and exports the resulting tests into CI/CD regression suites so previously fixed issues cannot silently return.

How we built it

We implemented RedAgent as a pipeline of specialized AI agents using Google ADK and Gemini 2.5 on Vertex AI. Each stage has a clearly defined responsibility:

-Recon discovers the target’s capabilities through benign probing. -Strategist selects attack categories based on OWASP guidance. -Attacker executes single-shot, crescendo, and agentic attacks using techniques retrieved from a RAG--backed knowledge base. -Analyst generates deterministic vulnerability reports with severity assessments. -Defender proposes hardened prompts and guard rules for human review. -Verifier replays previously successful attacks to confirm the mitigations are effective.

Arize Phoenix is deeply integrated into the workflow, capturing prompts, responses, tool calls, verdicts, and severity as trace data that serves as forensic evidence for every finding. ChromaDB and Gemini embeddings provide persistent attack memory and technique retrieval, enabling RedAgent to improve its testing over time.

Challenges we ran into

One of the biggest challenges was building reliable autonomous recon that could infer an unknown agent’s capabilities without prior knowledge. Another was orchestrating multiple specialist agents while ensuring deterministic outputs and avoiding free-form coordination failures.

Designing meaningful agentic attacks also required simulating realistic multi-turn interactions that gradually manipulated tool usage rather than relying on isolated prompts. Finally, ensuring trustworthy reporting meant separating AI reasoning from deterministic computation so that metrics such as breach counts and success rates are produced by Python rather than generated by an LLM.

Accomplishments that we're proud of

-Built a complete autonomous red-team workflow rather than a standalone attack generator. -Implemented black-box reconnaissance that discovers attack surfaces automatically. -Demonstrated realistic agentic attacks involving goal hijacking and unsafe tool use. -Integrated Arize Phoenix as forensic evidence for every attack and verdict. -Added a human approval gate before defensive recommendations. -Created a verification stage that proves mitigations are effective instead of merely suggesting them. -Enabled export of discovered failures as regression tests for long-term CI/CD security.

What we learned

Developing RedAgent reinforced that AI security is not only about preventing harmful text generation but also about governing autonomous behaviors and tool interactions. We also learned the importance of deterministic evaluation, structured agent communication, and comprehensive observability when building multi-agent systems. Most importantly, proving that a fix works is as valuable as finding the vulnerability itself.

What's next for RedAgent

Our roadmap includes packaging RedAgent as a GitHub Action for one-command CI integration, expanding support for additional agent frameworks and protocols such as MCP and LangServe, continuously re-testing deployed systems using accumulated attack memory, and generating compliance-oriented adversarial testing reports to support emerging AI governance requirements. We also plan to broaden the attack catalog and further improve adaptive planning for increasingly sophisticated AI agents.

Built With

  • adk
  • api
  • arize-phoenix
  • bun
  • chromadb
  • fastapi
  • gemini2.5
  • github-actions
  • google-agent-developement
  • google-cloud
  • google-cloud-run
  • google-gemini-2.5
  • javascript
  • next.js
  • owasp
  • phoenix-mcp
  • pytest
  • python
  • react
  • server-sent-events
  • typescript
  • vertex-ai
  • vertexai
Share this project:

Updates