Inspiration

Traditional API scanners are loud, dumb, and easily blocked. I wanted to build something that thinks like a human pentester—an entity that doesn't just "fuzz," but understands the logical flow of a system. Behemoth was born from the idea of "Spirit Circuits"—delegating complex hacking tasks to a specialized hierarchy of AI agents to achieve autonomous, deep-reasoning exploitation.

What it does

Behemoth is an autonomous offensive security framework. It maps an API attack surface using The Warlock, identifies logical relationships between data with Spirit-Eye, executes complex BOLA and Injection attacks via The Berserker and The Alchemist, and finally generates professional-grade remediation reports using The Paladin. It doesn't just find bugs; it understands the "why" and "how" of a vulnerability.

How we built it

The core engine is built in Python using the Google Generative AI SDK.

  • The Intelligence: We utilized Gemma 3 for rapid recon and Gemini 3 Flash for its massive context window and reasoning capabilities.
  • The Evolution: Development began with local testing against a custom Flask-based project. This "sandbox" environment allowed us to fine-tune the agents' ability to detect subtle logic flaws in a controlled setting before scaling the framework to handle complex, global-scale APIs like the OWASP Juice Shop.
  • The Architecture: We engineered a persistent Shadow Memory layer that allows agents to share session state (like JWTs) in real-time.
  • The Interface: A sleek CLI built with Typer and Rich for a premium hacker aesthetic.

Challenges we ran into

The biggest hurdle was "Hallucination Control" during exploitation. We didn't want the AI to just guess endpoints. We solved this by creating Spirit-Eye, a logical parsing engine that forces the model to ground its attacks in the actual structure of the ingested OpenAPI specification.

Accomplishments that we're proud of

We successfully demonstrated Behemoth executing a full Admin Bypass and Mass Data Exfiltration on the OWASP Juice Shop environment with zero human intervention. Seeing the "Berserker" harvest a token and instantly pass it to the "Alchemist" via Shadow Memory was a true "Aha!" moment.

What we learned

We learned that Multi-Agent Orchestration is the future of cybersecurity.

  • By giving models specific "personae" and restricted scopes, the overall accuracy of the vulnerability research increased by nearly 40% compared to a single-model prompt approach.
  • Scaling Logic: Moving from a small-scale personal Flask project to a global-level audit revealed the importance of "Cognitive Throttling."
  • Agent Efficiency: By giving models specific "personae" and restricted scopes, the overall accuracy of the vulnerability research increased by nearly 40% compared to a single-model prompt approach.

What's next for Behemoth

The future of Behemoth lies in Adaptive Evasion. We plan to implement a "Ghost Protocol" where the agents automatically modify their traffic signatures to bypass WAFs (Web Application Firewalls) in real-time based on the 403/429 error feedback they receive.

Built With

  • gemini-2.5-flash
  • gemini-3-flash
  • gemma-3
  • google-genai
  • multi-agent-systems
  • offensive-security
  • python
  • rich
  • typer
Share this project:

Updates