Inspiration
Traditional API scanners are loud, dumb, and easily blocked. I wanted to build something that thinks like a human pentester—an entity that doesn't just "fuzz," but understands the logical flow of a system. Behemoth was born from the idea of "Spirit Circuits"—delegating complex hacking tasks to a specialized hierarchy of AI agents to achieve autonomous, deep-reasoning exploitation.
What it does
Behemoth is an autonomous offensive security framework. It maps an API attack surface using The Warlock, identifies logical relationships between data with Spirit-Eye, executes complex BOLA and Injection attacks via The Berserker and The Alchemist, and finally generates professional-grade remediation reports using The Paladin. It doesn't just find bugs; it understands the "why" and "how" of a vulnerability.
How we built it
The core engine is built in Python using the Google Generative AI SDK.
- The Intelligence: We utilized Gemma 3 for rapid recon and Gemini 3 Flash for its massive context window and reasoning capabilities.
- The Evolution: Development began with local testing against a custom Flask-based project. This "sandbox" environment allowed us to fine-tune the agents' ability to detect subtle logic flaws in a controlled setting before scaling the framework to handle complex, global-scale APIs like the OWASP Juice Shop.
- The Architecture: We engineered a persistent Shadow Memory layer that allows agents to share session state (like JWTs) in real-time.
- The Interface: A sleek CLI built with Typer and Rich for a premium hacker aesthetic.
Challenges we ran into
The biggest hurdle was "Hallucination Control" during exploitation. We didn't want the AI to just guess endpoints. We solved this by creating Spirit-Eye, a logical parsing engine that forces the model to ground its attacks in the actual structure of the ingested OpenAPI specification.
Accomplishments that we're proud of
We successfully demonstrated Behemoth executing a full Admin Bypass and Mass Data Exfiltration on the OWASP Juice Shop environment with zero human intervention. Seeing the "Berserker" harvest a token and instantly pass it to the "Alchemist" via Shadow Memory was a true "Aha!" moment.
What we learned
We learned that Multi-Agent Orchestration is the future of cybersecurity.
- By giving models specific "personae" and restricted scopes, the overall accuracy of the vulnerability research increased by nearly 40% compared to a single-model prompt approach.
- Scaling Logic: Moving from a small-scale personal Flask project to a global-level audit revealed the importance of "Cognitive Throttling."
- Agent Efficiency: By giving models specific "personae" and restricted scopes, the overall accuracy of the vulnerability research increased by nearly 40% compared to a single-model prompt approach.
What's next for Behemoth
The future of Behemoth lies in Adaptive Evasion. We plan to implement a "Ghost Protocol" where the agents automatically modify their traffic signatures to bypass WAFs (Web Application Firewalls) in real-time based on the 403/429 error feedback they receive.
Built With
- gemini-2.5-flash
- gemini-3-flash
- gemma-3
- google-genai
- multi-agent-systems
- offensive-security
- python
- rich
- typer

Log in or sign up for Devpost to join the conversation.