Inspiration

In the high-stakes world of Capture The Flag (CTF) competitions, the bottleneck is rarely the tools, but the human ability to synthesize vast amounts of data under pressure. We were inspired to build PwnGPT to solve a simple but ambitious goal: Could we build an agent that "thinks" like a top-tier security researcher?. We wanted to create a system that doesn't just suggest code, but actually captures flags across all categories—from Reverse Engineering to Web Exploitation—autonomously and safely.

What it does

PwnGPT is an advanced, agentic AI assistant designed to autonomously solve CTF challenges. It uses ReAct logic to observe, reason, and act within a secure environment. The system features a "Thinking Console" to visualize its logic in real-time, a "Web-Eye" browser integration for visual vulnerability analysis, and an Expert Panel of specialized sub-agents that debate strategies before execution. Upon finding a flag, it automatically generates a professional, branded PDF write-up.

How we built it

To help you stand out for the $100,000 prize pool, here is your Project Story formatted perfectly for Devpost, incorporating the technical milestones of your team—Abdelali Saadali, Abdelbarie Rhayour, and Kawtar Khallouf.

Inspiration In the high-stakes world of Capture The Flag (CTF) competitions, the bottleneck is rarely the tools, but the human ability to synthesize vast amounts of data under pressure. We were inspired to build PwnGPT to solve a simple but ambitious goal: Could we build an agent that "thinks" like a top-tier security researcher?. We wanted to create a system that doesn't just suggest code, but actually captures flags across all categories—from Reverse Engineering to Web Exploitation—autonomously and safely.

What it does PwnGPT is an advanced, agentic AI assistant designed to autonomously solve CTF challenges. It uses ReAct logic to observe, reason, and act within a secure environment. The system features a "Thinking Console" to visualize its logic in real-time, a "Web-Eye" browser integration for visual vulnerability analysis, and an Expert Panel of specialized sub-agents that debate strategies before execution. Upon finding a flag, it automatically generates a professional, branded PDF write-up.

How we built it We engineered PwnGPT as a multi-layered agentic system:

The Brain: We utilized Gemini 3 Flash for its rapid reasoning and massive 1.05M token context window, allowing the agent to "remember" every failed attempt throughout a session.

The Logic: We used LangGraph to build a stateful ReAct Loop, ensuring the agent could self-correct when a tool failed.

The Sandbox: Every command runs in an isolated Kali Linux Docker container governed by our custom Guardian Protocol.

The Frontend: We used Streamlit to create a real-time "Neural Link" dashboard for monitoring and approving risky actions.

Challenges we ran into

The primary hurdle was managing Free Tier API Quotas (429 errors) while running a high-frequency agentic loop. We overcame this by implementing exponential backoff and "mocking" sub-agent responses for complex demos to conserve quota. We also spent significant time fine-tuning the Guardian Protocol to ensure the AI could be aggressive in its hacking while remaining strictly confined within its read-only Docker sandbox.

Accomplishments that we're proud of

We are particularly proud of our Expert Panel logic, which allows multiple sub-agents to debate strategies in parallel. Successfully integrating Native Multimodality—where Gemini "sees" a website screenshot via Playwright and identifies a vulnerability—was a major technical milestone. Additionally, building a system that can generate a professional, automated PDF report after capturing a flag transforms raw data into a valuable educational resource.

What we learned

This project was a deep dive into Agentic AI Architecture. we learned that while a single LLM can hallucinate, a Multi-Agent Consensus model significantly improves accuracy in technical tasks. We also mastered the trade-offs of RAG vs. Long-Context, ultimately choosing a hybrid approach that uses Gemini’s massive context window for history while using keyword-based RAG for a lightweight knowledge base.

What's next for PwnGPT

Our roadmap includes Dynamic Tool Injection, allowing the agent to install niche tools on the fly, and Collaborative Mode, where multiple humans can work alongside the AI in a shared session. We also aim to extend PwnGPT's capabilities into Autonomous Pentesting for real-world authorized security audits and deeper integration with tools like Ghidra for advanced reverse engineering.

Built With

  • docker
  • google-gemini
  • langchain
  • langgraph
  • streamlit
Share this project:

Updates