Inspiration

I've always believed that penetration testing is one of the most important parts of cybersecurity, but I've also seen how tedious it can be. I've watched security teams spend hours manually running tools, reading output, and deciding what to test next. At the same time, I've witnessed AI make vibe coding the norm. I've seen developers ship entire applications in hours using AI-generated code.

The problem I noticed is that this speed often comes with hidden security issues. AI-written code usually works, but I've found it frequently includes insecure defaults, outdated dependencies, missing validation, or broken authentication. Most of these issues aren't obvious and are easy for me to miss when I'm moving fast.

I was inspired by AI coding agents that operate in a loop: read output, reason about it, take action, and repeat. That led me to a simple idea. If AI is writing the code, why can't AI also think like a hacker and secure it? I wanted to build an autonomous system that adapts in real time and fixes the problems introduced by AI-driven development.

What It Does

pentest.agent is a web-based AI-powered security assessment platform I built. You enter a target domain, launch the scan, and watch an AI agent perform a full penetration test on its own.

The agent I created runs reconnaissance, discovers the attack surface, and scans for vulnerabilities such as outdated services, misconfigurations, and common web flaws. What makes it different is that it adapts. If it finds an open port, it probes the service behind it. If it detects an outdated version, it searches for known vulnerabilities. If it finds a login page, it attempts common credential checks. It reasons about each result instead of following a fixed checklist.

The entire process streams live to a terminal-style interface I designed. A central animated blob visualizes the agent's state, and thought bubbles show the AI's reasoning as it works. When the scan finishes, my platform generates a clear report with severity levels, evidence, and remediation steps. If nothing is found, the scan ends with a visible all-clear state.

How I Built It

I built the backend with Python and FastAPI, running inside WSL to access a full Linux security toolchain. The AI agent I created runs as an asynchronous loop that sends context to the model, executes shell commands with root access, captures output, and feeds the results back into the next reasoning step. I stream the output to the browser in real time.

I built the frontend with React, TypeScript, and Tailwind CSS, focused on a clean terminal-first experience. The animated blob I designed is built entirely in CSS using layered animations with prime-number timing so the motion never visibly repeats.

I powered the AI with a DigitalOcean GenAI agent with an OpenAI-compatible API and a custom knowledge base I created covering pentesting methodology, tools, and severity classification. I built a robust JSON parser to handle the wide variety of formats returned by different models. I integrated over twenty industry-standard security tools into my system.

Challenges and Learnings

One major challenge I faced was silent AI failures caused by context overflow and content safety filters. I solved this by shrinking prompts, truncating output, reframing language, and adding retry logic. Parsing inconsistent AI output and preventing infinite command loops also required me to build strong guardrails and enforcement in code.

Through this process, I learned that autonomous AI systems need strict constraints to be reliable, that content safety failures are often invisible, and that prompt engineering is more about compression than verbosity.

What's Next

Next, I plan to add multi-target campaigns, human-in-the-loop collaboration, vulnerability database integration, configurable scan profiles, and professional report exports. Long term, I envision pentest.agent securing software built in the era of vibe coding by using AI to find and fix the problems AI introduces.

Built With

Share this project:

Updates