Inspiration

Security testing is expensive, slow, and doesn't scale. Companies spend weeks hiring penetration testers to probe applications for vulnerabilities manually. I wanted to see if I could automate this using LLM agents, giving every development team access to autonomous security testing that runs 24/7. The idea: what if AI agents could think like hackers, adapt their strategies, and automatically validate their findings?

What it does

AgentRed is an autonomous security testing platform that deploys AI agents to find vulnerabilities in web applications. It runs three parallel attack lanes (SQL injection, privilege escalation, and WAF bypass), each powered by an LLM agent from AWS Bedrock. The agents can browse pages, submit forms, analyze responses, and adapt their attack strategies in real-time.

How I built it

Built entirely on AWS serverless architecture: AWS Bedrock powers the LLM agents (using Amazon Nova Pro) Lambda functions execute the attack logic, scoring, and summarization Step Functions orchestrates parallel attack lanes and manages workflow API Gateway exposes trigger and status endpoints for the dashboard S3 hosts the static dashboard and stores run artifacts DynamoDB tracks lane state and run metadata Parameter Store manages configuration for all lanes VPC + EC2 runs the target application (DVWA)

The dashboard is pure HTML/CSS/JavaScript with no build toolchain. It polls the API every 5 seconds to show live attack progress. I wrote property-based tests using Hypothesis (Python) and fast-check (JavaScript) to validate correctness properties. Everything is deployed via AWS Console for maximum accessibility.

Challenges I ran into

Git and GitHub struggles: Pushing to GitHub was a nightmare. The repo kept rejecting pushes due to large files (lambda_package/ with 100+ MB of dependencies, node_modules/, zip files). Had to learn .gitignore patterns, remove files from git history, and eventually start fresh with a clean repo. Merge conflicts with README files added more complexity.

AWS Console deployment complexity: Configuring 50+ parameters in Parameter Store manually would have taken hours. Created an automated seed script to speed this up, but had to carefully coordinate the order (VPC first to get DVWA IP, then parameters, then Lambdas).

Lambda networking: Getting Lambda functions to communicate with the EC2 target in a VPC required understanding security groups, subnets, and VPC endpoints. The lane worker Lambda needed VPC access while others didn't.

CORS configuration: The dashboard couldn't talk to API Gateway initially. Had to configure CORS headers on both Lambda responses AND API Gateway routes, handling OPTIONS preflight requests correctly.

Property-based testing: Writing formal correctness properties for security exploits was challenging. How do you specify "this is a valid SQL injection"? Ended up using reproducibility (does it work 80% of the time?) and evidence markers (does the response contain SQL error messages?).

Dashboard state management: Building a polling-based dashboard with proper error handling, retry limits, and terminal status detection required careful JavaScript state management without any frameworks.

Accomplishments that I'm proud of

100% serverless - No servers to manage, scales automatically Property-based testing - Formal correctness guarantees for a security tool Console-deployable - Anyone can deploy this without CLI experience Real-time dashboard - Live monitoring with no build toolchain required Comprehensive documentation - 400+ line deployment guide with step-by-step instructions Clean architecture - Separation of concerns between orchestration, execution, scoring, and summarization Actually works - The system successfully orchestrates parallel attack lanes and reports results

Most importantly, I learned an incredible amount about AWS services, serverless architecture, LLM agents, and property-based testing in just a few days.

What we learned

Step Functions, Lambda, API Gateway, and Parameter Store work beautifully together, but the configuration surface area is huge. Understanding IAM roles, VPC networking, and service integrations took significant time.

What's next for AgentRed

More attack lanes: Add XSS, CSRF, authentication bypass, and API security testing Smarter agents: Implement memory and learning so agents improve over time CI/CD integration: Run AgentRed automatically on every deployment Multi-target support: Test multiple applications in parallel

Share this project:

Updates