Nyx Autonomous White Hat Hacker

Inspiration

Nyx was inspired by the DARPA Artificial Intelligence Cyber Challenge, a challenge that drew together cybersecurity experts from around the world to create AI systems to automatically secure critical software. In that spirit, we created an AI white hat hacker that could rapidly penetration test a website and expose vulnerabilities and zero-day exploits.

What it does

Nyx takes a website and uses an AI agent to perform penetration testing along generally-accepted white hat hacking principles. Nyx WILL NOT take down a production app, expose user data, or violate terms of service. Nyx WILL perform a full website analysis, check for common vulnerabilities and exposures (CVEs), and safely find new vulnerabilities. Once it finishes penetration testing, it returns a full security report that the user can use to patch their website.

How we built it

We used Next.js to handle the UI. That Next app communicates with our AI agent built atop Openclaw through Discord webhooks and Pusher Pub/Sub. Openclaw runs on top of Ubuntu and uses multiple cybersecurity tools including nuclei and nmap for general vulnerability scanning and nikto for identifying outdated server software and config issues. Openclaw is also able to use the browser to interact with the website and find more complex vulnerabilities.

Challenges we ran into

At the start, we ran into difficulties with setting up Openclaw and figuring out how to connect it to our frontend. We eventually settled upon using Discord and Pusher which solved it. Our chosen LLM, Grok 4.20, initially refused to do penetration testing. We resolved this by narrowing the system prompt so that the model was explicitly instructed to do no harm and excluding actions that could impact production stability.

Accomplishments that we're proud of

We're proud that we were able to mostly achieve DARPA' s goal of creating an autonomous AI system to help developers identify vulnerabilities and fix them, and beyond that, wrap the complexity of penetration testing into an elegant and easy-to-use interface.

What we learned

We learned a lot about cybersecurity. Not only did we learn about white-hat methodology and ethical disclosure, we also had to learn what it means to write a secure app. We learned about common exploits like cross-site scripting, SQL injection, Broken Access Control. We tested our mettle against hackthissite.org and went back to the drawing board numerous times during development to ensure that Nyx was able to find all its vulnerabilities. We've seen recently with LiteLLM attack and the Claude Code source code leak that the speed of exploitation and human error can sometimes outpace the speed of manually patching. This is especially the case as developers become more dependent on AI coding tools and are less personally involved with the minutiae of their codebase.

What's next for Nyx

We hope to continue to improve it's penetration testing capabilities so that it's able to find more complex exploits. We also hope to pair our scanner with a coding agent to automatically and immediately deploy the appropriate patches the moment a vulnerability is found. Our end goal is to have Nyx, continuously and in perpetuity, crawl the internet, look for vulnerabilities, and automatically patch them.

Link to slides: https://www.figma.com/deck/QxEBvuXImccUYbuCvdN8aU/Nyx-Slides?node-id=6-30