Inspiration
I built Striker to test an AI Transcript Advisor app that I created to help me understand which classes could transfer into UMD. If someone broke the app before I did, they might see past grades that I'm not so proud of... Yeah so I got overwhelmed manually finding and patching vulnerabilities. It pushed me to think beyond normal QA and build something that could test software in the way an attacker would.
What it does
Striker is an autonomous and locally run AI pen-testing agent. It explores the target software, identifies risky trust boundaries and sensitive actions, and systematically tests them like an attacker would. It adopts the logic of the Cyber Kill Chain and MITRE ATT&CK frameworks. The output shows what's broken so I can focus on fixing the most important weaknesses.
How we built it
We built Striker around a simple but hard idea: an AI agent should not just flag vague security issues, it should autonomously test the software to prove what would actually fail. That meant designing it to observe the target software, infer risky trust boundaries, generate attack paths, execute those tests systematically, and return undeniable proof. We grounded its behavior in the Cyber Kill Chain and MITRE ATT&CK so it would act with a purpose rather than being a random prompt wrapper.
Challenges we ran into
The biggest challenge was making Striker do meaningful adversarial testing instead of producing noise. We had to make the agent explore software, infer risky boundaries, generate attack paths, execute them systematically, and produce usable proof.
Another challenge was control: a system that behaves like an attacker is only useful if it stays inside safe boundaries and remains understandable to the developer using it.
The hard part was making it disciplined, and avoiding aggression.
Accomplishments that we're proud of
We are proud that Striker is not just lipstick above an API. It tackles a real technical problem: bringing red-team logic into autonomously QA for target software. Instead of static checks or shallow scanning, it acts like a controlled attacker with clear evidence.
We are also proud that the product stays focused on a clear workflow: run the agent locally, let it test the software, and review concrete findings with proof and replay steps.
What we learned
We learned that normal QA is not enough. Vibe coded software can look polished but fail badly when someone tampers with inputs, abuses permissions, or pushes workflows outside their intended path.
We also learned that autonomy alone is not impressive unless it is structured, explainable, and useful.
The real value comes from combining autonomy with disciplined attack logic, safety boundaries, and actionable evidence.
Built With
- claude
- gemini

Log in or sign up for Devpost to join the conversation.