What is OpenZerg?

OpenZerg is a red-teaming framework that attacks your own web app before anyone else does. Not with a checklist but with a swarm that learns. Each generation of attackers reads what the last one found, mutates, and hits harder. It stops when it finds a breach or runs out of angles. Either way, you know the truth before you ship.

Inspiration

Security testing hasn't kept up with how fast software ships. Teams run the same probes, get a green light, and deploy. But those probes were written by humans who anticipated threats in advance. Real attackers don't work from your checklist: they probe, they learn what almost worked, and they come back with something you didn't anticipate.

We wanted to build that. An attacker that runs automatically, gets smarter with every attempt, and blocks the deploy the moment it finds a crack. No human red team required. Also its Starcraft reference if you're old enough to get it haha.

How We Built It

Each attacker is a Go-spawned agent running inside a real Kubernetes pod on DigitalOcean. It targets a live web application — OWASP Juice Shop — across five attack categories: SQL injection, XSS, JWT tampering, broken object-level authorization, and path traversal.

The key thing that makes it work: agents use Nimble to actually navigate the app. Juice Shop is an Angular SPA, static curl misses half the attack surface. Nimble renders the DOM, handles sessions, and runs the full browser flow. Our agents see what a real attacker sees.

Between generations, Gemma 4 reads the survivors' findings and writes the next wave's attack instructions. Not randomly, it reasons about what got close and what the logical next move is. If a tautology SQL injection returned a syntax error, Gemma tries a UNION SELECT. If a JWT was rejected with 401, Gemma tries alg:none. Generation 2 is smarter than generation 1 because it had to be.

The fitness model is what makes the evolution loop work. A probe that got a SQL syntax error scores 0.6, not zero. That partial signal is what Gemma needs to know where to push harder. Binary pass/fail kills the signal. Partial credit keeps it alive.

If any generation crosses a fitness threshold of 0.80, the deployment gate closes automatically. No approval flow. No ticket. The swarm found a crack and the deploy doesn't ship.

Challenges

Getting agents to behave reliably under a hard 60-second pod deadline was the hardest part, specifically parsing their output cleanly enough to score and feed into the next generation in real time.

Writing prompts that made generation 2 actually escalate rather than repeat generation 1 took more iteration than expected. The difference between "try harder" and "here's what almost worked, now try this specific angle" is everything.

What We Learned

The insight that unlocked the whole system: partial credit. If you score attacks as pass/fail, you lose all signal about what's worth evolving. A probe that found the door but couldn't open it is more valuable than one that never found the door. That gap is exactly what Gemma needs to reason about. Once we had that, the evolution loop actually worked.

Built With

Share this project:

Updates