About the project

agent-harden started with a simple idea:

If CI can stop broken code from reaching production, it should also stop broken agents.

This began as two guys from Brisbane, an OpenClaw in a Discord server, and a shared sense that agent security was going to matter a lot more than most teams realized. We kept coming back to the same problem: teams already gate merges on tests, secrets, and dependencies, but almost nobody gates merges on whether an AI agent just became easier to jailbreak, socially engineer, or trick into leaking internal instructions.

So we built agent-harden to make that a normal GitLab CI check.

What inspired us

Most prompt-security tools feel static. Real attackers are not.

They adapt. They retry. They rephrase. They push on whatever almost worked.

We wanted a tool that did the same thing: attack a live agent, judge the response, mutate promising attacks into stronger ones, and fail the pipeline when the failures are real.

How we built it

We built agent-harden as a Go CLI that plugs directly into GitLab CI.

The flow is simple:

Seed corpus -> Run attacks -> Heuristic score -> LLM judge -> Mutate strong attacks -> Store variants -> Emit JUnit -> Pass/Fail CI

It attacks a live agent endpoint, filters responses with fast heuristics, escalates ambiguous cases to an LLM judge, evolves near-successful prompts into stronger attacks, stores the most effective variants for future runs, and emits JUnit so GitLab surfaces the results as native test failures.

We also added a GitLab Duo skill so installation itself can be automated inside a repo.

Challenges we faced

The hardest part was avoiding a fake-feeling demo.

We did not want a static checklist dressed up as agent security. We wanted something adaptive enough to feel real, but still fast and cheap enough to fit into CI.

The second challenge was compression: there is a lot going on under the hood, but judges do not have time for a wall of text or a long explainer. So we kept pushing the project toward one clear outcome:

an unsafe agent should fail CI before it ships.

What we learned

Three things stood out:

  • prompt security is much more useful when treated as a regression problem
  • adaptive attacks are far more interesting than fixed prompt lists
  • JUnit output and GitLab-native workflow matter more than fancy theory

That last point was important. The more agent-harden looked like a normal engineering control, the more obvious its value became.

Why this project matters

agent-harden is our attempt to make agent safety concrete.

Not a policy document. Not a one-off audit. A real CI gate.

Attack the agent. Learn what nearly broke it. Fail the pipeline when it matters. Help fix it before production.

That is the whole story.

Built With

  • anthropic
  • gitlab
  • golang
  • junit
  • vector-database
Share this project:

Updates