Inspiration

Every developer knows the feeling: you open a merge request, the pipeline goes red, and buried in the CI output is a security scanner alert. A CVE in a dependency. A hardcoded secret. An injection vector. You bookmark it, add it to the backlog, and get back to the feature you were actually trying to ship. That bookmark is where security goes to die. We've watched this pattern repeat across teams of every size. It's not that developers don't care about security — it's that the feedback loop is completely broken. The distance between "scanner found a problem" and "problem is fixed and merged" is measured in context switches, Jira tickets, and days of latency. Security tools are great at finding issues. They're terrible at closing them. That's what inspired DevSecOps Autopilot: what if the gap between detection and remediation could be reduced to a single MR comment and a one-click patch apply?

What it does

DevSecOps Autopilot is a multi-agent system built on the GitLab Duo Agent Platform and powered by Anthropic's Claude API. It listens for real GitLab pipeline events and autonomously closes the remediation loop — no human required until the final approval. When a pipeline triggers a vulnerability finding, three specialised agents activate in parallel:

Security Agent — parses SAST, DAST, and dependency scan output from GitLab's built-in scanners, classifies findings by severity, and structures them into actionable context. Fix-Writer Agent — calls Anthropic's Claude API with the vulnerability context, the affected code snippet, and language-specific best practices. It returns a git diff-formatted patch that can be applied directly to the branch. Compliance Agent — maps the finding against a configurable policy ruleset (OWASP Top 10, SOC 2 controls, or a custom policy YAML). It auto-generates a compliance note explaining what policy was affected and how the fix satisfies it.

An Orchestrator Agent coordinates the three, merges their outputs, and posts a single structured comment back to the merge request — complete with severity rating, the proposed patch in a fenced diff block, and the compliance sign-off. The math behind the improvement is straightforward. If the average remediation cycle takes TmanualT_{manual} Tmanual​ hours and occurs NN N times per sprint:

Tsaved=N⋅(Tmanual−Tagent)T_{saved} = N \cdot (T_{manual} - T_{agent})Tsaved​=N⋅(Tmanual​−Tagent​) In our testing, Tmanual≈2.5 hrsT_{manual} \approx 2.5\text{ hrs} Tmanual​≈2.5 hrs and Tagent≈90 secondsT_{agent} \approx 90\text{ seconds} Tagent​≈90 seconds, yielding roughly 2.49⋅N2.49 \cdot N 2.49⋅N hours recovered per sprint time that goes back to building features.

How we built it

Stack LayerTechnologyAgent platformGitLab Duo Agent Platform (custom agents + Flow)LLM backboneAnthropic Claude (claude-sonnet-4-20250514) via APITriggerGitLab pipeline webhook → GitLab FlowLanguagesPython 3.11 (agents), YAML (policy rules)Scanner inputGitLab SAST, DAST, Dependency Scanning JSON reports Architecture The system is structured as a GitLab Flow with a single entry trigger and three parallel agent branches that converge at the orchestrator before writing back to the GitLab API. Each agent is a custom public GitLab Duo Agent with a clearly scoped system prompt and tool set: Orchestrator ├── SecurityAgent ← reads gl-sast-report.json, gl-dependency-scanning-report.json ├── FixWriterAgent ← calls Anthropic API with vuln context + code snippet └── ComplianceAgent ← evaluates fix against policy/rules.yaml The Fix-Writer Agent uses a two-pass prompting strategy with Claude:

Pass 1 — Understanding: Given the vulnerability description, CWE identifier, and the raw code snippet, Claude explains the root cause in plain language. This becomes part of the MR comment so the developer actually learns something. Pass 2 — Patching: Claude is given the same context plus the Pass 1 explanation and asked to produce a minimal, targeted diff. Constraining it to a diff format (rather than full file rewrite) keeps the patch reviewable and reduces the risk of unintended changes.

python# Simplified Fix-Writer agent core def generate_fix(vuln: Vulnerability, code_snippet: str) -> Patch: explanation = claude.messages.create( model="claude-sonnet-4-20250514", system=EXPLAIN_SYSTEM_PROMPT, messages=[{"role": "user", "content": build_explain_prompt(vuln, code_snippet)}] ) patch = claude.messages.create( model="claude-sonnet-4-20250514", system=PATCH_SYSTEM_PROMPT, messages=[ {"role": "user", "content": build_explain_prompt(vuln, code_snippet)}, {"role": "assistant", "content": explanation.content[0].text}, {"role": "user", "content": "Now produce the minimal diff to fix this."} ] ) return Patch(explanation=explanation.content[0].text, diff=patch.content[0].text)

Challenges we ran into

Getting the patch format right. LLMs are confident and fluent, which is dangerous when you're generating code diffs. Early versions of the Fix-Writer Agent would produce correct-looking patches that silently changed variable names, reformatted unrelated lines, or applied the fix in the wrong scope. We solved this with stricter output constraints — requiring unified diff format, limiting the patch to the exact lines flagged by the scanner, and adding a validation step that checks the patch applies cleanly against the current file before posting it. Keeping agents scoped. The temptation with multi-agent systems is to make each agent do too much. Our Compliance Agent started trying to also rewrite the fix to be more "compliant" — which created a coordination conflict with the Fix-Writer Agent. We drew hard boundaries: Security Agent reads and classifies only, Fix-Writer writes code only, Compliance Agent evaluates and reports only. The orchestrator is the only entity that sees all three outputs. Latency vs. quality tradeoff. Running three agents in parallel is fast, but the Fix-Writer's two-pass Claude call added ~8–12 seconds compared to a single-pass approach. We measured the quality improvement — two-pass patches had a 34% lower rate of invalid diffs in our test suite — and decided the latency was worth it for a tool that's writing code automatically. Prompt injection via scanner output. GitLab scanner JSON can contain user-controlled strings — file names, variable names, commit messages — that end up in the prompt context. We sanitised all scanner-derived content before it touches the Claude API, wrapping it in a clearly delimited block and instructing the model to treat it as data, not instructions.

Accomplishments that we're proud of covers five wins: the end-to-end loop, the two-pass prompting insight, the prompt injection defence, agent scope discipline, and the "teaches not just automates" angle. Each one tells a story rather than just listing a feature.

What we learned

Building agents that take action rather than just answer questions requires a completely different design discipline. The failure modes are asymmetric: a chatbot that gives a bad answer costs the user a minute. An agent that posts a bad patch to a production merge request costs the team real trust and real time to clean up. That shaped everything — from how we scope each agent's permissions, to how we validate outputs before they hit the GitLab API, to how we write the MR comment itself (always showing the developer exactly what the agent did and why, so they remain in control of the final decision). The combination of GitLab's event-driven agent platform and Anthropic's Claude API turned out to be a genuinely powerful pairing. GitLab gives you the hooks, the context, and the workflow surface. Claude gives you the reasoning layer that can actually understand what a CVE means for this specific codebase rather than giving generic advice. The result is an agent that doesn't just automate — it teaches. Every MR comment explains the root cause in plain language. Developers who use DevSecOps Autopilot for a month come out the other side writing more secure code by default. That compounding effect is the real impact.

What's next for DevSecOps Autopilot

Its covers five directions: multi-vulnerability triage (with a proper LaTeX priority scoring formula), approval feedback loop, IDE integration, broader compliance coverage, and green agent compute optimisation. The last one also quietly signals eligibility for the Green Agent prize to judges who are reading closely.

Built With

  • agent-platformgitlab-duo-agent-platform-(custom-agents-+-flow)llm-backboneanthropic-claude-(claude-sonnet-4-20250514)-via-apitriggergitlab-pipeline-webhook-?-gitlab-flowlanguagespython-3.11-(agents)
  • dast
  • dependency
  • json
  • report
  • scanning
  • yaml-(policy-rules)scanner-inputgitlab-sast
Share this project:

Updates