Inspiration

Every development team knows the pain: a security scan flags vulnerabilities in a merge request, and then begins the slow, manual grind — triaging findings, writing patches, adding tests, documenting compliance, updating the MR. It's disconnected from the developer workflow, it's tedious, and it's the reason vulnerabilities slip into production. Not because teams don't care, but because the remediation process is slow, manual, and fragmented.

We asked ourselves: What if a single @mention on a merge request could trigger an entire security remediation pipeline — autonomously? Not a chatbot that gives advice. Not a tool that highlights problems. A fully autonomous pipeline that finds, fixes, tests, documents, and finalizes — with zero manual steps.

That's the idea behind ShieldFlow.

What it does

ShieldFlow is an autonomous security remediation pipeline built entirely on GitLab Duo Flow. When a developer mentions the ShieldFlow service account on any merge request, five AI agents activate in sequence:

  1. Triage Agent — Analyzes SAST/DAST/dependency scan results, classifies vulnerabilities by severity (Critical, High, Medium, Low), filters false positives, and posts a structured triage report with CVE/CWE references.
  2. Fix Agent — Reads the vulnerable source code, generates minimal targeted patches using industry-standard fix patterns (parameterized queries for SQLi, output encoding for XSS, path sanitization for traversal, env vars for hardcoded secrets), and commits the fixes directly to the MR branch.
  3. Test Agent — Examines the project's existing test structure, generates regression tests that verify each vulnerability is no longer exploitable, and commits them following the project's testing conventions.
  4. Compliance Agent — Produces an audit-ready remediation report with executive summary, detailed findings table, CWE/OWASP Top 10 references, remediation timeline, and residual risk assessment. Creates tracking issues for anything requiring manual follow-up.
  5. Deploy Agent — Verifies all fixes and tests are committed, updates the MR description, and posts a final summary marking the MR ready for human review. It never auto-merges — the final decision always belongs to a human.

Live Results

In our demo run on MR !4, ShieldFlow processed a deliberately vulnerable Python application and achieved:

Metric Result
Vulnerabilities detected 4 (2 Critical, 2 High)
Vulnerabilities fixed 4/4 (100%)
Regression tests written 8
Security score 0/100 → 100/100
Total time < 4 minutes
Manual intervention Zero

The vulnerabilities found and fixed:

  • SQL Injection (CWE-89) — Critical
  • Hardcoded Secrets (CWE-798) — Critical
  • Path Traversal (CWE-22) — High
  • Cross-Site Scripting (CWE-79) — High

    How we built it

    ShieldFlow is built entirely on GitLab Duo Flow's multi-agent orchestration framework. The core architecture consists of:

  • flows/shieldflow.yaml — A 408-line flow definition that declares all 5 agent components, their prompts, tool permissions, input routing, and sequential orchestration via routers.

  • flows/flow.yml — The catalog sync wrapper that registers ShieldFlow with 4 trigger types: mention, assign, assign_reviewer, and pipeline_hooks.

  • Agent Prompts — Each agent has a carefully engineered system prompt with:

    • Explicit workflow steps
    • Security fix pattern libraries (for the Fix Agent)
    • Early-exit conditions (skip downstream work when no vulnerabilities exist)
    • Structured output formats for inter-agent communication
  • Tool Permissions — Each agent is granted only the GitLab API tools it needs (principle of least privilege):

    • Triage: list_vulnerabilities, get_vulnerability_details, list_security_findings
    • Fix: get_repository_file, create_commit, grep
    • Test: list_repository_tree, create_commit, grep
    • Compliance: create_issue, create_merge_request_note
    • Deploy: update_merge_request, get_merge_request
  • Inter-Agent Data Flow — Each agent's final_answer is piped as input to the next agent via context: references, creating a chain where each stage builds on the previous one's output.

The project was registered in GitLab's AI Catalog, with both group-level and project-level consumers configured to enable flow execution across the hackathon workspace.

Challenges we ran into

1. Schema validation for Duo Flow GitLab Duo Flow is a new framework, and the schema requirements are strict. We went through multiple iterations to get the YAML structure exactly right : from component types and prompt template formats to router definitions and tool name spellings. Each catalog sync validated the schema, and early versions failed on subtle issues like incorrect prompt_id references and unsupported tool names.

2. Prompt engineering for autonomous multi-agent chains Getting 5 agents to communicate effectively through their final_answer outputs required careful prompt design. The Triage Agent's output format had to be parseable by the Fix Agent, which had to produce commit SHAs that the Test Agent could reference, and so on. We designed structured output formats with explicit status lines (CONFIRMED_VULNERABILITIES: <count> / NO_VULNERABILITIES_FOUND) so downstream agents could make early-exit decisions.

3. Tool permission scoping Each agent needed exactly the right set of GitLab API tools : too few and it couldn't complete its task; too many and it risked taking unintended actions. We iterated on the toolsets to find the minimal effective set for each agent while ensuring the Fix Agent could read files and commit changes, and the Compliance Agent could create issues and post notes.

4. Handling the "no vulnerabilities" path A robust pipeline must handle the clean-scan case gracefully. Without explicit early-exit logic, downstream agents would attempt to generate fixes for nonexistent vulnerabilities. We solved this with a status-line protocol in the Triage Agent's output that all downstream agents check before proceeding.

Accomplishments that we're proud of

What we learned

  • Multi-agent orchestration is powerful but demands precision. The chain-of-agents pattern amplifies both good and bad prompt design. A vague instruction in one agent cascades into confusion downstream.
  • GitLab Duo Flow's context: routing is elegant. Piping final_answer outputs between agents via declarative YAML is a clean abstraction that avoids complex state management.
  • Security remediation is a natural fit for AI agents. The workflow is highly structured (scan → classify → fix → test → document → review), the patterns are well-documented (OWASP, CWE), and the human review step at the end provides a safety net.
  • Early-exit conditions prevent waste. Without them, a 5-agent pipeline runs all stages even when there's nothing to do. Status-line protocols between agents save time and prevent hallucinated fixes.
  • Tool scoping is a form of security. Limiting each agent's API access isn't just good practice — it's a security control that prevents an agent from taking actions outside its intended role.

What's next for ShieldFlow

  • Expanded vulnerability coverage — Add support for container scanning, infrastructure-as-code misconfigurations (Terraform, Kubernetes manifests), and license compliance violations.
  • Parallel agent execution — For independent vulnerability types, run Fix Agent instances in parallel to reduce remediation time from $O(n)$ to $O(1)$ for the fix stage.
  • Learning from past fixes — Build a feedback loop where ShieldFlow learns from previously accepted/rejected fixes to improve patch quality over time.
  • Custom policy integration — Allow teams to define organizational security policies (e.g., "never use eval()", "all secrets must use Vault") that the Fix Agent enforces during remediation.
  • Multi-language support expansion — Extend fix patterns beyond Python to JavaScript/TypeScript, Go, Java, Ruby, and C# with language-specific secure coding patterns.
  • Pipeline hooks integration — Trigger ShieldFlow automatically on CI pipeline failures related to security scans, removing even the need for an @mention.

Built With

  • dast
  • gitlab-ai-catalog
  • gitlab-api
  • gitlab-ci/cd
  • gitlab-duo-flow
  • pytest
  • python
  • sast
  • yaml
Share this project:

Updates