Aegis Security Automation Pipeline
Autonomous Vulnerability Remediation for GitLab CI/CD
What Is Aegis?
Aegis is a multi-agent security automation system that integrates directly into GitLab CI/CD pipelines. It automatically detects, triages, fixes, and validates security vulnerabilities across six domains—SAST, DAST, SCA, secrets, and IaC—with minimal human intervention.
The system operates as a linear pipeline. A master orchestrator discovers everything about the repository, then domain-specific agents process findings in sequence, generating unified diffs and committing fixes directly to the branch.
How It Works
The Master Orchestrator
The pipeline triggers with a goal string containing a project ID and branch name. The Security Master validates this input, then recursively fetches every relevant file from the repository—source code, manifests, configuration files, IaC templates, and CI/CD definitions. It also pulls scanner results from GitLab's security APIs.
The Master outputs structured data with exact section headers. Each section contains formatted findings with file paths, line numbers, evidence, and severity. If a domain has no findings, it outputs exactly "No findings."
The Domain Triads
Each security domain has three agents that run in sequence:
Triage Agent extracts findings from the Master's output, pulls file content, and outputs structured blocks with vulnerable code and context. The Secrets Triage Agent additionally scans git history to find credentials committed months ago.
Fix Agent is read-only. It takes the triage output and generates minimal unified diffs that remediate the vulnerability. For SAST, this might mean replacing string concatenation with PreparedStatements. For SCA, it upgrades the vulnerable dependency line. For secrets, it replaces hardcoded credentials with environment variable references and provides a rotation checklist. For IaC, it distinguishes between non-destructive fixes and destructive fixes.
Validate Agent is the only component that can write to the repository. It verifies each diff against the actual file content, applies the change, commits with a descriptive message, and confirms the vulnerability as resolved. For destructive IaC fixes, it creates an issue instead of committing.
Domain-Specific Logic
SAST handles injection vulnerabilities, XSS, and unsafe deserialization. Fixes include parameterized queries, HTML escaping, and input validation.
DAST addresses runtime vulnerabilities like CSRF and missing security headers. Fixes add CSRF tokens and configure Content Security Policy headers.
SCA upgrades vulnerable dependencies in manifests. It changes only the affected line, never touching unrelated dependencies.
Secrets Detection scans both current files and git history. It finds AWS keys, database passwords, and API tokens. Fixes replace them with environment variable references and provide step-by-step rotation checklists.
IaC scans Terraform and Kubernetes manifests for misconfigurations. It identifies destructive changes—fixes that would destroy or recreate resources—and escalates those to issues with manual remediation steps. Non-destructive fixes are auto-committed.
Challenges We Ran Into
Linear Pipeline Constraint
The architecture required agents to run in strict sequence, but many security domains could theoretically be processed in parallel. This created longer overall execution times since SAST, DAST, SCA, secrets, and IaC couldn't run simultaneously. A parallel architecture would have been more efficient, but the dependency chain—particularly the need for the Master's output to propagate correctly—forced a linear design.
Permission Issues with CI/CD
Agents required fine-grained permissions to read repository files, fetch security findings, and create commits. Striking the right balance between functionality and security was difficult. Some agents needed read-only access while others needed commit permissions, and managing these distinct roles within GitLab's CI/CD permission model required careful scoping.
Context Propagation Between Agents
Ensuring that every agent received the correct project ID, branch name, and mode required a consistent context block at the top of every output. When this propagation failed—often due to race conditions or timing issues—agents would attempt to work with missing data. Making the system resilient to these failures required explicit dependencies at every step.
What We Learned
Agent Communication Patterns — A simple, structured context block at the top of every agent's output is the most reliable way to pass information through a multi-agent pipeline. Each agent copies the block forward, ensuring every downstream agent has exactly what it needs.
The Power of Read-Only Fix Agents — Separating fix generation from fix application eliminated the risk of agents accidentally committing incorrect changes. The Validate Agent acts as a safety gate, verifying every diff before it touches the repository.
Destructive Change Awareness — Infrastructure changes that would destroy or recreate resources need special handling. Building detection into the IaC agents prevented potentially disruptive changes from being auto-committed by escalating them to issues instead.
What's Next for Aegis
Parallel Processing — Moving from a linear pipeline to a parallel architecture would significantly reduce execution time. The Master could fan out to domain triads simultaneously, with a final aggregation agent collecting results and committing all fixes. This requires careful synchronization but is achievable.
Expanded IaC Coverage — Adding support for additional infrastructure tools like CloudFormation, Pulumi, and Ansible would broaden Aegis's reach. Each platform has its own destructive change patterns that need detection and escalation.
Rollback Capabilities — Currently, Aegis commits fixes but cannot roll them back if something breaks. Adding automatic rollback on pipeline failure would make the system more resilient. This could be implemented by tagging remediation commits and triggering reverts when tests fail.
Summary
Aegis automates the entire security remediation lifecycle. The Master discovers and structures findings. Triage agents extract context. Fix agents generate diffs. Validate agents commit changes.
The system handles SAST, DAST, SCA, secrets, and IaC in one linear flow, transforming security remediation from a manual backlog into an autonomous pipeline. While challenges around parallel execution, permissions, and context propagation required careful design, the result is a resilient system that closes vulnerabilities in minutes rather than weeks.
Built With
- gitlab
- yaml
Log in or sign up for Devpost to join the conversation.