Inspiration Every engineering team I've worked with shares the same two pain points — and they happen at the worst times.
At 2am, a pipeline fails. Security scanners surface 3 CRITICAL vulnerabilities. Nobody sees them until morning. By then, it's a fire drill: triage the findings, cross-reference CVE databases, read the code, write a fix, open a PR, get it reviewed. Half a day gone. And this repeats every sprint.
At 9am, the team standup begins. Each developer spends 3 minutes recalling what they did yesterday. One person is blocked — an MR has been idle for 6 hours waiting on a review. Nobody knows until 9:22am. The meeting ends. Nobody acts immediately. Context evaporates by noon.
These aren't hard problems. They're recall and coordination toil — exactly what AI agents should absorb. That was the inspiration: build one platform that eliminates both, with no custom infrastructure, no external APIs, and no agent that ever takes an irreversible action.
What I Built Autonomous DevOps Agents is a single platform with two complementary feedback loops, both running entirely on the GitLab Duo Agent Platform with Anthropic Claude as the reasoning layer.
Reactive Loop — SecureGuard AI 🛡️ Fires when a pipeline completes with security findings. Five agents execute in sequence: Pipeline Fails │ ▼ SOC Triage → Threat Intel → Remediation → Validation → Green Agent
SOC Triage maps every finding to MITRE ATT&CK, calculates a composite risk score, and creates a ranked GitLab issue Threat Intel enriches CRITICAL/HIGH with CVE details, CVSS v3.1 vectors, and CISA KEV status Remediation reads the vulnerable code, generates a minimal targeted fix, and opens a branch + MR + security test Validation verifies the root cause was fixed (not just the symptom), labels the MR, and emits a Security Posture Report Green Agent estimates the CO₂ cost of the agent run and recommends pipeline optimizations The composite risk formula is:
risk = min( 100 , cvss × 10 × exploitability × blast_radius × confidence)
Demo result: 3 CRITICAL + 3 HIGH findings → 0 in 4.2 minutes. Max risk score dropped from 95 → 32. 8 validated MRs, all awaiting human review.
Proactive Loop — StandupAI Fires every weekday morning at 9am. Five agents execute in sequence: 9:00 AM trigger │ ▼ Activity Collector → Blocker Detector → Summary Generator → Delivery → Green Agent
Activity Collector reads every MR, pipeline, commit, issue, and comment from the past 24 hours Blocker Detector identifies real blockers (MR idle >4h, broken main, approved MR not merged) and filters noise (flaky tests, intentional WIPs, still-running pipelines) Summary Generator writes a personal standup for each developer and a team health report with action items Delivery creates a daily standup GitLab issue, per-blocker issues assigned to the person who needs to act, and comments on stuck MRs Green Agent identifies workflow waste patterns and estimates compute savings Demo result: Standup information delivered in 8 minutes vs. 25-minute manual meeting. Blockers surfaced before the meeting begins. Overnight SecureGuard fix MRs appear in the morning report — closing the full DevOps feedback cycle.
Shared Design Principles Both loops use only GitLab platform-native tools — list_security_findings, create_merge_request, create_issue, list_mrs, list_commits, etc. No external APIs. No custom infrastructure. One shared Green Agent backbone.
Agents create issues and MRs. They never merge, close, or reassign. Humans retain full authority over all code and workflow changes.
Challenges
Irreversibility as a design constraint. The hardest part wasn't getting agents to produce good output — it was ensuring they could never cause harm. Every action is a GitLab draft or an issue. Nothing gets merged without a human.
Blocker detection requires reasoning, not rules. A rule-based system flags every idle MR as a blocker. An agent can reason: "This MR has been idle 6 hours, but the author marked it WIP and the pipeline is still running — not a blocker." That distinction required careful prompt engineering and testing.
Building within the platform constraints. The group CI policy uses override_project_ci, meaning no custom CI jobs can run in the pipeline itself. The agents had to orchestrate entirely through the GitLab Duo Agent Platform flows — no escape hatches.
Minimal fixes, not rewrites. The Remediation agent was tempted (by Claude) to refactor surrounding code when fixing vulnerabilities. Strict prompting was required: fix only the vulnerable line, add only the minimum necessary test. Code reviewers don't want to audit a rewrite to approve a parameterized query fix.
What I Learned Agent orchestration shines when each agent has a single, well-scoped responsibility with a clear input/output contract The Green Agent as a shared component across both loops demonstrates that sustainability analysis is a cross-cutting concern — not a feature you bolt on "Human in the loop" isn't just compliance theater — it's the right design when agents operate on production code at 2am GitLab's platform-native tools are sufficient for real autonomous workflows; external APIs aren't needed for most DevOps automation Built With GitLab Duo Agent Platform — agent orchestration, flows, triggers Anthropic Claude — reasoning layer for all 10 agents GitLab APIs — security findings, MRs, issues, commits, pipelines (platform-native tools only) Python 3.11 — demo app with intentional vulnerabilities for testing MITRE ATT&CK Framework — vulnerability classification and triage OWASP Top 10 — remediation guidance mapping CVSS v3.1 / CISA KEV — threat intelligence scoring Docker / Cloud Run — demo app containerization YAML — agent and flow definitions (10 agents, 3 flows) Built With (tags) GitLab Duo · Anthropic Claude · Python · Docker · YAML · MITRE ATT&CK · OWASP · CVSS · GitLab CI/CD · Cloud Run
Log in or sign up for Devpost to join the conversation.