An Agent Triggered
OpsOrchestrator Running in Duo chat
Agent Applied all fixes
Agent commented Status Report
Agent commented Status Report

OpsOrchestrator: The Autonomous SDLC Engine

Inspiration

Every engineering team we've worked on has the same invisible tax: the work around the code. Grooming sessions that eat up Tuesday afternoons. Tickets so vague that developers spend the first day just figuring out what "done" means. Security findings that sit unreviewed until the night before a release. Deployment decisions made by gut feel because nobody has time to audit the pipeline properly.

AI made this worse before it made it better. When code generation accelerated, planning and oversight became the new bottlenecks. You can ship a feature in an afternoon — but only if the ticket was clear, the dependencies were mapped, the security scan passed, and someone remembered to check whether the branch was safe to merge. None of that got faster.

The cost is quantifiable. If a team of 10 engineers each loses 7 hours per week to planning and administrative overhead at a fully-loaded cost of $150/hr:

$$\text{Weekly waste} = 10 \times 7 \times \$150 = \$10{,}500 \text{ per week}$$

$$\text{Annual waste} = \$10{,}500 \times 52 = \$546{,}000 \text{ per team per year}$$

We built OpsOrchestrator because we wanted a teammate that never forgets, never skips the checklist, and never needs to be asked twice.

What It Does

OpsOrchestrator is a unified AI command center built natively on the GitLab Duo Agent Platform. It orchestrates eight specialized agents that cover every non-coding bottleneck in the software development lifecycle:

Agent	Role
Intent Router	Central entry point in Duo Chat. Triages natural language requests and physically delegates to the right specialist by triggering a session.
Sprint Planner	Decomposes vague feature ideas into structured, sprint-ready child issues with Fibonacci estimates, skill-based assignments, and a tracking MR.
Security Reviewer	Scans MR diffs for 20+ vulnerability types (hardcoded secrets, SQL injection, PII). Posts inline severity badges with remediation code.
Compliance Checker	Enforces 10 project policy gates. Generates a persistent Audit Trail JSON for SOC2/GDPR compliance and can auto-patch minor violations.
DevOps CI/CD Doctor	Performs root-cause analysis on real job logs. Uses `edit_file` to commit the fix directly and restore the pipeline to green.
Dependency Agent	Monitors the supply chain. Identifies vulnerable or outdated packages and opens a safe version-bump MR.
Deployment Advisor	Evaluates six readiness signals and posts a structured GO / CAUTION / NO-GO report with rollback plan.
Standup Digest	Synthesizes project health, developer workload, and blockers into a single personalized daily digest.

One trigger. Eight agents. Zero bottlenecks.

The Deployment Advisor's verdict is computed as a weighted signal score:

$$\text{DeployScore} = \sum_{i=1}^{6} w_i \cdot s_i \quad \text{where } s_i \in {0,\ 0.5,\ 1},\quad \sum w_i = 1$$

How We Built It

The entire system is a showcase of the GitLab Duo Agent Platform's extensibility.

The Execution Layer

Each of our 8 agents is a Custom Agent defined as a self-contained YAML flow. We utilized the full suite of Duo primitives — edit_file, create_commit, list_merge_request_diffs, and get_job_logs — to move beyond simple chat into autonomous action.

The Reasoning Layer

We utilized Claude Sonnet 4.6 via the GitLab Duo AI Gateway. Claude powers our Two-Step ChatOps Protocol:

Mode 1 — Assessment: The agent analyzes, reports, and appends a specific copy-paste authorization command. It does not write a single line of code.
Mode 2 — Execution: Upon receiving the exact phrase apply fixes, the agent gains human-in-the-loop authorization to modify the repository.

This means every autonomous commit in the project history has a matching human approval comment above it. The AI is auditable by design.

The Integration Layer

Our agents communicate via a Label-as-State-Machine architecture. Labels like security::approved or compliance::failed act as the shared "memory" of the project, allowing the Deployment Advisor and Standup Digest to see exactly what every other agent has found — without needing an external database or message bus.

Challenges We Ran Into

The "Silent Exit" Problem: Early in development, agents would sometimes exit without posting a report if they couldn't resolve a Project ID. We solved this by hardening the Absolute Rules section in every YAML flow — ensuring context:project_id is always extracted and validated before any tool call.

Human-in-the-Loop Balancing: We had to ensure the agents were autonomous but safe. Building the Two-Step Protocol was a core design challenge: how do you make an agent smart enough to fix a failing pipeline but "polite" enough to ask first? We landed on a phrase-based trigger system that requires a human to explicitly sign off before the AI commits anything.

Token Optimization: Reading full CI logs or large MR diffs can quickly exceed context windows. We implemented a Priority Filter for the CI/CD Doctor that extracts only the final 15 lines and specific error patterns, allowing the agent to diagnose failures with surgical precision without consuming the full log.

Accomplishments We're Proud Of

The Intent Router — Building a central brain that handles multi-intent requests (e.g., "Plan this issue AND check this MR for security") and delegates to two agents simultaneously, each triggering an independent session.
Self-Healing SDLC — Seeing the CI/CD Doctor identify an AssertionError in a job log and physically push a commit to fix the code, restoring the pipeline to green in under 60 seconds.
Audit Trail Generation — The Compliance Checker doesn't just comment; it creates a structured JSON audit record, making the AI an active participant in regulatory governance rather than a passive advisor.
181 Tests, All Passing — Every flow definition is validated by a CI pipeline with 181 unit tests on every push, ensuring the agents are always production-ready.

What We Learned

Tools > Prompts: An agent's intelligence is fundamentally limited by its toolset. By giving our agents access to list_project_audit_events, they became Project Historians — able to provide context and evidence that no standalone LLM could ever know.

Native is Better: We initially considered an external hosting model, but building natively inside GitLab provides better security, lower latency, and a tighter integration with the platform's RBAC and token model. The AI Gateway is a game-changer for enterprise-grade agents.

LLM Prompt Architecture is Software Engineering: Writing a 600-line YAML flow is not prompt engineering — it is software engineering. Phase-ordered execution, explicit mode guards, negative constraints, and fallback branches are the same disciplines as writing reliable code. We approached every flow as a state machine, not a chat.

What's Next for OpsOrchestrator

Our roadmap focuses on Self-Improving Estimation. We are developing a feedback loop where the Estimation Agent compares its original Fibonacci weights to the actual time-to-close recorded by the Standup Digest.

Using an exponential moving average to converge on real team velocity:

$$v_{n+1} = \alpha \cdot v_{\text{actual}} + (1 - \alpha) \cdot v_n \qquad \alpha = 0.3$$

Over time, the agents will learn the team's specific throughput and become measurably more accurate with every sprint — without any manual calibration.

We also plan to release OpsOrchestrator to the GitLab AI Catalog so any organization can install their own team of autonomous SDLC specialists with a single click.

Powered by OpsOrchestrator on GitLab Duo · Claude via Anthropic