Phoenix : Self-Healing DevSecOps Agent Orchestra

Inspiration

Developers spend only 52 minutes a day actually writing code. The other 7+ hours disappear into the DevSecOps tax waiting for slow pipelines, chasing reviewers on merge requests, manually gathering SOC2 compliance evidence before audits, and triaging security vulnerabilities that sit in a backlog for weeks.

AI has made writing code faster. But nobody automated the everything else. Planning, security, compliance, deployments every handoff in the software lifecycle is still manual. Every handoff is still waiting for a human.

We wanted to build something that eliminates the handoffs entirely.

What it does

Phoenix is a team of four specialized AI agents, orchestrated through the GitLab Duo Agent Platform, that cover the entire software development lifecycle automatically from the moment a bug is filed to the moment compliance evidence is written.

The Fixer — Triggered when @ai-phoenix-fixer-gitlab-ai-hackathon is mentioned in a GitLab issue. It queries the project's history (related issues, commits, MRs, and past CVEs) using a 4-phase approach:

Phase 1 (Knowledge Graph): Searches for legacy risks before writing a single line of code
Phase 2 (Vibe Session): Uses Claude to creatively plan and weigh two fix approaches
Phase 3 (Spec Session): Hardens the plan into an exact specification
Phase 4 (Execute): Creates a branch, pushes the fix, and opens an MR with a full Context Report

The Skeptic — Triggered when assigned as MR reviewer. Uses Claude as a hostile senior engineer — auditing the diff for SQL injection, XSS, missing validation, and logic errors. Runs a multi-round debate loop, posting Round N — REJECTED or Round N — APPROVED as public MR comments until zero critical issues remain.

The Eco-Cop — Triggered on every git push before the CI/CD pipeline starts. Scans the diff for errors guaranteed to fail the build. If found, cancels the pipeline before it wastes compute and posts the exact CO₂ saved (minutes_saved × 0.0002 kg/min). Maintains a Flaky Test Memory to avoid cancelling known-flaky test failures.

The Auditor — Triggered when any MR is merged to main. Collects SOC2 compliance evidence — who approved, security scan result, pipeline result, branch protection status — and writes a structured record to COMPLIANCE_LOG.md automatically. Turns weeks of manual audit prep into zero seconds of work.

How we built it

Each agent is defined as a GitLab Duo Agent (agent.yml) published to the AI Catalog, orchestrated through a GitLab Duo Flow (flow.yml).

Claude (Anthropic) powers the Fixer and Skeptic agents via the GitLab AI Gateway. The Fixer uses structured JSON prompting with a 4-phase reasoning chain. The Skeptic uses an adversarial system prompt that forces multi-round debate with the Fixer.

Google Cloud Run hosts the Eco-Cop's backend service, a Python FastAPI app that handles diff analysis, CO₂ calculations, and Redis-backed flaky test memory.

Tech stack:

GitLab Duo Agent Platform (agents + flows)
Claude claude-sonnet-4-20250514 (Anthropic) via GitLab AI Gateway
Google Cloud Run + Redis (Eco-Cop backend)
Python 3.12 (agent logic)
GitLab API (issues, MRs, pipelines, files)

Challenges we ran into

The GitLab Duo Flows schema was different from what we expected — the template requires a specific definition: structure with components, prompts, routers, and flow sections. Getting this right took several iterations.

Prompt engineering for safety — we didn't want the Fixer to guess when context was insufficient. Adding the sufficient_context flag and refusal condition (post a comment and stop rather than hallucinate a fix) was crucial for real-world reliability.

Accomplishments we're proud of

The Skeptic debate loop produces public, round-by-round argument logs in MR comments — judges and developers can see the agents reasoning in real time
The Knowledge Graph approach in the Fixer — searching issues, commits, and MRs for legacy risks before writing any code — goes beyond simple code generation
The Eco-Cop's Flaky Test Memory — ignoring known-flaky tests rather than cancelling all failed pipelines — solves the most common CI/CD annoyance in real teams
The COMPLIANCE_LOG.md auto-generation on every merge eliminates SOC2 audit prep entirely

What we learned

GitLab Duo Agent Platform is genuinely powerful for multi-agent orchestration the ability to publish agents to a public catalog and reference them in flows is exactly what enterprise DevSecOps needs
Adversarial AI patterns (the Skeptic as a hostile reviewer) are more reliable than a single-pass code reviewer — forcing Claude to look for flaws rather than approve produces better security outcomes
Chain-of-thought prompting (Knowledge Graph → Vibe → Spec → Execute) dramatically reduces hallucinations in code generation compared to a single "write a fix" prompt

What's next for Phoenix

MR conflict resolution — an agent that detects and resolves merge conflicts automatically
Incident post-mortem writer — triggered on pipeline failures, generates structured post-mortems with root cause, timeline, and action items
Carbon-aware scheduling — Eco-Cop suggests running heavy CI jobs during hours when the local power grid uses renewable energy (using WattTime API)