Inspiration
Every team I've worked on has the same ritual when a pipeline goes red - someone drops what they're doing, digs through logs, traces the offending commit, figures out what else might break, writes up an issue, maybe drafts a fix. It eats 30 to 60 minutes every time, and the steps are almost identical. I kept thinking: if the pattern is this predictable, why are humans still doing it?
When GitLab opened up the Duo Agent Platform, it clicked. I could build a chain of specialized agents - each one handling a specific slice of the investigation - and wire them together into a flow that runs the moment a pipeline fails. No one has to context-switch. No one has to dig. The system just handles it.
The carbon angle came from a side conversation about how much compute we waste on failed CI runs. It's not just time - it's electricity, emissions, money. Once I realized I could quantify that per incident, it felt irresponsible not to.
What it does
AIO watches for pipeline failures via a GitLab webhook. When one hits, it kicks off a chain of 7 agents:
- Log Analysis reads the CI output and classifies the failure - syntax error, missing dependency, timeout, flaky test, infra issue
- Commit Correlation figures out which commit most likely caused it, using file overlap, recency, and commit message heuristics scored by AI
- Blast Radius parses the CI config to map which downstream jobs and services are affected
- Confidence Engine runs a weighted formula across all three signals and decides what action is safe to take - auto-fix, draft for review, or just investigate
- Narrator writes a 5-Whys postmortem with root cause, timeline, and actionable recommendations
- Carbon & Compute calculates the CI minutes wasted, grams of CO2, dollar cost, and equivalent car miles
- GitLab Actor creates the actual artifacts - an issue with the full postmortem, a pipeline comment, and optionally a hotfix MR (draft or ready depending on confidence)
The whole chain runs in about 14 seconds. There's a real-time dashboard built in Next.js that shows the incident feed, analytics, failure trends, and carbon metrics.
How we built it
The agent definitions are pure YAML using the GitLab Duo Agent Platform v1 spec - 7 agent prompts with structured JSON contracts and a sequential flow. Each agent gets context from its predecessors via context:<agent>.final_answer bindings.
The backend is Node.js with Express and TypeScript, deployed on Cloud Run. It receives the GitLab webhook, queues the event through Pub/Sub, and orchestrates the agent chain. Gemini 2.0 Flash handles the fast classification tasks; Claude Sonnet 4 kicks in for deeper reasoning like postmortems. There's a provider-agnostic router so you can switch with an env var.
State lives in Firestore for real-time subscriptions and BigQuery for analytics. The dashboard is Next.js 14 with Tailwind, using SWR for data fetching and Recharts for visualization. Auth is NextAuth with three RBAC tiers.
The monorepo is managed with npm workspaces and Turborepo. Deployment is a single script that sets up all GCP resources, builds via Cloud Build, and deploys both services to Cloud Run.
Challenges we ran into
Getting the confidence formula right was the hardest part. Early versions were either too aggressive (auto-fixing things that shouldn't be auto-fixed) or too conservative (investigating everything). The current weighted approach - 35% log signal, 40% commit signal, 25% blast radius - took a lot of iteration with real failure scenarios to feel trustworthy.
Structured JSON output from LLMs was another headache. Agents would sometimes return markdown-wrapped JSON or add commentary. Zod validation at every boundary catches this, and the deterministic fallback templates mean the chain never stalls even if the AI misbehaves.
Making the agents work well within GitLab Duo's tool constraints required careful scoping - each agent only gets the specific Duo tools it actually needs, and the prompt engineering had to be tight to get reliable JSON outputs with minimal hallucination.
Accomplishments that we're proud of
The confidence-gated safety system is probably the thing I'm most proud of. AI-powered auto-remediation sounds great in a pitch, but shipping it without guardrails would be reckless. The three-tier system (auto, draft, investigate) means the system is genuinely useful at every confidence level - it's never just shrugging and saying "I don't know."
The carbon tracking turned out to be more impactful than I expected. Seeing actual CO2 numbers next to each incident makes CI waste visceral in a way that "pipeline failed" doesn't.
The whole thing running end-to-end in 14 seconds - from webhook to GitLab issue with postmortem, confidence score, carbon metrics, and optionally a hotfix MR - still feels satisfying every time.
What we learned
Prompt engineering for structured output is a completely different discipline than conversational prompting. The system prompts for these agents are tight, example-driven, and leave zero room for the model to freestyle.
Building on the Duo Agent Platform taught me a lot about designing agent-to-agent contracts. When agents pass structured data to each other, the schema IS the API - get it wrong and the whole chain falls apart.
Also learned that green computing metrics are surprisingly easy to add and surprisingly effective at changing behavior. People pay attention when you show them their pipelines are burning real carbon.
What's next for AIO
- Self-learning loop - feed resolved incident outcomes back into the confidence model so it calibrates over time
- Multi-pipeline correlation - detect when failures across different projects share a common root cause
- Confidence threshold UI - let teams tune the auto-fix/draft/investigate thresholds from the dashboard instead of env vars
- GitLab Duo catalog integration - publish the agent flow to the Duo agent catalog so any GitLab project can enable it with one click
Built With
- bigquery
- claude
- cloudbuild
- cloudrun
- docker
- express.js
- firestore
- gemini
- gitlabduo
- next.js
- nextauth
- node.js
- pubsub
- react
- recharts
- swr
- tailwindcss
- turborepo
- typescript
- zod

Log in or sign up for Devpost to join the conversation.