Every engineering team has merged a "quick infrastructure scaling" MR without knowing it added $700/month in cloud costs or 53 kg of CO2 emissions. We've all done it. The cloud bill shows up a month later, and by then nobody remembers which MR caused it.

Meanwhile, regulations are catching up. California's SB 253 requires companies with over \$1B revenue to report Scope 2 computing emissions by August 2026. The EU's CSRD requires companies with 250+ employees to disclose emissions under ESRS E1. Engineering teams will need auditable trails of infrastructure decisions tied to code changes.

We asked: what if governance happened automatically, at the merge request level, before the code ships?

What it does

Equilibrium is a 3-agent GitLab Duo Flow that audits every infrastructure merge request across three dimensions:

  • Cost: Calculates monthly cloud spend delta from Terraform diffs using GCP pricing data
  • Carbon: Estimates CO2 emissions using the formula:

$$\text{kgCO}_2\text{e/month} = \text{vCPUs} \times 0.007 \text{ kW} \times 730 \text{ hrs} \times 1.1 \text{ PUE} \times \frac{\text{gCO}_2\text{/kWh}}{1000}$$

  • Compliance: Checks approvals, linked issues, pipeline status, and security scan results

The flow reads a single Markdown policy file (GOVERNANCE.md) where teams set their priority (cost, sustainability, deadline, or balanced) and thresholds. The same MR gets a different verdict depending on the policy. Under balanced, a threshold breach means BLOCKED. Switch to priority: deadline and context: ship-by-friday, and the flow approves with conditions and creates a follow-up issue instead.

The flow doesn't just comment. It takes autonomous action:

  • Posts structured audit reports with cost/carbon tables
  • Creates fix commits (e.g., migrating to a low-carbon GCP region)
  • Opens compliance tracking issues with labels
  • Maintains a running governance dashboard with budget forecasts

How we built it

Equilibrium is built entirely on the GitLab Duo Agent Platform using a chained flow architecture:

  1. Evidence Analyzer — Reads the MR diff, looks up instance pricing from data/pricing.json (with MCP fallback to live GCP billing APIs), calculates carbon using region-specific grid intensity from data/carbon.json (with MCP fallback to Climatiq), and checks compliance signals. Outputs a structured JSON evidence summary. Does not post comments or make decisions.

  2. Governance Reasoner — Reads GOVERNANCE.md, applies priority logic to the evidence, and determines the verdict. Posts a formatted audit comment on the MR. Creates fix commits when a cheaper or greener alternative exists. Opens compliance tracking issues when gaps are detected.

  3. Governance Tracker — Reads the reasoner's output, updates cumulative cost/carbon totals in data/dashboard.json, calculates budget forecasts, and posts a dashboard summary comment.

The agents are wired sequentially in flows/equilibrium-flow.yml:

evidence_analyzer → governance_reasoner → governance_tracker → end

Each agent has a single responsibility and a focused toolset. The Evidence Analyzer can read files and diffs but cannot post comments. The Reasoner can post comments and create commits but doesn't collect evidence. This separation makes the system predictable and debuggable.

Key design decisions:

  • Policy as code: GOVERNANCE.md is a plain Markdown file anyone can read and edit. No YAML configs, no external dashboards. Commit a change, and the next audit uses the new rules.
  • MCP with fallback: The flow tries live data sources first (Google Cloud MCP for pricing, Climatiq MCP for carbon intensity) and falls back to static JSON files. This means it works offline and in demo environments.
  • GPU-aware carbon: The Evidence Analyzer detects GPU instance types (a2-, g2-, a3-) and applies a $2.5\times$ energy multiplier for accurate carbon estimation.

Challenges we faced

Getting the agent chain right. The hardest part was ensuring clean data handoff between agents. The Evidence Analyzer's output needs to be structured enough for the Reasoner to parse, but the platform passes it as a string. We solved this by having the Evidence Analyzer emit both a human-readable summary and a JSON block, and instructing the Reasoner to parse the JSON.

Priority logic complexity. The interaction between priority and context creates a matrix of behaviors. Under priority: deadline + context: ship-by-friday, the Reasoner must approve with conditions instead of blocking, and create follow-up issues instead of fix commits. Getting the prompt to reliably follow this logic across edge cases took significant iteration.

Carbon calculation accuracy. Real-world carbon accounting is complex. We simplified to a per-vCPU model using publicly available grid intensity data, but had to account for PUE (Power Usage Effectiveness), GPU workloads, and regional variation. The formula is transparent and shown in every audit comment so teams can verify it.

Dashboard state persistence. The Governance Tracker needs to read and update data/dashboard.json on every run. If two flows run concurrently, they could overwrite each other. We mitigated this by having the Tracker read from main and commit back, relying on Git's conflict detection.

What we learned

  • Policy-as-code changes everything. Making governance a Markdown file that lives in the repo means developers actually read it and change it. It's version-controlled, diff-able, and reviewable in MRs.
  • Agent separation matters. A single agent trying to collect evidence, reason about policy, and update dashboards produces inconsistent results. Three focused agents with clear boundaries are far more reliable.
  • The "same code, different verdict" moment is powerful. Showing that changing one word in a policy file changes the AI's reasoning from BLOCKED to APPROVED WITH CONDITIONS is the clearest demonstration of why agentic flows matter.
  • Sustainability in software is an unsolved problem. Most engineering teams have zero visibility into the carbon impact of their infrastructure decisions. The tooling gap is enormous, and regulations are arriving faster than solutions.

What's next

  • Auto-close compliance issues when the MR receives the required approvals
  • Multi-cloud support for AWS and Azure pricing/carbon data
  • GitLab Pages dashboard with historical trends and team-level carbon budgets
  • Slack/Discord notifications for BLOCKED verdicts
  • Integration with real Terraform state to compare planned vs. actual resource usage ## How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Equilibrium

Built With

  • gitlab
Share this project:

Updates