Inspiration
Every software team ships code fast but few ask: what's the real cost of this release?
We kept seeing the same patterns across organizations: merge requests merged without traceability to planning issues, CI pipelines burning compute with no caching, zombie review environments running for weeks after their MRs were closed, and security findings buried in scan reports that nobody read. The governance, sustainability, and security dimensions of software delivery are treated as afterthoughts separate dashboards, separate teams, separate workflows.
We asked: what if a single AI agent could watch every merge request and assess it across all three ESG dimensions then actually fix what it finds?
The GitLab Duo Agent Platform gave us the perfect foundation. With native access to 70+ GitLab API tools and Anthropic Claude built in, we could build something that doesn't just report problems but resolves them autonomously, auditably, and with minimal carbon footprint.
What it does
ESG-Ops is a multi-agent AI orchestrator that evaluates every merge request across three pillars:
Governance (Compliance Sentinel) : Verifies MR-to-issue traceability, analyzes SAST/DAST findings, checks approval status, detects copyleft licenses, and generates a SHA-256 compliance snapshot for cryptographic auditability.
Environmental (Eco-Impact Optimizer) : Estimates pipeline ( CO_2 ) emissions using a physics-based model:
$$CO_2 \text{ (grams)} = \frac{\text{duration (min)} \times \text{power (W)} \times PUE \times \text{carbon_intensity (g/kWh)}}{60{,}000}$$
where power ranges from ( 10\text{W} ) (lint jobs) to ( 150\text{W} ) (GPU workloads), ( PUE = 1.1 ), and carbon intensity is ( 430 \text{ g } CO_2/\text{kWh} ) (IEA 2024 global average). It also hunts zombie environments idle review apps wasting cloud resources and energy.
Security (Security Auto-Fixer) : Reads SAST/DAST findings, generates targeted code patches for vulnerabilities (SQL injection, XSS, hardcoded secrets, command injection, path traversal), and maps every fix to OWASP Top 10, CWE, SOC 2 Type II, and ISO 27001.
The ESG Orchestrator chains these components sequentially : Compliance Sentinel → Eco-Impact Optimizer → ESG Report Generator and posts a unified ESG Scorecard (( 0 )–( 100 ), graded A–F) as an MR comment.
Beyond the orchestrator, four standalone agents are available via the Duo Chat sidebar or @mention in MR comments: the Auto-Remediator creates branches, commits fixes, and opens MRs for CI/CD optimizations. The Security Auto-Fixer does the same for vulnerability patches with full compliance documentation. The Zombie Environment Killer hunts idle cloud resources. And the Impact Predictor simulates best/expected/worst scenarios across 5 dimensions over a 7-day horizon, identifying butterfly effects before code is merged.
Every decision is Ed25519-signed and hash-chained a tamper-evident, independently verifiable audit trail inspired by SLSA provenance.
How we built it
We designed ESG-Ops as 7 declarative YAML flows running on the GitLab Duo flow registry v1 specification, powered by 7 specialized agents.
The architecture follows a sequential pipeline pattern: the ESG Orchestrator chains the Compliance Sentinel, Eco-Impact Optimizer, and ESG Report Generator in sequence, with each component's findings passed to the next. Four additional standalone agents Auto-Remediator, Security Auto-Fixer, Zombie Environment Killer, and Impact Predictor are independently invokable via the Duo Chat sidebar or by @mentioning their service account in MR comments.
Under the hood, we built 7 Python engines that power the agents:
- Carbon Calculator : Physics-based ( CO_2 ) estimation with per-job power profiles and relatable comparisons ("equivalent to charging 12 smartphones")
- Provenance Engine : Ed25519 key generation, signing, hash-chaining, and verification
- Cascade Scorer : A tiered analysis system (Skip/Quick/Full) that classifies changes and routes them to the appropriate depth of analysis, reducing the agent's own token consumption by up to ( 75\% )
- Security Mapper : Automated mapping of findings to 4 compliance frameworks
- Auto-Remediator : YAML-aware patch generator that creates concrete
.gitlab-ci.ymlfixes - Pipeline Optimizer : Detects anti-patterns like missing caches, non-interruptible jobs, and oversized images
- Dashboard Generator : Produces a static HTML dashboard with Chart.js visualizations, deployed via GitLab Pages
The CI pipeline is self-auditing: it lints and tests the agents, validates all flow YAML schemas, runs the carbon calculator on its own pipeline, and deploys the ESG dashboard — all with caching, DAG optimization, and interruptible jobs.
We backed everything with 184 unit tests covering all engines.
Challenges we ran into
Balancing autonomy with safety. Auto-remediation is powerful but dangerous. We spent significant effort ensuring the Security Auto-Fixer never auto-applies a patch that could break functionality — it always creates a separate MR for human review. The Zombie Environment Killer recommends stops but never auto-deletes. Drawing the line between "helpful automation" and "reckless robot" required careful judgment at every step.
Carbon estimation without ground truth. There's no standard way to measure the ( CO_2 ) of a CI pipeline. We built our model from first principles job type classification, power profiles based on Cloud Carbon Footprint methodology, PUE overhead factors but validation is inherently difficult. We chose transparency over precision: every estimate shows its assumptions and methodology.
Making the agent practice what it preaches. It's ironic to build a sustainability agent that wastes tokens. The Cascade Scorer was born from this tension we needed the agent to be intelligent enough to know when not to think. Classifying a whitespace-only commit as SKIP (( 0 ) tokens) versus a dependency update as FULL (( {\sim}5000 ) tokens) required careful heuristic design.
Cryptographic provenance at the flow level. Implementing Ed25519 signing within the constraints of the Duo Agent Platform meant the provenance engine had to be stateless across flow steps. Each decision record carries enough context to reconstruct and verify the entire chain independently.
Accomplishments that we're proud of
End-to-end auto-remediation: ESG-Ops doesn't just find problems it opens MRs with fixes. The Auto-Remediation Engine generates concrete YAML patches with rule IDs (
CACHE-001,INTERRUPT-001) for auditability. The Security Auto-Fixer produces code patches with full OWASP/CWE/SOC 2/ISO 27001 compliance documentation.The agent is itself green: Cascade scoring reduces token usage by up to ( 75\% ). The CI pipeline uses slim images, cached dependencies, DAG optimization, and interruptible jobs. We estimated our own pipeline's carbon footprint and optimized it.
Cryptographic trust, not platform trust: Every ESG assessment produces an Ed25519-signed, hash-chained provenance bundle. You don't have to trust the agent you can verify it mathematically.
184 tests passing: Every engine is thoroughly tested, from carbon math edge cases to provenance chain verification to security mapping coverage across all OWASP Top 10 categories.
Impact prediction with butterfly effects: The Impact Predictor doesn't just score risk it identifies non-obvious second-order consequences of changes, simulating cascading effects across security, performance, compliance, carbon, and team velocity.
What we learned
ESG in DevOps is an unsolved and underserved problem. Most teams treat compliance, sustainability, and security as separate concerns with separate tools. Unifying them into a single score per merge request creates a fundamentally different relationship with release quality.
AI agents need to be auditable to be trustworthy. LLM-powered decisions in CI/CD pipelines are only acceptable if every decision can be independently verified. Cryptographic provenance transforms AI from "magic black box" to "verifiable advisor."
Sustainability is a design constraint, not a feature. Building a green agent forced us to think about efficiency at every level from token usage to Docker image sizes to artifact retention policies. These constraints produced a better architecture overall.
The Duo Agent Platform is remarkably capable. Declarative YAML flows with 70+ built-in tools and native Claude integration meant we could focus on what the agents should do rather than how to wire them together. The platform handled LLM execution, tool orchestration, and trigger routing.
What's next for ESG-Ops Agent
- Real-time carbon dashboards : Stream per-pipeline carbon data to a time-series backend, enabling trend analysis and carbon budgets per team
- Policy-as-code ESG gates : Let teams define custom ESG thresholds (e.g., "block merges with score ( < 60 )") as declarative policies in their repositories
- Multi-project ESG rollups : Aggregate scores across an entire GitLab group to give engineering leadership a portfolio-level ESG view
- Regional carbon intensity : Replace the global average with real-time grid carbon data (e.g., Electricity Maps API) based on runner location
- Learning from remediation outcomes : Track whether auto-fix MRs get merged or closed, and feed that signal back to improve future recommendations
- SBOM integration : Incorporate Software Bill of Materials analysis into the compliance pillar for deeper supply chain governance

Log in or sign up for Devpost to join the conversation.