Inspiration

Every day, teams merge code blind. A 2-line auth change and a 200-line docs update get the same review process. Nobody knows which MRs spike CI bills until the monthly invoice arrives. Software sustainability is an afterthought. I built ShadowMerge to make risk, cost, and carbon visible — before you merge.

What it does

ShadowMerge is a multi-agent GitLab Duo Flow powered by Anthropic Claude that automatically forecasts the blast radius, CI cost, and carbon footprint of every Merge Request. Mention @ai-shadowmerge-gitlab-ai-hackathon in any MR comment, and a 4-agent pipeline activates:

  1. Blast Radius Profiler — reads MR diffs, checks vulnerabilities, loads team-configurable rules from .shadowmerge.yml, classifies risk (LOW/MEDIUM/HIGH)
  2. Green FinOps Estimator — reads historical pipeline baselines, forecasts CI runtime delta, pipeline cost, and CO₂ using Cloud Carbon Footprint methodology
  3. ShadowMerge Reporter — posts a formatted forecast comment, auto-labels the MR (risk tier + green score), creates a security tracking issue for HIGH risk MRs
  4. Security Advisor — reads actual high-risk source code files, identifies anti-patterns (hardcoded secrets, SQL injection, missing auth), assigns CVSS severity ratings, posts a detailed security advisory with remediation checklists

For a high-risk MR, ShadowMerge takes 7 automated actions: forecast comment, 3 labels, security issue, and security advisory comment.

How I built it

I used the GitLab Duo Agent Platform with 4 specialized Claude-powered agents orchestrated in a sequential flow. Each agent's output feeds the next via context:{name}.final_answer, enabling true multi-agent data passing — not just parallel calls.

The key architectural decisions:

  • Structured JSON contracts between agents: Agent 1 outputs a strict risk JSON schema, Agent 2 consumes it and outputs a finops JSON schema, Agent 3 reads both to generate actions. This chain-of-reasoning pattern lets each Claude agent build on the previous one's analysis.
  • Configurable behavior: Teams drop a .shadowmerge.yml file in their repo to customize risk rules, cost parameters, carbon intensity, and which actions ShadowMerge takes. The Blast Radius agent reads this config before every analysis.
  • Conditional execution: The Security Advisor only activates for HIGH risk MRs — Claude checks the risk tier and skips gracefully for low/medium, avoiding unnecessary processing.
  • Historical baselines: The Green FinOps agent reads baseline.json for data-driven estimates instead of static heuristics, with a BigQuery pipeline architecture designed for production deployment.

Carbon estimation follows Cloud Carbon Footprint methodology: energy per runner-minute (0.003 kWh) × grid carbon intensity (0.475 kgCO₂e/kWh global, 0.177 for GCP runners).

Challenges I ran into

  • Context passing: The biggest challenge was getting agents to access the MR. The flow's context:project_id needed to be explicitly passed as input to every component, and prompts had to instruct agents how to extract the MR IID from the trigger context.
  • Structured output enforcement: Getting Claude to consistently output valid JSON across 4 agents required careful prompt engineering with explicit schema definitions and "Output ONLY this JSON" constraints.
  • Sandbox limitations: The hackathon sandbox blocks external MCP servers (GITLAB_WORKFLOW_SANDBOX=true), so BigQuery integration works via baseline file with CI pipeline architecture ready for production.
  • Catalog sync: The .ai-catalog-mapping.json file caused pipeline failures when agent files were renamed — required manual cleanup and re-tagging.

Accomplishments that I'm proud of

  • 4-agent sequential flow with real structured data passing between Claude agents
  • 7 automated actions on high-risk MRs (comment, 3 labels, issue, security advisory)
  • Claude performing deep code analysis — reading actual source files, identifying CVSS-rated vulnerabilities, generating specific remediation steps
  • Team-configurable risk rules via .shadowmerge.yml — not a one-size-fits-all tool
  • Green Score (A-F) with per-MR carbon footprint estimates and historical trend tracking
  • Production-grade GCP architecture with Workload Identity Federation and BigQuery schema

What I learned

  • Anthropic Claude excels at multi-step reasoning chains — having 4 agents pass structured data creates surprisingly sophisticated analysis compared to a single prompt
  • Prompt engineering for structured JSON output requires explicit schema definitions, "ONLY this JSON" constraints, and careful field naming
  • The context:{component}.final_answer pattern enables powerful multi-agent pipelines but requires careful coordination of data contracts between agents
  • Making carbon footprint visible per-MR is surprisingly actionable — developers immediately start asking "how do I get a better Green Score?"
  • Configurable risk rules (.shadowmerge.yml) are essential for real-world adoption — every team has different risk thresholds

What's next for ShadowMerge

  • ML-powered predictions: Train a model on historical pipeline data to replace heuristic multipliers with learned patterns
  • Slack/Teams integration: Real-time notifications for HIGH risk MRs
  • Team dashboards: Visualize risk distribution and carbon trends using auto-applied labels
  • Auto-suggest reviewers: Assign reviewers based on risk surface areas (auth expert for auth changes)
  • Custom risk rules marketplace: Share .shadowmerge.yml templates across teams

Built With

  • anthropic-claude
  • cloud-carbon-footprint
  • gcp-workload-identity-federation
  • gitlab-duo-agent-platform
  • google-cloud-bigquery
  • python
  • sql
  • yaml
Share this project:

Updates