DeployGuard

DeployGuard System Architecture
Report with Vertex AI
GitLab CI pipeline for Vertex AI for MR Deployment Review
GitLab issues created automatically by DeployGuard
GitLab-Duo-Agent-MR-Comment
Duo-Agent-Critical-Security-Vulnerabilities-detection
Duo-Agent-Green_computing-Score
Duo-Agent-SRE-Deployment-Readiness-Checklist
Duo-Agent-RemediationActions+Risk-Breakdown
DeployGuard_Agent_In_catalog_&_enabled

Inspiration

Modern software development moves at breakneck speed. A developer might write a perfectly valid SQL migration or a clean REST API endpoint that passes all unit tests, but if they forget to document a rollback plan, omit a feature flag, or skip a runbook update, the deployment becomes dangerously fragile.

Worse, resource-heavy code patterns — unbounded queries, infinite loops, always-on infrastructure — silently slip into production, degrading system performance and increasing cloud costs.

The Gap: Why native GitLab isn't enough

GitLab's existing features are exceptional at access control and manual gates (who can deploy, who must approve). DeployGuard introduces intelligent content analysis — what changed, what's missing, how risky is it. They are complementary, not overlapping.

What native GitLab does not provide:

Automated operational readiness analysis of MR diffs
Auto-generated deployment checklists based on change types
Deployment risk scoring (0-100) based on change complexity
Sustainability/green scoring for cloud deployments
Automatic follow-up issue creation for missing operational requirements
Intelligent classification of changes into operational categories

What it does

DeployGuard acts as an AI-powered Site Reliability Engineer living directly inside your Merge Requests. It analyzes diffs, generates operational checklists, computes risk and sustainability scores, and enforces deployment gates — all autonomously, triggered by the CI pipeline on every MR event.

The DeployGuard Advantage

Unlike standard CI/CD security scanners or basic manual approval thresholds, DeployGuard brings several unparalleled innovations to the deployment ecosystem:

Pioneering Green Computing: Standard pipelines completely ignore the environmental operational cost of newly merged code. DeployGuard strictly evaluates code logic (like heavy payload queries) to explicitly compute a Green Score, actively helping organizations reduce their cloud carbon footprint.
Deep Semantic Code Understanding: Rather than just scanning for security vulnerabilities, DeployGuard acts as a genuine Site Reliability Engineer. It uses Vertex AI to structurally understand what the code is doing (e.g., recognizing a db_migration), demanding contextual operational checklists like providing a "Rollback_Plan".
Deterministic "No-Hallucination" Math: We explicitly avoid using conversational LLMs to make generic deployment "decisions". DeployGuard passes AI semantic categorizations directly into a concrete, deterministic Node.js mathematics engine to definitively compute the exact 0-100 Risk and Green scores. This makes DeployGuard phenomenally safer for real enterprise deployment gating!
Single-Agent Cost Efficiency: DeployGuard achieves strict operational review using a single, highly-optimized Agent pipeline, massively reducing AI token execution costs and pipeline latency compared to complex multi-agent swarms.

Key Features

1. Vertex AI Primary Path (Gemini 2.5 Flash) Every MR automatically triggers a GitLab CI job that authenticates to GCP via Workload Identity Federation, calls Vertex AI Gemini 2.5 Flash for semantic diff analysis, and posts a full DeployGuard report to the MR thread — no manual trigger needed.

2. GitLab Duo Automatic Fallback (Anthropic) If the Vertex AI path fails for any reason (auth error, model error, timeout, empty response), entrypoint.ts automatically posts a fallback comment mentioning @ai-deploy-guard-analysis-gitlab-ai-hackathon. This triggers the GitLab Duo agent (powered by Anthropic) to perform the review instead. The fallback is seamless — the MR author always gets a report.

3. Hybrid Semantic Diff Analysis Rule-based parsing combined with Gemini 2.5 Flash semantic categorization classifies every file change into operational categories: new_api_endpoint, db_migration, infrastructure_modification, config_change, dependency_update.

4. Dynamic Checklists Not static PR templates. If DeployGuard detects a database migration, it demands a rollback plan. If it detects a new API, it requires a feature flag and runbook update. Checklists are generated per-MR based on what actually changed.

5. Dual-Scoring Engine

Risk Score (0-100): Weighted combination of code complexity, unmet checklist items, and historical deployment failure rates for similar change types.
Green Score (0-100): Penalizes resource-heavy anti-patterns — unbounded loops, unpaginated SELECT * queries, always-on infrastructure additions. Rewards net code deletion.

6. Automated Follow-Up Issues When critical thresholds aren't met, DeployGuard opens hard-linked GitLab tracking issues for each unmet checklist item — not just a comment, but actionable project management tickets.

7. Configurable Deployment Gating A .deployguard.yml file lets teams map risk levels to enforcement policies (block, warn, allow), physically halting dangerous MRs via CI exit code 2.

8. Secret Scrubbing All diff content is sanitized before reaching any AI endpoint — AWS keys, Bearer tokens, passwords, and environment variable secrets are redacted.

How we built it

DeployGuard is architected across three decoupled layers:

Layer 1 — Vertex AI CI Pipeline (Primary Path)

The primary path runs automatically on every MR event via .gitlab-ci.yml. The authentication flow was the hardest part to get right:

GitLab OIDC token (id_tokens block, hardcoded audience)
        ↓
STS REST API — exchange for federated token with cloud-platform scope
        ↓
IAM Credentials REST API — impersonate SA, get cloud-platform scoped token
        ↓
GOOGLE_OAUTH_ACCESS_TOKEN → entrypoint.ts → Vertex AI REST API

We deliberately bypass gcloud entirely. The reason: gcloud auth print-access-token silently ignores --scopes for external account (WIF) credentials, returning a token with the runner's default scopes instead of cloud-platform. By calling the STS and IAM Credentials REST APIs directly via curl, we explicitly request the correct scope at each step.

src/vertex-ai-client.ts makes direct HTTPS calls to the Vertex AI REST endpoint:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/{project}/locations/us-central1/publishers/google/models/gemini-2.5-flash:generateContent

We also bypass the @google-cloud/vertexai SDK for the same reason — the SDK ignores GOOGLE_OAUTH_ACCESS_TOKEN and attempts its own auth flow, which fails in the WIF context.

Vertex AI CI Path vs Duo Agent Tools — Honest Comparison

A natural question is: can the Vertex AI CI path do everything the Duo agent tools do? Mostly yes — with one honest gap.

What the Duo agent tools give you natively:

build_review_merge_request_context — fetches the actual MR diff securely within GitLab's trust boundary
create_merge_request_note — posts back to the MR thread
list_security_findings / list_vulnerabilities — reads GitLab security scan results (Ultimate tier)
create_issue — opens linked follow-up issues
All of this happens inside GitLab's auth context — no tokens needed

What the Vertex AI CI path can do:

Call Gemini for semantic analysis ✓ (same or better model)
Post MR comments via GitLab REST API using GITLAB_API_TOKEN ✓
Create issues via GitLab REST API ✓
Read the full MR diff via /projects/:id/merge_requests/:iid/changes ✓

The one gap: list_security_findings and list_vulnerabilities are GitLab Ultimate tier Duo-specific tools with no direct public REST API equivalent. The Vertex AI path cannot read GitLab security scan results.

For the core review flow — diff reading, Gemini analysis, comment posting, issue creation — the Vertex AI CI path is functionally equivalent to the Duo agent and in some ways more powerful (explicit model selection, deterministic scoring, configurable gating via exit codes). The security findings integration is the only capability that requires the Duo agent path.

This is why the dual-path architecture is the right design: Vertex AI handles the primary analysis with full control, Duo handles the fallback and provides the security findings layer that the REST API cannot replicate.

Layer 2 — GitLab Duo Agent (Automatic Fallback)

The Duo agent (agent.yml / flow.yml) serves as the automatic fallback. A deeply customized system prompt with XML-tagged <execution_instructions> forces deterministic tool execution:

build_review_merge_request_context — securely fetches the raw MR diff
create_merge_request_note — posts the formatted Markdown report
create_issue — opens follow-up tracking issues for unmet checklist items

The fallback is triggered automatically by entrypoint.ts when Vertex AI fails — no human intervention required.

Layer 3 — TypeScript Deterministic Engine (`src/`)

A robust Node.js backend with strict OOP patterns:

DiffAnalyzer — hybrid rule + AI classification
ChecklistGenerator — category-driven checklist generation with AI augmentation
RiskScorer — weighted math engine (complexity + unmet items + history)
GreenScorer — resource pattern detection and penalty scoring
GateEnforcer — policy evaluation, exit code 2 on block
CommentFormatter — Markdown report generation
HistoryStore — deployment outcome tracking
VertexAIClient — direct HTTPS to Gemini with retry, timeout, and secret scrubbing

Challenges we ran into

Getting the GitLab CI pipeline working end-to-end in the hackathon sandbox was a gauntlet of compounding failures. Every layer had a hidden problem. Here's the full journey in order — nine separate issues, each one only visible after fixing the previous:

1. TypeScript execution — `ERR_UNKNOWN_FILE_EXTENSION`

The very first CI run failed before any application code ran. Node couldn't execute .ts files directly. The original script used ts-node --esm which conflicts with "type": "commonjs" in package.json. Fix: switched to npx tsx src/entrypoint.ts. tsx was already in the lock file — no package changes needed.

2. MR comments silently failing

The pipeline ran successfully but no comment appeared on the MR. gitlabRequest() wasn't checking HTTP status codes, so 401/403 errors were swallowed silently. Added status code checking. The actual root cause was a missing GITLAB_API_TOKEN — the hackathon sandbox's free tier has no project access tokens, so a personal access token with api scope was needed as a CI variable.

3. Diff fetching returning 0 files

The pipeline reported 0 files, +0/-0 lines on every run. Two compounding issues:

The /diffs endpoint returns metadata only — no actual diff content, especially for new files
Switching to /changes fixed the content, but its response shape is { changes: [...] }, not a plain array

Fix: Array.isArray(diffsData?.changes) ? diffsData.changes : Array.isArray(diffsData) ? diffsData : []

4. `id_tokens.aud` does not support variable interpolation

GitLab does not expand CI variables inside the id_tokens block. Using $WIF_PROVIDER_NAME as the audience produced invalid_request: Invalid value for "audience" from the STS API. The audience must be a hardcoded literal string. This is underdocumented — the error message pointed at the audience format, not the variable expansion failure.

5. Vertex AI SDK ignores `GOOGLE_OAUTH_ACCESS_TOKEN`

With WIF set up, the next error was GoogleAuthError from the @google-cloud/vertexai SDK — even though a valid token was present in GOOGLE_OAUTH_ACCESS_TOKEN. The SDK runs its own auth discovery and ignores externally-provided tokens entirely. Fix: replaced the SDK with direct HTTPS calls to us-central1-aiplatform.googleapis.com, giving full control over the Authorization: Bearer header.

6. `gcloud` silently ignores `--scopes` for WIF credentials

With the SDK replaced, the next error was 403 ACCESS_TOKEN_SCOPE_INSUFFICIENT. The token obtained via gcloud auth print-access-token --scopes=https://www.googleapis.com/auth/cloud-platform was missing the cloud-platform scope. The CLI printed a warning — --scopes flag may not work as expected and will be ignored for account type gce — and silently returned a token with the runner's default scopes.

A || fallback attempt (--impersonate-service-account ... || --scopes=...) also failed: the impersonation command succeeded (exit 0) so the fallback never ran — and the impersonated token also lacked the correct scope.

Fix: bypass gcloud entirely. Two direct curl calls:

POST https://sts.googleapis.com/v1/token — exchange the GitLab OIDC JWT for a GCP federated token, explicitly requesting cloud-platform scope
POST https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/{SA}:generateAccessToken — exchange the federated token for a service account token scoped to cloud-platform

This is what gcloud was supposed to do internally, but now we control the scope at every step.

7. JSON parsing breaks on pretty-printed API responses

The GCP REST APIs return pretty-printed JSON with spaces after colons ("access_token": "..."). The grep -o '"access_token":"[^"]*"' pattern matched nothing. python3 wasn't available in node:20-slim. Fix: pipe through node -e to parse JSON properly using the Node.js runtime already present in the image.

8. Model not found — `gemini-2.0-flash` returns 404

After auth was finally working, Vertex AI returned 404 NOT_FOUND for gemini-2.0-flash. The model wasn't enabled in the GCP project. Checked Vertex AI Studio's model dropdown to find what was actually available — gemini-2.5-flash was enabled and working.

9. LLM Tool "Blindness" in the Duo Agent (fallback path)

The GitLab Duo agent repeatedly hallucinated generic risk scores without reading any actual code. The fix was aggressive <execution_instructions> XML blocks in flow.yml that chain build_review_merge_request_context as a mandatory first step before generating any text.

The Sandbox Constraint That Forced Two Repos

The hackathon enforces a group-level pipeline execution policy that overrides participant .gitlab-ci.yml files entirely. The key log line:

User-defined CI/CD variables are ignored in this job (except for CATALOG_SYNC_TOKEN) according to the pipeline execution policy.

The policy runs its own enforcement jobs and suppresses all custom CI jobs. There is no workaround without group-level settings access. This forced us to develop and demonstrate the Vertex AI CI pipeline outside the hackathon sandbox repo, while the Duo agent ran inside it. The Vertex AI path is fully working — the constraint is the competition infrastructure, not the implementation.

Each of these issues was only discoverable after fixing the previous one. The total debugging cycle spanned multiple CI pipeline runs per issue — push, wait for runner, read truncated logs, hypothesize, repeat.

Accomplishments that we're proud of

Fully working Vertex AI primary path — After working through gcloud scope issues, SDK auth interference, wrong API endpoints, and model availability, the pipeline now successfully authenticates to GCP via WIF, calls Gemini 2.5 Flash, and posts AI-assisted DeployGuard reports to MRs automatically on every push.

Resilient dual-path architecture — Vertex AI is the primary, GitLab Duo (Anthropic) is the automatic fallback. If Vertex AI fails for any reason, the MR author still gets a review. This is production-grade resilience.

Zero-secret public repo — The entire codebase including GCP identifiers is public, yet the authentication is secure. WIF eliminates the need for any secrets in CI variables or the repository.

The Green Score — CI/CD pipelines almost never consider the environmental cost of merged code. DeployGuard mathematically penalizes resource-heavy patterns in real-time, giving developers sustainability feedback before code reaches production.

What we learned

The hardest part of AI-powered CI is not the AI — it's the auth. Getting a correctly-scoped GCP access token from a GitLab CI job required understanding the full WIF token exchange chain, discovering that gcloud silently ignores scope flags for external account credentials, and ultimately bypassing the gcloud CLI entirely. The Vertex AI call itself was three lines once the token was correct.

gcloud is an abstraction that can hide failures. --scopes being silently ignored with just a warning — not an error — is the kind of bug that's nearly impossible to diagnose without knowing to look for it. Direct REST API calls are more verbose but give you explicit control and explicit errors.

SDKs can fight you. The @google-cloud/vertexai SDK's auth discovery is incompatible with externally-obtained tokens. When a library's auth model doesn't match your environment, replacing it with direct HTTP is the right call.

id_tokens.aud is a literal, not a template. GitLab's OIDC token generation does not interpolate variables in the audience field. This is underdocumented and cost significant debugging time — the error message pointed at the audience format, not the variable expansion failure.

CI debugging is slow by nature. Each hypothesis required a push, a runner queue wait, and reading truncated logs. Nine separate issues meant nine separate debugging cycles. Adding explicit diagnostic output (printing raw API responses, token lengths, response shapes) at each step was essential — without it, failures were invisible.

Agent prompts are not chat prompts. Duo agents need explicit, ordered execution constraints — not conversational instructions. The difference between a hallucinating agent and a working one was adding CRITICAL: You MUST call build_review_merge_request_context FIRST to the system prompt.

Every obstacle made the architecture stronger. The gcloud scope issue forced us to understand the WIF token exchange at a deeper level. The SDK auth interference forced us to write a cleaner, more portable HTTP client. The Duo agent blindness forced us to write better system prompts. The final system is more robust than anything we would have designed upfront.

What's next for DeployGuard

Live APM Integration: Connect to Datadog or Prometheus to lock the deployment gate if production CPU/Memory is currently degraded — not just based on code analysis, but on live system state
Google Cloud Run deployment: Host the TypeScript pipeline as a Cloud Run service triggered by GitLab webhooks, completely decoupled from CI
Historical learning: Persist deployment outcomes to Firestore (already integrated via @google-cloud/firestore) to improve risk scoring accuracy over time
Multi-model routing: Route complex diffs to Gemini 2.5 Pro, simple ones to Gemini 2.5 Flash Lite, based on diff size and change category

Built With

gemini-2.5-flash
gitlab-ci
gitlab-duo
gitlab-duo-agent
google-cloud
node.js
oidc
sts-api
typescript
vertex-ai
workload-identity-federation