Inspiration
The EU's Digital Operational Resilience Act (DORA) went into force for financial entities in January 2025. Article 9 requires that every change to an ICT system protects data — encryption at rest, structured security logging, authorization before data access, audit trails, and safe error handling. In practice this is checked by a senior engineer reading each merge request by hand — ~23 minutes per MR, inconsistently, and usually after the risky code is already written. We wanted to move that check left: onto every merge request, automatically, in seconds, without taking humans out of the loop on anything consequential.
What it does
CodeGuard is an autonomous, human-gated compliance reviewer for GitLab merge requests. For each MR it:
- Reads the MR and its changed-file diffs (GitLab MCP / REST).
- Pre-scans the added code with a deterministic DORA/NIS2/GDPR rule engine.
- Reasons with Gemini 3 to confirm real violations, cut false positives, and draft a before/after fix with the exact article cited.
- Decides pass / warn / block.
- Verifies every fix by re-scanning it — it never shows a fix it can't prove clears the rule.
Then, only after a human approves, it acts through GitLab: posts the review comment, applies a
compliance-blocked label, blocks the MR, opens a training issue for repeat offenders — and, as the
headline feature, pushes a verified fix branch and opens a remediation MR. It also runs as a
blocking CI gate (a non-compliant MR turns the pipeline red) and an MR webhook. Every scan emits
a SHA-256 tamper-evident evidence record, and confirmed violations become tracked obligations in
a compliance-debt ledger exportable to CSV for auditors. A one-click 🎬 Judge Mode runs the
entire story unattended.
How we built it
- Gemini 3 (
gemini-3-flash-preview) for the reasoning step, via the@google/genaiSDK. The code auto-detects either the AI Studio (Gemini Developer API) or Vertex AI backend. - Google Cloud Agent Builder holds the judged agent definition (
agent-builder/agent.json) — Gemini 3 + the GitLab MCP server as the tool source, withhumanInTheLoopapproval on every write. - A Node + Express service mirrors the agent for the hosted UI, CI gate, webhook, and auto-remediation; a single-file dark "DevSecOps" dashboard drives it.
- Cloud Run hosts it (built with Cloud Build, image in Artifact Registry); Secret Manager holds the Gemini key, GitLab token, and CI shared secret.
- A deterministic golden-set eval (
npm run eval) scores detection + fix-verification with no credentials.
Challenges we ran into
- Finding the right Gemini 3 model id. Our project is an AI Studio (gen-lang-client) project, so
Vertex returned 404 for every generative model. On the Developer API,
gemini-3-pro-previewis listed but responds "no longer available." We discoveredgemini-3-flash-previewserves and returns clean JSON — and made the backend auto-detect AI Studio vs Vertex so the same code runs on both. - Keeping the demo bulletproof. Live Gemini + GitLab calls take ~8s and need credentials, which
is risky on stage. We added a
demo/...sandbox that the server always serves as canned data, so Judge Mode never fails, while any real project path hits live APIs. - Making a real red CI pipeline fail for the right reason. Our first
.gitlab-ci.ymlusednode:20-alpine, which has nocurl(exit 127). We switched toalpine+apk add curlso the job fails specifically on the DORA block, with CodeGuard's verdict in the build log. - Auto-remediation matching. Gemini's fix snippets don't always match source whitespace, so we apply fixes with tolerant exact-then-trimmed-line matching before committing the fix branch.
Accomplishments that we're proud of
- It's genuinely live and tested — a real GitLab MR returns BLOCK with 3 confirmed DORA/GDPR/NIS2 findings in ~8 seconds, and the agent opens a real auto-fix MR.
- An agent that closes the loop: detect → reason → prove → fix, not just lint.
- Human-in-the-loop is real, not cosmetic — writes are 403'd server-side until approval.
- Audit-ready by design: a verifiable evidence hash per scan and a CSV-exportable obligation ledger.
- 100% / 100% on our reproducible detection + fix-verification eval.
What we learned
- Pairing a cheap deterministic pre-scan with an LLM confirmation step gives high recall and low false positives — the regex finds candidates, Gemini 3 makes the judgment call.
- Treating a code review as governance (evidence + obligations) is what turns a neat demo into something a regulated bank could actually adopt.
- The most convincing hackathon proof isn't a slide — it's a clickable URL a judge can run themselves.
What's next for CodeGuard
- Persist the ledger (Firestore) with trend dashboards over time.
- Per-language rule packs and an org-specific policy editor.
- Multi-file auto-remediation with generated tests.
- Additional rule packs on the same engine: NIS2, GDPR Art.32, PCI-DSS, SOC 2.
Built With
- ai-studio
- artifact-registry
- cloud-build
- cloud-run
- css
- docker
- express.js
- gemini
- gemini-3
- gitlab
- gitlab-mcp
- google-cloud
- google-cloud-agent-builder
- html
- javascript
- model-context-protocol
- node.js
- rest-api
- secret-manager
- vertex-ai

Log in or sign up for Devpost to join the conversation.