Inspiration

Every day, development teams push code that handles personal data, health records, and financial information. A single missed compliance violation can result in massive fines Meta was fined €1.2B under GDPR, Amazon €746M.

I researched the market and found a critical gap: Compliance platforms (Vanta, Drata) manage organizational policies but never look at code. Security scanners (Snyk, Semgrep) find vulnerabilities but aren't designed for regulatory compliance. PII scanners (Privado.ai) detect data fields but don't analyze regulatory compliance. AI code reviewers (CodeRabbit) post comments but lack real GDPR/HIPAA/SOC 2 knowledge.

No existing tool reads a merge request diff, analyzes it against actual regulatory frameworks, posts violation reports with specific legal article references, and auto-generates fixes — all inside the developer workflow. I built ComplianceGuard AI to fill this gap.

What it does

ComplianceGuard AI is a GitLab Duo Flow powered by a single intelligent agent that scans merge request code changes for regulatory compliance violations across 7 frameworks: GDPR, SOC 2, HIPAA, PCI DSS v4.0, ISO 27001:2022, CCPA/CPRA, and LGPD.

By default, it scans for GDPR, SOC 2, and HIPAA. Teams can enable additional frameworks (PCI DSS, ISO 27001, CCPA, LGPD) via a simple config file no code changes needed.

When triggered by a mention or reviewer assignment, the agent runs five phases:

  1. Config Reads project settings from .gitlab/duo/complianceguard-config.yml (frameworks, auto-fix toggle)
  2. Knowledge base Dynamically loads detailed compliance rules from the repository (3,400+ lines across 7 framework files) using get_repository_file
  3. Green filter Skips non-code files and whitespace-only changes to save tokens
  4. Scan Analyzes code against loaded knowledge base rules, posts violation reports as MR comments with regulation articles, severity levels, and suggested fixes
  5. Auto-fix Reads each violated file, generates corrected code, commits fixes, and opens a fix MR (set auto_fix: false in config for scan-only mode)

The agent also includes false positive prevention it ignores test files, mock data, already-masked values, and environment variable reads. It only reports high and medium confidence findings.

Example violations it catches: hardcoded secrets (SOC 2 CC6.2), PII logged in plaintext (GDPR Article 5(1)(f)), PHI transmitted without encryption (HIPAA §164.312(e)(1)), PAN stored without encryption (PCI DSS Req 3.5.1), missing data masking (ISO 27001 A.8.11), missing right to delete (CCPA §1798.105), and excessive data collection (LGPD Art. 6 III).

Additionally, a Chat Agent is available in GitLab Duo Chat for interactive compliance Q&A developers can ask questions about GDPR, SOC 2, HIPAA and get actionable guidance with specific regulation references.

How I built it

I built ComplianceGuard AI using the GitLab Duo Agent Platform with two artifacts:

  • Duo Flow (flows/complianceguard.yml) A single-agent flow with a 5-phase compliance pipeline that handles config reading, knowledge base loading, green filtering, scanning, and auto-fixing within the same context
  • Chat Agent (agents/complianceguard-chat.yml) An interactive compliance assistant available in GitLab Duo Chat for Q&A

The flow uses 8 GitLab tools: list_merge_request_diffs to read code changes, get_repository_file to load compliance knowledge bases and read full file contents for fixing, get_merge_request for MR context, create_merge_request_note for posting violation reports, create_commit for committing fixes (one file per commit), create_merge_request for opening fix MRs, and gitlab_api_get for additional API access.

The key innovation is dynamic knowledge base loading: instead of relying on the LLM's built-in knowledge, the agent reads 3,400+ lines of detailed compliance rules from 7 knowledge base files in the repository. Each file maps specific code patterns to specific regulation articles (e.g., "plaintext PII in logs → GDPR Article 5(1)(f)", "CVV stored after auth → PCI DSS Req 3.3.1"), ensuring accurate, consistent, and auditable references. This approach means the knowledge base can be updated independently of the flow — teams can even customize rules for their specific industry.

Challenges I ran into

  1. Multi-agent context loss: My initial design used separate Scanner and Resolver agents. The Resolver lost project context and couldn't access the MR. I solved this by merging both into a single agent that scans and fixes in the same execution no context handoff needed.

  2. Project ID resolution: The agent was receiving incorrect project context from GitLab, causing 404 errors when accessing MR data. I solved this by explicitly passing context:project_id as a flow input and referencing it in the user prompt, ensuring the agent always operates on the correct project.

  3. Large MR failures: My initial single-comment approach failed on multi-file MRs because the combined report exceeded tool limits. I redesigned to post one comment per file and one commit per file, which scales to any MR size.

  4. Tool retry loops: When a tool call failed, the agent would retry endlessly (151 tool calls in one session). I added explicit retry limits in the prompt "NEVER retry more than once, skip and continue."

  5. External agent limitations: I initially built a full Python agent runtime with structured code, but discovered that custom external agents are not supported on GitLab.com (requires Self-Managed + feature flag). I pivoted to the Duo Flow approach with dynamic knowledge base loading to achieve similar depth within the flow format.

Accomplishments that I'm proud of

  • The full pipeline works end-to-end: Mention the agent → compliance scan runs → violation reports appear → fix MR is automatically created. No manual steps.
  • 11 violations detected in MR !16 across GDPR, SOC 2, and HIPAA with 8 critical and 2 warning, all with specific article references. Auto-fix MR !17 created automatically with all violations resolved.
  • Dynamic knowledge base: 3,400+ lines of compliance rules loaded at runtime from 7 framework files not hardcoded in prompts, not hallucinated.
  • 7 regulatory frameworks the most comprehensive code-level compliance scanner I've seen.
  • Zero-config by default: Enable the flow, mention the agent, done. Optionally add a config file to enable more frameworks or disable auto-fix.
  • Green Agent design: Skips non-code files, whitespace-only changes, and only loads knowledge bases for active frameworks to minimize token usage.
  • False positive prevention: Ignores test files, mock data, already-masked values, and environment variable reads reducing noise in reports.
  • Filling a real market gap: No existing tool combines MR-level code analysis with regulatory framework knowledge and automated fixes.

What I learned

  • GitLab Duo Agent Platform is powerful but still evolving the flow YAML format doesn't support multi-agent data passing, so single-agent architecture with phased execution is the most reliable approach.
  • Custom external agents are not available on GitLab.com (requires Self-Managed + ai_catalog_create_third_party_flows feature flag). This pushed me to maximize what's possible within the Duo Flow format dynamic knowledge base loading was the breakthrough that brought depth to the flow approach.
  • Building custom agents and flows for my own GitLab projects showed me how much easier and more systematic they make the development process. Seeing how a purpose-built agent can automate an entire review workflow was eye-opening I'll be creating project-specific agents for my future work.
  • The per-file comment pattern is essential for production-grade MR review tools single giant comments break at scale.
  • Embedding domain knowledge in repository files (not just prompts) and loading them dynamically creates an auditable, versioned, and customizable knowledge system.
  • Real-world compliance scanning requires understanding business logic, not just pattern matching which is exactly where LLMs excel.
  • Researching 7 regulatory frameworks taught me how critical compliance is when building products for global markets. Understanding GDPR, HIPAA, PCI DSS and others at the article level changed how I think about my own development practices.

What's next for ComplianceGuard AI

  • Organizational Learning Track patterns across a team's merge requests over time. If a team repeatedly makes the same mistake (e.g., 47 PII logging violations in 3 months), surface targeted warnings before they even write the code. The system gets smarter as the team works.

  • Compliance-as-Code Marketplace Let companies and communities publish industry-specific rule packages. A fintech company shares their PCI DSS + PSD2 ruleset, a healthtech startup shares HIPAA + HITECH rules, an e-commerce team shares GDPR + CCPA patterns. Install a package, get instant coverage.

  • Regulatory Change Tracker Laws evolve. When GDPR gets amended, when the EU AI Act takes effect, or when a new HIPAA update drops, ComplianceGuard automatically re-scans existing codebases and alerts: "3 projects now have 12 new violations under the updated regulation." Compliance that keeps itself current.

Built With

  • claude-ai-(anthropic)
  • gitlab-duo-agent-platform
  • gitlab-duo-flows
  • python
  • yaml
Share this project:

Updates