STIX Guardian - AI Compliance Agent for GitLab

Inspiration

Last year, a Fortune 500 healthcare company deployed an AI coding agent. Within two weeks, it pushed a commit that logged patient diagnoses to stdout in a production service. Nobody caught it for six days. The HIPAA fine was $1.3 million.

The root cause wasn't the AI - it was the gap between "the AI generated code" and "someone verified it followed policy." Traditional approaches fail because logs can be tampered with, manual review doesn't scale to AI speed, and guardrail outputs aren't provable.

We asked: what if every AI agent action had to pass through a policy engine before execution - and every decision was cryptographically signed so you could prove it happened?

That's STIX Guardian.

What it does

STIX Guardian is a compliance enforcement agent built on the GitLab Duo Agent Platform. It sits between AI intent and code execution, ensuring every merge request meets regulatory requirements before it can ship.

When a developer creates a merge request:

STIX Guardian triggers automatically - the GitLab Duo custom flow activates on MR creation and updates
Claude (via Duo) reads the diff - the Duo agent IS Claude Sonnet 4. It analyzes code changes with deep regulatory knowledge, citing specific sections like HIPAA 164.502(a) or PCI-DSS Requirement 3.4
Pattern scanner detects sensitive data - SSNs, credit cards, patient records, API keys, private keys, and 20+ other compliance signals detected in added lines
STIX Enforcer evaluates policy - YAML policy packs (HIPAA, SOX, PCI-DSS) are evaluated locally, deterministically, with zero LLM calls
Decision is cryptographically enforced:
- ALLOW: MR approved, stix:compliant label added, Ed25519 proof token minted
- DENY: MR blocked, inline violation comments with regulatory citations and remediation code
- QUARANTINE: Routed to compliance officer with Claude's risk summary for human review
Evidence is hash-chained - every decision recorded in a SHA-256 hash-chained, Ed25519-signed audit log. Tamper one byte, the chain breaks at the exact event.

The critical insight: Claude reasons about compliance. STIX enforces it with mathematics. The proof token proves it happened.

How we built it

GitLab Duo Integration - We built two Duo configurations: a custom agent (STIX Guardian) with compliance review tools, and a custom flow (Compliance Gate) that orchestrates the full scan-evaluate-decide-record pipeline. Inside GitLab, the Duo agent IS Claude - no separate Anthropic API call needed. Claude reads the diff through read_merge_request_diff, reasons about compliance using our detailed system prompt with regulatory citations, and takes action using GitLab tools (comments, labels, approvals, issue creation).

GitLab Adapter (packages/agent-sdk/src/integrations/gitlab.ts) - A 1,000-line TypeScript adapter that normalizes GitLab MR, pipeline, issue, and note webhook events into STIX authorization requests. Includes a diff scanner with pattern matchers for PII (SSN, credit card, email, phone, DOB, passport, driver's license), PHI (patient data, diagnoses, medical records, prescriptions, lab results), PCI (CVV, cardholder, expiry), and secrets (API keys, passwords, private keys, AWS/GitHub/GitLab/OpenAI credentials). Maps STIX decisions back to GitLab API actions.

Compliance Analyzer SDK (packages/agent-sdk/src/reasoning/compliance-analyzer.ts) - For environments outside GitLab (other CI platforms, standalone SDK usage), this module calls Claude directly via the Anthropic API. Inside GitLab, the Duo agent handles this - the SDK module is the fallback for non-Duo environments.

Policy Engine - Four microservices: Enforcer (policy evaluation + Ed25519 proof tokens), Policy (YAML pack management), Evidence (SHA-256 hash chain + Ed25519 signatures), Approvals (human-in-the-loop workflows). Policy packs cover HIPAA, SOX, PCI-DSS, GDPR, and CCPA.

Dashboard - Next.js with Recharts decision timeline (stacked area chart of ALLOW/DENY/QUARANTINE over time), GitLab MR compliance status page, policy simulator with 5 preset compliance scenarios, evidence chain visualizer, and sustainability metrics showing LLM calls saved.

CI/CD Integration - .gitlab-ci.yml compliance gate job runs scripts/gitlab/guardian-ci.mjs on every MR pipeline, independently of the Duo agent. This gives two enforcement points: Duo for interactive review, CI for automated gate.

Challenges we ran into

Understanding the Duo platform architecture - We initially built a separate Claude API integration, then realized the Duo custom agent IS Claude. Restructuring to use Duo as the primary reasoning layer (with the SDK module as fallback for non-GitLab environments) made the architecture cleaner and eliminated redundant API calls.

Pattern detection precision - Balancing false positives (flagging every phone-number-like string) vs. false negatives (missing obfuscated secrets). Our two-tier approach solves this: fast regex patterns catch obvious signals, Claude (via Duo) provides nuanced analysis for edge cases. The QUARANTINE decision exists specifically for cases where automated detection is uncertain.

Recharts data transformation - The evidence API returns time series as separate arrays per decision type ({allowed: [], denied: [], quarantined: []}) but Recharts needs a single flat array with all values per timestamp. Built a mergeTimeSeries() helper that groups by timestamp across decision types.

Making crypto accessible - Ed25519 signatures and SHA-256 hash chains are powerful but opaque. We invested in the dashboard's evidence chain visualizer and proof token decoder so judges and users can actually see and understand the cryptographic enforcement, not just trust that it's happening.

Accomplishments that we're proud of

Zero-trust compliance enforcement - Every ALLOW decision produces a self-contained Ed25519 proof token. Target systems (databases, APIs, email servers) can independently verify proof tokens with just the public key. No trust in STIX required at runtime.

Tamper-evident audit trail - Run stix-verify chain and it walks the entire SHA-256 hash chain, verifying every link and Ed25519 signature. We built a live demo where you tamper one byte in the evidence log and the verifier catches the exact event where the chain breaks.

98% LLM call reduction - Most compliance decisions are deterministic: "does this diff contain PHI? Is the target branch production?" Pattern matching and YAML policy rules handle these locally. Claude is only invoked through the Duo agent for nuanced analysis. This makes the system both faster and cheaper than "send everything to an LLM."

65 tests for the GitLab integration alone - Comprehensive test coverage for the diff scanner (PII, PHI, PCI, secrets patterns), event normalizers (MR, pipeline, issue, note), decision mapper (ALLOW/DENY/QUARANTINE actions), and comment formatter. All passing.

The One-Sentence pitch - "Duo-Agent reasons. STIX enforces. The proof token proves it." We're proud that the architecture is simple enough to explain in one sentence.

What we learned

The Duo agent IS Claude - Understanding that GitLab Duo custom agents run Claude Sonnet internally changed our architecture. Instead of two parallel Claude calls, we use Duo as the reasoning layer inside GitLab and the SDK's ComplianceAnalyzer as fallback for other platforms.
Cryptographic enforcement is fundamentally different from logging - A log says "this happened." A proof token says "this was authorized by this policy at this time, and here's the Ed25519 signature that proves it." Regulators care about the difference.
80% of compliance is deterministic - Pattern matching catches most PII/PHI/secrets. YAML policy rules handle most branching logic. Claude's value is in the 20% that requires regulatory reasoning - the ambiguous cases, the nuanced data flows, the human-readable explanations.
GitLab's agent platform enables real enforcement - The combination of Duo agents (Claude reasoning + GitLab tools) with CI/CD pipelines (automated gates) creates two independent enforcement points. One can't bypass the other.

What's next for STIX Guardian - AI Compliance Agent for GitLab

Policy pack marketplace - Community-contributed compliance rules for industry-specific regulations (FERPA for education, FedRAMP for government, DORA for financial services)
Multi-platform enforcement - Extend beyond GitLab to GitHub Actions, Azure DevOps, and Bitbucket Pipelines using the same SDK adapter pattern
Real-time evidence streaming - WebSocket updates from the evidence chain to the dashboard for live compliance monitoring
SOC 2 Type II report generation - Automatically generate audit reports from the evidence chain, mapping every decision to control objectives
Policy drift detection - Replay historical decisions against updated policies to catch regressions before they reach production (replay engine already built, needs GitLab integration)

Built With

anthropic-api
claude-sonnet-4
docker
ed25519
express.js
gitlab-api
gitlab-ci/cd
gitlab-duo-agent-platform
next.js
node.js
pnpm
react
recharts
sha-256
typescript
vitest
yaml

Submitted to

GitLab AI Hackathon

Created by

STIX Guardian is an AI compliance enforcement agent built on the GitLab Duo Agent Platform. When a developer creates a merge request, STIX Guardian triggers automatically - Claude (via Duo) analyzes the diff for regulatory violations (HIPAA, SOX, PCI-DSS), a pattern scanner detects PII/PHI/secrets, and the STIX policy engine makes a deterministic ALLOW/DENY/QUARANTINE decision with zero LLM calls for 98% of cases. Every decision is signed with an Ed25519 cryptographic proof token and recorded in a SHA-256 hash-chained audit log. Denied MRs get inline comments with regulatory citations and remediation code. Quarantined MRs route to compliance officers with Claude's risk summary. The result: provable, tamper-evident compliance enforcement for AI-assisted development - not "trust me, the AI followed policy," but mathematical proof.

Mahesh Vaikri

Updates

Mahesh Vaikri started this project — Mar 04, 2026 09:00 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.