Inspiration

AI repositories introduce risks that traditional DevSecOps tools were never designed to catch — unsafe model loading, PII-laden datasets, prompt injection vulnerabilities, missing model documentation, and silent performance regressions. Organizations ship AI features every day without any automated check on whether those features are safe, compliant, or reproducible.

We wanted to fix that without asking developers to leave GitLab. AIRE embeds governance directly into the workflow, issues, merge requests, and pipelines — so that every AI change is reviewed, scored, and acted upon automatically.

What it does

AIRE deploys four specialized agents that cover every layer of AI risk:

Agent Trigger What it catches
DataTrust MR opened / updated Dataset origin, PII, GDPR/HIPAA/CCPA violations, commercial usage
AI Governor MR opened / updated Model documentation, license compliance, dependency risks
Security Scanner MR opened / updated Unsafe model loading, hardcoded keys, prompt injection, MLOps pipeline risks
Prompt Analyser MR with prompt changes Injection risk, toxicity, hallucination probability, token waste

Two GitLab Flows orchestrate the agents, aggregate scores, make a risk decision, and take automated action — commenting, labeling, blocking, creating issues, and committing compliance reports back to the source branch.

How we built it

We built on the GitLab Duo Agent Platform with Google Cloud Platform as the infrastructure backbone.

The AI Governor and DataTrust agents run as external agents powered by Vertex AI (Gemini) via LangChain orchestration, communicating with GitLab through MCP requests and responses. Vertex AI Vector Search handles semantic retrieval for datasource license lineage, dependency licenses, and GDPR/HIPAA/CCPA compliance knowledge — giving the agents precise, context-aware answers rather than generic retrieval. Redis (Memorystore) caches context across agent calls to keep response times within budget.

The Security Scanner and Prompt Analyser agents run natively as GitLab Repo Agents — no external orchestration needed. They list file changes, compute their respective scores, and generate reports directly within the GitLab pipeline.

The two GitLab Flows — AIRE Risk Analyser and AIRE Report Generator — tie everything together, aggregating agent outputs, running the Risk Decision Agent, and taking automated action via the Report Reader Agent.

The entire MCP server is hosted on GCP Compute, making the Vertex AI-backed agents accessible to GitLab as a persistent, low-latency backend.

Architecture

Agents

1. DataTrust Agent

Platform: External agent via MCP + Vertex AI (Gemini), hosted on GCP Compute
Communication: MCP request/response with GitLab
Trigger: MR opened or updated

What it does:

  • Validates datasource credibility and flags unknown origins
  • Evaluates data quality and metadata completeness
  • Runs GDPR, HIPAA, and CCPA compliance verification via RAG
  • Checks commercial usage rights via dedicated RAG retriever
  • Merges context across all retrievers for a final credibility score

RAG Sources:

  • Datasource origin / lineage index (Vertex AI Vector Search)
  • Compliance knowledge base — GDPR, HIPAA, CCPA (Vertex AI Vector Search)
  • Commercial usage dataset (Vertex AI Vector Search)

AIRE Architecture diagram


2. AI Governor Agent

Platform: External agent via MCP + Vertex AI (Gemini), hosted on GCP Compute
Communication: MCP request/response with GitLab
Trigger: MR opened or updated

What it does:

  • Parses model metadata from changed files
  • Evaluates documentation completeness
  • Analyses license details — general, dependency, and intended use
  • Verifies compliance with dependency licenses via RAG
  • Checks commercial usage restrictions
  • Emits observability traces to Cloud Logging

RAG Sources:

  • Dependency license index (Vertex AI Vector Search)
  • Commercial usage dataset (Vertex AI Vector Search)

AIRE Architecture diagram


3. Security Scanner Agent

Platform: GitLab-native Repo Agent
Trigger: MR opened or updated

What it does:

  • Lists all file changes in the MR diff
  • Runs AI/ML-specific security checks across five domains:
    • Model & Data Security — unsafe deserialization, unverified model loading
    • AI Infrastructure & Data Security — hardcoded credentials, insecure data pipelines
    • Agent & Tool-Use Security — prompt injection patterns, tool call abuse
    • MLOps & Pipeline Security — insecure CI steps, unverified artifact sources
  • Calculates a Risk Score (0–100)
  • Generates a structured security report

AIRE Architecture diagram


4. Prompt Analyser Agent

Platform: GitLab-native Repo Agent
Trigger: MR opened or updated (only when prompt files are changed)

What it does:

  • Detects whether prompts were modified in the diff
  • If yes, runs a full Prompt Health Check:
    • Prompt Quality Assessment
    • Prompt Stability Analysis
    • Injection Risk Assessment
    • Toxicity Risk Assessment
    • Hallucination Probability Assessment
    • Token Optimisation Assessment
  • Calculates a Prompt Health Score
  • Generates a structured prompt analysis report

AIRE Architecture diagram


Flows

AIRE Report Generator Flow

Orchestrates all four agents in parallel, collects their output scores, and aggregates them into a unified risk signal across five dimensions:

Dimension Source Agent
Security Security Scanner Agent
Compliance AI Governor Agent
Credibility DataTrust Agent
Data DataTrust Agent
Performance All agents combined

The aggregated score is passed to the Risk Decision Agent.

  1. Example MR : https://gitlab.com/gitlab-ai-hackathon/participants/34852787/-/merge_requests/32
  2. Report file added by the Flow : https://gitlab.com/gitlab-ai-hackathon/participants/34852787/-/blob/feature/base-demo-app/report.md?ref_type=heads

AIRE Architecture diagram


AIRE Risk Analyser Flow

Takes the aggregated risk signal from the four agents and drives automated action:

  1. Risk Decision Agent — evaluates the combined score against thresholds
  2. Report Reader Agent — interprets the decision and takes action:
    • Approve → commits the compliance report to the source branch
    • 🚫 Block → creates a GitLab issue, tags the MR, applies a risk label

AIRE Architecture diagram


Risk Scoring Model

Overall risk R is computed as a weighted sum of agent scores:

$$R = w_1 S + w_2 C + w_3 D + w_4 P$$

Where:

  • S = Security severity score (Security Scanner Agent)
  • C = Compliance gap score (AI Governor Agent)
  • D = Data trust score (DataTrust Agent)
  • P = Prompt health score (Prompt Analyser Agent)
  • w_1 + w_2 + w_3 + w_4 = 1

Risk Categories:

Score Category Action
0–30 🟢 Low Risk Approve, commit report
31–70 🟡 Medium Risk Comment with findings, label MR
71–100 🔴 High Risk Block MR, create issue, require manual review

Quick Demo

1. Install the four agents

Install our four AIRE agents in your gitlab AI repo from Gitlab AI Catalog. You may require Gitlab Duo agent platform for this. Find our catalog mapping here : https://gitlab.com/gitlab-ai-hackathon/participants/34852787/-/blob/main/.ai-catalog-mapping.json?ref_type=heads.

2. Create MR :

After installing the AIRE agents, Please proceed to make your AI changes in your repo and Create an MR request for your AI changes against the Parent working branch. In the MR comment, mention the AIRE report generator flow like this : https://gitlab.com/gitlab-ai-hackathon/participants/34852787/-/merge_requests/32#note_3191744120. Our Flow may take upto 10-15 Minutes to analyse the entire changes, feed it to the four AIRE agents including the external agents through MCP, aggregate all the results and commit it as a report file in your current source branch.
Example Report file : https://gitlab.com/gitlab-ai-hackathon/participants/34852787/-/blob/feature/base-demo-app/report.md.

3. Mention Risk analyser Flow

Since we didn't have the access for Custom CI/CD Flow because of Gitlab Sandbox, We tried to mimic the behaviour in the MR comment itself. For that you have to mention the AIRE Risk analyser flow in the Deployment MR like this : https://gitlab.com/gitlab-ai-hackathon/participants/34852787/-/merge_requests/32#note_3191804421. After analysing the Report file, The decision engine will classify the risk and perform the block the Deployment and create an issue for the high risk and Approve the Deployment for the low risk reports.
Example issue : https://gitlab.com/gitlab-ai-hackathon/participants/34852787/-/work_items/4

Tech Stack

Layer Technology
Agent Platform GitLab Duo Agent Platform
External Agent Orchestration LangChain
LLM Vertex AI — Gemini
Semantic Search / RAG Vertex AI Vector Search
Agent Communication MCP (Model Context Protocol)
Cache GCP Memorystore (Redis)
MCP Server Hosting GCP Compute Engine
Observability GCP Cloud Logging & Trace
CI/CD Integration GitLab CI/CD Pipelines

Challenges we ran into

Developing AIRE on the GitLab Duo Agent Platform during the hackathon presented several practical challenges:

1. MCP server latency across agent boundaries

The DataTrust and AI Governor agents communicate with GitLab via MCP request/response cycles hosted on GCP Compute. Early in development, round-trip latency between GitLab's flow orchestrator and our external MCP server was causing agent timeouts mid-flow. We resolved this by implementing Redis (Memorystore) as a context cache between agent calls, pre-warming the session state so each MCP call arrived with context already loaded rather than fetching it cold. This brought response times within the flow's execution budget.

2. Vertex AI Vector Search index quality

Retrieval quality for the compliance knowledge base (GDPR, HIPAA, CCPA) degraded significantly when policy text was chunked naively. Regulatory language has dense cross-references, a chunk containing "Article 17" without "right to erasure" context would return misleading similarity matches. We iterated on a semantic chunking strategy that preserves clause-level coherence, and structured the index with separate corpora for each regulatory framework rather than a single mixed index. This reduced false-positive compliance flags substantially.

3. Orchestrating parallel agents within a single GitLab Flow

The AIRE Report Generator Flow runs all four agents in parallel and aggregates their scores into a unified risk signal. Getting the flow to correctly wait for all four agent outputs before passing the aggregated result to the Risk Decision Agent required careful output schema design, each agent had to emit a strict JSON structure with a defined score field so the aggregator could consume them deterministically regardless of completion order. Early versions where agents returned free-form text caused the aggregation step to hallucinate scores.

4. LangChain + Vertex AI Gemini tool-calling reliability

The DataTrust and AI Governor agents use LangChain to orchestrate multi-step reasoning across Vertex AI Vector Search retrievers. We encountered cases where Gemini would skip retriever calls entirely and answer from parametric memory, returning plausible-sounding but ungrounded compliance verdicts. We solved this by restructuring the agent prompts with explicit retrieval gates: the agent is instructed to treat any compliance claim as invalid unless it is supported by a retrieved chunk, with the chunk ID cited in the output. This forced grounded reasoning and eliminated hallucinated regulatory verdicts.

Accomplishments that we're proud of

1. Five Vertex AI Vector Search indexes in production

We built and deployed separate indexes for GDPR/HIPAA/CCPA compliance, datasource origin lineage, dependency licenses, and commercial usage rights — all live and serving real retrieval requests. Getting retrieval quality right across three regulatory frameworks was the hardest part of the build.

2. Four agents, two flows, one coherent decision

Running all four agents in parallel and aggregating their scores into a single weighted risk decision — without any agent contaminating another's context — required strict output schema design across the entire pipeline. The system produces a structured, reproducible risk score on every MR.

3. AI Governance in action

AI Governance that blocks, not just warns. AIRE takes five automated actions on a high-risk MR: posts a structured findings comment, applies a risk label, creates a GitLab issue with per-finding remediation steps, tags the MR, and commits a compliance report back to the source branch. The demo showed this working end-to-end on a real AI application with hardcoded credentials, unsafe model loading, and missing GDPR documentation.

What we learned

Building governance into the workflow rather than bolting it on changes how developers respond to it. When the agent lives inside GitLab and speaks the language of MRs, issues, and labels, it stops feeling like overhead and starts feeling like help.

We also learned that multi-agent systems live or die by their boundaries — overlapping responsibilities create noise, not safety. And that Vertex AI Vector Search is genuinely powerful when the corpus is well-structured — retrieval quality is only as good as what you index.

What's next for AIRE — AI That Governs AI

  • Auto-remediation — not just flagging issues, but opening fix MRs automatically
  • Custom policy engine — teams define their own governance rules in YAML, AIRE enforces them
  • Model drift detection — continuous monitoring post-deployment, not just at merge time
  • Broader compliance packs — EU AI Act, SOC 2, ISO 42001 out of the box

Built With

  • certbot
  • docker
  • docker-compose
  • gcp
  • gcp-memorystore
  • gitlab
  • langchain
  • mcp
  • nginx
  • python
  • redis
  • vertex-ai-vector-search
  • vertexai
Share this project:

Updates