Inspiration
Modern code review is a bottleneck. Developers wait hours for feedback, security issues slip through under time pressure, and compliance violations are only caught after the fact — during audits or incidents. We wanted to automate the boring, critical parts: catch the SQL injections, flag the HIPAA violations, and enforce SOC2 controls before a single line merges — every time, automatically.
What it does
MR Review Pipeline is a 3-agent AI system that automatically reviews every GitLab merge request the moment it's opened. It runs three specialized agents in sequence:
Code Reviewer — Reviews the diff for bugs, logic errors, race conditions, poor naming, missing tests, and performance issues. Posts actionable inline comments on the MR. Security Fixer — Scans for OWASP Top 10 vulnerabilities (SQL injection, hardcoded secrets, broken auth, insecure deserialization). Generates ready-to-apply patch code for Critical and High findings with CWE references. Compliance Auditor — Audits every change against SOC2, GDPR, HIPAA, and PCI-DSS. Returns a PASS/FAIL verdict per framework with specific control IDs, affected files, and remediation guidance. Each agent's output is posted directly as notes on the GitLab MR. No human needed to start it — it triggers automatically via CI/CD.
How we built it
GitLab Duo Workflows — The 3-agent orchestration flow is defined in mr_review_flow.yml using GitLab's AI Catalog flow format (AgentComponent, sequential pipeline with output chaining between steps). Agent definitions — Each agent (reviewer.yml, security_fixer.yml, compliance_auditor.yml) has a tailored system prompt and a specific toolset (get_merge_request, list_merge_request_diffs, create_merge_request_note, list_security_findings). AGENTS.md — A compliance rules document the Compliance Auditor reads via read_file to apply project-specific overrides for PHI field names, PAN masking rules, and GDPR data minimization requirements. CI/CD — .gitlab-ci.yml validates YAML on every MR and triggers the flow via the GitLab AI Catalog API (invoke_flow.py). Local simulation — simulate_flow.py reproduces the full pipeline locally using Groq (Llama 3), Gemini, OpenAI, or Anthropic — with automatic fallback between providers.
Challenges we ran into
GitLab Duo Enterprise access — The full on-platform agent pipeline requires Duo Enterprise, which isn't available on free/premium tiers. We built a complete local simulation (simulate_flow.py) as a fallback so the project is fully demonstrable without it. Gemini free tier quota — Gemini's free tier has a limit: 0 daily cap in some regions (India). We added automatic multi-provider fallback: Gemini → Groq → OpenAI → Anthropic, so the demo always works regardless of which key you have. google-generativeai deprecation — Mid-development, Google deprecated the google-generativeai SDK in favour of google-genai. We migrated to the new SDK and updated the model invocation API. Agent output chaining — Ensuring the Compliance Auditor receives meaningful context from both previous agents required careful prompt template design with {{reviewer_output}} and {{security_fixer_output}} variable substitution.
Accomplishments that we're proud of
A single python simulate_flow.py --demo command runs 3 AI agents end-to-end, catching a SQL injection (CWE-89), a hardcoded password (CWE-798), and HIPAA PHI violations — all from a realistic code diff. The Compliance Auditor maps every finding to a specific control ID (e.g., HIPAA §164.312(a), PCI-DSS Req 6, GDPR Article 5) — not just generic warnings. The system is provider-agnostic: it works with Groq, Gemini, OpenAI, or Anthropic with zero code changes. Fully CI/CD integrated — the pipeline runs automatically on every MR with no manual intervention.
What we learned
GitLab's AI Catalog flow format is powerful but requires Duo Enterprise — building a local simulation in parallel is essential for development and demos. Prompt chaining between agents dramatically improves output quality: the Compliance Auditor produces far better results when it sees the Security Fixer's findings first. System prompts with explicit output formats (numbered sections, severity levels, control IDs) produce dramatically more structured and useful agent output than open-ended prompts. Free LLM tiers have unpredictable regional quota limits — multi-provider fallback is a must for hackathon reliability.
What's next for MR Review Pipeline
Auto-push fix commits — The Security Fixer currently generates patch code; the next step is having it automatically commit fixes to the MR branch for Critical findings. Wiki reporting — Post a structured compliance audit report to the GitLab project wiki after every MR, building a persistent audit trail for SOC2 evidence. Custom rule engine — Let teams define their own compliance rules in AGENTS.md beyond the four built-in frameworks (e.g., internal coding standards, data residency policies). Slack/Teams notifications — Alert security or compliance teams when a CRITICAL finding is detected, before the MR can be merged. Historical trend dashboard — Track security debt and compliance posture over time across all MRs in a project.
Log in or sign up for Devpost to join the conversation.