Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for GitLab Sentinel: Predictive DevOps Intelligence

Inspiration

Every DevOps team has experienced the frustration of a broken pipeline after merge, a security vulnerability discovered in production, or a risky deployment that should have been caught in review. We noticed that every past GitLab AI Hackathon winner built reactive tools — they fix problems after they happen. We asked: what if AI could predict failures before they occur?

That's GitLab Sentinel — shifting DevOps from reactive firefighting to proactive prevention.

What it does

When a developer submits a Merge Request, Sentinel automatically analyzes the code changes and predicts three types of risk:

  1. Pipeline Risk — Will CI/CD break? Detects dependency version conflicts, broken CI configs, missing test coverage
  2. Security Risk — Are there vulnerabilities? Catches hardcoded secrets (CWE-798), SQL injection (CWE-89), RCE risks (CWE-94)
  3. Delivery Risk — Is this MR too risky to merge? Flags large blast radius changes, missing tests, breaking API changes

Sentinel posts a structured analysis report directly as an MR comment with risk scores (0-10), specific findings with CWE references, and actionable prevention recommendations.

How we built it

Sentinel is built entirely on the GitLab Duo Agent Platform using Custom Agents and Custom Flows:

  • 2-Agent Lightning Architecture: A unified Sentinel Analyzer (combines triage + pipeline prediction + security scanning in one pass) feeds into a Sentinel Reporter that posts the structured MR comment
  • Strict Tool Budget: Maximum 12 tool calls per analysis (typical: 5-8), preventing agent over-exploration and ensuring fast response times
  • MR-Diff-First Analysis: The analyzer reads only the MR diff files, never scanning the entire repository — this is both faster and more accurate
  • Industry-Standard Evaluation: 4 benchmark scenarios with metrics including pass@k, tool trajectory, detection completeness, and false positive rate
  • 74 Offline Tests: pytest suite validating YAML constraints, output schemas, flow integrity, and prompt quality in 0.13s

Tech stack: GitLab Duo Agent Platform, Anthropic Claude (via GitLab sandbox), Google Cloud Platform (BigQuery), Python (pytest)

Challenges we ran into

  • WebSocket Timeout: Our initial 4-agent serial chain took ~10 minutes and caused WebSocket disconnects (code 1006). We redesigned to a 2-agent architecture that completes in ~4 minutes
  • Agent Over-Exploration: Early versions made 153+ tool calls, scanning entire repos. We added strict tool budgets and "answer immediately" directives to keep calls under 15
  • Security Scanner Scope: The scanner initially read files from main branch instead of MR diff, causing false results. Fixed by making the analyzer start from list_merge_request_diffs
  • Platform Constraints: Discovered undocumented rules (no DeterministicStep, no model field, string inputs cause WebSocket disconnect) through trial and error

Accomplishments that we're proud of

  • Predictive, not Reactive: First GitLab AI Hackathon entry to predict failures before they happen
  • 2-Agent Lightning Design: Solved the timeout problem by consolidating 4 agents into 2 without losing analysis depth
  • Comprehensive Testing: 74 offline tests + 4 benchmark evaluation scenarios with industry-standard metrics
  • Green Agent Design: Strict token and tool call budgets for efficiency

What we learned

  • The GitLab Duo Agent Platform is powerful but has many undocumented constraints that require careful testing
  • Fewer, smarter agents beat more, specialized agents when platform timeouts are a factor
  • Prompt engineering with strict tool budgets and procedural instructions dramatically improves agent reliability
  • MR-diff-first analysis is both faster and more accurate than full-repo scanning

What's next for GitLab Sentinel

  • Auto-Fix Suggestions: Generate MR suggestions that fix detected issues automatically
  • Historical Learning: Use BigQuery to learn from past pipeline failures and improve predictions over time
  • CI/CD Integration: Trigger Sentinel automatically on every MR via GitLab webhooks
  • Custom Rule Engine: Let teams define project-specific risk rules and thresholds

Built With

  • gitlab-duo-agent-platform
Share this project:

Updates