GitLab Sentinel: Predictive DevOps Intelligence

Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for GitLab Sentinel: Predictive DevOps Intelligence

Inspiration

Every DevOps team has experienced the frustration of a broken pipeline after merge, a security vulnerability discovered in production, or a risky deployment that should have been caught in review. We noticed that every past GitLab AI Hackathon winner built reactive tools — they fix problems after they happen. We asked: what if AI could predict failures before they occur?

That's GitLab Sentinel — shifting DevOps from reactive firefighting to proactive prevention.

What it does

When a developer submits a Merge Request, Sentinel automatically analyzes the code changes and predicts three types of risk:

Pipeline Risk — Will CI/CD break? Detects dependency version conflicts, broken CI configs, missing test coverage
Security Risk — Are there vulnerabilities? Catches hardcoded secrets (CWE-798), SQL injection (CWE-89), RCE risks (CWE-94)
Delivery Risk — Is this MR too risky to merge? Flags large blast radius changes, missing tests, breaking API changes

Sentinel posts a structured analysis report directly as an MR comment with risk scores (0-10), specific findings with CWE references, and actionable prevention recommendations.

How we built it

Sentinel is built entirely on the GitLab Duo Agent Platform using Custom Agents and Custom Flows:

2-Agent Lightning Architecture: A unified Sentinel Analyzer (combines triage + pipeline prediction + security scanning in one pass) feeds into a Sentinel Reporter that posts the structured MR comment
Strict Tool Budget: Maximum 12 tool calls per analysis (typical: 5-8), preventing agent over-exploration and ensuring fast response times
MR-Diff-First Analysis: The analyzer reads only the MR diff files, never scanning the entire repository — this is both faster and more accurate
Industry-Standard Evaluation: 4 benchmark scenarios with metrics including pass@k, tool trajectory, detection completeness, and false positive rate
74 Offline Tests: pytest suite validating YAML constraints, output schemas, flow integrity, and prompt quality in 0.13s

Tech stack: GitLab Duo Agent Platform, Anthropic Claude (via GitLab sandbox), Google Cloud Platform (BigQuery), Python (pytest)

Challenges we ran into

WebSocket Timeout: Our initial 4-agent serial chain took ~10 minutes and caused WebSocket disconnects (code 1006). We redesigned to a 2-agent architecture that completes in ~4 minutes
Agent Over-Exploration: Early versions made 153+ tool calls, scanning entire repos. We added strict tool budgets and "answer immediately" directives to keep calls under 15
Security Scanner Scope: The scanner initially read files from main branch instead of MR diff, causing false results. Fixed by making the analyzer start from list_merge_request_diffs
Platform Constraints: Discovered undocumented rules (no DeterministicStep, no model field, string inputs cause WebSocket disconnect) through trial and error

Accomplishments that we're proud of

Predictive, not Reactive: First GitLab AI Hackathon entry to predict failures before they happen
2-Agent Lightning Design: Solved the timeout problem by consolidating 4 agents into 2 without losing analysis depth
Comprehensive Testing: 74 offline tests + 4 benchmark evaluation scenarios with industry-standard metrics
Green Agent Design: Strict token and tool call budgets for efficiency

What we learned

The GitLab Duo Agent Platform is powerful but has many undocumented constraints that require careful testing
Fewer, smarter agents beat more, specialized agents when platform timeouts are a factor
Prompt engineering with strict tool budgets and procedural instructions dramatically improves agent reliability
MR-diff-first analysis is both faster and more accurate than full-repo scanning

What's next for GitLab Sentinel

Auto-Fix Suggestions: Generate MR suggestions that fix detected issues automatically
Historical Learning: Use BigQuery to learn from past pipeline failures and improve predictions over time
CI/CD Integration: Trigger Sentinel automatically on every MR via GitLab webhooks
Custom Rule Engine: Let teams define project-specific risk rules and thresholds

Built With

gitlab-duo-agent-platform

Updates

LIUWEI Wei started this project — Mar 25, 2026 12:44 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.