SentryFlow

Inspiration

Every DevOps team has experienced this: a pipeline fails at 2 AM, a production alert fires, and engineers scramble to diagnose and fix the issue. We asked ourselves — what if GitLab could heal itself?

SentryFlow was inspired by the concept of self-healing systems in Site Reliability Engineering. Instead of engineers manually reading logs, tracing errors, and writing fixes, we built an AI-powered agent flow that does it all automatically — from detection to diagnosis to fix deployment.

What it does

SentryFlow is a four-agent self-healing DevOps flow built on the GitLab Duo Agent Platform:

🔍 Sentinel Agent — Ingests pipeline failure events and GCP Cloud Monitoring alerts, normalizing them into a unified incident schema
🔬 Diagnostician Agent — Performs root cause analysis by reading job logs, analyzing code diffs, checking recent commits, and classifying failures (test failures, dependency issues, config errors, infrastructure timeouts)
🔧 Surgeon Agent — Takes action based on confidence level: creates auto-fix merge requests (high confidence), rollback MRs (production incidents), or triage issues (low confidence)
📋 Reporter Agent — Posts structured summaries, links related artifacts, and maintains an audit trail

The flow handles edge cases like flaky test detection, multi-job failure analysis, and duplicate alert suppression.

How we built it

SentryFlow is built entirely on the GitLab Duo Agent Platform using:

Custom Agents (YAML) — Four agents with specialized system prompts and curated tool sets
Custom Flow (YAML v1 schema) — Linear orchestration: Sentinel → Diagnostician → Surgeon → Reporter
AGENTS.md — Project-level context defining the unified incident schema, failure categories, confidence levels, and coding conventions
SKILL.md files — Reusable patterns for pipeline diagnosis, incident correlation, and GCP integration
Unified Incident Schema — A JSON contract that flows between all four agents, ensuring structured communication

The architecture follows a clear separation of concerns — each agent has a single responsibility and communicates through a well-defined schema.

Challenges we faced

Group CI Policy Override — The hackathon group's pipeline execution policy overrides project-level .gitlab-ci.yml, preventing us from triggering real pipeline failures. We adapted by using the Mention trigger with pre-filled incident context.
No Pipeline Event Trigger — GitLab Duo flows currently only support Mention, Assign, and Assign Reviewer triggers — not pipeline failure events. We designed the Sentinel agent to handle mention-based triggers as a third path alongside pipeline failures and GCP alerts.
Template Variable Mapping — Discovered that flow component input names must exactly match prompt template variable names ({{sentinel}} not {{incident_context}}). Debugging this required analyzing session logs and understanding the flow engine's variable injection mechanism.
Platform Configuration — Encountered rootNamespaceId resolution issues and token scope limitations in the hackathon sandbox environment, requiring careful debugging of the Duo CLI execution logs.

What we learned

The GitLab Duo Agent Platform is a powerful framework for building multi-agent workflows
Designing a clear data contract (unified incident schema) between agents is critical for reliable multi-agent systems
Self-healing systems need confidence-based routing — not every fix should be auto-applied
Edge case handling (flaky tests, cascading failures, duplicate alerts) separates a toy demo from a production-ready system

What's next for SentryFlow

GCP Cloud Function webhook for fully automated pipeline failure → agent trigger pipeline
BigQuery audit trail for tracking all incidents and resolutions
Learning loop — agents improve diagnosis accuracy based on whether their MRs were accepted or rejected
Multi-project support — one SentryFlow instance monitoring an entire group's pipelines

Built With

ai-agents
ci-cd
cloud-monitoring
devops
gcp
gitlab
gitlab-duo
python
yaml

Updates

Subbarao Sanka started this project — Mar 24, 2026 12:05 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.