The Problem

Every software developer knows the "Post-Incident Dread." After pulling an all-nighter to fix a production bug, the last thing anyone wants to do is spend three hours manually digging through Slack logs, GitLab MRs, and commit histories to write a postmortem. As a result, documentation is often rushed, bias creeps in, and remediation tasks get lost in the backlog. This is Institutional Amnesia: failing to learn from the same mistake twice because the lessons learned were never turned into actionable GitLab issues.

The Solution:

ProdFireDoctor is an automated agent workflow built directly into GitLab. It doesn't just "summarize text"; it reconstructs the technical narrative of an incident. By connecting Claude Sonnet to the GitLab API, we’ve built a system that autonomously investigates the root cause and generates the documentation required to prevent recurrence.

How It Works

We use a multi-agent orchestration pattern:

  1. The Trigger: A simple @ai-incident-management-flow-gitlab-ai-hackathon of the bot within any GitLab Incident or Issue kicks off the process.
  2. Agent 1 (The Incident Management Agent): This agent performs the heavy lifting. It uses the GitLab API to "crawl" the context; pulling the incident timeline, scraping linked MRs, and analyzing the commit diffs immediately preceding the outage. Claude’s reasoning allows it to distinguish between "noise" and the actual breaking change.
  3. Agent 2 (The Post Mortem Agent): This agent takes the investigator’s technical findings and formats them into a standardized postmortem template. Beyond just a summary, it identifies specific contributing factors and drafts the follow-up GitLab Issues needed for a permanent fix.

Why Claude?

We leaned heavily on Claude Sonnet for two specific reasons:

  1. Complex Synthesis: Claude excels at taking messy, unstructured incident timelines and turning them into professional, cohesive narratives.
  2. Precision: Using Claude’s tool-use capabilities, the agent can accurately query GitLab’s GraphQL API to fetch only the relevant code changes, reducing noise and cost.

GitLab Integration

Our project lives where the code lives. By utilizing GitLab Issues, Incidents, and the API, we’ve created a seamless experience that feels like a native part of the GitLab Duo suite.

What’s Next?

  • Predictive Prevention: Training the agent to flag similar patterns in MRs before they are merged.
  • Slack/Teams Integration: Triggering the pilot directly from the Teams/Slack chat.

Built With

  • anthropic
  • claude
  • gitlab
Share this project:

Updates