### 🌟 Inspiration

Modern engineering teams spend 8-12 hours per sprint on manual issue triage: classifying types, assessing risks, assigning priorities, tracking sprint capacity, and producing stakeholder reports.

GitLab provides powerful Foundational Agents - Code Suggestions, Chat, Explain Code but they are interactive assistants, not autonomous systems. They help humans think through problems, but they don't operate independently.

This project started with a focused question:

What if GitLab issue triage worked like a real engineering team: multiple specialists, each with a defined role, collaborating autonomously, and delivering transparent, auditable decisions?

The GitLab Multi-Agent Triage Observatory demonstrates exactly that: a production-grade multi-agent architecture where each agent has a single responsibility, generates observable reasoning, and takes real action inside GitLab through the API.

Not a chatbot. Not a wrapper. A real autonomous triage pipeline.

🤖 What It Does

The GitLab AI Triage System is a fully autonomous multi-agent pipeline that analyzes, classifies, prioritizes, and summarizes GitLab issues at scale - without human intervention.

Core Capabilities:

  • Automatic issue classification - type (security, performance, feature, bug), complexity (low/medium/high), technical uncertainty
  • Proactive risk assessment - security impact, business risk, scalability concerns, regulatory implications (OWASP, GDPR, CVSS scoring)
  • Sprint capacity planning - velocity tracking, team availability, WIP limits, blocking dependencies
  • Executive digests - stakeholder-ready summaries with automated Slack/email distribution
  • Batch processing - triage 50+ issues in under 2 minutes (vs 8-12 hours manual)
  • Transparent reasoning - every decision includes structured audit trail with agent-by-agent justification
  • GitLab API integration - automatic label application, priority updates, comment posting
  • Fault-tolerant execution - circuit breakers, health checks, graceful degradation when agents fail Architecture:

Five specialized agents, each running as an independent FastAPI microservice, communicating through the Model Context Protocol (MCP):

  1. Orchestrator Agent (port 8500) - coordinates pipeline, manages GitLab API, applies results
  2. Planner Agent (port 8501) - NLP classification, complexity scoring, decomposition
  3. Progress Agent (port 8502) - sprint velocity, team capacity, scheduling recommendations
  4. Risks Agent (port 8503) - security assessment, business impact, compliance validation
  5. Digest Agent (port 8504) - executive summaries, label generation, stakeholder reports

The result is not "AI in a chat window." It's an autonomous triage team operating inside GitLab.

🏗️ How We Built It

The system is implemented as a distributed microservice architecture using modern DevOps practices:

Technology Stack:

  • Agents: Python 3.12 + FastAPI (5 independent services)
  • AI: Anthropic Claude for reasoning and classification
  • Protocol: Model Context Protocol (MCP) for structured agent communication
  • Containerization: Docker + Docker Compose (multi-container orchestration)
  • API Integration: GitLab REST API v4 with OAuth2 scoping
  • Frontend: Vanilla JavaScript + WebSocket for real-time updates
  • Observability: Prometheus metrics, structured logging, health checks

Each agent exposes:

  • /health endpoint for circuit breaker monitoring
  • /mcp tool interface with structured request/response
  • Reasoning logs with step-by-step justification
  • Unified metrics (latency, error rates, throughput)

Engineering Focus Areas:

  1. Observability-first design - every agent decision is logged and traceable
  2. Failure isolation - one agent failing doesn't break the entire pipeline (graceful degradation)
  3. Deterministic fallbacks - when LLMs hit rate limits, use baseline heuristics
  4. Parallel execution - sub-agents process issues concurrently (5× throughput improvement)
  5. Stateless services - horizontal scaling via Kubernetes or Cloud Run
  6. Input sanitization - prompt injection protection at every agent boundary
  7. Circuit breakers - auto-disable failing agents after 3 consecutive errors

Architecture Philosophy:
This mirrors real engineering teams: clear roles, accountability, predictable behavior, and observable decision-making.

Python vs Go:
We intentionally implemented the entire MCP stack in Python for maximum transparency, debuggability, and UI observability during live demonstrations. The architecture cleanly separates reasoning from orchestration, making it straightforward to migrate the control plane to Go using Anthropic's official go-mcp implementation in future iterations.


⚠️ Challenges We Ran Into

Building a real multi-agent system not a single LLM wrapper came with real engineering challenges:

1. Avoiding Overlap with GitLab Foundational Agents

Challenge: GitLab already has Planner, Security Analyst, and Data Analyst agents.

Solution: We focused on autonomous triage, not interactive chat assistance. Our agents act (update GitLab, apply labels, schedule sprints), while Foundational Agents assist (help humans write issues, explain CVEs). Zero functional overlap - fully complementary.

Key differentiator: Our Planner auto-classifies issues in batch; GitLab's Planner helps write issues interactively.

2. Designing Agents That Act, Not Just Talk

Challenge: Most AI agent demos are conversational - they suggest, but don't execute.

Solution: Direct GitLab API integration. Our Orchestrator has scoped write permissions (labels, comments, priorities only - no code changes, no repository access). Every triage run produces real GitLab updates.

Security approach: Only Orchestrator touches GitLab API. Reasoning agents (Planner, Risks, Progress, Digest) have zero external access - they analyze and decide, but cannot act. This enforces separation of concerns.

3. Ensuring Transparent Reasoning

Challenge: LLM decisions are often opaque black boxes.

Solution: Every agent returns a structured reasoning trail:

{
  "step_number": 3,
  "agent": "Planner",
  "description": "Classified as security issue",
  "input": {"title": "JWT tokens not expiring"},
  "output": {"type": "security", "complexity": "high", "cvss": 8.1},
  "timestamp": "2026-03-13T10:23:45Z"
}

🏆 Accomplishments We're Proud Of

True multi-agent architecture - not a single LLM wrapper with prompt engineering

Full autonomous pipeline - GitLab API updates without human approval (in production mode)

Transparent reasoning - every decision includes structured audit trail visible in UI

Proactive risk assessment - identifies security issues before development begins (something GitLab's Security Analyst doesn't cover - it explains CVEs post-implementation)

Sprint health monitoring - velocity tracking and capacity planning (beyond GitLab's Data Analyst capabilities)

Complementary design - enhances GitLab Duo, doesn't duplicate Foundational Agents

Production-ready deployment - fully containerized, health-checked, observable

Clean separation of concerns - each agent behaves like a real teammate with a single job

Demo Mode with visual indicator - safe hackathon demonstration without repository risk

Performance: 90% time reduction (20 issues: 12 minutes → 38 seconds)

Security: Input sanitization at every boundary, prompt injection protection, scoped API access

📚 What We Learned

1. Multi-agent systems shine when designed like engineering teams, not chatbots

Traditional LLM wrappers use a single prompt blob. We learned that separating responsibilities into focused agents produces more reliable, maintainable systems. Each agent has a clear job just like a real team.

2. Observability is essential for AI systems

AI needs health checks, metrics, and circuit breakers too. We learned to treat agents like microservices: expose /health, log reasoning, track error rates, auto-disable on failure. This makes production deployment viable.

3. Structured protocols like MCP make agent ecosystems scalable

Without MCP, we'd have ad-hoc JSON schemas and brittle communication. MCP gave us standardized tool invocation, error handling, and logging. This made adding new agents trivial — just implement the MCP interface.

4. Hybrid intelligence (LLM + deterministic rules) is the key to reliability

Pure LLM systems fail under rate limits or malformed output. Pure rules are brittle. We learned that LLM reasoning with deterministic fallbacks achieves 99.9% uptime while maintaining intelligent decision-making.

5. Autonomous systems must be explainable to earn trust

No one trusts a black-box AI that applies labels automatically. We learned to make every decision auditable with step-by-step reasoning trails. This transparency is non-negotiable for production adoption.

🚀 What's Next for GitLab AI Triage System

Long-term (6-12 months):

  • Cloud-native scale - deploy on Google Cloud Run or AWS Fargate (auto-scaling)
  • Multi-repository support - triage across 100+ repos from single control plane
  • GitLab Duo API integration - use Code Suggestions for automated fix proposals
  • Go migration - rewrite orchestrator in Go using go-mcp for 10x performance

Mission:

Automate the painful parts of issue triage using agents built like real systems — not chatbots.

Built With

Share this project:

Updates