Inspiration
I ran a 170-person technical support help desk for a Fortune 500 company. Every week, the same patterns repeated: a deployment would break something, three engineers would spend hours triaging, someone would find the root cause, fix it, and move on. The postmortem would get deprioritized. Two months later, the exact same failure pattern would take down a different service — and a completely different team would burn another four hours rediscovering the same root cause from scratch.
The knowledge was there. It lived in Slack threads, in someone's head, in a Confluence page nobody could find. But it never made it back into the system in a way that prevented the next incident.
At scale — 170 engineers, hundreds of services, thousands of deployments a week — this isn't a process problem. It's an immune system problem. The human body doesn't re-learn every pathogen from scratch. It remembers. It recognizes. It acts faster the second time. I wanted that for repositories.
What it does
Reflex is an adaptive immune system for GitLab repositories. It's 17 AI agents across 4 orchestrated flows built on the GitLab Duo Agent Platform:
Flow 1 — Incident Response (8 agents): When a pipeline fails, Reflex searches its knowledge graph for similar past incidents, triages severity, then runs an adversarial debate protocol where the Root Cause Agent proposes a hypothesis and the Challenger Agent independently attacks it. Only debate-verified diagnoses reach the Fix Agent. After deployment, a blameless postmortem is written automatically and the knowledge graph is updated.
Flow 2 — Sentinel (3 agents): Every new merge request is scanned against the knowledge graph. If incoming code matches patterns that previously caused incidents, developers are warned with full context before the code merges.
Flow 3 — Harden (3 agents): After resolving an incident, Reflex generalizes the vulnerability into abstract patterns and scans the entire codebase for similar weaknesses, creating hardening MRs proactively.
Flow 4 — CrossProject (3 agents): The same pattern broadcast extends across the entire GitLab group. One incident resolved in one project triggers hardening MRs in sibling projects.
The knowledge graph is committed to Git — versioned, diffable, code-reviewable. It's not a black box. It's organizational memory that lives alongside the code it protects.
How we built it
The core insight was that incident response isn't a linear pipeline — it's a feedback loop. Every resolved incident should make the system stronger. That meant building three things:
1. The Knowledge Graph Engine (Python) — A persistent, JSON-backed graph stored in .reflex/knowledge/incidents.json. Each incident node contains the root cause, fix strategy, normalized error signatures, and searchable patterns with regex. Similarity matching uses a weighted combination of signature and name similarity:
$$\text{score} = 0.6 \times \text{sig_similarity} + 0.4 \times \text{name_similarity}$$
Recurrence detection automatically increments counters when new incidents match existing patterns, building a frequency map of organizational failure modes.
2. The Debate Protocol — Inspired by adversarial ML verification. The Root Cause Agent and Challenger Agent work independently on the same evidence. The Challenger issues one of three verdicts: Confirmed (use original diagnosis), Refined (mostly right, adjusted), or Rejected (wrong, alternative proposed). Low-confidence diagnoses skip the Fix Agent entirely and go straight to postmortem for human investigation. This prevents the most dangerous failure mode in AI-assisted incident response: confidently applying the wrong fix.
3. Carbon-Aware Scheduling — Every agent step logs token counts and wall-clock time. Energy consumption is estimated using published inference benchmarks, and carbon footprint is calculated using EPA eGRID 2023 regional grid intensity data adjusted for cloud provider renewable energy usage:
$$CO_2 = \text{tokens} \times \frac{0.002 \text{ kWh}}{1000 \text{ tokens}} \times 0.39 \frac{\text{kg } CO_2}{\text{kWh}} \times 0.6_{\text{renewables}}$$
Every postmortem includes a sustainability comparison against the human baseline of ~3 hours of engineer time with laptop, monitor, and meeting overhead (~250g CO₂).
The flows are built entirely on GitLab Duo Agent Platform's v1 flow registry, with Google Cloud Logging and Monitoring integration for richer diagnostic context than pipeline logs alone.
Challenges we ran into
Getting the debate protocol right. Early versions had the Challenger Agent rubber-stamping everything because it received the Root Cause output as context and anchored on it. The fix was making the Challenger read the raw evidence independently and only receive the hypothesis after forming its own assessment. The prompt engineering to prevent anchoring bias while still enabling productive disagreement took significant iteration.
Knowledge graph cold start. An empty knowledge graph provides zero value. We solved this by seeding it with three realistic incidents that demonstrate the full pattern — including one recurrence (INC-001 and INC-003 share the same "unsafe direct access on nullable" pattern). This shows evaluators the system's memory in action rather than asking them to imagine it.
Flow schema compliance. The GitLab Duo Agent Platform's flow registry schema has strict validation — additionalProperties: false on components, specific tool name enums, wrapper format requirements. Getting 4 flows with 17 total components to pass validation required careful iteration against the schema.
What we learned
The biggest lesson from running that 170-person help desk was that the problem was never the individual incident — it was the forgetting. Smart engineers solved hard problems every day. But the solutions evaporated into Slack threads and tribal knowledge. Reflex is our attempt to make forgetting structurally impossible.
Building it reinforced that AI agents are most powerful not as individual actors, but as systems with memory, disagreement, and feedback loops. The debate protocol alone catches failure modes that no single-agent pipeline ever would. And the knowledge graph turns a reactive tool into a proactive immune system — one that gets stronger with every incident it resolves.
What's next
- Real-time telemetry integration — Stream Cloud Monitoring metrics directly into the triage agent for faster severity classification
- Confidence calibration — Track debate outcomes over time to calibrate when the system should auto-merge vs. escalate to humans
- Multi-org federation — Share anonymized vulnerability patterns across organizations (with consent) to build collective immunity
Log in or sign up for Devpost to join the conversation.