Guardian — AI-Governed Incident Response

Incidents dashboard — Guardian-triggered P1 alerts appear the moment the webhook fires.
Incident detail — [GUARDIAN] payment-gateway degradation automatically classified, assigned and escalating.
Guardian pipeline canvas (v15.00 · Published) — all five nodes wired end-to-end in Airia Agent Studio.
Run steps — each node completes in sequence; entire pipeline succeeds in 3.6 seconds.
Run timeline — Gantt view of node execution: Triage (61ms) → Runbook (664ms) → HITL Gate (285ms) → War Room (2.3s) → Compliance (64ms).
Node 02 retrieves the exact runbook from Confluence via the Airia MCP Gateway + Knowledge Graph.
Slack Guardian DM — multiple war-room activation messages arrive for each triggered incident.
War room channel — #inc-*-payment-gateway auto-created, fully briefed: triage result, runbook steps, Jira ticket, HITL approver, and on-cal.
Priority Highest, labelled dora-tracked + guardian-automated, linked to the Guardian audit session — zero manual input.
Pipeline JSON output — full audit trail: HITL decision, war-room URLs, and compliance_status: DORA_SOX_COMPLIANT — all in one structur.

Inspiration

Fifteen years working inside Dutch financial institutions — watching P1 incidents at 2:47 AM unfold the same way every time. The first 12 minutes are pure coordination: who owns this? What severity? Where is the runbook? Create the Slack channel. Open the Jira ticket. Page the on-call team. Transaction failures accumulate while engineers scramble.

That 12-minute overhead is not just inefficient — it is a compliance liability. The EU's Digital Operational Resilience Act (DORA Article 11) requires financial institutions to document every AI-assisted decision in their incident management workflow. Manual processes produce no audit trail. Neither do most agentic AI tools. Guardian was built to eliminate the overhead and produce the compliance record simultaneously.

What it does

Guardian is a 5-node autonomous incident response pipeline that handles the entire coordination phase — triage, runbook retrieval, human approval, war room setup, and regulatory post-mortem — without a single manual step.

Node 01 — Triage Sentinel: Receives the PagerDuty webhook and applies deterministic P1/P2/P3 threshold logic in Python (same input always yields the same classification). Claude 3.5 Sonnet then generates a human-readable explanation of the reasoning — not a black box.

Node 02 — Runbook Agent: Queries Confluence through Airia's MCP Gateway using semantic Knowledge Graph search. Zero credentials in code — all stored in Airia's vault. Returns the top runbook steps in under 2 seconds.

Node 03 — HITL Gate: Uses Airia's MCP Apps (launched Feb 12, 2026) to render an interactive Slack approval card with clickable Approve / Escalate / Reject buttons. DORA Article 11 requires human oversight before automated action. Every approval is timestamped and tied to an identity — this is the compliance audit entry.

Node 04 — War Room Coordinator: Spawns nested SlackSubAgent + JiraTicketSubAgent in parallel. Creates the #inc-*-payment-gateway channel, posts the AI-generated incident brief, and opens a Priority: Highest Jira ticket with all context pre-filled. Completes in under 4 seconds after approval.

Node 05 — Compliance Narrator: Triggered on resolution. Generates a post-mortem with a full AI decision audit trail — every model call, its input data, reasoning, confidence score, and whether a human overrode it. Output: compliance_status: DORA_SOX_COMPLIANT. Your compliance officer can sign it. Your auditor will accept it.

Zero manual coordination. Full regulatory audit trail. Automatically.

How we built it

Built entirely on Airia Agent Studio using 16 distinct platform features across the 5-node pipeline:

Webhook Trigger → receives PagerDuty Events API v2 alerts
Python Code Blocks ×5 → deterministic severity logic, orchestration
AI Model Calls → Claude 3.5 Sonnet for reasoning and post-mortem generation
Structured Output → type-safe JSON schema enforcement between all nodes
Agent Variables → incident context passed through pipeline without re-prompting
Knowledge Graph → semantic runbook search across Confluence RUNBOOKS space
MCP Gateway → zero-credential connections to Confluence, Jira, and Slack
MCP Apps → interactive Slack HITL approval card (Airia's newest feature)
Human-in-the-Loop Node → 15-minute timeout with auto-escalation
Nested Agent Architecture → Slack + Jira sub-agents executing in parallel
Slack Bot Deployment → war room channel creation + HITL message delivery
API Endpoint Deployment → production webhook receiver for PagerDuty
Document Generator → 6-section post-mortem PDF with compliance template
Governance Dashboard → full AI decision audit trail, timestamped per node
Compliance Automation → DORA Article 11 + SOX Section 404 record generation
Airia Community ×3 → Triage Sentinel, War Room Coordinator, Compliance Narrator

All production secrets live in Airia's secrets vault. Zero credentials in code.

Challenges we ran into

MCP Apps discovery: Airia's MCP Apps launched February 12, 2026 — mid-build. Integrating interactive Slack buttons through an undocumented feature required active changelog monitoring and live platform testing, not local emulation.

HITL + nested agent timing: The HITL Gate introduces variable human latency. Coordinating that with deterministic nested agent output (channel + ticket in ≤5s after approval) required careful Agent Variables schema design across nodes 03 → 04.

DORA/SOX accuracy: The Compliance Narrator had to produce content a real compliance officer would accept. This required mapping each pipeline event to specific DORA Article 11 sub-requirements and SOX 404 IT General Controls — back to the primary legislation, not generic checklists.

Knowledge Graph indexing latency: Confluence updates took up to 90 seconds to reflect in semantic search. Added a fallback cached runbook JSON in Airia Knowledge Base for demo resilience.

Accomplishments that we're proud of

The Governance Dashboard moment: Building an AI system that produces a machine-readable, auditor-acceptable explanation for every decision it made — with input data, model version, confidence score, and human override status — is what DORA Article 11 actually requires. Most AI incident tools don't produce this. Guardian does, automatically.

MCP Apps HITL card: Getting interactive Approve / Escalate / Reject buttons working through Airia's newest feature — pausing the pipeline, capturing approver identity, and resuming with the decision in Agent Variables context — creates a demo moment that visibly impresses engineers who have built Slack integrations before.

127 passing tests: Every node has full unit coverage including P1/P2/P3 boundary conditions, critical-service multiplier logic, HITL timeout paths, and error handling. The pipeline is not just a demo — it is production-grade.

3 forkable Community modules that work standalone without any Airia dependency. The adapter layer in Triage Sentinel normalises PagerDuty, OpsGenie, Datadog, CloudWatch, and Prometheus — genuinely reusable, not theoretically reusable.

What we learned

Design compliance-first, not compliance-after: Guardian was designed with DORA Article 11 as a hard requirement from Day 1. The HITL gate is not optional. The governance logging is not an add-on. Every node's output schema carries audit-relevant data forward. If compliance is bolted on at the end, it shows.

Airia's platform depth rewards investment: Using 16 distinct features forced deep engagement with every layer. The Knowledge Graph + MCP Gateway combination for runbook retrieval is substantially more powerful than a direct Confluence API call — semantic search produces better matches for edge-case alert patterns.

The 40% failure statistic is a design constraint, not a tagline: Every engineering decision — deterministic algorithm before AI call, cached fallbacks, zero credentials in code, variable timing handling — is a direct counter to the failure modes that explain the 40%.

What's next for Guardian

Multi-cloud ingestion: AWS CloudWatch, Datadog, and OpsGenie adapters alongside PagerDuty. The Triage Sentinel Community module's normalisation layer is already structured for this.

Predictive triage: Run Node 01 against Datadog anomaly streams before PagerDuty fires — alert engineers 2–3 minutes before threshold breach.

Multi-jurisdiction compliance: HIPAA (healthcare), FISMA (government), and MAS TRM (Singapore finance) forks of the Compliance Narrator. The frameworks/ directory in the Community module is already structured for this.

Real-time Governance Dashboard: Surface the AI decision audit trail live during the incident — not just in the post-mortem PDF — for compliance-team visibility during active response.

Built With

airia-agent-studio
airia-compliance-automation
airia-document-generator
airia-governance-dashboard
airia-hitl-node
airia-knowledge-graph
airia-mcp-gateway
atlassian-confluence
atlassian-jira
claude-3.5-sonnet
dora-article-11
jest
node.js-20-esm
pagerduty-events-api-v2
python
section
slack-mcp-apps
sox

Submitted to

Airia AI Agents Hackathon
- Winner 1st Place Active Agents

Created by

I designed and built Guardian end-to-end as a solo project — architecture,
all five pipeline nodes, integrations, tests, and demo.

The idea came from 15 years working inside Dutch financial institutions
(ING, ABN AMRO) watching P1 incidents play out the same way every time:
12 minutes of manual coordination before anyone starts actually fixing the
problem. I knew exactly what the pain was because I lived it.

I designed the 5-node pipeline from scratch on Airia Agent Studio — using
16 platform features including MCP Gateway, MCP Apps (Airia's newest
feature, launched Feb 2026), HITL Node, Knowledge Graph, Nested Agents,
Governance Dashboard, and Compliance Automation. Every architecture decision
maps to a real DORA Article 11 or SOX 404 requirement, not a generic
compliance checklist.

I also wrote 127 Jest unit and integration tests, published 3 standalone
Community modules (Triage Sentinel, War Room Coordinator, Compliance
Narrator), and recorded the full 4-minute live demo against real production
services — no mocks.

Built to prove that agentic AI projects don't have to be in the 40% that fail.

Manoj Mallick
With over 15 years of experience in software development

Updates

Manoj Mallick started this project — Mar 19, 2026 09:41 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.