Safety Governor: A Reliability Layer for Autonomous SRE Agents

Inspiration

While testing out and using various AI agents as part of my software development workflows, I realized that one of the most deterministic tasks in the software lifecycle (incident remediation and operational response) had seen the least trustworthy AI adoption. A further deep dive led me to identify silent failures which are difficult to track for vanilla LLMs. LLMs are decent at drafting code or summarizing logs. But when it comes to high-stakes operational decisions, they exhibit silent failure modes. They sound confident. They cite authority. They propose actions. But they often do so without grounding their claims in live evidence.

That realization led me to a deeper research arc around model reliability under stress. The perfect high-stakes use case emerged naturally: SRE incident remediation, where a hallucinated action can escalate a minor blip into a production outage.

The solution is not better prompting but architectural separation between reasoning and enforcement.

What It Does

Safety Governor is a reliability layer that wraps every AI-generated remediation plan in a deterministic:

$$ \textbf{Plan} \rightarrow \textbf{Stress} \rightarrow \textbf{Compress} \rightarrow \textbf{Gate} \rightarrow \textbf{Execute} $$

pipeline before any action reaches production.

The workflow is genuinely agentic. It retrieves its own evidence autonomously via MCP tools attached to an Elastic Agent Builder agent. But the final safety gate is enforced by Python, independent of the LLM.

The LLM cannot talk its way past the gate.


How I Built It

Stack

  • Elastic Agent Builder (Elastic Cloud): the agentic runtime; handles tool planning, tool calling, and synthesis
  • FastMCP 3.0.2: Python MCP server framework using @mcp.tool() decorators with Streamable HTTP transport
  • Elasticsearch: stores runbooks, evidence, policies, live logs, incident records, workflow events
  • Kibana: surfaces the agent, dashboards, and audit trail
  • Slack: approval workflow with human in the loop; thread replies parsed for APPROVE / FORCE_OVERRIDE
  • Jira: autonomous ticket lifecycle; ticket ID propagated through gate decision and Slack message
  • Rich: terminal TUI for demo; 4 worker threads (logs, audit, agent reasoning, Slack)

I started by building the basic workflow entirely in Python and incrementally replaced different components with the Agent Builder agent, MCP server, and Jira and Slack integrations. This approach allowed the workflow to evolve naturally — each tool was introduced because it improved the system’s reliability, not simply to make the system appear agentic.


Deterministic Safety Gate

The gate is intentionally independent of the LLM.

Hard constraints include:

  • Any policy conflict → block_and_escalate
  • $n_{\text{contradictions}} \ge 2$ → block_and_escalate
  • Insufficient evidence coverage → block_and_escalate
  • Critical severity → block_and_escalate (requires FORCE_OVERRIDE)

Formally:

$$ G(p, s) = \begin{cases} \texttt{block_and_escalate}, & \text{if safety constraints violated} \ \texttt{execute}, & \text{otherwise} \end{cases} $$

This separation ensures:

  • Auditability
  • Deterministic behavior
  • Model-independence
  • Tamper resistance

The LLM influences reasoning, not enforcement.


Challenges I Ran Into

Designing Gate Thresholds

If thresholds are too strict, every incident escalates.
If too lenient, unsafe actions slip through.

Balancing safety and autonomy required multiple scenario simulations.

Preserving Gate Independence

It was tempting to let the LLM evaluate safety directly.

However, if the gate also ran inside the same LLM context, the trust boundary would collapse. Deterministic Python enforcement was necessary to preserve audit guarantees.


Accomplishments I'm Proud Of

  • True multi-step agentic orchestration using Elastic Agent Builder and MCP tools
  • A deterministic trust boundary between AI reasoning and production
  • Live contradiction detection via Elasticsearch evidence queries
  • Refusal logic: the system can deny APPROVE for critical incidents
  • Full forensic audit trail: every tool call, confidence shift, gate decision, and override is indexed
  • Closed-loop runbook learning after successful resolution
  • A live demo that shows both:
    • Safe autonomous execution
    • Hard-blocked incidents with enforced override

What I Learned

Governance Is a Systems Problem

Trust does not come from better prompts.

It comes from:

  • Evidence retrieval
  • Contradiction measurement
  • Deterministic enforcement
  • Auditability

Determinism Matters in Production

If identical inputs can produce different enforcement outcomes due to temperature or model drift, the system cannot be trusted.

Separating reasoning from enforcement is essential.

Reliability Scores Should Calibrate, Not Dictate

CDCT, DDFT, and EECT adjust the sensitivity of the pipeline. They do not directly control the gate. This keeps enforcement legible and auditable.


What’s Next for Safety Governor

Closed-Loop Threshold Tuning

Human overrides should dynamically recalibrate gate thresholds over time.

Multi-Model Adversarial Planning

Planner and Verifier agents running on different LLM backends to increase robustness.

Anomaly-First Prevention

Integrate Elastic ML anomaly detection to shift from remediation to prevention.

Production-Grade MCP Hosting

Deploy MCP services as hardened microservices rather than development-mode exposure.

Policy-as-Code Integration

Allow enterprises to define custom enforcement logic declaratively.


Safety Governor does not attempt to make LLMs perfect.

It assumes they are fallible and builds a governed execution layer around them.

It executes only when its reasoning survives stress-testing.

Built With

Share this project:

Updates