Modern incident response teams struggle with two extremes: manual triage that is slow and error-prone, or automated systems that act too aggressively without sufficient context or auditability. This creates risk, especially in high-severity incidents where decisions must be fast, explainable, and safe. DecisionTrace AI was built to solve this exact problem.

DecisionTrace AI is a governed, multi-step AI agent system built using Elasticsearch Agent Builder that transforms incident response into an evidence-driven, auditable workflow. Instead of a single prompt or black-box automation, the system uses multiple specialized agents that reason, verify, and act — each with a clearly defined role.

The core problem it solves is unsafe automation. Many AI systems either over-automate without enough data or require heavy human involvement at every step. DecisionTrace AI introduces a middle path: automation that only happens when evidence exists, and human control when it doesn’t.

The system consists of three agents:

  • A PlannerAgent that uses ES|QL and Elasticsearch indexes to analyze incidents, assess available telemetry, and propose an evidence-based action plan.
  • A ReviewerAgent that enforces safety and governance by validating whether each automated step has sufficient supporting data.
  • An ExecutorAgent that performs only approved, low-risk actions and records exactly what was done and why.

A key feature is the DecisionTrace, a structured audit record stored in Elasticsearch. Each trace captures the planner’s reasoning, reviewer’s verdict, executor’s actions, ES|QL evidence used, and an Automation Readiness Score that quantifies how safe automation was for that incident. This makes every decision explainable and reviewable after the fact.

Technically, the project makes heavy use of ES|QL, Elasticsearch indexes, and Agent Builder tools to retrieve logs, extract error patterns, and build time-aware reasoning. Rather than relying on prompts alone, agents actively query real data and adapt their decisions based on what exists and what is missing.

Two aspects I especially liked while building this project were:

  1. ES|QL as a reasoning backbone, which allowed agents to ground decisions in real telemetry instead of assumptions.
  2. Agent separation, which made safety rules enforceable and easy to explain in a demo.

The biggest challenge was designing a system where AI does not overstep — ensuring automation stops gracefully when evidence is incomplete. Solving this led to a more realistic, production-ready agent architecture.

DecisionTrace AI demonstrates how Elasticsearch Agent Builder can be used not just to answer questions, but to run safe, real-world workflows that teams can trust.

Built With

Share this project:

Updates