Inspiration

Fleet Server issues are a classic “death by a thousand cuts” ops problem: the same enrollment/policy failures recur across hosts, and engineers waste time re-triaging them. A real example is the common “context canceled” enrollment failure reported by users when an agent enrolls successfully and then goes inactive minutes later. We wanted a fast, repeatable way to go from symptom → root-cause checklist → tracked remediation.

What it does

FleetFix is a custom multi-step agent that:

  1. clusters the most frequent Fleet/Agent failure signatures (ES|QL),
  2. retrieves the best matching runbook (index search + exact signature lookup),
  3. and (with confirmation) creates a ticket via an Elastic Workflow that stores a record in Elasticsearch.

How we built it

We built FleetFix in Elastic Agent Builder by combining Elasticsearch data + tool-driven automation. Agent Builder supports custom tools (ES|QL tools, index search tools, and workflow tools), which we used to make the agent reliable and repeatable rather than “prompt-only.” We enabled Workflows in Kibana (workflows:ui:enabled) and defined the ticket workflow in YAML. We also packaged everything into scripts so a fresh install can be reproduced quickly.

Challenges we ran into

  1. ES|QL time filtering and parameter typing required careful handling.
  2. Workflow tools depend on a workflow existing in the current Kibana space, so wiring workflow_id during setup was tricky.
  3. Kibana APIs are strict about request payload schema, so exports/imports had to be “create-safe.”

Accomplishments that we're proud of

A demo-ready agent that does detect → explain → act with guardrails (explicit confirmation before ticket creation). Reproducible “fork-and-run” setup scripts.

What we learned

Reliable agents come from deterministic tools + tight outputs. A simple impact model is: $$$ Time Saved ≈ 𝑁 × (manual triage − FleetFix triage) $$$

where N is the number of recurring incidents.

What's next for FleetFix Agent

Add deeper automation (auto-link similar historical incidents), richer dashboards, and MCP-based ticket writer for fully automated setup across spaces.

Built With

  • elastic-agent-builder
  • elastic-cloud-serverless
  • elasticsearch
  • elasticsearch-bulk-api
  • elasticsearch-indices
  • es|ql
  • index-search-tool
  • kibana
  • kibana-agent-builder-api
  • ndjson
  • powershell
  • workflow
  • yaml
Share this project:

Updates