INSPIRATION

When a new advisory drops — a CVE, a vendor write-up — every security team faces the same question: "Are we covered, and can you prove it?" Today that takes a detection engineer hours to days: translate the prose to ATT&CK techniques, check whether you even collect the right data, check whether a detection already exists, hand-write SPL — and it usually ships untested, firing on noise or silently never firing. We wanted an agent that does that work and proves it before anything goes live.

WHAT IT DOES

Paste an advisory and a governed, human-in-the-loop pipeline runs:

  1. Extracts the MITRE ATT&CK techniques from the advisory.
  2. Interrogates your Splunk through the Splunk MCP Server to classify each technique as COVERED, BLIND (no data), or GAP (data present, no detection).
  3. Writes an SPL detection for each gap, grounded in real sampled events.
  4. Backtests it against replayed attack data and self-corrects — when a draft fires on benign activity, it reads the actual false-positive events and rewrites itself until it's clean.
  5. Deploys as a scheduled saved search via the Splunk Python SDK — only after you click Approve.
  6. Reports before/after coverage with an ATT&CK Navigator layer and a full audit log.

The standout moment: the self-correcting loop — a draft firing on 100+ false positives refines itself to 52 true positives / 0 false positives, live.

HOW WE BUILT IT

A Python agent (a clean library plus a Streamlit UI) on Splunk Enterprise 10.4. Claude is the reasoning engine — Haiku for extraction and the first-draft detection, Sonnet for the evidence-driven repair. The agent is an MCP client: every read (metadata, existing detections, backtests) goes through the official Splunk MCP Server (#7931); the single write — creating the approved saved search — goes through the Splunk Python SDK, with the human approval gate between them. It's grounded in the real Splunk security ecosystem: splunk/attack_data for replayable, ATT&CK-mapped datasets, and the MITRE ATT&CK model for the coverage map. Every action is written to an append-only audit log, and coverage exports as an ATT&CK Navigator layer.

CHALLENGES

Making the self-correction reliable — feeding the backtest's real false-positive evidence back to the model so it converges in one or two passes instead of whack-a-mole. Connecting our MCP client to the official #7931 over streamable-HTTP with encrypted-token auth and a self-signed cert. And shaping replayed attack data (re-timestamping, Sysmon field extraction) so backtests are meaningful.

ACCOMPLISHMENTS

The self-correcting backtest loop that proves a detection before deploy. A clean governance split — read through the MCP Server, write through the SDK, nothing live without human approval. And a working end-to-end flow from raw advisory to a deployed, validated Splunk detection.

WHAT'S NEXT

A response-adapter so the full pipeline runs directly on #7931's native tools; more techniques and data sources beyond LSASS; and pointing the reasoning layer at Splunk Cloud hosted models.

Built With

  • attack-data
  • claude-(anthropic-api)
  • mitre-att&ck
  • model-context-protocol
  • python
  • splunk-enterprise
  • splunk-mcp-server
  • splunk-python-sdk
  • streamlit
Share this project:

Updates