Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Splunk IR Triage Agent

Inspiration

Traditional SOAR systems run static playbooks: alert -> predefined steps -> action. They handle known patterns well but break on alert variants, multi-stage incidents that need entity context, and ambiguous alerts where "investigate vs suppress" is a judgment call. When the playbook can't decide, the analyst gets paged on every alert and queue fatigue sets in.

We built an IR triage agent that thinks like a tier-1 SOC analyst, autonomously pulls context from Splunk via MCP, and emits a structured triage card with explicit uncertainty flags. The differentiator is honesty: when data is sparse the agent says so on the card instead of fabricating findings.

What it does

On alert fire:

  1. Receives the alert payload (search name, SID, SPL, matching event fields, time, owner, app).
  2. Autonomously queries the Splunk MCP Server: surrounding events for any host/user/process named in the alert, historical firings of the same alert signature, knowledge-object lookups (saved searches, alerts).
  3. Emits a strict JSON triage card: classification, severity, entity_context, historical_pattern, recommended_action (escalate / contain / investigate / suppress), reasoning, confidence (0.0–1.0), and an explicit uncertainty_flags array.

The discrete recommended_action values mean downstream automation (PagerDuty, ServiceNow, ticketing) can wire the card in without a human pre-filter.

How we built it

  • Splunk Enterprise 10.4.0 running locally (60-day trial).
  • Splunk MCP Server v1.1.3 from Splunkbase (app #7931) — 10 tools at https://<host>:8089/services/mcp: splunk_run_query, splunk_get_indexes, splunk_get_index_info, splunk_get_metadata, splunk_get_knowledge_objects, splunk_run_saved_search, splunk_get_info, splunk_get_user_info, splunk_get_user_list, splunk_get_kv_store_collections.
  • Gemini 2.5 Flash via Vertex AI as the agent brain, using function calling. The agent loop in agent.py dynamically translates MCP tool schemas to Gemini function declarations, so adding new MCP tools requires no code changes.
  • Python 3.11+ for orchestration; direct streamable-HTTP to Splunk's /services/mcp endpoint (no mcp-remote proxy needed).

The triage system prompt in triage.py defines the JSON output schema and budget guardrails: max ~6 tool calls, conclude with low confidence + uncertainty flags when data is sparse, never invent fields not seen in the data.

Challenges we ran into

  • Splunk MSI on non-ASCII Windows hostname: serverName validation rejects hostnames that are non-ASCII or contain dashes. Required renaming the host + clean reinstall.
  • Gemini AI Studio free tier quota: gemini-2.5-flash hit 503 high-demand and gemini-2.0-flash hit 429 daily quota within a handful of dev runs. Switched to Vertex AI for stable capacity.
  • JSON Schema dialect mismatch: Splunk MCP tool schemas include keys (pattern, examples) that Gemini's function declaration parser rejects. We strip unsupported keys in _clean_schema.

Accomplishments that we're proud of

  • End-to-end agent run on two sample alerts in under three turns each, valid JSON output, sensible severity + action + confidence + uncertainty flags.
  • Zero hallucination on sparse-data runs: when Splunk returns no rows, the triage card says so on the card with explicit flags instead of inventing entity history.
  • Strict JSON output schema downstream automation can consume without parsing free text.

What we learned

LLM-on-SOC demos usually impress by being bold ("Critical attack detected, isolate host!"). The harder, more useful pattern is calibrated confidence with explicit uncertainty surfacing. A SOC analyst can trust confidence=0.4 with three uncertainty flags more than confidence=0.95 with a confident lie.

What's next for Splunk IR Triage Agent

  • Wire the agent into Splunk's alert action framework so it triggers automatically on saved-search alerts, attaching the triage card to the modmail-style conversation in Splunk Web.
  • Add a Splunk app wrapper so the agent installs as a Splunkbase app rather than a separate Python process.
  • Extend recommended_action to suggest specific SOAR playbook IDs for the contain path.

Built With

Share this project:

Updates