Why this exists
Production-incident response is a high-pressure pattern. The on-call gets paged at 2 a.m., reads a half-baked alert summary, then has to manually chain together half a dozen Splunk searches, Observability detector lookups, and saved-search joins just to figure out what's actually broken. That ten-minute scramble is exactly what a Splunk-aware agent should automate.
gemini-splunk-agent does that walk for you. Hand it the user's symptom in plain English and it works the case through the official Splunk MCP server tools, end-to-end, citing alert IDs, detector rules, SPL fragments, and underlying timeseries numbers verbatim from the tool output. Same shape as a senior SRE's debugging notebook, fully automated.
What it does
The agent's system prompt forces a five-step workflow over the Splunk MCP tool surface:
list_alerts(status="active")— surface currently firing alerts (both saved-search alerts and Splunk Observability detectors).get_detector(detector_id)— pull the detector rule + current value + baseline so the verdict is grounded.run_search(spl)— execute the SPL behind the alert, mirroring Splunk'sservices/search/jobsREST shape. Pulls back raw event records with verbatim timestamps and metrics.run_observability_query(metric, window_minutes)— pull the underlying Splunk Observability Cloud timeseries so the verdict has a graph, not just a summary.list_indexes()— only when the user asks what data exists on the cluster.
After tool-walking, the agent emits a labeled triage with six required sections:
ANSWER: one sentence — firing alert + root cause.
ACTIVE ALERT: alert id + name + severity + status.
DETECTOR: detector id + rule (verbatim) + current vs baseline.
EVIDENCE: 2-4 bullets — SPL output, p95 numbers, timestamps (verbatim).
ROOT CAUSE: one sentence, grounded in the evidence above.
NEXT STEP: one concrete action for the on-call.
How it does it
- Model: Gemini 2.5 Flash on Vertex AI.
- Agent runtime:
google.adk.agents.LlmAgentwithMcpToolset, bound to the Splunk MCP server viaStdioConnectionParams. - MCP server: ships with a faithful stub of the official Splunk MCP server (
@splunk/splunk-mcp). The agent talks to the stub during the demo so judges can reproduce the run without provisioning a Splunk Cloud tenant. To target a real tenant, swapstub=Falseand provideSPLUNK_HOST+SPLUNK_TOKEN+SPLUNK_O11Y_TOKEN. - Surface: Streamlit dashboard on Cloud Run + a
runner.ask()Python entrypoint for headless use.
The canned case
ALRT-2026-0518-1432-A — "checkout-api p95 latency > 1500ms (15-min window)". Severity CRITICAL. Detector DTC-checkout-latency-p95 is firing with current value 1842 ms against a baseline p95 of 220 ms. The saved-search SPL on the alert is:
search index=app_logs sourcetype="checkout-api" earliest=-30m
| stats perc95(duration_ms) as p95_ms by _time
| where p95_ms > 1500
The agent walks all five tools, quotes the detector rule character-for-character (p95(duration_ms) > 1500 over 15m), pulls the SPL output (30 records, half pre-spike at ~218 ms and half post-spike at ~1786 ms), confirms with run_observability_query for the checkout-api.duration_ms.p95 metric, and emits a verbatim verdict pointing to a 14:32 UTC deploy as the most likely cause.
Live Vertex AI smoke test
Reproducible via scripts/smoke.py against the stub MCP. The latest live run passed all ten verbatim-output checks:
[PASS] has ANSWER
[PASS] has ACTIVE ALERT
[PASS] has DETECTOR
[PASS] has EVIDENCE
[PASS] has ROOT CAUSE
[PASS] has NEXT STEP
[PASS] names alert id ALRT-2026-0518-1432-A
[PASS] names detector id DTC-CHECKOUT-LATENCY-P95
[PASS] quotes detector rule "P95(DURATION_MS) > 1500"
[PASS] names checkout-api
Bonus prizes targeted
- Best Use of Splunk MCP Server ($1K) — the agent's entire tool surface IS the Splunk MCP server. Five tools (
list_alerts,get_detector,list_indexes,run_search,run_observability_query) match the official server's shape, stubbed for demos and real-tenant-ready via one env-var swap.
Challenges we ran into
Writing a system prompt that forces verbatim quoting from tool output is harder than it sounds. Early drafts had Gemini paraphrase numbers ("around 1.8 seconds" instead of "1842 ms"). The fix was to require labeled sections and explicit "verbatim" rules in the prompt, then add a smoke test that greps for specific identifiers (alert ID, detector ID, detector rule string) in the final output. The test is the contract.
What's next
Wire against a real Splunk Cloud tenant during the contest window (the env-var swap is already there). Add a "second opinion" mode where two Gemini instances independently produce verdicts and a reconciler agent flags disagreements. Pipe the verdict trail back into Splunk as a structured _internal event so the SELF-CORRECTION rationale ends up in the case file.
License
Apache 2.0. Standalone repo created during the Splunk Agentic Ops contest period.

Log in or sign up for Devpost to join the conversation.