Inspiration Enterprise downtime now costs over $600 billion globally every year — and the most expensive part isn't fixing the problem. It's the 45 minutes before anyone even agrees on what the problem IS.

When an incident hits, observability data, security logs, and platform metrics live in three separate worlds. An SRE stares at a latency graph. A security analyst stares at firewall logs. A platform engineer stares at scheduled search performance. Nobody sees the whole picture — so the team gets on a bridge call and manually stitches the story together while the clock keeps running and the business keeps losing money.

Splunk's own 2026 vision talks about the "Agentic SOC" — moving from reactive alerting to proactive, autonomous investigation. We took that seriously. We asked: what if the stitching happened automatically, continuously, before a human ever opened their laptop?

That question became TRIDENT-AI.

What it does TRIDENT-AI is an autonomous incident intelligence swarm that runs continuously against Splunk Cloud — three specialized AI agents working in parallel across all three hackathon tracks:

🔱 TelemetrySentinel (Observability) — runs Splunk's native ML-SPL predict algorithm against live metrics, generating quantile confidence bands (upper95/lower95) for zero-shot anomaly detection. No training data. No manual thresholds.

🔱 ThreatMarshall (Security) — the moment TelemetrySentinel flags an anomaly, ThreatMarshall retrieves correlated security logs through the authenticated Splunk REST API and maps detected threats to MITRE ATT&CK v14 techniques — IOCs, attack timeline, the works.

🔱 PlatformAuditor (Platform & DevEx) — simultaneously diagnoses platform health via Splunk's REST API, catching resource-hogging scheduled searches and configuration drift that may be compounding the incident.

All three run asynchronously, in parallel — zero human trigger. Their findings converge at the Splunk Model Context Protocol Server, where structured tool calls and results are exchanged. AWS Bedrock (Claude 3.5 Sonnet) synthesizes everything into ONE incident package: an executive summary for leadership, a MITRE ATT&CK attack chain, a unified timeline, a quantified business impact estimate, and ranked remediation options — each expressed as a structured MCP tool call.

The engineer doesn't investigate. They review, and approve with one click. TRIDENT writes the resolved incident permanently back into Splunk Cloud via HTTP Event Collector — fully audited, fully accountable.

The result: 45 minutes of cross-team chaos becomes under 3 minutes of autonomous clarity, plus one human decision.

How we built it 🔧 Backend — Python + FastAPI, running a true autonomous daemon thread (asyncio.gather for parallel agent execution, polling every 60 seconds).

🔌 Splunk Integration — Authenticated session-based access to Splunk Cloud via Splunk Web's REST proxy (port 443). Executed Splunk's native ML-SPL predict command through the Splunk MCP Server (JSON-RPC 2.0) — verified live, receiving real quantile prediction bands back over the network. Incidents written back via HTTP Event Collector (port 8088).

🧠 AI Synthesis — AWS Bedrock (Claude 3.5 Sonnet) consumes structured JSON findings from all three agents and returns a schema-validated incident package: executive summary, MITRE ATT&CK mapping, business impact, remediation options as MCP tool calls.

🎨 Frontend — Custom React "Dark Ops" command console: live agent status with pulsing investigation indicators, an autonomous incident queue, MITRE ATT&CK chain visualization, D3 network graph of affected services, and a one-click human-in-the-loop approval panel with full audit trail.

📐 Architecture — Full sequence diagram in the repo root mapping every data flow: Splunk Cloud → Agent Swarm → MCP Server → Bedrock → React UI → Human approval → back to Splunk Cloud.

Challenges we ran into Splunk Cloud free trial instances restrict the management port (8089) — the port the MCP Server and REST API normally use — even with IP allowlisting configured. Our first instinct was to fall back to pure simulation. Instead, we dug into Splunk Web's architecture and found that its REST proxy on port 443 forwards authenticated requests to splunkd internally.

We rebuilt our entire auth layer around session-based authentication through this proxy — and successfully executed Splunk's native predict ML-SPL algorithm through the MCP Server, receiving real, computed quantile confidence bands (upper95/lower95) back over the network with a 200 OK. This let TRIDENT prove genuine Splunk AI execution on a free-tier instance — something we believe most trial-tier participants won't attempt.

Accomplishments that we're proud of ✅ A verified, authenticated connection to Splunk Cloud that executes Splunk's native machine-learning forecasting (predict, quantile bands) through the MCP Server — with proof, not just claims.

✅ A genuinely autonomous loop — three agents detect, investigate, and synthesize a complete incident with zero human trigger.

✅ A human-in-the-loop design that mirrors Splunk's own Agentic SOC philosophy: agents investigate exhaustively, humans decide on irreversible action. Autonomy without sacrificing accountability.

✅ Genuine cross-domain correlation — Observability, Security, and Platform DevEx unified into ONE incident narrative, not three separate dashboards.

✅ Incidents written back into Splunk Cloud for permanent, queryable audit — closing the loop, not just displaying alerts.

What we learned We learned that Splunk Cloud's REST API remains reachable through Splunk Web's authenticated proxy even when the management port is restricted on free-tier instances — a path we hope helps other developers building against Splunk Cloud trials. We learned how Splunk's native predict command computes quantile-based confidence intervals for zero-shot forecasting. And we learned how to structure multi-agent findings as Model Context Protocol tool calls for clean, auditable LLM synthesis — the exact pattern Splunk's Agentic SOC vision describes.

What's next 🚀 Production-tier integration with Splunk's Foundation AI Security Model for deeper semantic log classification, replacing our REST-based retrieval with native security-model inference.

🚀 SOAR-integrated remediation playbooks — turning PlatformAuditor's findings into automated, approved firewall and search-termination actions.

🚀 Multi-tenant deployment for MSSPs — one TRIDENT instance monitoring multiple customer Splunk environments, with per-tenant incident isolation.

🚀 Splunkbase packaging — distributing TRIDENT as an installable Splunk app, lowering the barrier for any Splunk Cloud customer to deploy autonomous incident intelligence.

Built With

  • asyncio
  • aws-bedrock
  • claude-3.5-sonnet
  • d3.js
  • fastapi
  • httpx
  • jsonrpc-2.0
  • mitre-attack
  • ml-spl
  • model-context-protocol
  • pydantic
  • python
  • react
  • splunk
  • splunk-cloud
  • splunk-mcp-server
Share this project:

Updates