Inspiration

The inspiration for SentinelOps came from observing the chaotic reality of Security Operations Centers (SOC). When a critical alert fires, the clock starts ticking. However, human analysts typically waste the first 30 to 45 minutes manually assembling context—running Splunk queries, parsing logs, mapping downstream dependencies, and hunting for lateral movement.

We realized that this triage phase is highly deterministic but incredibly time-consuming. We asked ourselves: What if we could parallelize this investigation using specialized AI agents? We wanted to build a system where the moment an alert triggers, an autonomous "War Room" is assembled, doing the heavy lifting so human operators can focus purely on decision-making.

What it does

SentinelOps is an autonomous incident command system that intercepts critical security alerts the second they fire. Instead of a human analyst manually hunting through Splunk, SentinelOps deploys a team of specialized AI agents to investigate the incident in parallel.

Within seconds, the system:

Hunts for Threats: Automatically generates and runs optimized Splunk searches to uncover the attacker's footprint (like compromised accounts, lateral movement, or malware drops). Maps the Blast Radius: Analyzes your internal network topology to determine exactly which critical business services are at risk. Builds an Evidence Board: Compiles all findings into a centralized, chronological timeline and maps the attacker's actions directly to the MITRE ATT&CK framework. Plans Remediation: Generates a specific, actionable response plan (e.g., isolating specific hosts or forcing password resets) that a human operator can execute with a single click.

How we built it

SentinelOps is a multi-agent AI command system orchestrated using LangGraph and deeply integrated with the Splunk MCP Server.

The Orchestrator: A LangGraph StateGraph acts as the incident commander. When an alert webhook is received, it fans out the investigation to three specialized agents. Parallel Investigation Agents: Threat Hunter: Uses the saia_generate_spl and splunk_run_query tools to hunt for indicators of compromise (IOCs) and maps them to MITRE ATT&CK. Root Cause (RCA) Agent: Analyzes the event sequence to build a chronological attack timeline. Blast Radius Agent: Queries our service_topology.csv lookup table to recursively calculate downstream business impact. Splunk AI Assistant (SAIA) Integration: The agents autonomously write, optimize, and explain their own SPL queries using Splunk's AI capabilities, completely removing the need for pre-baked queries. The War Room Dashboard: We built a real-time FastAPI backend that streams the agents' reasoning to a sleek, glassmorphic UI via WebSockets, culminating in a centralized Evidence Board and actionable Remediation Plan. The Mathematics of Blast Radius To quantify the severity of an incident, our Blast Radius Agent calculates an impact score using a weighted recursive function over the service topology graph. The impact I I of a compromised node n n is defined as:

I ( n

)

W n + ∑ c ∈ c h i l d r e n ( n ) ( I ( c ) × D c ) I(n)=W n ​

  • c∈children(n) ∑ ​ (I(c)×D c ​ ) Where:

W n W n ​ is the intrinsic criticality weight of node n n (e.g., Domain Controller = 10, Workstation = 2). D c D c ​ is the dependency decay factor (typically 0.8 0.8), representing the probability of lateral impact to child node c c. This allows the system to instantly prioritize incidents that threaten Tier 0 infrastructure.

Challenges we ran into

Agent Hallucinations in SPL: Initially, the LLM would occasionally generate invalid Splunk Search Processing Language (SPL). We solved this by implementing an autonomous feedback loop: if the Splunk MCP Server returns an error, the agent passes the error back to saia_generate_spl to correct its own syntax before proceeding. State Management: Keeping track of parallel agent executions required migrating from a simple chain to a robust LangGraph StateGraph with a persistent MemorySaver, ensuring no evidence was lost during the fan-out phase.

What we learned

Splunk MCP is a Game Changer: The Model Context Protocol (MCP) completely revolutionized how our agents interact with Splunk. Instead of building custom REST API wrappers, providing the agents with standardized MCP tools (splunk_run_query, saia_explain_spl) made them incredibly autonomous. Parallelization beats Chain-of-Thought: Breaking the investigation down into specialized roles (Threat Hunter vs. Blast Radius) and running them concurrently reduced our incident resolution time from 4 minutes down to under 45 seconds.

Built With

  • docker
  • docker-compose
  • fastapi
  • gemini
  • javascript/html/css
  • langchain
  • langgraph
  • mermaid.js
  • python-(backend)
  • splunk-enterprise
  • splunk-mcp-server-v1.1
  • uvicorn
Share this project:

Updates