OpsWitness

Inspiration

AI agents can investigate operational incidents quickly, but speed is not enough when teams cannot prove which context, tools, queries, and evidence influenced a decision.

Operational teams need more than an agent transcript. They need an independently verifiable record showing why an investigation reached a conclusion, whether the evidence supports it, what is likely to happen next, and whether a proposed remediation is safe.

OpsWitness was created to make AI-driven operations predictive, observable, explainable, and accountable.

OpsWitness evidence-first incident response architecture

What it does

OpsWitness is an evidence-first incident investigation and governance layer for AI agents using Splunk through the Model Context Protocol.

It sits between an MCP-capable AI agent and Splunk MCP Server, records real MCP requests and responses, writes evidence events to Splunk through HTTP Event Collector, and reconstructs each investigation as a causal context graph.

OpsWitness:

Discovers the live tools exposed by Splunk MCP Server before using them
Discovers the connected Splunk AI Toolkit inventory through MCP
Executes scoped operational investigations through splunk_run_query
Executes Splunk AITK DensityFunction over real HEC evidence volume to detect statistically unusual agent investigation activity
Records agent tool selections, calls, generated SPL, and query results
Uses the Cisco Deep Time Series Model to forecast incident trajectories with mean, p5, and p95 bounds
Uses Foundation-Sec to produce validated, advisory-only security reasoning tied to real evidence references
Detects prompt injection, poisoned tool metadata, broad searches, raw exports, sensitive-index access, and unsafe query windows
Correlates deployments with operational incidents
Requires incident conclusions to cite evidence nodes that exist in the recorded run
Discovers organization-approved saved searches and KV Store-backed response policy, failing closed when those governance assets are unavailable
Rewrites unsafe SPL into scoped investigation queries
Sends evidence-backed incident briefs to Slack
Keeps remediation behind explicit human approval
Provides an independent Splunk dashboard for verifying the evidence trail

The result is an incident room where responders can see what happened, what the models predict will happen next, why the agent reached its conclusion, and which actions remain blocked until a human approves them.

How we built it

OpsWitness uses a Python FastAPI backend and a Next.js Incident Room frontend.

The FastAPI service acts as a transparent MCP proxy between AI agents and Splunk MCP Server. MCP JSON-RPC traffic is normalized into structured Pydantic events and written to Splunk through HEC. Capability-aware preflight discovers the live MCP tool inventory before OpsWitness executes any investigation. In the connected Splunk Cloud stack, preflight discovered 45 AI Toolkit algorithms.

OpsWitness selected DensityFunction because it learns a normal distribution without labeled incident data and returns inspectable IsOutlier(...) flags and BoundaryRanges. It executes through splunk_run_query over real HEC evidence volume, turning unusual agent investigation activity into auditable Splunk evidence.

A NetworkX-based graph engine reconstructs causal relationships between prompts, context, MCP tools, Splunk searches, results, model forecasts, incidents, remediation proposals, notifications, and approval decisions. Conclusions are accepted only when every cited evidence node exists in the recorded graph.

For predictive operations, OpsWitness integrates the official Cisco Deep Time Series Model using Splunk's organizer-provided self-hosting path and its AITK-compatible authenticated inference API. Forecast values, confidence bounds, predicted peaks, model provenance, and input source are recorded as evidence.

For security reasoning, Foundation-Sec returns strict structured assessments. OpsWitness validates the response, removes fabricated evidence references, marks the result as advisory-only, and never allows model output to bypass human approval.

The Next.js frontend uses Cytoscape.js to display the evidence graph, ordered investigation timeline, live integration stages, policy findings, forecast evidence, incident impact, safe query proposals, and approval state.

Splunk provides the operational data, MCP tools, searchable evidence index, indexer acknowledgement, native SPL investigation, AI Toolkit algorithm discovery and execution, and independent evidence dashboard. Optional approved saved searches and KV Store policy are capability-discovered and fail closed when unavailable. Slack delivers the incident brief, while OpsWitness records the explicit human approval decision as evidence.

Challenges we ran into

The largest challenge was making the system use real Splunk capabilities without assuming every Splunk environment exposes the same features.

OpsWitness solves this with capability-aware MCP preflight. It discovers the connected server's live tool inventory and only calls tools that are actually advertised. The same MCP path discovered all 45 available AI Toolkit algorithms and executed the selected DensityFunction algorithm.

Splunk Cloud trial environments can restrict direct REST access on port 8089. We used Splunk's raw web proxy MCP endpoint over port 443 while preserving real MCP execution.

The managed Cisco Deep Time Series Model endpoint was recognized by our Splunk Cloud stack but could not resolve the tenant's hosted-model endpoint. Following the self-hosting guidance supplied by the Splunk hackathon team, we deployed the official Cisco model as an authenticated local FastAPI service and integrated its AITK-compatible forecast response into OpsWitness.

We also needed to ensure that model responses and incident conclusions could not fabricate evidence. OpsWitness validates model output, filters unsupported citations, and rejects conclusions containing evidence node IDs that do not exist in the recorded run.

Accomplishments that we're proud of

Live integration with Splunk Cloud HEC
Live initialization and tool discovery through Splunk MCP Server
Real splunk_run_query execution through the OpsWitness MCP proxy
Live discovery of 45 Splunk AI Toolkit algorithms through MCP
Live Splunk AITK DensityFunction execution over real HEC evidence
Fail-closed evidence capture with optional HEC indexer acknowledgement
Live zero-shot forecasts from the official Cisco Deep Time Series Model
Evidence-bound Foundation-Sec security reasoning
Causal graph reconstruction for AI investigations
Evidence validation before accepting incident conclusions
Fail-closed discovery of approved saved searches and KV Store response policy
Safe SPL rewriting and human remediation approval
Live Slack incident notifications
Native Splunk evidence dashboard
Three distinct live incident drills
Thirty-four automated backend and integration tests
Complete open-source setup and Mermaid architecture documentation

What we learned

Agent observability requires more than logs. Teams need causal evidence showing how prompts, context, tool metadata, queries, forecasts, and human decisions influenced an operational action.

We learned that predictive models are most useful when their forecasts become auditable evidence rather than unverified instructions. The Cisco model helps OpsWitness show where an incident is heading, while Splunk evidence and policy determine what responders should trust and what actions are allowed.

We also learned that different models should have distinct operational jobs. DensityFunction detects statistically unusual agent evidence activity inside Splunk, while the Cisco Deep Time Series Model forecasts where an incident signal is heading.

We also learned that optional platform capabilities must be discovered rather than assumed. Capability-aware integrations make AI operational systems more reliable, portable, and honest.

Most importantly, Splunk can observe not only infrastructure, but also the AI agents operating that infrastructure.

What's next for OpsWitness

Next, OpsWitness can add organization-specific policy packs, richer deployment integrations, role-based approvals, production graph-database deployments, and additional incident-management integrations.

We also plan to connect self-hosted CTSM directly through a locally managed Splunk AI Toolkit deployment, add organization-specific approved response integrations, expand the evidence graph across multiple collaborating agents, and use historical Splunk evidence to identify recurring unsafe investigation patterns.

Built With

ctsm
cytoscape.js
fastapi
foundation-sec
hugging-face
kuzu
mcp
networkx
next.js
pydantic
python
react
slack
splunk
splunkaitoolkit
splunkcloud
typescript

Updates

N DIVIJ . started this project — Jun 15, 2026 10:38 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.