Inspiration

At RSAC 2026, leaders from Microsoft, Cisco, CrowdStrike, and Splunk all converged on the same warning: AI agent governance is becoming one of the biggest gaps in enterprise security. AI agents are now accessing sensitive data, calling APIs, and taking autonomous actions across organizations, but there's no system of record for what they actually do. If an agent goes rogue, exfiltrates data, or escalates its own permissions, nobody finds out until it's too late.

That gap was the spark for AgentWatch — an AI agent governance and trust auditor built entirely on Splunk.

What it does

AgentWatch turns Splunk into the system of record for AI agent behavior:

  • Ingests every AI agent action: tool calls, data access, permission changes, n real time via Splunk's HTTP Event Collector (HEC)
  • Detects anomalies using SPL: unauthorized tool access, data exfiltration attempts, privilege escalation, credential access, excessive API calls, and off-hours activity
  • Visualizes agent activity and risk on a live dashboard, total events, anomaly counts, critical alerts, and per-agent risk breakdowns, refreshing automatically every 10 seconds
  • Reports with one click, a full executive governance report with an overall risk score, agent-by-agent risk summary, anomaly breakdown by type, and a timeline of the most serious incidents

How we built it

I started by designing five simulated AI agents (security, ops, data, dev, and HR agents), each with a defined set of "allowed tools" representing their normal scope of work. A Python simulator generates realistic activity for these agents, both normal behavior and six categories of anomalous behavior, and streams every event into Splunk via HEC as JSON.

From there, SPL queries running against the agentwatch index calculate risk scores, group anomalies by type, and surface per-agent statistics. A Flask app queries Splunk's REST API and serves a custom live dashboard (built with Chart.js) plus a one-click HTML governance report designed to be readable by a CISO or auditor, not just an engineer.

The whole stack runs in Docker inside a GitHub Codespace, Splunk Enterprise, the simulator, and the Flask dashboard all talking to each other locally.

Challenges we ran into

  • Splunk's web UI and Codespaces don't play nicely together, every login redirected to localhost and failed. Rather than fight it, I pivoted to building a custom Flask dashboard that talks directly to Splunk's REST API. This turned out to be a better demo anyway, it shows a complete, purpose-built UI rather than a generic Splunk dashboard.
  • Docker permission issues inside the Codespace meant CLI-based Splunk admin commands (like creating an index) failed with permission errors. The fix was to do everything through Splunk's HEC and REST API instead, which worked perfectly and didn't require root access.
  • HTTPS vs HTTP on the HEC endpoint caused silent connection failures at first, Splunk's HEC only accepts HTTPS, even locally.

Accomplishments that we're proud of

  • A fully working, end-to-end pipeline, from simulated AI agent activity, through Splunk HEC ingestion, SPL-based anomaly detection, to a live dashboard and a generated governance report. Every layer actually works with real data, not mockups.
  • 806+ real events processed with nearly 100 anomalies correctly detected and categorized across 6 distinct risk types, including critical ones like data exfiltration attempts and privilege escalation.
  • The one-click governance report, turning raw Splunk search results into an executive-readable risk score, agent-by-agent breakdown, and incident timeline, the kind of output a real CISO or auditor could use immediately.
  • Building this as a complete solo project in a short timeframe, including learning Splunk's HEC and REST API for the first time, debugging Docker/Codespace environment issues, and shipping a polished, demo-ready product.
  • A genuinely novel angle, rather than building "yet another SOC triage agent" (which Splunk already ships natively), AgentWatch tackles the emerging, largely unsolved problem of governing the AI agents themselves.

What we learned

This project was my first hands-on experience with Splunk, HEC ingestion, the REST search API, and SPL for anomaly detection. I learned how quickly raw event data can become an actionable risk score and report when you combine structured logging with the right SPL queries. I also learned that "AI agent observability" isn't just a buzzword, it's a real, unsolved problem that tools like Splunk are well-positioned to address.

What's next for AgentWatch

  • Integrate with Splunk's MCP Server so real AI agents (not just simulated ones) can be monitored the same way
  • Use Splunk's Foundation-sec hosted model to generate natural-language explanations of why an anomaly is risky
  • Add automated response actions, e.g., auto-suspending an agent's credentials when a CRITICAL anomaly is detected
  • Build a baseline-learning mode so "normal behavior" is learned per-agent rather than hardcoded

Built With

Share this project:

Updates