AgentWatch — AI Observability for Splunk AI Agents

End-to-end AgentWatch architecture for AI observability, quality monitoring, forecasting, and self-healing in Splunk.
Real-time MCP tool latency and call volume. AgentWatch instantly surfaces slow tools and error spikes across all AI agent calls.
7-day AI quality trend showing hallucination and relevance scores. Drift detected on May 26th — alert fired automatically.
MLTK-powered token cost forecast with 95% confidence bounds. 6 real anomaly spikes detected before they hit billing.
Auto-generated root cause analysis and SPL corrections for failed agent calls. Full audit trail with severity and confidence scores.
AgentWatch — 8-panel AI observability dashboard for Splunk. Traces, quality, forecasting, and self-healing in one place.

Inspiration

Every organization deploying AI agents inside Splunk faces the same blind spot. Your MCP Server is running queries, your AI Assistant is generating SPL — but when it returns a hallucinated result, when quality drifts over days, when token costs are about to spike — nobody catches it. AgentWatch was built to solve this: who watches your AI?

What it does

AgentWatch is a real-time AI observability platform with four core capabilities:

Agent Trace Collection — captures every MCP tool call with latency, token usage, and status. Logs 459+ trace events to a dedicated Splunk index with unique trace IDs.

AI Quality Scoring — uses an LLM-as-judge to score every agent response for hallucination, relevance, and completeness (0.0–1.0). Fires drift alerts when quality drops below 0.7 threshold.

Token Cost Forecasting — uses Splunk MLTK predict command to forecast token usage 24 hours ahead with 95% confidence bounds. Detected 6 real anomaly spikes in testing.

Self-Healing Engine — when a tool call fails, AgentWatch pulls context from Splunk, generates a root cause and corrected SPL query using local AI, and writes a notable event back to Splunk automatically.

How we built it

Splunk MCP Server 1.2 — intercepts and traces every AI agent tool call
Splunk AI Toolkit 5.7.4 — MLTK predict for time-series forecasting
Splunk Python SDK — all data ingestion, search, and index management
Ollama llama3.2 — local LLM running as quality judge and fix generator
FastAPI — REST API exposing POST /trace and POST /score endpoints
Splunk Classic Dashboard — 8 panels covering latency, quality, forecast, and healing

Challenges we ran into

The self-healing engine was the hardest component to make reliable. Ollama sometimes returns inconsistent JSON mixed with explanatory text. We solved this with a multi-strategy JSON parser that tries direct parsing, bracket extraction, and regex matching in sequence.

Generating 7 days of realistic historical data with correct timestamps for the MLTK forecast required careful handling of Splunk's time indexing.

Accomplishments that we're proud of

Complete AI observability stack built in under 2 weeks as a solo developer
Quality scorer correctly flags hallucinated responses with zero false negatives in testing
MLTK forecaster detected 6 real token usage anomalies with zero manual configuration
Self-healing engine generates actionable SPL corrections automatically

What we learned

Building observability for AI systems requires thinking recursively — you need AI to watch AI. The LLM-as-judge pattern for quality scoring is surprisingly effective at catching hallucinations that rule-based systems would miss. The Splunk MLTK predict command is powerful for anomaly detection with minimal setup.

What's next for AgentWatch — AI Observability for Splunk AI Agents

Integration with Splunk Enterprise Security to surface AI quality issues as ES notable events
Support for Splunk Cloud with hosted Foundation-sec model as the quality judge
Real-time streaming dashboard using Splunk WebSocket API
Multi-agent workflow tracing across full agentic pipelines
Cost attribution per team or use case for AI budget management

Built With

fastapi
mcp
mltk
ollama
python
splunk
splunksdk

Submitted to

Splunk Agentic Ops Hackathon

Created by

As the sole contributor to AgentWatch, I was responsible for the complete lifecycle of the project, from ideation and system architecture to implementation, testing, and documentation. I designed and developed the end-to-end AI observability platform, including Splunk integration, MCP trace collection, AI quality evaluation, token usage forecasting with Splunk MLTK, automated self-healing workflows, REST APIs, dashboards, and supporting infrastructure. I also created the project documentation, architecture diagrams, repository structure, and submission materials.

Farhan Ahmad Ansari

Updates

Farhan Ahmad Ansari posted an update — Jun 03, 2026 01:15 AM EDT

AgentWatch has been successfully submitted to the Splunk Agentic Ops Hackathon.

Built as a solo project, AgentWatch provides observability for AI agents through trace collection, AI quality scoring, token usage forecasting, and automated self-healing recommendations. Looking forward to refining the project further and sharing additional demos and screenshots.

Log in or sign up for Devpost to join the conversation.

Farhan Ahmad Ansari started this project — Jun 03, 2026 01:13 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.