Inspiration

Teams building with LLM agents are flying blind. We kept hitting the same wall: the standard observability stack Splunk, Datadog, New Relic was built for deterministic applications, not for non-deterministic agentic systems. When a user input hijacks an agent into ignoring its instructions, there's no log line that says "compromised." It just looks like a normal LLM call returning 200 OK. A 1500-token runaway loop burning through your API budget looks identical to a healthy 500-token call from the outside.

I wanted to close that gap using infrastructure teams already trust. Not another SaaS dashboard that requires shipping your prompts to a third party something that runs inside your own Splunk instance, on-prem or air-gapped, using the role-based access controls and retention policies you already have. For regulated industries, that's the difference between "we can use this" and "compliance vetoed it."

What it does

AgentLens is a two-part system that turns Splunk into a live observability and security platform for LLM agents.

The Python SDK wires any CrewAI or LangGraph application into Splunk with a single line of code:

import agentlens
agentlens.instrument(service_name="my-agent")

After this call, every LLM invocation, tool call, and agent reasoning step is captured as an OpenTelemetry span and streamed to Splunk via HTTP Event Collector no manual logging, no schema design, no proxy server.

The Splunk Detection App provides defense-in-depth using classical ML native to Splunk:

  • Layer 1 — Prompt injection detection. A TF-IDF vectorizer plus GradientBoostingClassifier trained on 550 labeled prompts catches injection attempts in real time, running every 5 minutes against new events.
  • Layer 2 — Token anomaly detection. A DensityFunction model trained on token-usage distribution flags statistical outliers the classifier misses like cost-runaway loops where an agent is manipulated into burning thousands of tokens.
  • A live 7-panel dashboard built in Dashboard Studio showing total events, both detection counters, agent activity timeline, anomaly trends, token usage with estimated cost, and agent execution flow.

How we built it

The SDK uses OpenInference instrumentors as the semantic layer, ensuring the captured data matches what the broader observability ecosystem expects. Spans flow through an OpenTelemetry TracerProvider with a BatchSpanProcessor into a custom SplunkHECSpanExporter that POSTs to Splunk HEC, landing in index=agentlens.

On the Splunk side, I have used the Splunk AI Toolkit's native ML algorithms TFIDF, GradientBoosting, and DensityFunction rather than calling external LLMs to judge LLMs. This means detection runs deterministically, costs nothing per request, and survives offline.

To prove it works, I have built WanderBot, a CrewAI travel booking assistant with three agents (researcher, booking specialist, customer communications) and five intentionally injected vulnerabilities: prompt injection, hallucination, cost runaway, data exfiltration, and system-prompt leak. I have also shipped a LangGraph variant to prove the SDK is framework-agnostic.

Challenges we ran into

  • Making injection visible. Crafting a training set that reflected reality meant embedding injection patterns inside genuine CrewAI Task wrappers, not just raw malicious strings 270 benign travel queries and 280 malicious prompts across six categories.
  • Two layers beat one. Pattern-matching alone missed the cost-runaway attack; only the token-distribution model caught the loop that consumed 29,357 tokens. That validated the defense-in-depth approach.
  • Splunk environment setup. Getting the Python for Scientific Computing add-on to extract on Windows required enabling long paths in the Registry, and the trained ML models needed explicit global permissions before the dashboard could read them.

What we learned

Classical ML still earns its keep. We didn't need an LLM to police LLMs deterministic, Splunk-native models gave us real-time detection with zero per-request cost and full offline capability. I have also learned how much value comes from meeting teams where they already are: by living inside Splunk, AgentLens inherits an entire compliance and alerting infrastructure for free.

What's next for AgentLens

  • v0.2.0 — StateSpaceForecast for 24-hour token cost projection, LocalOutlierFactor as a third detection layer, per-agent cost budgets with throttling
  • v0.3.0 — Autonomous alert agent via Splunk MCP Server, self-tuning detection thresholds
  • v0.4.0 — Expanded framework support: OpenAI Agents SDK, Anthropic Claude SDK, AutoGen, Pydantic AI
  • v0.5.0 — Hallucination detection using a Splunk-hosted model as judge, with citation grounding checks
  • v1.0.0 — Async exporter, retry queue, offline buffer, full test suite, PyPI and Splunkbase publication

Built With

Share this project:

Updates