Warrant

"Don't trust your agent. License it."
"Warrant — a licensing authority for AI agents"
"Live dashboard: proving ground, license registry, and error-rate vs. the learned control limit"
"Every license is an auditable certificate, bound to a tamper-evident ledger"
"Architecture: three planes joined by MCP — Splunk MCP in, Warrant MCP out"

Inspiration Splunk's 2026 roadmap is full of agents that act — a Triage Agent, a Guided Response Agent, an Autonomous Response Agent — all marketed as "transparent, auditable, and under analyst control." That quietly admits the unsolved problem: nothing decides when the human can let go. Industry-wide, "can this agent act on its own yet?" is answered with a vibe, a policy, or "never, officially." I think trust shouldn't be a vibe — it should be a measured quantity, earned the way a pilot earns a license: in a simulator, under graded emergencies, not by crashing real planes.

What it does Warrant issues a license per action class (e.g. restart_connection_pool). An agent earns one only by-passing exams in a proving ground, and only when three independent conditions hold:

Confidence — the Wilson score lower bound on its hit-rate clears a threshold, so one lucky success can never license an action.
Evidence — enough graded outcomes stand behind it.
Calibration — its Brier score is low: a confidently-wrong agent fails even with a decent hit-rate.

Every outcome is graded against the agent's own falsifiable prediction — a control limit learned from healthy data, committed before it acts. Reality, not the model, decides if it was right. Licenses are revoked when a prediction is violated, invalidated when the agent's brain changes (model/prompt drift), and they decay if left unused.

Because a licensing authority must itself be trustworthy, the registry is hardened the way an auditor would demand: licenses are pinned to a per-agent fingerprint (a different brain can't spend a license it didn't earn); MCP-reported outcomes are trust-but-verify (pass a metric_url and Warrant measures the result itself; bare claims are flagged self-reported); the ledger is a tamper-evident sha256 hash chain; production failures trigger probation (each strike raises the re-licensing bar); and autonomy is graduated (ALLOW_WITH_MONITORING for a thin margin, not free rein).

The whole story is four acts: ① Proving ground (earn licenses on ~15 manufactured incidents) → ② Production (act autonomously on a real leak, then a decoy fools it → it catches its own violated prediction, rolls back, and the license is suspended) → ③ Model updated overnight (fingerprint changes, every license drops to provisional) → ④ Re-certify.

How I built it

Python + FastAPI for the control loop, a parameterised "flight-simulator" sandbox, the proving ground, a self-contained live dashboard (no CDNs), and the Warrant MCP server (FastMCP).
- Splunk MCP Server as a client — every read of Splunk data and hosted-model call goes through it (splunk_run_query over _internal, saia_generate_spl).
Warrant as an MCP server — exposes the trust gate (warrant_request_action, warrant_report_outcome, warrant_check_license, warrant_list_licenses, warrant_verify_ledger) so any external agent can be licensed over MCP. A working proof (warrant.mcp_demo) shows an independent agent earn, use, and lose autonomy — and a second brain refused a license it never earned.
The math: a 4-sigma statistical control limit learned from healthy telemetry (the SPC idea behind Splunk's own anomaly detection); a Wilson confidence bound; a Brier calibration score; a brain fingerprint for drift. The same Wilson+Brier logic ships as SPL (splunk/trust_ledger.spl).
Pluggable brain — a deterministic heuristic or Google Gemini drives diagnosis through the identical safety harness; if the LLM is unavailable, the heuristic takes over so the system never stalls.

Challenges I ran into

Free Splunk Cloud trials don't expose external ingestion (no HEC, no management port), so I adopted a hybrid topology: the sandbox holds live telemetry, Splunk is the reasoning brain over real _internal data via MCP, and the SPL trust ledger is ready for a HEC tenant.
- A naive hit-rate grants autonomy after one lucky success, and a high hit-rate can hide a badly-calibrated agent — so I moved to a Wilson bound + Brier gate, then added probation and trust decay so autonomy can't be gamed.
Corporate TLS interception broke every HTTPS client until I routed Python through the OS trust store.

Accomplishments I'm proud of A genuinely agentic loop where an LLM acts on a live system — wrapped in an architecture that makes it safe: falsifiable predictions, reversible-only actions, earned and revocable per-action licenses, and drift detection no shipping ops tool has. And a demo that proves the point most demos avoid: the agent being wrong, catching itself, losing its license, and re-earning it.

What I learned The bottleneck for autonomous operations isn't model intelligence — it's calibrated, auditable trust. Framing an ops agent as a Popperian system that must state how it could be proven wrong, then licensing it like a pilot, turns trust from a leap of faith into a number on a dashboard.

What's next

Telemetry natively in Splunk (HEC) on a non-trial tenant, so the SPL trust ledger drives a real Splunk license-registry dashboard end to end.
- More action classes and per-environment proving-ground scenarios.
- Wiring warrant_request_action in front of a Splunk SOAR playbook so a real Splunk agent is gated by its own earned license.
- Authentication on the Warrant MCP server and a shared, durable ledger for multi-agent production use.

Built With

fastapi
google-gemini
html
javascript
model-context-protocol
python
server-sent-events
splunk
splunk-ai-assistant
splunk-mcp-server
vercel

Updates

Uthmannabeel Uthman started this project — Jun 15, 2026 03:04 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.