-
-
Aegis-Mind NOC Dashboard live in Splunk Enterprise — MTTR: 4.2s, $24,500 saved, Cisco Deep Time Series forecast & incident post-mortem logs.
-
Aegis-Mind NOC Terminal — cyberpunk CLI shell with 3 real incident scenarios: K8s exfiltration, SSH brute-force, and Circuit Breaker demo.
-
Autonomous agents in action: Triage → Time-Series (+683% spike) → AWS IAM revocation playbook executed. Post-mortem report auto-generated.
-
Bilingual Chat Copilot — ask security questions in French or English. AI translates to SPL, runs live on Splunk via MCP with Self-Healing.
-
Natural language → SPL → live Splunk results. "Show me recent brute-forces" returns 4 real events with source IPs, users & hit counts.
Aegis-Mind: Autonomous Multi-Agent NOC for Splunk
🌌 Inspiration
Every SOC/NOC analyst knows the operational challenge: it's 3 AM, a critical alert fires, and you are reviewing thousands of raw log lines to determine if it is an actual Kubernetes credential exfiltration or a false positive. The Mean Time to Respond (MTTR) often stretches from minutes to hours while the threat blast radius expands.
We set out to solve this: what if an autonomous AI cell could execute the entire incident response loop — triage, impact forecasting, and remediation — before human escalation is required?
The Splunk Agentic Ops Hackathon provided the ideal ecosystem. Leveraging Splunk's latest AI capabilities (MCP Server, hosted models such as Foundation-Sec and Cisco Deep Time Series, the Python SDK's native splunklib.ai agent framework, and the Splunk AI Assistant for SPL), we designed a production-grade, native Splunk application deployed directly inside Splunk Enterprise.
🛠️ What it does
Aegis-Mind is an autonomous, event-driven incident response and threat mitigation system that lives natively inside Splunk Enterprise. It automates security operations through four integrated capabilities:
1. 🕵️ Autonomous Multi-Agent Incident Cell
When a security alert triggers, three specialized AI agents collaborate sequentially under the orchestration of the Google Antigravity SDK:
- Triage Agent (
Foundation-Sec-1.1-8B-Instructlogic): Queries Splunk logs via the MCP server, performs deep forensic analysis, maps indicators of compromise (IoCs), and classifies the threat vector. - Time-Series Agent (
Cisco Deep Time Serieslogic): Extracts infrastructure KPIs, forecasts network performance degradation at $t+15\text{m}$ and $t+1\text{h}$, and elevates severity metrics if production impact is imminent. - Remediation Agent (
gpt-oss-120blogic): Synthesizes forensic and operational findings into executable playbooks (e.g., Calico network policy blocklists, AWS IAM compromised session revocations) and verifies restoration through Splunk.
2. 🔍 Native Splunk Custom Search Command (| aegismind)
A streaming search command built on the official splunklib.searchcommands framework using Splunk's Chunked v2 protocol. Analysts can pipe any SPL query through | aegismind to inject real-time, AI-enriched security analysis directly into search results, removing the need for context-switching or external tools.
3. 💬 Bilingual Interactive Chat Copilot
A standalone interactive NOC console where analysts input questions in natural French or English (e.g., "Montre-moi les brute force récents" or "Show me credential leaks"). The copilot translates the natural language into validated SPL queries, automatically appends the custom | aegismind command, executes the search against Splunk via the MCP server, and displays structured results. It incorporates a Self-Healing SPL layer that auto-corrects malformed queries dynamically.
4. 🛡️ Smart Resilience Layer
- Quota Circuit Breaker (
src/utils/circuit_breaker.py): If the Triage Agent confidently identifies an alert as a False Positive, downstream agent execution is immediately halted, conserving Splunk API limits and LLM token usage. - Self-Healing SPL: The MCP client captures Splunk query syntax errors, applies rule-based corrections (e.g., single to double quote conversions, missing statistics command pipes, absent leading search terms), and retries the search transparently.
- Intelligent Fallback: If the live Splunk instance contains empty indexes, the system automatically switches to curated, high-fidelity simulated datasets (e.g., AWS CloudTrail exfiltration telemetry) to guarantee a flawless evaluation.
🏗️ How we built it
Architecture & Stack
- Splunk Enterprise (local instance, port 8001/8089) as the data platform and deployment target.
- Splunk Python SDK v3.0.0 including the native
splunklib.aiagent framework (Agent,BaseAgent,ToolSettings,connect_local_mcp) andsplunklib.searchcommandsfor the custom streaming command. - Splunk MCP Server for secure, token-based agent-to-Splunk communication over the REST API.
- Google Antigravity SDK for multi-agent orchestration and task coordination.
- Splunk AI Assistant (SAIA) patterns for natural language to SPL translation with bilingual keyword mapping.
Native Splunk App Packaging
We packaged Aegis-Mind as a properly structured Splunk App (aegis_mind/) deployed directly to /Applications/Splunk/etc/apps/aegis_mind/:
-
default/commands.conf: Registers the| aegismindcustom search command withchunked = true. -
default/alert_actions.conf: Declares theaegis_triagecustom alert action utilizingpython.version = python3. -
default/app.conf: Contains application metadata withis_configured = 1. -
metadata/default.meta: Manages global ACL permissions (export = system). -
default/data/ui/views/aegis_dashboard.xml: A dark-themed dashboard visualizing agent execution timelines, Cisco forecasting curves, and security metrics.
Development Process
- Designed the agent logic (
src/agents/) and the REST-based MCP client bridge (src/mcp_client.py). - Built the interactive NOC console (
src/main.py) using structured ANSI formatting for clear execution visibility. - Integrated the Splunk SDK's
StreamingCommandto build the native streaming search command. - Developed the bilingual copilot, incorporating the SPL generator and the self-healing parser layer.
- Packaged the components into the native Splunk App structure, verified directory structures, and validated end-to-end performance against a live Splunk instance.
🚧 Challenges we ran into
Splunk Chunked Protocol v2 Parsing
Our first implementation attempted raw socket chunk negotiation. It crashed because Splunk utilizes text-based headers (e.g., "chunked 1.0,453,0"). We resolved this by adopting splunklib.searchcommands.StreamingCommand, which handles protocol encapsulation natively.
Config Conflicts in commands.conf
Legacy configuration variables (such as enableheader = true) caused silent conflicts with the modern chunked protocol. Removing these conflicting keys and relying solely on chunked = true and supports_getinfo = true resolved the issue.
SSL Certificate Bypass in Development
Local Splunk Enterprise test instances default to self-signed SSL certificates. Standard Python HTTP calls threw verification errors. We implemented a secure developer SSL context bypass to streamline local prototyping without compromising production standards.
Asyncio Lifecycle in Interactive Console
The bilingual chat copilot runs an async event loop for concurrent queries. Python's asyncio raised CancelledError exceptions on Ctrl+C exits, printing verbose tracebacks. We wrapped the event loop with graceful KeyboardInterrupt handling and clean task cancellation.
Empty Splunk Indexes During Demos
A fresh trial Splunk instance has empty indexes, returning zero logs for valid SPL queries. We built the Smart Fallback engine: if a valid query returns empty results, the MCP client transparently switches to pre-loaded, highly detailed simulated datasets, ensuring a smooth evaluation experience while keeping the real integration codebase fully active.
🏆 Accomplishments that we're proud of
- Production Integration: Every component runs against a live Splunk Enterprise instance. The
| aegismindcommand enriches active search results, and the custom alert action triggers real multi-agent security workflows. - Full SDK Alignment: We embedded the official
splunklib.aiframework and modeled our agent architecture directly on Splunk's native patterns (Agent,BaseAgent,ToolSettings), adhering strictly to platform standards. - Self-Healing SPL: When an AI-generated SPL query encounters syntax errors, the client intercepts the failure, applies rule-based corrections, and retries the search automatically with zero human intervention.
- Bilingual Interface: Analysts can query in English or French out-of-the-box. The generator maps keywords semantically across both languages.
- Automated Crisis Documentation: Every completed incident compiles a comprehensive Markdown Post-Mortem report, including calculated MTTR, financial cost savings, and a dynamic Mermaid.js sequence diagram tracing the chronological agent interactions.
- App Inspect Ready: The app package follows standard Splunk directory guidelines (
default/,metadata/,bin/), utilizespython.version = python3, and complies with security isolation requirements.
🧠 What we learned
- The depth of Splunk SDKs: The
splunklib.aimodule is a highly capable AI engineering tool. Adhering to native design patterns instead of custom wrappers made our integration extremely robust. - Protocol precision is key: The difference between a working and non-working custom command came down to a single configuration key (
chunked = truevs legacy headers). - Resilience is vital: The Self-Healing SPL and Smart Fallback systems were critical for ensuring a stable, failure-tolerant evaluation flow.
- MCP is the standard: The Model Context Protocol provides a clean, secure, and unified abstraction for LLMs to query databases and platforms safely.
🔮 What's next for Aegis-Mind
- OAuth 2.0 Integration: Migrate from token-based auth to Splunk's OAuth flow once it is generally available for third-party developers.
- Live Hosted Model Endpoints: Connect agents to Splunk's cloud-hosted model APIs rather than emulated local inference for cloud-scale processing.
- Splunk SOAR Integration: Wire the Remediation Agent playbooks directly into Splunk SOAR for automated orchestration with human-in-the-loop approval gates.
- Splunkbase Publication: Complete final App Inspect validations and publish Aegis-Mind as an open-source community app on Splunkbase.
Log in or sign up for Devpost to join the conversation.