Inspiration
Modern attackers use AI-driven malware that can move from initial access to full domain takeover in under 8 minutes — completely automated. Meanwhile, enterprise security teams are drowning in 500–5,000 alerts per day and can realistically review only 50–100. Human defenders are fighting at human speed against machine-speed threats.
This project is our unified submission to both the Splunk App Development Hackathon (autonomous SIEM triage) and the Finding Evil: Cybersecurity Hackathon (autonomous SIFT forensic investigation) — two different threat-response use cases powered by the exact same architecture.
What it does
Sentinel Zero is an autonomous AI incident-response agent powered by Google Gemini 2.5 Flash and the Model Context Protocol (MCP). It operates in two modes:
🔴 Splunk Mode (Splunk Hackathon Track)
- Ingests live Splunk SIEM alerts and ranks them by severity (CRITICAL / HIGH / MEDIUM)
- Autonomously triages the selected alert through a 5-iteration self-correcting agent loop
- Real alert tested: "Unusual Volume Shadow Copy Deletion (vssadmin.exe)" — CRITICAL
- Streams every reasoning step live to the analyst dashboard via Server-Sent Events
🟢 SIFT Forensics Mode (Finding Evil Hackathon Track)
- Connects to a custom-built FastMCP server with read-only SIFT forensic tools
- Loads real forensic targets: SEC-PROD-SRV01_disk.raw (45 GB) + SEC-PROD-SRV01_memory.dmp (16 GB)
- Runs objective: audit filesystem for unauthorized executables + check for process hollowing
- Detects and eliminates hallucinations through an independent Self-Correction auditor
- Generates a complete Incident Response Runbook (Incident ID: IR-APM-20231027-001)
Both modes share:
- Real-time SSE streaming of agent reasoning to the UI (no black box — full transparency)
- Self-Correction engine: independent Gemini call that catches and removes unsupported claims
- Multi-key API rotation: pool of 10 Gemini API keys with exponential backoff on 429 errors
- Demonstrated in live session: 2 key rotations per run, 5 hallucinations caught per run
How we built it
Architecture: A dual-mode MCP client/server system with a self-correcting agentic loop.
Agent Core (
core/agent.py): Gemini 2.5 Flash runs an autonomous loop (max 5 iterations), calling MCP tools, collecting tool outputs, and building confidence-scored findings on each pass.MCP Layer:
- Splunk Mode — integrates with Splunk's official MCP Server for live alert triage
- SIFT Mode — connects to our custom
sift_mcp_server(FastMCP), exposing read-only forensic tool wrappers (fls, volatility3, grep) as native Python functions
Self-Correction Engine (
core/self_correct.py): An independent second Gemini call that acts as a forensic auditor. It compares every proposed finding against raw tool outputs and explicitly flags any claim not backed by hard evidence. Demonstrated live: caught 5 hallucinations per session, drove confidence to 0%.Multi-Key API Resilience: Pool of 10 Gemini API keys rotates automatically on
429 RESOURCE_EXHAUSTEDerrors. Demonstrated live: key rotations at 5:44:55 AM, 5:45:24 AM (Splunk run) and 5:46:22 AM, 5:46:48 AM (SIFT run).Frontend: Premium glassmorphic cyberpunk dashboard — vanilla HTML/CSS/JS with 400vh scroll-driven storytelling, requestAnimationFrame canvas animations, and lazy SSE connections.
Backend: FastAPI with SSE streaming. Deployed on Vercel + Hugging Face Spaces.
Challenges we ran into
Gemini API quota exhaustion mid-investigation: Built a dynamic key rotation pool that catches 429 errors in real time and resumes the exact same request. Demonstrated live with 4 key rotation events across 2 sessions.
LLM hallucination in security context: Built a dedicated
SelfCorrectorclass as a second independent Gemini call. In every live test, the auditor correctly flagged and discarded hallucinated findings, driving confidence to 0% until real evidence existed.Preventing destructive tool calls: Used MCP's tool schema to architecturally restrict the agent — the AI physically cannot invoke any command that modifies forensic data.
Vercel 10-second serverless timeout: Agent loop runs 15–90 seconds. Solved with graceful SSE error messaging and local-run mode (
localhost:8001).
Accomplishments that we're proud of
- ✅ Self-correction actually works — 5 hallucinations caught and rejected per session
- ✅ Zero-hallucination architecture — Confidence: 0% is the correct output when evidence is absent. The system refuses to lie.
- ✅ 10-key API resilience — 4 live key rotation events demonstrated across two sessions
- ✅ Evidence integrity guaranteed — read-only MCP toolchain; agent cannot alter evidence
- ✅ Full audit trail — every tool call logged with timestamps to execution_log.json
- ✅ Complete IR Runbook generated — IR-APM-20231027-001 with 3-phase remediation plan
- ✅ Dual hackathon architecture — one codebase, one deployment, two winning tracks
What we learned
MCP is the future of agentic security tooling. A sandboxed tool schema enforces constraints that prompts alone cannot. The tool schema is your safety net.
Autonomous loops need explicit safety caps.
max_iterations = 5is not optional. Without it, unconstrained agents spiral into compounding hallucinations.Self-correction requires full independence. A second, completely separate Gemini call with explicit auditor instructions catches real errors that self-review misses.
Streaming UX (SSE) builds trust. When judges and operators watch every reasoning step arrive live — including the corrections — trust increases dramatically.
Resilience is a first-class feature. API rate limits are not edge cases for autonomous agents. Multi-key rotation and exponential backoff must be in the core.
Built With
- alienvault-otx
- css3
- fastapi
- fastmcp
- gemini-2.5-flash
- glassmorphism
- google-gemini
- html5
- hugging-face-spaces
- javascript
- mcp
- model-context-protocol
- python
- sans-sift
- scrollytelling
- server-sent-events
- sleuthkit
- splunk
- sse
- uvicorn
- vercel
- volatility3

Log in or sign up for Devpost to join the conversation.