Inspiration

Modern attackers use AI-driven malware that can move from initial access to full domain takeover in under 8 minutes — completely automated. Meanwhile, enterprise security teams are drowning in 500–5,000 alerts per day and can realistically review only 50–100. Human defenders are fighting at human speed against machine-speed threats.

This project is our unified submission to both the Splunk App Development Hackathon (autonomous SIEM triage) and the Finding Evil: Cybersecurity Hackathon (autonomous SIFT forensic investigation) — two different threat-response use cases powered by the exact same architecture.


What it does

Sentinel Zero is an autonomous AI incident-response agent powered by Google Gemini 2.5 Flash and the Model Context Protocol (MCP). It operates in two modes:

🔴 Splunk Mode (Splunk Hackathon Track)

  • Ingests live Splunk SIEM alerts and ranks them by severity (CRITICAL / HIGH / MEDIUM)
  • Autonomously triages the selected alert through a 5-iteration self-correcting agent loop
  • Real alert tested: "Unusual Volume Shadow Copy Deletion (vssadmin.exe)" — CRITICAL
  • Streams every reasoning step live to the analyst dashboard via Server-Sent Events

🟢 SIFT Forensics Mode (Finding Evil Hackathon Track)

  • Connects to a custom-built FastMCP server with read-only SIFT forensic tools
  • Loads real forensic targets: SEC-PROD-SRV01_disk.raw (45 GB) + SEC-PROD-SRV01_memory.dmp (16 GB)
  • Runs objective: audit filesystem for unauthorized executables + check for process hollowing
  • Detects and eliminates hallucinations through an independent Self-Correction auditor
  • Generates a complete Incident Response Runbook (Incident ID: IR-APM-20231027-001)

Both modes share:

  • Real-time SSE streaming of agent reasoning to the UI (no black box — full transparency)
  • Self-Correction engine: independent Gemini call that catches and removes unsupported claims
  • Multi-key API rotation: pool of 10 Gemini API keys with exponential backoff on 429 errors
  • Demonstrated in live session: 2 key rotations per run, 5 hallucinations caught per run

How we built it

Architecture: A dual-mode MCP client/server system with a self-correcting agentic loop.

  1. Agent Core (core/agent.py): Gemini 2.5 Flash runs an autonomous loop (max 5 iterations), calling MCP tools, collecting tool outputs, and building confidence-scored findings on each pass.

  2. MCP Layer:

    • Splunk Mode — integrates with Splunk's official MCP Server for live alert triage
    • SIFT Mode — connects to our custom sift_mcp_server (FastMCP), exposing read-only forensic tool wrappers (fls, volatility3, grep) as native Python functions
  3. Self-Correction Engine (core/self_correct.py): An independent second Gemini call that acts as a forensic auditor. It compares every proposed finding against raw tool outputs and explicitly flags any claim not backed by hard evidence. Demonstrated live: caught 5 hallucinations per session, drove confidence to 0%.

  4. Multi-Key API Resilience: Pool of 10 Gemini API keys rotates automatically on 429 RESOURCE_EXHAUSTED errors. Demonstrated live: key rotations at 5:44:55 AM, 5:45:24 AM (Splunk run) and 5:46:22 AM, 5:46:48 AM (SIFT run).

  5. Frontend: Premium glassmorphic cyberpunk dashboard — vanilla HTML/CSS/JS with 400vh scroll-driven storytelling, requestAnimationFrame canvas animations, and lazy SSE connections.

  6. Backend: FastAPI with SSE streaming. Deployed on Vercel + Hugging Face Spaces.


Challenges we ran into

  • Gemini API quota exhaustion mid-investigation: Built a dynamic key rotation pool that catches 429 errors in real time and resumes the exact same request. Demonstrated live with 4 key rotation events across 2 sessions.

  • LLM hallucination in security context: Built a dedicated SelfCorrector class as a second independent Gemini call. In every live test, the auditor correctly flagged and discarded hallucinated findings, driving confidence to 0% until real evidence existed.

  • Preventing destructive tool calls: Used MCP's tool schema to architecturally restrict the agent — the AI physically cannot invoke any command that modifies forensic data.

  • Vercel 10-second serverless timeout: Agent loop runs 15–90 seconds. Solved with graceful SSE error messaging and local-run mode (localhost:8001).


Accomplishments that we're proud of

  • Self-correction actually works — 5 hallucinations caught and rejected per session
  • Zero-hallucination architecture — Confidence: 0% is the correct output when evidence is absent. The system refuses to lie.
  • 10-key API resilience — 4 live key rotation events demonstrated across two sessions
  • Evidence integrity guaranteed — read-only MCP toolchain; agent cannot alter evidence
  • Full audit trail — every tool call logged with timestamps to execution_log.json
  • Complete IR Runbook generated — IR-APM-20231027-001 with 3-phase remediation plan
  • Dual hackathon architecture — one codebase, one deployment, two winning tracks

What we learned

  • MCP is the future of agentic security tooling. A sandboxed tool schema enforces constraints that prompts alone cannot. The tool schema is your safety net.

  • Autonomous loops need explicit safety caps. max_iterations = 5 is not optional. Without it, unconstrained agents spiral into compounding hallucinations.

  • Self-correction requires full independence. A second, completely separate Gemini call with explicit auditor instructions catches real errors that self-review misses.

  • Streaming UX (SSE) builds trust. When judges and operators watch every reasoning step arrive live — including the corrections — trust increases dramatically.

  • Resilience is a first-class feature. API rate limits are not edge cases for autonomous agents. Multi-key rotation and exponential backoff must be in the core.

Built With

  • alienvault-otx
  • css3
  • fastapi
  • fastmcp
  • gemini-2.5-flash
  • glassmorphism
  • google-gemini
  • html5
  • hugging-face-spaces
  • javascript
  • mcp
  • model-context-protocol
  • python
  • sans-sift
  • scrollytelling
  • server-sent-events
  • sleuthkit
  • splunk
  • sse
  • uvicorn
  • vercel
  • volatility3
Share this project:

Updates