Inspiration
Security teams often receive alerts faster than analysts can validate them. In incident response, the hard part is not just finding something suspicious; it is proving what happened with enough evidence to act safely. We built SentinelMCP to explore what an AI-assisted forensic analyst could look like if it was constrained to read-only tools, forced to cite evidence, and measured against ground truth.
What it does
SentinelMCP is an AI-powered forensic triage and realtime incident response system. It connects a Claude-based agent to a curated Model Context Protocol server that exposes read-only DFIR tools for disk, memory, PCAP, event log, and correlation analysis.
Given a case directory, the agent can run forensic tools, parse their output into structured evidence, generate findings, self-evaluate whether each finding is confirmed, inferred, or unsupported, detect gaps, rerun targeted tools, and produce JSON, HTML, and Sigma-style outputs.
The project also includes a realtime pipeline: Sysmon and Microsoft Defender XDR style events are normalized, correlated into alerts, captured into burst micro-cases, triaged by a heuristic or prompt-driven backend, converted into incidents, dispatched to local ticket/webhook artifacts, tracked through lifecycle state, and exposed through a FastAPI incident tracker API.
How we built it
We built the core system in Python 3.11. The MCP layer uses FastMCP and exposes typed, read-only tools for MFT timelines, Prefetch, Amcache, registry keys, deleted files, event logs, Volatility process and network data, injected code checks, command-line extraction, PCAP conversations, DNS queries, HTTP requests, and disk-memory correlation.
The agent loop runs in stages: initial evidence collection, Claude triage, Claude self-evaluation, deterministic gap analysis, targeted reruns, convergence tracking, MITRE tagging, and report generation. Shared dataclass models define findings, cases, tool executions, execution logs, scores, and forensic records.
For realtime response, we added collectors, normalizers, transports, correlation rules, burst capture, micro-case building, incident formatting, response planning, dispatch artifacts, lifecycle tracking, journaling, replay support, and a FastAPI API. The benchmark harness compares findings against ground truth and Protocol SIFT baseline results using precision, recall, F1, and hallucination rate.
Challenges we ran into
The biggest challenge was keeping the AI useful without letting it become unsafe or vague. We designed the MCP server as a security boundary: no arbitrary shell execution, no write operations against evidence, and structured parser outputs instead of raw command dumps.
Another challenge was making the system testable without requiring every judge or developer to have full forensic images and SIFT tooling installed. We separated parser logic, tool contracts, scoring, realtime behavior, and report generation so they can be unit tested independently, while still supporting real forensic tools in a proper DFIR environment.
We also had to balance offline forensic depth with realtime incident response. The final architecture supports both: deep case-based triage and event-driven burst micro-cases.
Accomplishments that we're proud of
We are proud that SentinelMCP is more than a chatbot wrapper. It has an MCP tool boundary, typed forensic parsers, a self-correction loop, deterministic gap detection, benchmark scoring, MITRE tagging, Sigma export, HTML/JSON reporting, and a realtime incident workflow.
We are also proud of the safety posture. The agent cannot call arbitrary shell commands through MCP, and evidence handling is designed around read-only analysis. The project includes tests across the agent loop, parsers, MCP contracts, scoring, reporting, realtime phases, API behavior, and output formatting.
What we learned
We learned that AI is most valuable in security when it is paired with narrow tools, strong evidence models, and measurable outputs. A useful incident response agent needs more than a model call: it needs structured data, clear confidence levels, gap detection, reproducible reports, and a way to compare performance against ground truth.
We also learned that MCP is a strong fit for security workflows because it lets us expose powerful tools through carefully scoped interfaces instead of giving an agent unrestricted system access.
What's next
Next, we want to harden the local setup, add a packaged demo dataset, improve dependency and vulnerability scanning, parallelize independent MCP tool calls, add richer report previews, and expand the realtime connectors beyond local/file-backed adapters. We also want to improve benchmark coverage with more incident types and publish a judge-friendly demo video showing the full flow from evidence to verified findings to incident response actions.
Log in or sign up for Devpost to join the conversation.