Inspiration

An AI attacker can breach a system and take full control in under eight minutes. Meanwhile, a human incident responder is still opening their toolkit.

Protocol SIFT showed that connecting AI to forensic tools is possible, but it gives the AI raw shell access, which can lead to hallucinated commands and potential evidence corruption.

I wanted to solve this problem at the architectural level instead of relying on prompts that simply tell the AI what not to do.

What it does

SIFT MCP Forensic Agent autonomously investigates memory images using two core components.

Purpose-Built MCP Server

The server exposes 11 typed forensic functions through the Model Context Protocol. Instead of constructing shell commands, the AI calls functions such as get_process_list(image).

Destructive commands do not exist in the server. Safety is enforced by design rather than through prompts.

Self-Correcting LangGraph Agent

The agent follows a state machine that performs triage, deep analysis, AI analysis, self-correction, and reporting.

After analyzing the evidence, it validates its own findings against the underlying data. If it detects gaps or inconsistencies, it loops back and performs additional analysis. The workflow allows up to three self-correction rounds.

Every tool execution is logged with UTC timestamps so each finding can be traced back to the exact tool invocation that produced it.

How I built it

  • Built a custom MCP server in Python that wraps Volatility 3 and Sleuth Kit as typed, validated functions
  • Added input validation, subprocess timeouts, structured output parsing, and error handling for every tool
  • Designed a LangGraph workflow with five nodes: Triage → Deep Analysis → AI Analysis → Self-Correction → Report
  • Implemented a self-correction step that cross-checks PIDs against the process list and identifies coverage gaps
  • Created a Streamlit dashboard for tracking investigations in real time
  • Tested the system against the 18 GB Rocba memory image from the hackathon's standard forensic case
  • Deployed the application on a Google Cloud VM running Ubuntu 22.04 with SIFT forensic tools installed

Challenges I ran into

M4 Mac vs. SIFT

SIFT Workstation is designed for x86 systems, while my MacBook M4 uses ARM architecture. I had to deploy the environment on a Google Cloud VM instead of running it locally.

Token limits

The Rocba image contains 2,186 processes. Free-tier LLM context limits required aggressive truncation, which caused the agent to miss the most suspicious finding, MRC.exe, during its initial analysis.

LLM hallucinations

The AI occasionally flagged normal Windows processes as suspicious and referenced incorrect tool names. I documented every hallucination honestly in the accuracy report.

18 GB evidence transfer

Moving forensic evidence between my local machine and the cloud VM required creative use of Google Cloud Storage.

Accomplishments I'm proud of

Architectural enforcement over prompts

Many submissions rely on configuration files that tell the AI not to run destructive commands.

I built an MCP server where those commands simply do not exist. This approach enforces security at the architecture level rather than depending on prompt instructions.

Real self-correction on real evidence

The agent genuinely loops back when validation identifies missing information. The execution logs show three real self-correction rounds with actual re-analysis.

Honest accuracy reporting

I documented every hallucination the AI made, including false positives, missed indicators due to truncation, and incorrect tool references.

The hackathon values honesty over perfection, and I built that principle into the project.

Real-world forensic data

The system was tested against the full 18 GB Rocba memory image rather than a small demonstration dataset.

It identified suspicious artifacts including MRC.exe running from D:\Tools and more than 1,000 anomalous Teams.exe processes.

Learning from scratch

I started this hackathon with no forensic background.

Along the way, I learned cloud infrastructure, MCP, Volatility, forensic workflows, LangGraph, and memory analysis, then applied those skills to build a working forensic agent.

What I learned

  • The difference between prompt-based security and architectural security
  • How forensic investigations are conducted in practice: triage first, follow the evidence, validate findings, and document everything
  • How to build an MCP server from scratch
  • How to design LangGraph workflows with conditional routing and self-correction loops
  • How to use forensic tools such as Volatility 3 and Sleuth Kit for memory analysis

What's next

  • Add Windows process baselines to reduce false positives
  • Implement anomaly detection before truncation so suspicious processes are prioritized
  • Add disk image analysis for cross-source correlation
  • Support larger-context LLMs that can analyze complete evidence without truncation

Built With

  • cloud
  • fastmcp
  • google
  • google-cloud
  • groq
  • langchain
  • langgraph
  • llama
  • llama-3.1
  • model-context-protocol
  • model-context-protocol-(mcp)
  • python
  • sans-sift-workstation
  • sleuth-kit
  • streamlit
  • volatility-3
Share this project:

Updates