SIFT MAD (Malicious Agent Detection)

Inspiration

Reading the hackathon description, the thing that immediately stood out to me was, if AI is conducting attacks, it almost certainly behaves in ways that are not human. And if that's true, then learning to recognize those behavioral traits would give you the foundation for a detection system built specifically around them, not just around the general signatures of malicious activity, but the specific signatures of malicious AI activity.

What I didn't anticipate was that there were no public datasets capturing AI-driven attacks. Firsthand reports and primary sources described such attacks, but no one had translated those descriptions into labeled data that a detection system could be built and validated against. I had to create the ground truth myself. That became the first and in some ways most important part of the project: before I could build a detector, I had to build a synthetic dataset generator grounded in what the empirical record actually said about how AI agents behave during network intrusions.

What It Does

MAD is a forensic detection stack for AI-driven cyberattacks on the SANS SIFT Workstation. It combines three things: a synthetic dataset generator, a behavioral detection layer, and an agentic investigation workflow.

The generator, MABE (Malicious Agent Behavior Emulator), produces labeled session bundles of Windows Security Events and Sysmon records emulating AI-driven network attacks against a simulated enterprise environment. It ships with a pre-generated dataset of 1,425 sessions (1,350 benign, 75 attack) that can be fully reproduced from a single command.

The detection layer runs against those bundles looking specifically for behavioral attributes characteristic of AI: machine-speed inter-event timing, exhaustive network enumeration across all reachable hosts, and credential harvest followed rapidly by privilege escalation. These signals are computed deterministically against unsupervised per-account baselines, with no ground truth labels required. The goal is to surface not just accounts that look like attackers, but accounts that look like AI attackers. Validated at 100% precision and recall across all 1,425 sessions.

Once detection is complete, an agentic layer built on Claude Code and a custom MCP server takes over. It runs autonomously through detection and calibration, then hands off to an interactive investigation loop where a human analyst works through flagged accounts, receives prioritized Protocol SIFT tool recommendations, and builds a traceable record of findings down to the specific event IDs in the raw bundle.

How I Built It

MABE's behavioral model was strongly informed by the three-engine architecture from LMDG (arXiv 2508.02942), which I used as the foundation for the benign user simulation. The attack agent was built on top of that, with behavioral parameters drawn from primary sources: sub-second velocity and exhaustive enumeration from Anthropic's GTG-1002, the 47-158x human velocity multiplier from SANS/Lee, scope expansion behavior from the Dragos water utility report, credential chaining from arXiv 2502.04227, and dead-end backtracking from arXiv 2310.11409. Every parameter in the generator is annotated with its empirical source.

For the detection layer, a conversation between Dwarkesh Patel and Ron Minsky from Jane Street shaped how I thought about the architecture. The question came up of when it makes sense to run large expensive models versus simpler ones in quantitative finance, and the answer was essentially to think across time horizons: the faster a decision needs to be made, the simpler and cheaper the mechanism should be. I applied that directly to detection. Each of the three mechanisms has three gated layers ordered by computational cost, where a session only proceeds to the next layer if the previous one fired. This keeps the system efficient across a large corpus and keeps compute costs low, while still surfacing rich, multi-level signal data on sessions that do warrant deeper analysis.

The MCP server was the key architectural decision for the agentic layer. Rather than having Claude Code execute shell commands and interpret text output, the MCP server exposes typed functions that return structured data. A confidence score comes back as a float field in a dict, not as a sentence the model has to parse. This eliminates a whole class of hallucination risk at the architectural level rather than relying on prompt instructions, which can be ignored.

Challenges I Ran Into

The most significant challenge is one I want to be transparent about. I built detection mechanisms around the exact behavioral characteristics I also built into MABE, which creates a real circularity risk. The detector performs well on MABE not necessarily because it would generalize to real AI-driven attacks, but because both were designed from the same inferences drawn from the same primary sources. The solution is collaboration with practitioners who have actually observed these attacks, and testing against sandboxed AI systems conducting attacks against simulated infrastructure.

The SIFT workstation and the forensic investigation workflow were entirely new to me going into this. Understanding what forensic analysts actually need from a tool, what it means for a finding to be traceable, how Protocol SIFT tools fit together, required building that understanding in parallel with building the tool itself. That created a lot of surface area for getting things wrong early.

One concrete example: EvtxECmd, log2timeline, and YARA are all designed for native Windows EVTX binary files. MABE produces JSON exports. These tools can't ingest the bundles directly. Rather than failing silently, the recommendation engine flags affected actions as NEEDS NATIVE EVTX and explains why, while the MCP tools handle the equivalent analysis against the JSON data. But discovering that constraint mid-build and designing around it honestly added real scope.

What I Learned

I learned a lot about what forensic investigation actually looks like in practice: what analysts need from their tools, what it means for evidence to have integrity, and how different that is from building a real-time alerting system. Coming in with no background in the field, that was the steepest part of the learning curve.

Building the MCP server was also a first for me. Designing a clean interface between a deterministic detection system and an LLM-driven investigation workflow, in a way that prevents the model from fabricating the outputs it's supposed to be interpreting, turned out to be a genuinely interesting architecture problem.

What's Next for MAD

The dataset gap is both the biggest obstacle and the clearest opportunity. MABE addresses the absence of any public ground truth for AI-driven attacks, but its behavioral parameters are inferences from published sources rather than observations from real deployments. The next step is collaboration with practitioners who have access to real or sandboxed AI-driven attack data, whether from red team exercises or forensic artifacts from environments where AI-assisted attackers have been detected. Testing against that data would either validate MABE's behavioral model or identify where it diverges, and both outcomes are useful.

The architecture is designed with this evolution in mind. The core detection layer is platform-neutral and mechanism-agnostic, MABE's behavioral parameters are all configurable and traceable to sources, and the hope is that this project serves as a starting point for a broader community conversation as more primary data becomes available.

Built With

anthropic
claude-code
mcp
networkx
numpy
python
sift

Updates

Luca Popescu started this project — Jun 15, 2026 11:24 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.