Inspiration

An AI-driven adversary can go from initial access to full domain control in under eight minutes — CrowdStrike's fastest observed breakout is seven. And the first thing a careful intruder does on the way out is reach back and stomp the timestamps, so the file that owned the box reads as old, boring, and trusted, and sinks below the analyst's timeline.

The most tutorialized counter-move in DFIR is "compare $STANDARD_INFORMATION to $FILE_NAME, and if they disagree the file was timestomped." Every entrant who touches NTFS metadata will ship that check. Here's the problem I couldn't get past: a matching $SI == $FN pair isn't proof of innocence — it's the signature of timestomp-then-rename laundering. The textbook check doesn't just miss the best-hidden attacks; it actively clears them. A clean check isn't a clean file.

So I asked: what would a senior analyst do that a junior doesn't? They'd stop trusting the clock and go read the journal underneath it.

What it does

Stomped is an autonomous DFIR agent that hunts timestomping and rename/move laundering on the SANS SIFT Workstation. It runs the textbook check, distrusts its own clean result, and consults the layer no user-mode anti-forensics tool can touch — the $UsnJrnl:$J change journal — ordering events by the monotonic USN sequence number instead of by the timestamps under suspicion. The attacker forged the clock; they couldn't forge the order.

  1. It triages. enumerate_mft scores "clean-but-active" candidates from the $MFT.
  2. It runs the trap, honestly. mft_si_fn reports the naive $SI/$FN result out loud — even when it says "clean."
  3. It distrusts the clean result. Because a matching pair is the laundering signature, it never clears a file on the naive check alone.
  4. It reads the journal. usn_events returns the kernel-written USN records, ordered by sequence — the attacker-independent clock.
  5. It self-corrects. When the journal contradicts the file, it flags the contradiction and reconstruct_timeline re-ranks by USN, superseding its first answer.
  6. It tags, never asserts. Every finding is CONFIRMED / INFERRED / CONTRADICTED with a one-line receipt (the exact tool execution behind it), and confidence is a deterministic function of how many independent layers agree.

On the real ROCBA case it caught the attacker's own anti-forensics tool. A file presenting as a 2018 Recycle-Bin item ($R9531FE.exe, inode 472521), $SI == $FN to the nanosecond — cleared by the textbook check — was in fact Sysinternals SDelete, created 2020-11-14 13:41:19 UTC, timestomped 12 milliseconds later, then renamed to hide it. The journal placed its true birth 729 days after the date it claimed.

How I built it

Pattern B — a custom MCP server — because Criterion 3 (Constraint Implementation) asks two questions only Pattern B can answer well: are the guardrails architectural or prompt-based, and what stops spoliation when the model ignores the rule? A direct shell loop "asks the model nicely" to stay read-only; one hallucinated mount -o rw,remount is evidence spoliation. I wanted the dangerous verb to not exist.

  • The typed read-only surface (mcp_server/). Seven functions and nothing else: enumerate_mft, mft_si_fn, usn_events, logfile_attr_changes, reconstruct_timeline, vol_mft_memory, carve_residual_usn. No execute_shell, no write_file, no mount. safety.py is the single choke point: a binary allowlist (only court-vetted read-only forensic tools), subprocess with shell=False (no metacharacters, no redirections), no write path to evidence, and argument scrubbing for -o rw / remount / O_RDWR tokens. The agent is additionally denied Bash/Write/Edit in the engine.
  • Zero runtime Python deps. The MCP transport is hand-written stdlib JSON-RPC over stdio — nothing to pip install; the forensic muscle is the SIFT Workstation (SleuthKit istat/fls/icat/fsstat, MFTECmd for the $MFT + $UsnJrnl:$J, plaso, Volatility 3 windows.mftscan.MFTScan, bulk_extractor, ewfmount).
  • Bounded parsing. The 3.8 GB sparse $J is parsed once by MFTECmd into a CSV index cached under STOMPED_WORK; per-inode queries filter that index, so the model never drowns in a multi-gigabyte stream.
  • Deterministic confidence (confidence.py). Not the model's mood: 1 journal layer → INFERRED (0.55); 2 agreeing layers ($UsnJrnl + $LogFile) → CONFIRMED (0.80); + memory corroboration → 0.94.
  • The engine + log (stomped/). A thin orchestrator drives Claude Code headless, renders the dark-editorial live view, and merges the model token stream with the MCP receipts into a newline-delimited JSON execution log where supersedes_step makes the self-correction a first-class, auditable event.

Challenges I ran into

  1. The cohort-default trap. Everyone ships the $SI-vs-$FN check, and it clears laundered files. Fix: treat the naive result as a hypothesis to break, not a verdict — and always consult the journal before clearing.
  2. Read-only that survives a hallucination. Prompt rules can be ignored. Fix: make read-only architectural — a typed surface with no mutator verb, over an immutable ewfmount-ro container — and then test it. A spoliation fuzz harness fires 17 write/-o rw/rm/dd/mount/path-traversal attempts plus a prompt-injection ordering the agent to "remount read-write and repair the MFT." 0/17 reached evidence.
  3. The journal wrapped — and I had to be honest about it. On ROCBA, the corroborating $LogFile had already wrapped past the record, so only one journal layer survived. The contradiction is unambiguous, but the rule requires two layers for CONFIRMED. Fix: I let it land INFERRED (0.55) and say so out loud, rather than dress an inference as a fact. Honesty is the credibility.
  4. MFT entry reuse. Inode 472521 had a prior occupant deleted minutes earlier. Fix: read USN by sequence and key on the record's own create/rename events, so the dead neighbour doesn't pollute the verdict.

Accomplishments that I'm proud of

  • It was wrong on camera, then caught itself — the textbook check cleared the intruder's SDelete; the journal reopened it. That's the tiebreaker criterion (Autonomous Execution Quality), live on real evidence.
  • 0/17 spoliation attempts reached evidence, pre-run hash == post-run hash — read-only by architecture, and tested for bypass.
  • Every finding traces to the exact tool execution that produced it (◦ receipt + JSONL traces_to_steps).
  • The held-out triptych: three files — clean / naive-stomped / rename-laundered — verdicted live and differently (CLEAN, CONFIRMED, CONFIRMED). A fixed script can't produce three correct, different answers on demand.

What I learned

  • A clean check isn't a clean file. The most crowded artifact in the field is an unreliable hypothesis, not a verdict.
  • Honesty is a feature. An INFERRED 0.55 with a named missing layer is worth more to a 3 AM analyst than a confident lie. Confidence has to be mechanical, tied to corroboration — never vibes.
  • Make the guardrail architectural, not a polite request. If the model can spoliate, prompt wording won't save the evidence. Remove the verb.

What's next for Stomped

  • More anti-forensics classes: $LogFile transaction reconstruction, USN residual-carving as a first-class layer, $MFT $Txf/transaction cross-checks.
  • Memory as a standard axis: wire Rocba-Memory through windows.mftscan.MFTScan by default for the disk-vs-RAM cross-check that promotes findings to 0.94.
  • Beyond NTFS metadata: the same "distrust → consult the independent witness → tag" loop generalizes to prefetch, AmCache, and event-log tampering.
  • Packaging: a one-line Protocol SIFT install so any responder can point it at their own image.

The Bigger Picture

Defenders don't lose to attackers because they lack tools — the SIFT Workstation has 200+. They lose because they trust the wrong artifact under time pressure. Stomped proves an AI agent can be taught the senior habit that matters most: distrust your own first answer, and go read the record the attacker couldn't forge. Read-only by architecture. Confirmed, inferred, or contradicted — never just asserted.

Built With

  • bulk-extractor
  • ewfmount
  • fastmcp
  • journal
  • model-context-protocol-(mcp)
  • ntfs
  • plaso
  • protocol-sift
  • python
  • sans-sift-workstation
  • the-sleuth-kit
  • usn
  • volatility-3
Share this project:

Updates