Find Evil! — Verifiable Autonomous IR Agent

Inspiration

AI-powered attackers move in minutes. The obvious fix — point an AI agent at a forensic workstation — breaks two ways that make it unusable for real casework: the agent can destroy the evidence it's analyzing (an agent with a shell can rm, dd, or overwrite a disk image), and it hallucinates findings you can't verify ("PsExec ran at 14:22" — no artifact, no proof). In a real incident, an unverifiable finding is worse than no finding: it wastes the one thing you don't have at 3 AM — analyst time. We built Find Evil! to prove both problems are solvable architecturally, not with a longer prompt.

What it does

Find Evil! is an autonomous incident-response agent for the SANS SIFT Workstation. Point it at a mounted disk image (and optionally a memory dump) and it runs a six-phase analysis the way a senior analyst does — triage, timeline, memory, artifacts, correlation, report — sequencing tools, recognizing anomalies, and self-correcting.

It runs in two modes: a deterministic, reproducible pipeline (court-defensible — the same case always runs the same way), and an autonomous mode where a Claude model drives the investigation: choosing the next tool from what it has found, narrating its reasoning, forming and testing hypotheses. The crucial property: the architectural guarantees hold in both modes.

Two things are made architecturally impossible:

Evidence tampering — the agent has no shell. rm, dd, curl, ssh do not exist in its tool surface; destructive/exfil commands are blocked in code before any process spawns.
Hallucinated findings — every CONFIRMED finding must carry a tool call_id present in the audit log, or the report refuses to generate it. 0% hallucination in the CONFIRMED tier, by construction.

Any finding can be verified in under 10 seconds: grep its call_id against the audit log and get the exact tool, the exact arguments, and the SHA256 of the output that produced it. Full chain of custody.

How we built it

Language: Python 3.11
Tool layer: a custom MCP server (FastMCP) exposing 10 typed forensic tools wrapping SANS SIFT tooling — log2timeline/plaso, Volatility 3 (pslist/malfind/netscan), RegRipper, YARA, plus timestomping detection ($STANDARD_INFORMATION vs $FILE_NAME).
Guardrails: a single _safe_run() chokepoint runs every command with shell=False; Pydantic validates every path; a BLOCKED_COMMANDS/PROTECTED_WRITE_PATHS list rejects destructive/exfil actions and writes to evidence paths — all before a subprocess spawns.
Agent: a deterministic 6-phase orchestrator with bounded self-correction, plus an autonomous Claude-driven mode (manual tool-use loop) that keeps the same audited tool surface.
Integrity: an append-only JSONL audit log; every tool call (executed or blocked) gets a UUID and a SHA256 of its output. The report generator rejects any untraceable CONFIRMED finding.
Quality bar: 56 automated tests + a reproducible accuracy benchmark that runs with no evidence required.

Challenges we ran into

Injection through tool arguments, not just the command line — e.g. parsers="mft; rm /cases/x". Solved with argument-level screening plus shell=False, so even an unscreened payload is an inert string.
Keeping integrity under an autonomous LLM. When a model drives tool selection, what stops it confirming a finding it can't prove? Our answer: the report generator verifies every CONFIRMED call_id against the audit log — we have a test proving an LLM-invented call_id is rejected. Full autonomy, zero loss of integrity.
Defining "hallucination" rigorously enough to enforce in code: traceability to a logged call_id, checked at report-generation time, not at prompt time.
Resisting the temptation to fake demo numbers. We built a synthetic, fully-known dataset instead, so the metrics are real and reproducible.

Accomplishments that we're proud of

12 documented bypass attempts (injection, command-chaining, exfiltration, prompt injection) — all blocked, each turned into a passing regression test.
A 0%-hallucination guarantee that's mechanically enforced, not promised — and that holds even when an autonomous LLM is in control.
A report where any finding is verifiable in under 10 seconds via its call_id.
56 passing tests, 10 typed forensic tools, and a one-command install.

What we learned

Architectural constraints beat prompt-based ones. "The agent can't run rm" is only true if the capability doesn't exist — telling a model not to is not a control.
Autonomy and safety aren't a trade-off if the safety is structural: we gave a reasoning LLM full control of the investigation and it still can't tamper or fabricate.
For forensics, traceability is the product. A finding you can't grep back to a tool call is, for an analyst, indistinguishable from a guess.

What's next for Find Evil!

macOS/Linux artifact phases (currently Windows-focused).
Encrypted-volume handling when keys are available.
A web UI over the existing JSON report + audit log. ```

After pasting, change the last header from "What's next for Untitled" to "What's next for Find Evil!" (Devpost auto-fills your project name there).

Built With

anthropic-claude
fastmcp
log2timeline
mcp
plaso
pydantic
pytest
python
regripper
sans-sift
volatility3
yara

Updates

Manoj Mallick started this project — Jun 15, 2026 06:46 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.