find-evil!

Adversarial Multi-Agent DFIR Correlation Engine

Find Evil! Hackathon submission — SANS SIFT Workstation · Protocol SIFT · LangGraph Multi-Agent Framework track


What it does

find-evil closes the hallucination gap in autonomous DFIR by running every analyst finding through a second adversarial agent whose only job is to attack it.

Three core innovations — none of which exist in Protocol SIFT baseline:

1. Adversarial Self-Verification Engine

Every claim the analyst produces must cite exact text from raw tool output. A second Claude instance (the Adversary) receives both the claims and the original tool output, and attacks each claim with specific, typed challenges:

  • hallucinated_citation — the cited text is not in the raw output → claim suppressed
  • unsupported_inference — evidence exists but doesn't support the conclusion → confidence penalty
  • false_positive_malfind — .NET JIT / packed legitimate binary → confidence penalty
  • alternative_explanation — benign explanation ignored → reinvestigation triggered

Only claims that survive adversarial review reach the final report. Hallucination rate is measured and compared against Protocol SIFT baseline (12%).

2. Algorithmic Timestomp Detection

Pure code, zero LLM. Compares $STANDARD_INFORMATION vs $FILE_NAME timestamps from MFT records. Detects:

  • $SI.created < $FN.created — physically impossible without timestamp manipulation
  • $SI.modified < $FN.created — file modified before it existed
  • All timestamps with microsecond = 0 — characteristic of timestomping tools
  • Suspiciously round timestamps — attackers often use exact values

Zero hallucination possible. Every detection is a deterministic computation.

3. Live Evidence Graph

Every artifact is a NetworkX node. Every relationship is a typed edge. Agents run graph queries instead of reading JSON blobs:

  • "Find processes with no disk executable AND active network connection" — one graph traversal
  • "Find orphaned network connections with no process" — rootkit indicator
  • "Find suspicious parent-child chains" — Word spawning PowerShell, etc.
  • PageRank on suspicious nodes surfaces the highest-priority pivot points
Case data ──► [Ingest: all SIFT tools] ──► [Evidence Graph + Timeline Resolver]
                                                        │
                                              [Analyst Agent: cited claims]
                                                        │
                                           [Adversary Agent: attacks claims]
                                                        │
                                          ┌─── disputed? ───┐
                                          │                  │
                                    [Targeted         [Promote: filter
                                    Reinvestigation]   by confidence]
                                          │                  │
                                          └────► analyst ◄───┘
                                                        │
                                           [Correlate: graph queries]
                                                        │
                                              [Finalize: report + metrics]

Architecture

Pattern: Multi-Agent Framework (LangGraph StateGraph)

Security boundaries:

  • _assert_read_only() called before every SIFT tool → architectural enforcement
  • ALLOWED_BINARIES set → no arbitrary shell execution
  • All tools via subprocess.run(capture_output=True) → no shell injection
  • LLM has no run_shell() function → prompt injection cannot cause spoliation

Context window management:

  • Smart truncation: first 60 + last 20 lines per tool (tail often has summary data)
  • Agents exchange typed Claim objects with EvidenceCitation — not raw text dumps
  • Evidence graph queries return typed node lists — not JSON blobs

Self-correction:

  • Adversary flags claims with reinvestigate=True and specifies which tools to re-run
  • Targeted reinvestigation re-runs only the disputed tools, not the full pipeline
  • Hard max_iterations cap prevents runaway loops
  • Full execution trace logged to JSONL for every iteration

Try it out

Prerequisites

  • SIFT Workstation — download from sans.org/tools/sift-workstation
  • Protocol SIFT: bash curl -fsSL https://raw.githubusercontent.com/teamdfir/protocol-sift/main/install.sh | bash
  • Python 3.11+
  • ANTHROPIC_API_KEY set in environment

Install

git clone https://github.com/<handle>/find-evil
cd find-evil
pip install -r requirements.txt

Option 1 — CLI (fastest path for judges)

# Disk + memory
python cli.py \
  --case CASE001 \
  --disk   /mnt/evidence/disk.E01 \
  --memory /mnt/evidence/mem.vmem \
  --max-iter 2

# Memory only (works with Volatility public samples)
python cli.py --case DEMO_001 --memory /path/to/cridex.vmem

# Verbose debug output
python cli.py --case CASE001 --disk /mnt/evidence/disk.E01 --verbose

Option 2 — API + React dashboard

# Terminal 1 — API server
python -m server.api

# Terminal 2 — React UI  
cd ui && npm install && npm run dev
# Open http://localhost:5173

Mount evidence read-only (recommended)

sudo mkdir -p /mnt/evidence
sudo mount -o ro,loop /path/to/disk.E01 /mnt/evidence
python cli.py --case CASE001 --disk /mnt/evidence

Using Volatility Foundation public sample (no SIFT needed for memory-only)

# Download cridex public memory sample
wget -O cridex.vmem "https://github.com/volatilityfoundation/volatility/wiki/Memory-Samples"
python cli.py --case CRIDEX_001 --memory ./cridex.vmem

Output files

output/<case_id>/
  triage_report.json     ← full structured findings with all claims
  graph.json             ← Cytoscape.js evidence graph

logs/<case_id>/
  execution.jsonl        ← full agent trace (one JSON event per line)
  summary.json           ← human-readable run summary

benchmark/
  <case_id>_history.jsonl ← hallucination rate over iterations

Reading execution logs

# All adversarial results with hallucination rates
jq 'select(.event_type == "adversarial_result")' logs/CASE001/execution.jsonl

# All tool calls with timing
jq 'select(.event_type == "tool_call") | {tool, duration_ms, error}' logs/CASE001/execution.jsonl

# Claims that triggered reinvestigation
jq 'select(.event_type == "node_start" and .node == "reinvestigate")' logs/CASE001/execution.jsonl

# Total token usage
jq -s '[.[].tokens // 0] | add' logs/CASE001/execution.jsonl

# Timestomping anomalies
jq '.timestamp_anomalies[]' output/CASE001/triage_report.json

Project structure

find-evil/
├── core/
│   ├── schema.py          ← Claim, EvidenceCitation, AdversarialAttack data model
│   ├── adversarial.py     ← Analyst + Adversary agents, reinvestigation logic
│   ├── timeline.py        ← MFT parser, TimestompDetector, TimelineResolver
│   └── graph.py           ← NetworkX evidence graph, typed queries, PageRank
├── agents/
│   └── orchestrator.py    ← LangGraph StateGraph pipeline
├── tools/
│   └── sift_tools.py      ← SIFT CLI wrappers (read-only enforced)
├── logs/
│   └── execution_logger.py ← JSONL audit trail + SSE streaming
├── benchmark/
│   └── harness.py         ← Hallucination rate measurement, ground truth scoring
├── server/
│   └── api.py             ← FastAPI + SSE streaming
├── ui/src/
│   └── App.jsx            ← React dashboard (graph viz, claim explorer, benchmark)
├── data/ground_truth/
│   └── demo_001.json      ← Cridex ground truth for benchmark scoring
├── docs/
│   ├── accuracy_report.md
│   └── dataset.md
├── cli.py                 ← Rich terminal CLI
└── requirements.txt

Every submission lives on as a community tool.

Built With

Share this project:

Updates