find-evil!
Adversarial Multi-Agent DFIR Correlation Engine
Find Evil! Hackathon submission — SANS SIFT Workstation · Protocol SIFT · LangGraph Multi-Agent Framework track
What it does
find-evil closes the hallucination gap in autonomous DFIR by running every analyst finding through a second adversarial agent whose only job is to attack it.
Three core innovations — none of which exist in Protocol SIFT baseline:
1. Adversarial Self-Verification Engine
Every claim the analyst produces must cite exact text from raw tool output. A second Claude instance (the Adversary) receives both the claims and the original tool output, and attacks each claim with specific, typed challenges:
hallucinated_citation— the cited text is not in the raw output → claim suppressedunsupported_inference— evidence exists but doesn't support the conclusion → confidence penaltyfalse_positive_malfind— .NET JIT / packed legitimate binary → confidence penaltyalternative_explanation— benign explanation ignored → reinvestigation triggered
Only claims that survive adversarial review reach the final report. Hallucination rate is measured and compared against Protocol SIFT baseline (12%).
2. Algorithmic Timestomp Detection
Pure code, zero LLM. Compares $STANDARD_INFORMATION vs $FILE_NAME timestamps from MFT records. Detects:
$SI.created < $FN.created— physically impossible without timestamp manipulation$SI.modified < $FN.created— file modified before it existed- All timestamps with microsecond = 0 — characteristic of timestomping tools
- Suspiciously round timestamps — attackers often use exact values
Zero hallucination possible. Every detection is a deterministic computation.
3. Live Evidence Graph
Every artifact is a NetworkX node. Every relationship is a typed edge. Agents run graph queries instead of reading JSON blobs:
- "Find processes with no disk executable AND active network connection" — one graph traversal
- "Find orphaned network connections with no process" — rootkit indicator
- "Find suspicious parent-child chains" — Word spawning PowerShell, etc.
- PageRank on suspicious nodes surfaces the highest-priority pivot points
Case data ──► [Ingest: all SIFT tools] ──► [Evidence Graph + Timeline Resolver]
│
[Analyst Agent: cited claims]
│
[Adversary Agent: attacks claims]
│
┌─── disputed? ───┐
│ │
[Targeted [Promote: filter
Reinvestigation] by confidence]
│ │
└────► analyst ◄───┘
│
[Correlate: graph queries]
│
[Finalize: report + metrics]
Architecture
Pattern: Multi-Agent Framework (LangGraph StateGraph)
Security boundaries:
_assert_read_only()called before every SIFT tool → architectural enforcementALLOWED_BINARIESset → no arbitrary shell execution- All tools via
subprocess.run(capture_output=True)→ no shell injection - LLM has no
run_shell()function → prompt injection cannot cause spoliation
Context window management:
- Smart truncation: first 60 + last 20 lines per tool (tail often has summary data)
- Agents exchange typed
Claimobjects withEvidenceCitation— not raw text dumps - Evidence graph queries return typed node lists — not JSON blobs
Self-correction:
- Adversary flags claims with
reinvestigate=Trueand specifies which tools to re-run - Targeted reinvestigation re-runs only the disputed tools, not the full pipeline
- Hard
max_iterationscap prevents runaway loops - Full execution trace logged to JSONL for every iteration
Try it out
Prerequisites
- SIFT Workstation — download from sans.org/tools/sift-workstation
- Protocol SIFT:
bash curl -fsSL https://raw.githubusercontent.com/teamdfir/protocol-sift/main/install.sh | bash - Python 3.11+
ANTHROPIC_API_KEYset in environment
Install
git clone https://github.com/<handle>/find-evil
cd find-evil
pip install -r requirements.txt
Option 1 — CLI (fastest path for judges)
# Disk + memory
python cli.py \
--case CASE001 \
--disk /mnt/evidence/disk.E01 \
--memory /mnt/evidence/mem.vmem \
--max-iter 2
# Memory only (works with Volatility public samples)
python cli.py --case DEMO_001 --memory /path/to/cridex.vmem
# Verbose debug output
python cli.py --case CASE001 --disk /mnt/evidence/disk.E01 --verbose
Option 2 — API + React dashboard
# Terminal 1 — API server
python -m server.api
# Terminal 2 — React UI
cd ui && npm install && npm run dev
# Open http://localhost:5173
Mount evidence read-only (recommended)
sudo mkdir -p /mnt/evidence
sudo mount -o ro,loop /path/to/disk.E01 /mnt/evidence
python cli.py --case CASE001 --disk /mnt/evidence
Using Volatility Foundation public sample (no SIFT needed for memory-only)
# Download cridex public memory sample
wget -O cridex.vmem "https://github.com/volatilityfoundation/volatility/wiki/Memory-Samples"
python cli.py --case CRIDEX_001 --memory ./cridex.vmem
Output files
output/<case_id>/
triage_report.json ← full structured findings with all claims
graph.json ← Cytoscape.js evidence graph
logs/<case_id>/
execution.jsonl ← full agent trace (one JSON event per line)
summary.json ← human-readable run summary
benchmark/
<case_id>_history.jsonl ← hallucination rate over iterations
Reading execution logs
# All adversarial results with hallucination rates
jq 'select(.event_type == "adversarial_result")' logs/CASE001/execution.jsonl
# All tool calls with timing
jq 'select(.event_type == "tool_call") | {tool, duration_ms, error}' logs/CASE001/execution.jsonl
# Claims that triggered reinvestigation
jq 'select(.event_type == "node_start" and .node == "reinvestigate")' logs/CASE001/execution.jsonl
# Total token usage
jq -s '[.[].tokens // 0] | add' logs/CASE001/execution.jsonl
# Timestomping anomalies
jq '.timestamp_anomalies[]' output/CASE001/triage_report.json
Project structure
find-evil/
├── core/
│ ├── schema.py ← Claim, EvidenceCitation, AdversarialAttack data model
│ ├── adversarial.py ← Analyst + Adversary agents, reinvestigation logic
│ ├── timeline.py ← MFT parser, TimestompDetector, TimelineResolver
│ └── graph.py ← NetworkX evidence graph, typed queries, PageRank
├── agents/
│ └── orchestrator.py ← LangGraph StateGraph pipeline
├── tools/
│ └── sift_tools.py ← SIFT CLI wrappers (read-only enforced)
├── logs/
│ └── execution_logger.py ← JSONL audit trail + SSE streaming
├── benchmark/
│ └── harness.py ← Hallucination rate measurement, ground truth scoring
├── server/
│ └── api.py ← FastAPI + SSE streaming
├── ui/src/
│ └── App.jsx ← React dashboard (graph viz, claim explorer, benchmark)
├── data/ground_truth/
│ └── demo_001.json ← Cridex ground truth for benchmark scoring
├── docs/
│ ├── accuracy_report.md
│ └── dataset.md
├── cli.py ← Rich terminal CLI
└── requirements.txt
Every submission lives on as a community tool.
Built With
- claude
- javascript
- python

Log in or sign up for Devpost to join the conversation.