Inspiration
AI-powered adversaries can own a domain in under eight minutes. Protocol SIFT gives defenders speed on the SANS SIFT Workstation — but it still hallucinates. GRAVEYARD exists to close that gap: hunt ghost artifacts in memory (processes the OS no longer lists but Volatility still sees) and block every report until findings are provably grounded in tool output. We built autonomous self-correction without a human approval portal — a machine verifier that won't ship unproven claims.
What it does
GRAVEYARD is an autonomous verification layer for Protocol SIFT on Windows memory forensics.
- graveyard_engine.py — Detects ghost processes (psscan vs pslist), orphan sockets (netscan PIDs absent from pslist), timeline parity contradictions, and severity-scored artifacts.
- verify_findings.py — Architectural gate: REJECTs attribution in observations, phantom artifacts, and citation mismatches. Reports generate only on exit code 0.
- scripts/agent_loop.sh — Deterministic self-correction: draft v1 REJECT → auto-correct from engine facts (no LLM) → draft v2 PASS.
- scripts/benchmark_accuracy.py — Measured F1, false positive rate, and hallucination catch vs documented ground truth.
- mcp_graveyard_server.py — 8 typed, read-only MCP tools (no shell, no evidence writes).
- tests/test_spoliation.py — 22 spoliation tests with demo runner.
- scripts/run_live_triage.sh — One-command Volatility pipeline on SIFT.
Workflow: Volatility exports → engine → targeted netscan → draft findings → verifier → report only on PASS.
Measured on bundled sample: 100% ghost/orphan recall, 0 false positives, 100% hallucination catch on injected overclaim tests.
How we built it
We extended Protocol SIFT using Custom MCP Server + Cursor agent rules (AGENTS.md, .cursor/rules/graveyard.mdc) on the SANS SIFT Workstation.
Prompt guardrails: ghost-first triage sequencing, observation vs interpretation split, tee all tool output to ./exports/.
Architectural guardrails: Python engine parses exports programmatically; verifier enforces citation substring match; spoliation guard blocks evidence-path writes; MCP exposes read-only typed functions only.
Self-correction is not prompt retry — agent_loop.sh loops verify → auto-correct until exit 0, with full JSONL audit trails in docs/execution_logs/.
Repo: https://github.com/let-the-dreamers-rise/graveyard
Challenges we ran into
- Agents default to "malicious C2" in observations — we built a 14-term attribution guard and mandatory correction loop.
- Balancing honest metrics vs marketing — we report measured sample-case numbers and label simulated baseline comparisons clearly.
- SIFT VM setup and live memory access — offline demo runs fully on bundled exports; live triage documented in README and DATASETS.md.
- Evidence integrity on alternative IDEs — documented prompt-only limits for destructive commands; architectural enforcement via verifier and spoliation tests.
Accomplishments that we're proud of
- Live SIFT demonstration of REJECT → auto-correct → PASS with no human in the loop.
- 100% ghost/orphan recall and hallucination catch on bundled ground truth.
- 22 spoliation tests passing — reproducible with
bash scripts/spoliation_test.sh. - 8 read-only MCP tools judges can wire into Protocol SIFT today.
- All 8 hackathon deliverables: public MIT repo, architecture diagram, dataset docs, accuracy report, execution logs, try-it-out instructions, and demo video.
What we learned
Exit-code gates beat prompt pleading — if the report cannot ship until citations match exports, hallucinations stop at the door. Ghost + orphan socket on the same PID is the highest-value signal to surface first. Measured, reproducible benchmarks matter more than unverifiable "zero hallucination" claims.
What's next for Graveyard
- Full live SRL-2018 ground truth after memory image triage.
- malfind hollow-memory correlation on ghost PIDs inside the engine.
- CI: benchmark + spoliation + agent_loop on every push.
- Deeper multi-source correlation (prefetch, Amcache) beyond timeline parity lite.
Built With
- cursor
- model-context-protocol-(mcp)
- protocol-sift
- python
- sans-sift-workstation
- volatility-3
- windows-memory-forensics
Log in or sign up for Devpost to join the conversation.