Inspiration

Triaging a single suspicious binary can consume 60–90 minutes of an analyst's day --- hashing, YARA scanning, Volatility runs, IOC extraction, and report writing, all by hand. Meanwhile, adversaries churn out thousands of samples per hour. SIFT-AID was built to close that gap: a local-first, privacy-preserving triage agent that compresses the same workflow to under 8 minutes without sacrificing audit integrity. Inspired by the SANS SIFT Workstation, it proves an LLM-guided pipeline can think like a senior analyst without ever leaving the workstation.


What it does

SIFT-AID is a fully autonomous malware triage agent running entirely on the SANS SIFT Workstation. Feed it a suspicious binary or memory image and it handles everything end-to-end:

  • SHA-256 hashing and VirusTotal queries
  • YARA scanning across 15+ families
  • IOC extraction via strings and regex
  • Volatility 3 memory forensics with 12 whitelisted plugins
  • Docker sandbox behavioral analysis with read-only :ro privileges
  • MITRE ATT&CK technique mapping
  • Confidence-scored cross-validation
  • Analyst-reviewed containment rules for iptables/nftables

Output is a dual-format report (JSON + Markdown) plus a valid STIX 2.1 bundle ready for SIEM/SOAR ingestion --- all in a single automated pipeline.


How we built it

The orchestration layer is LangGraph 0.2+ with 12 named nodes and cyclic self-correction. An MCP server acts as the security boundary --- exposing exactly 12 typed, read-safe functions with no generic shell execution.

Ten specialist agents handle the pipeline:

Hash · YARA · Volatility · IOC · BinaryAnalysis · EntropyAnalysis · VulnerabilityCheck · NetworkIntel · DynamicAnalysis · Containment

These are backed by a MITRE ATT&CK mapper and a LanceDB IOC store for cross-incident correlation. Evidence is mounted kernel-level read-only (:ro) inside Docker, the service runs as a non-root sentinel user, and a FastAPI dashboard streams real-time LangGraph node progress over WebSockets.


Challenges we ran into

  • Volatility output bloat --- malfind output can balloon past 10 MB. We handle it with 64 KB trimming and explicit [TRIMMED] markers so the model knows when data was cut.
  • Resource contention --- parallel plugin execution against the same image caused conflicts, so we moved to sequential processing.
  • YARA portability --- rule compatibility between python-yara and the CLI required a try/except degradation path.
  • Dynamic analysis without CAPE --- we built an ephemeral per-sample Docker sandbox that spins up and tears down automatically.
  • Hallucination prevention --- every finding in the JSON report must cite its exact MCP function, timestamp, and raw output snippet. The validate node rejects anything that lacks provenance.

Accomplishments that we're proud of

SIFT-AID consistently triages complex NIST CFReDS and DFRWS forensic images in 144–184 seconds --- well inside the 8-minute SLA --- with 100% precision and zero false positives across four malicious datasets and 15 clean-software baselines.

The architectural guardrails are genuinely novel:

  • An MCP server with no generic execution endpoints
  • Kernel-level read-only evidence mounting
  • A Volatility plugin whitelist enforced before any subprocess call
  • Every finding in every report is programmatically traceable to its source

We're especially proud that the self-correction loop correctly escalated the ambiguous DFRWS 2005 steganography challenge to "Analyst Review" rather than forcing a verdict.


What we learned

  • Design the MCP tool interface first. Defining what the LLM is allowed to do before writing any agent code forces real architectural thinking --- far more effective than prompt-level guardrails, which can be bypassed.
  • LangGraph's cyclic graph with conditional routing makes self-correction explicit and auditable: every state transition is a named edge.
  • Voting beats probability calibration for confidence scoring without a calibration dataset --- each corroborating tool adds a vote, making results more transparent and explainable.
  • A local LLM can orchestrate complex forensic toolchains effectively, proving that AI-assisted DFIR requires neither cloud APIs nor data exfiltration.

What's next for SIFT-AID

  • Multi-sample batch mode with queue management for high-volume triage
  • Expanded MITRE ATT&CK coverage to 100+ techniques
  • MISP integration for bidirectional threat intelligence sharing
  • Timeline analysis --- super-timeline generation from MFT, journal, and logs, plus automated spoliation detection
  • Cross-organization threat intel feed evolving from the LanceDB IOC memory
  • Extended forensic image support beyond Windows and Linux to macOS and Android

Built With

Share this project:

Updates