-
-
SubEight Startup Dashboard — Selecting between Autonomous IR Loop and Interactive Natural Language DFIR Shell.
-
ReAct Loop & Live MCP Triage — The agent queries Volatility memory APIs and formulates the initial user Edge browsing hypothesis.
-
Dynamic Self-Correction Alert — Validator rejects valid Edge hypothesis over unusual AppData path and STUN.exe parent.
-
Structured Narrative & Audit Trail Output — Complete forensic report displayed in the terminal and compiled into a styled executive PDF.
-
Natural Language Interactive DFIR Shell — Analyst welcome screen showcasing query examples for active memory, MFT, Registry, and Event logs.
-
DFIR Shell Query Execution — Network connections for PID 1912 retrieved via FastMCP and displayed with Rich panels.
SubEight — Devpost Project Story
Inspiration
Digital Forensics and Incident Response (DFIR) is facing a critical speed problem. While modern AI-powered adversaries can execute a complete domain compromise in under 8 minutes, human analysts often take hours or days to parse, correlate, and make sense of raw disk images, event logs, and memory dumps.
We were inspired by the SANS Institute's experimental Protocol SIFT research initiative. The vision of combining the SANS SIFT Workstation's vast tool library with AI agents via the Model Context Protocol (MCP) felt like the ultimate way to close this defensive gap. We set out to build SubEight—an autonomous, self-correcting DFIR agent designed to hunt threats, challenge its own assumptions, and produce audit-grade reports at machine speed.
What it does
SubEight is an autonomous, self-correcting incident response agent that operates over a customized read-only Model Context Protocol (MCP) server.
Given an incident case (such as the Stark Research Labs challenge), SubEight:
- Triages Active Memory: Searches active processes (Volatility) to identify execution timelines and suspicious child processes.
- Correlates disk & registry evidence: Queries the Master File Table (MFT) and Windows Registry hives to find anomalies like process masquerading (e.g., a process pretending to be
msedge.exerunning from a user's local startup folder instead ofProgram Filesand spawned by a C2 utilitySTUN.exe). - Performs Dynamic Self-Correction: When discovering parent-child or path inconsistencies, it automatically refutes its initial hypothesis (e.g., that Edge browsing was legitimate) and pivots to investigate the malicious service persistence (
pssdnsvc). - Verifies Reputations: Validates binary SHA-256 hashes against signature intelligence databases.
- Generates Executive Reports: Compiles a highly polished Markdown narrative and compiles it into an executive-styled PDF report (via WeasyPrint) in under 2 minutes.
How we built it
We designed SubEight with a strict decoupled architecture (Antigravity Architecture):
- Orchestration: Built with LangGraph to model the ReAct decision loop and provide a resilient state machine that guides the agent from triage to correlation and reporting.
- Custom MCP Server: Built with FastAPI/FastMCP exposing typed, read-only tools. This ensures absolute data integrity (forensic soundness) by design since no write actions are exposed.
- Cache Database: Implemented an intermediate SQLite cache layer that parses and indexes raw tool outputs (CSV/JSON from MFTECmd, EvtxECmd, Volatility). Instead of flooding the LLM's context window with gigabytes of raw data, the agent queries structured, typed APIs, eliminating database query hallucinations.
- CLI Experience: Leveraged the
richlibrary to build a stunning, professional terminal interface showing ascii art, styled panel updates, grey italicized agent thoughts, and red self-correction warnings. - PDF Compilation: Designed a CSS print-styled template with HSL color tokens and Inter/Roboto Mono typography to compile the final Markdown report to PDF.
Challenges we ran into
- SQL Query Hallucinations: Initially, the agent tried to generate raw SQL queries to search the cache, frequently hallucinating tables or columns. We resolved this by removing SQL query capabilities entirely and exposing only strictly-typed, semantic API endpoints validated by Pydantic.
- 503 / Rate-Limit Overloads: Running multiple sequential LLM calls can trigger rate limits or API overloads. We programmed a robust fallback router inside our LangGraph orchestrator that automatically pivots to a stable backup model in case of transient 503 errors.
- Evidence Integrity vs. CPU Overhead: Traditionally, ensuring forensic soundness requires hashing entire disk images (often 50GB+) repeatedly. We bypassed this computational bottleneck by mounting the evidence files in strict read-only mode at the OS kernel level, ensuring mathematical integrity at zero CPU cost.
Accomplishments that we're proud of
- True Self-Correction: Watching the agent dynamically challenge its own starting hypothesis, declare a
[⚠️ SELF-CORRECTION DETECTED]warning, and pivot its search trajectory automatically when confronted with abnormal paths and parent PIDs. - Extreme Speed: Completing a full analysis of the Stark Research Labs challenge, validating threat signatures, and exporting PDF reports in under 2 minutes.
- Professional Terminal Aesthetics: Building a CLI dashboard that feels like a premium, state-of-the-art security tool worthy of presentation to senior cybersecurity leaders.
What we learned
- Prompt Engineering is not enough: When designing autonomous agents for high-stakes domains like incident response, architectural boundaries (like read-only APIs and OS-level write blockades) are essential to guarantee safety and reproducibility.
- Strict Schema Contracts: Restricting tool parameters through Pydantic fields dramatically improves agent reliability, making it virtually immune to parameters syntax errors.
What's next for SubEight
- Native Protocol SIFT Integration: Submitting SubEight to be reviewed for integration into the official Protocol SIFT codebase.
- Multi-Host Orchestration: Extending the LangGraph orchestrator to coordinate a team of specialized sub-agents running across separate compromised endpoints to perform cluster-wide threat hunting.
- Live Endpoint Telemetry: Integrating real-time Event Tracing for Windows (ETW) and Linux eBPF telemetry streaming into our SQLite cache layer.
Log in or sign up for Devpost to join the conversation.