Custom Go MCP Server: LLM calls typed functions only. No raw shell. No injection. Architectural enforcement, not prompt rules.
100% precision, 92.86% recall, 0 hallucinations. 13/14 APT IOCs found on real SRL-2018 dataset. Benchmark vs documented ground truth
CONFIRMED: usbclient.exe PID 6648, C2 to 108.79.235.64:33000, 11+ beacons to 172.16.4.10:8080. Real SRL-2018 APT dataset.
Self-correction live: pslist empty → DKOM IOC flagged → malfind empty → VAD hook confirmed. Agent catches its own blind spots.
Every tool call logged: intent, hypothesis, result, delta. Any finding traced back to exact execution. JSONL + Markdown output

LogPoseSIFT — Project Story

Inspiration

The average attacker moves from initial access to domain control in 7 minutes. The average incident responder takes 7 hours to even start triage.

That gap is not a skill problem — it is a speed problem. Attackers have automated their kill chain. Defenders are still typing commands manually.

I built LogPoseSIFT to close that gap. Not with a chatbot wrapper around existing tools, but with a genuinely autonomous agent that thinks like a DFIR analyst — forming hypotheses, running tools, catching its own mistakes, and producing findings that are traceable back to raw evidence.

The name comes from the Log Pose — the compass in One Piece that records and follows magnetic signatures to navigate unknown seas. LogPoseSIFT records the forensic signature of a compromise and navigates toward the truth.

What It Does

LogPoseSIFT is an autonomous DFIR triage agent that connects Claude (with Gemini failover) to the full SANS SIFT Workstation toolchain through a Custom MCP Server written in Go — the most architecturally sound pattern available.

Given a memory dump or disk image, LogPoseSIFT autonomously:

Pre-triages in Go before the LLM starts — running psscan and netscan directly, parsing real findings into a structured fact sheet embedded in the initial prompt. Claude cannot hallucinate what is already in its own context.
Executes 12 typed MCP tools across 6 categories (memory, disk, registry, YARA, hashing, correlation) in an agentic loop of up to 10 iterations.
Self-corrects — when pslist returns empty because a rootkit unlinked the EPROCESS chain, the memory agent detects this, flags it as a CONFIRMED DKOM IOC, escalates to pool tag scanning (psscan), and runs a psxview diff to prove which processes are hidden.
Correlates memory findings against disk findings to detect fileless malware and timestomping.
Tags every finding CONFIRMED / INFERRED / UNVERIFIED via a Go validator before it reaches the context window.
Writes a full audit trail — structured JSONL plus human-readable Markdown with intent, hypothesis, result, and delta per tool call.

Benchmark results — SRL-2018 APT dataset (real-world intrusion, documented ground truth)

Metric	Result
True Positives	13 / 14 IOCs
False Positives	0
Hallucinations	0
Precision	100%
Recall	92.8%

Architectural Pattern — Custom MCP Server

This is the hardest of the four supported approaches. Most competitors used Direct Agent Extension — prompt-based, two hours of work. We built architectural enforcement.

The LLM cannot run arbitrary shell commands. It calls typed Go functions registered as MCP tools. Go constructs the exec.Command args from validated typed inputs — never from LLM output. Shell metacharacter injection is rejected at the input layer before execution.

Claude / Gemini (LLM)
        │  MCP protocol — typed tool calls only
        ▼
cmd/sift-mcp/main.go   ← SECURITY BOUNDARY
        │  12 registered tools
   ┌────┴────┐
agents/     internal/
   │             │
   ├─ orchestrator      ├─ wrappers/  (7 typed tool wrappers)
   ├─ memory_agent      ├─ validator/ (hallucination guard)
   ├─ disk_agent        ├─ correlator/(disk vs memory cross-ref)
   └─ reasoning_logger  └─ registry/  (tool allowlist, 30+ entries)
        │
   SIFT Tools  (READ-ONLY)
   vol · fls · log2timeline · rip.pl · yara · hashdeep

Key distinction: A prompt-injected instruction saying "delete the evidence file" will fail because rm is not in the tool registry and the MCP server has no shell execution capability. This is not a guardrail — it is an absence of capability.

How We Built It

Phase 1 — Architecture

We committed to the Custom MCP Server pattern from day one. Every SIFT tool became a typed Go struct with validated input parameters and a JSON output parser.

volatility.go was the template. We repeated the pattern for RegRipper, TSK (fls/mactime/icat), bulk_extractor, foremost, log2timeline, YARA, and hashdeep — building a library of type-safe forensic tool wrappers that the LLM can call without ever touching a shell.

Phase 2 — The Hallucination Problem

Early versions had Claude receive 8,000 characters of real psscan data in iteration 1, then write "tools returned nothing" in iteration 6. The context window had moved on and it forgot.

The fix: pre-triage fact injection. Go runs psscan and netscan before the LLM loop starts, parses real process names and IP addresses into a structured fact sheet, and embeds it as confirmed facts in the very first message Claude reads. Claude cannot claim "no output" when the process names are sitting in its own system prompt.

Phase 3 — The Rootkit Problem

The SRL-2018 image has a DKOM rootkit that unlinks every process from the EPROCESS ActiveProcessLinks chain. Standard pslist returns only a header row.

Early versions treated this as a tool failure. That was wrong.

Empty pslist on a live 90-process Windows system is not a failure — it is a CONFIRMED IOC. We rewrote the memory agent to explicitly report empty malfind and cmdline as rootkit indicators, added psxview diff as the self-correction step, and switched primary process enumeration from pslist to psscan (pool tag scanning, which bypasses DKOM).

Phase 4 — Accuracy Verification

We built a benchmark harness (benchmark/run_benchmark.sh) that runs the agent autonomously against the SRL-2018 evidence, then scores the output against a documented ground truth JSON covering 14 known IOCs across 4 categories. The harness produces TP/FP/FN counts, precision, and recall — written to benchmark/results/ on every run.

Challenges

Context window degradation. Volatility netscan returns 12,000+ characters. Passing raw terminal output to Claude fills the context window with noise and causes the LLM to lose track of earlier findings. Solution: Go parsers that extract only semantically relevant rows — ESTABLISHED connections, suspicious ports, non-RFC1918 addresses — before returning to the LLM.

The self-correction trigger. Early versions only triggered self-correction if malfind output contained "Process:" — which never fires on a rootkit-compromised image because malfind is also suppressed. New trigger logic: empty malfind on a system where psscan finds 90+ processes is definitionally anomalous. The agent now reports this explicitly as a VAD walk suppression IOC.

Gemini type system. Go's type checker rejects assigning genai.FunctionResponse to a variable declared as genai.Text. Required declaring the loop message variable as genai.Part — the interface both types implement — to allow the agentic loop to work correctly with Gemini.

plaso path mismatch. The disk agent called log2timeline using a hardcoded output path from the registry entry, but then checked for the plaso file at a different computed path — a CONFIRMED check that always failed silently. Fixed by calling SafeExec directly with an explicit --storage-file argument, bypassing the registry path entirely.

The final report problem. Even with real tool data, Claude would write "all tools returned empty" in the final report. The root cause: by iteration 6, tool results from iteration 1 had scrolled out of effective attention. Fixed with pre-triage injection — the key facts are in the first message, not buried in tool results from 5 iterations ago.

What We Learned

Architectural enforcement beats prompt engineering every time. Every hour spent making Go wrappers type-safe saved ten hours of fighting hallucinations and prompt injection.

Empty tool output is a forensic finding, not a failure. A rootkit that hides processes produces empty pslist. Treating empty output as "no findings" is forensically wrong and will cause an analyst to miss the most important IOC in the image.

Pre-inject facts — don't hope the LLM remembers. The LLM does not reliably recall data from five iterations ago. The solution is not better prompting. It is running key tools in Go before the loop starts and making the results part of the system context.

Build the benchmark first. We built the accuracy harness after the agent. We should have built it first. Knowing what "correct" looks like from the start would have shortened Phase 3 significantly.

What's Next

Disk triage against SRL-2018 file server snapshot — the base-file-snapshot5.7z disk image contains MFT artefacts, registry hives, and prefetch files for full timeline correlation
YARA against raw memory pages — scan the entire memory image with Cobalt Strike, Mimikatz, and DKOM rootkit signatures for pattern-based confirmation of findings
Cross-image correlation — run memory agents on all 7 SRL-2018 hosts simultaneously and correlate findings across the enterprise to map the full attacker lateral path
Live SIEM integration — connect the MCP server to a SIEM or EDR for real-time autonomous triage on live endpoints
Persistent learning loop — write session failures to progress.json so the agent learns from previous runs on the same case and improves iteration over iteration

Built With

claude
custom-mcp-server
flash
framework
gemini-3.1
generative-ai-go
go
go-anthropic
hashdeep
mcp-go
multi-agent
plaso
regripper
sift
sonnet-4-6
the-sleuth-kit
volatility-3
workstation
yara

Updates

Amaresh Hebbar started this project — May 25, 2026 08:38 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.