Paladin Autonomous Security System

Paladin — Autonomous Corporate Security System

Inspiration

In late 2025, Anthropic's security team disclosed GTG-1002 — a state-sponsored operation that used an AI agent to run reconnaissance, exploitation, and lateral movement at 80-90% autonomy, at request rates the report described as "physically impossible" for human operators. Around the same time, research kept landing on the same number: autonomous attack tooling now operates tens of times faster than a human SOC analyst can react.

That asymmetry is the whole problem. Attackers already have autonomous agents. Defenders are still tabbing between a SIEM, a ticketing system, and a terminal, manually correlating three different log sources while the clock runs.

We wanted to find out: if an attacker can run an agent loop against you, can a defender run one back — locally, transparently, and safely enough that a SOC would actually trust its output? That question became Paladin: a fully local, end-to-end autonomous security system that doesn't just detect — it investigates, self-checks its own findings, and proposes (or executes) a response, with a forensic layer built specifically to keep a local LLM honest when it's reasoning about disk and memory evidence.

What it does

Paladin is a six-layer autonomous security pipeline that runs entirely on local infrastructure (Ollama + Qwen, Neo4j, PostgreSQL, Docker):

Layer 1 — Ingestion: simulates corporate telemetry across four channels — system/auth logs, emails, chat messages, and call transcripts — covering 14 built-in attack scenarios (brute force, data exfiltration, phishing, insider conspiracy, and more). Layer 2 — SAP Core: a SpaCy/Natasha NLP pipeline parses every event, maps entities into a Neo4j knowledge graph (16 node labels, 25+ relationship types), and a correlation engine runs six weighted detectors (brute force, mass download, cross-channel correlation, clearance violations, off-hours activity, external sensitive comms) to produce a severity score. Layer 3 — LLM Triage: a local Qwen 3.5:9B model reviews each incident's graph context and proposes a response action (NOTIFY, FLAG, ISOLATE, BLOCK, BLOCK_IP, QUARANTINE_FILE, REVOKE_SESSIONS). A multi-signal Action Verifier gates every proposal against severity policy and quality metrics before it can reach the graph. Layer 4 — Autonomous Executor: a background service enforces approved/timed-out actions, with a toggle between Autonomous mode (instant enforcement) and Human-in-the-Loop mode (operator review window). Layer 5 — Forensic Investigation (v2.0): when an incident crosses a severity threshold (≥0.65), Paladin spins up an isolated, network-disconnected Docker sandbox, mounts the evidence read-only, and lets the LLM plan and execute a real forensic investigation through a custom MCP server exposing 11 typed, read-only SIFT functions (parse_mft, extract_registry_hive, analyze_process_list, scan_network_connections, extract_loaded_modules, compute_hash, and more). Every finding is checked by a Self-Correction Loop for contradictions, cross-checked by a Correlation Engine for cross-source discrepancies (ghost processes, temporal paradoxes), and verified word-for-word against raw tool output by a Hallucination Tracker, which produces a final Accuracy Report. Layer 6 — Dashboard: a FastAPI + WebSocket SOC dashboard over TLS/JWT, with live incident feeds, a graph viewer, and forensic plan/finding/accuracy endpoints.

In our test run against a real Windows 7 disk and memory image, Paladin's forensic layer independently planned and executed a 5-step investigation that identified an actual Zeus/Zbot banking trojan (sdra64.exe), its registry persistence mechanism, an injected memory module, and an active C2 connection to an external IP — with 4 of 5 findings verified by exact match against raw tool output, and the one unverified finding correctly flagged rather than silently trusted.

How we built it

We chose the Custom MCP Server architectural pattern because it's the only one of the four where evidence safety is enforced by the absence of capability, not by the model choosing to behave. The LLM agent never gets a shell. It gets 11 strictly-typed Pydantic functions, and that's the entire attack surface.

Defense-in-depth across four boundaries:

ForensicActionVerifier — a programmatic SAFE / REQUIRES_APPROVAL / FORBIDDEN classifier with shell-injection (; && | `` $()) and path-traversal detection, default-deny. MCP Server API surface — no rm, dd, chmod, mkfs, no shell access, period. Only 11 read-only functions exist in the codebase. Sandbox filesystem — /evidence mounted :ro, container itself --read-only, tmpfs for scratch space. Sandbox network — network: none, --cap-drop ALL, --pids-limit 256. No DNS, no C2, no exfiltration path even if the model tried.

On top of that sandbox, the analytical pipeline:

SAP Core (SpaCy/Natasha) does morpho-semantic parsing of raw events and feeds a Neo4j graph via an async client with 30+ CRUD methods. Correlator runs six independent pattern detectors with additive scoring, feeding an Incident Manager that routes by severity (Tool Mode below 0.65, full Pipeline Mode at or above it, auto-escalation at 0.70 and 0.90). ForensicPlanManager prompts Qwen 3.5:9B to generate an investigation plan (TodoItems), executes each step through the MCP server, and runs a self-correction pass after every finding. Correlation Engine is deliberately LLM-free — ghost-process, temporal-paradox, registry-mismatch, and invisible-connection checks are deterministic graph queries, so they can't hallucinate. Hallucination Tracker re-reads the raw MCP tool output and checks every evidence_quote the LLM cited via exact match, case-insensitive match, or ≥70% semantic overlap, then produces a per-plan accuracy report. Everything is stored hot in Neo4j and archived cold (tool execution logs, accuracy metrics) in PostgreSQL, both behind internal mTLS via a local CA. The dashboard sits behind JWT + self-signed TLS.

Stack: Python, FastAPI, Ollama (Qwen 3.5:9B), Neo4j 5.18 + APOC, PostgreSQL 16, Docker, Volatility3, analyzeMFT, regripper, tshark, hindsight — all running locally, no cloud dependency.

Challenges we ran into

Local LLM forensic planning is slow and occasionally inconsistent. A 9B model generating a 5-step investigation plan and then reasoning over each tool's output took roughly 1.5–2 minutes per step end-to-end. We had to design the pipeline to be asynchronous and resumable rather than assuming fast turnaround, and to cap iterations so a confused model can't loop forever. Hallucination is real, even with typed tools. In our test run, 1 of 5 findings could not be exactly or semantically matched back to raw tool output. Rather than treat that as a failure to hide, we built the Hallucination Tracker to surface it explicitly as a 20% unverified rate — and made that number a first-class part of the system's own self-report. Self-correction vs. infinite loops. Letting the model re-check its own findings for contradictions is powerful, but without a hard iteration cap and a COMPLETED_WITH_GAPS terminal state, a model that's "not quite sure" can re-run forever. We had to add explicit version snapshots (HAS_VERSION) so every re-check is auditable rather than overwriting history. Severity routing tuning. Getting the six correlation detectors' weights to produce sensible escalation chains (brute force → account lockout → mass download, each at increasing severity) took several iterations — too aggressive and everything is CRITICAL; too lax and a real chain doesn't escalate. Reporting/aggregation bugs. Our own showcase script revealed mismatches between live pipeline logs and final summary counters (e.g., a finding count of 0 when 5 findings existed) — a useful reminder that the analytical core being correct doesn't mean the reporting layer is, and that judges (and analysts) will trust the dashboard, so it has to be exactly right.

Accomplishments that we're proud of

A fully local autonomous IR stack — no API keys, no cloud LLM calls, runs on a single GPU box. The forensic pipeline correctly identified a real Zeus/Zbot infection chain — dropper, registry persistence, process injection, and live C2 — from raw disk/memory artifacts, with confidence scores in the 85–95% range that actually correlated with verification outcomes. A Correlation Engine that catches the model's own contradictions without needing another LLM call — ghost processes and temporal paradoxes were flagged automatically across five findings. A Hallucination Tracker that doesn't grade on a curve — it reported a non-zero hallucination rate on its own output, which we think is exactly the kind of honesty Stage One judges are looking for. An architecture where the security boundary isn't a prompt instruction — it's the fact that destructive functions don't exist in the MCP server's code at all.

What we learned

Architectural guardrails beat prompt guardrails by a wide margin — and they're cheaper to verify. Reviewing 11 typed Pydantic functions for missing capabilities is a finite task; auditing a prompt for every way a model might be talked out of "please don't delete files" is not. An agent that checks its own work is more useful than an agent that's "more accurate." The single most valuable signal in our pipeline wasn't a finding — it was the contradiction the Correlation Engine raised between two findings, because that's the exact moment a human analyst should look closer. Local models change the constraints, not just the cost. A 9B model is good enough to plan a forensic investigation and write a defensible finding, but it's slow enough that the whole system has to be designed around asynchronous, queryable state (Neo4j + Postgres) rather than a single long-lived chat session. Severity scoring is a product decision, not just a math problem. The "right" thresholds depend on what an autonomous executor is allowed to do at each level — we had to design the action-severity mapping and the scoring weights together, not in sequence.

What's next for Paladin Autonomous Security System

Expand the MCP toolset beyond the current 11 functions — particularly parse_pcap and extract_browser_artifacts workflows for network-centric investigations, and additional Volatility3 plugins for deeper memory analysis. Multi-agent decomposition of the forensic layer (per the Multi-Agent Framework pattern): a memory-analysis agent, a disk-timeline agent, and a synthesis agent, so no single context window has to hold an entire case — and so agent-to-agent messages become an additional, structured audit trail. Benchmark against ground-truth case data to turn our current single-run accuracy report into a repeatable accuracy benchmark, tracking hallucination rate and confidence-accuracy correlation across many cases over time. Tune for SIFT Workstation deployment — package the forensic layer to run directly against SIFT's tool library on a downloadable SIFT OVA, so practitioners can point Paladin at their own case data with no additional setup. Close the reporting-layer gaps our own showcase surfaced — unify the accuracy/hallucination metrics shown across the pipeline log, dashboard, and accuracy report so the numbers an analyst sees are always the same numbers the system computed. Operator feedback loop — let a human analyst's accept/reject of a finding feed back into future severity scoring and self-correction decisions, closing the loop between Layer 3's autonomous actions and Layer 5's forensic verification.

Built With

Updates

Artikul T started this project — Jun 14, 2026 07:41 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.