Vigil — Autonomous SOC Pipeline

Brief Description (~400 words)

The Problem

Security Operations Centers face an unsustainable workload: analysts receive over 4,000 alerts daily, each requiring roughly 25 minutes of manual investigation, while more than 95% turn out to be false positives. The handoff between security investigation and operational remediation adds another 15–45 minutes of context-switching per incident. The result is analyst burnout, missed threats, and a mean time to resolution measured in hours — not minutes. Vigil eliminates this gap entirely.

What Vigil Does

Vigil is an autonomous SOC platform that deploys 11 specialized AI agents — Coordinator, Triage, Investigator, Threat Hunter, Sentinel, Commander, Executor, Verifier, Analyst, Reporter, and Chat — orchestrated in a hub-and-spoke topology over the A2A protocol. From the moment an alert fires to full resolution, Vigil detects in under 30 seconds, investigates in under 60 seconds, and remediates in under 3 minutes. Incidents that previously consumed hours of human effort are resolved autonomously, with a human-in-the-loop approval gate for high-impact actions.

Elastic Features Used

Vigil is built entirely on Elasticsearch and Agent Builder. Each agent is defined through Kibana Agent Builder with scoped tool access and dedicated system prompts. The platform uses 29 tools21 parameterized ES|QL tools for analytical reasoning (attack chain tracing, MITRE ATT&CK mapping, change correlation via LOOKUP JOIN that bridges deployment events from GitHub with error spikes to pinpoint the exact commit that caused an outage) and 8 Search tools spanning keyword, hybrid (BM25 + kNN with RRF), and pure vector retrieval across runbooks, threat intelligence, incident history, and asset inventories, all powered by 1024-dimensional embeddings through Elastic's inference endpoint. 7 Elastic Workflows handle actuation — blocking IPs through Cloudflare, suspending accounts via Okta, rolling back Kubernetes deployments, sending Slack notifications with interactive approval buttons, and creating Jira tickets — all with immutable audit logging.

What I Liked and Found Challenging

I loved how ES|QL parameterized queries made it possible to build safe, injection-proof analytical tools that agents could call with dynamic inputs from untrusted alert data. The composability felt natural.

Hybrid search with RRF was a highlight — combining lexical precision with semantic recall for runbook retrieval meant the Commander agent could find relevant remediation procedures even when the terminology didn't match exactly.

The biggest challenge was designing the reflection loop. When the Verifier detects that health metrics haven't recovered post-remediation, the Coordinator re-enters an investigating state for up to 3 cycles before escalating. Getting the state machine transitions, concurrency control with _seq_no, and deadline budgets right across all 11 agents required careful orchestration — but it's what makes Vigil truly autonomous rather than just automated.

An unexpected highlight was the self-improving triage system. After every resolved incident, the Analyst agent computes F1 scores and confusion matrices comparing triage predictions against actual outcomes, then auto-calibrates the priority scoring weights. Over time, Vigil literally gets smarter — fewer false positives slip through, and genuine threats are prioritized faster. Building the feedback loop between resolution outcomes and triage accuracy felt like closing the final gap between "automated" and "autonomous."


Project Story

Inspiration

Every 19 seconds, a SOC analyst clicks into another alert they already know is probably nothing. The numbers behind modern Security Operations Centers are brutal: 4,484 alerts per day, each requiring ~25 minutes of investigation. With a false positive rate exceeding 95%, the vast majority of that effort is wasted. Analyst burnout exceeds 65% annual turnover. And when fatigue causes the team to miss one of the 5% of alerts that actually matter, the cost is $5,600 per minute of downtime — or worse, a breach that drifts unnoticed for weeks.

But the deeper problem isn't volume — it's fragmentation. Detection lives in the SIEM. Investigation happens in analyst notebooks. Remediation is a ticket tossed to a different team. Verification is "did anyone complain again?" There is no closed loop. Context is lost at every handoff. And nobody learns from the outcome.

We asked: what if the entire SOC workflow — triage, investigate, plan, fix, verify, learn — could be orchestrated by a system of specialized AI agents, all backed by the same Elasticsearch data analysts already rely on? Not a chatbot. Not a copilot. A fully autonomous pipeline that resolves incidents end-to-end, with human oversight where it counts and self-correction when things go wrong.

That's Vigil.


What it does

Vigil is an autonomous Security Operations Center built on Elasticsearch Agent Builder. It deploys 11 specialized AI agents in a hub-and-spoke topology, coordinated through an A2A (Agent-to-Agent) protocol with typed contracts and self-validation at every boundary.

When a security alert or operational anomaly arrives, Vigil's pipeline engages:

1. Triage — The Triage agent enriches the alert with correlated events, asset criticality, and historical false positive rates via three ES|QL tools, then computes a composite priority score:

$$P = (S_{\text{threat}} \times 0.3) + (A_{\text{crit}} \times 0.3) + (\sigma(r) \times 0.25) + ((1 - f) \times 0.15)$$

where $\sigma(r) = \frac{1}{1 + e^{-0.07(r - 40)}}$ is a sigmoid normalization of the corroboration signal and $f$ is the historical false positive rate. Alerts scoring below 0.4 are auto-suppressed; above 0.7 triggers immediate investigation.

2. Investigation — The Investigator traces multi-hop attack chains via ES|QL, maps to MITRE ATT&CK techniques through hybrid vector search (BM25 + kNN with Reciprocal Rank Fusion), calculates blast radius, and — for operational incidents — uses ES|QL LOOKUP JOIN to correlate deployment events from GitHub with error spikes, identifying the exact commit, author, and PR that caused the outage.

3. Threat Hunting — When the Investigator finds indicators of compromise, the Threat Hunter sweeps the entire environment for lateral movement, behavioral anomalies, and additional IoC matches.

4. Remediation Planning — The Commander agent searches a runbook knowledge base using dense vector search (1024-dim embeddings, int8_hnsw quantization) and builds an ordered action plan with explicit success criteria and approval requirements.

5. Approval Gates — Critical actions (IP blocking, credential rotation, production rollback) trigger interactive Slack approval buttons. The Executor polls for human decisions with configurable timeouts and auto-escalation to PagerDuty.

6. Execution — Actions are dispatched through 7 Elastic Workflows to real integration targets: Cloudflare WAF (IP blocking), Okta (user suspension), Kubernetes (rollback, scale, restart), Jira (ticketing), Slack (notification), and PagerDuty (paging). Every action is logged to an immutable audit trail.

7. Verification — The Verifier waits for metric stabilization, then runs a dual-comparison health check via ES|QL: each success criterion must pass BOTH the Commander's threshold AND a statistical baseline verdict (current value within 2 standard deviations of the 7-day rolling mean).

8. Self-Correction (Reflection Loop) — If verification fails, Vigil enters a reflection loop: re-investigate with the failure analysis as new context, re-plan a different approach, re-execute, re-verify — up to 3 iterations before escalating to a human. The system doesn't just try once and give up. It adapts.

The entire pipeline is governed by a 12-state incident state machine with guard conditions on every transition. Post-resolution, the Analyst agent auto-calibrates triage weights against actual outcomes, generates new runbooks from successful ad-hoc remediations (with vector-based deduplication), and tunes per-service anomaly thresholds. The Reporter agent generates compliance evidence mapped to SOC 2, ISO 27001, and GDPR Article 33 controls.

Result: sub-3-minute mean time to resolution, versus the industry average of 45–90 minutes.


How we built it

Vigil's architecture rests on four Elastic pillars:

Agent Builder — 11 agents with ReAct reasoning, each with a dedicated system prompt, scoped tool access, and least-privilege API keys. The Coordinator acts as the hub, delegating to spoke agents via A2A messages with typed request/response contracts. Local deterministic handlers ensure contract compliance by default, with an opt-in path to Agent Builder's LLM-powered A2A when reasoning flexibility is needed.

ES|QL — 21 parameterized ES|QL tools covering alert enrichment, attack chain tracing, change correlation (using LOOKUP JOIN to bridge github-events-* with vigil-metrics-*), health monitoring, blast radius assessment, triage calibration, and 5 compliance report generators. Tools are defined as JSON descriptors and executed through a shared engine that handles parameter expansion, injection prevention, and column-index mapping (we never assume ES|QL column order).

Search (Dense Vector) — 8 search tools using BM25, kNN, and hybrid retrieval strategies across runbooks, threat intelligence, incident similarity, MITRE ATT&CK mappings, and baselines. A dedicated vigil-embedding-model inference endpoint powers 4 ingest pipelines that embed runbook content, threat intel descriptions, investigation summaries, and root cause analyses into 1024-dimensional vectors with int8_hnsw quantization.

Elastic Workflows — 7 YAML-defined automation pipelines for containment, remediation, notification, ticketing, human approval, reporting, and report delivery. Each workflow routes to real external APIs (Cloudflare, Okta, Kubernetes, Slack, Jira, PagerDuty, GitHub) with circuit breakers and graceful degradation when integrations are unavailable.

The data layer uses 4 data streams (alerts, actions, metrics, GitHub events) with ILM policies and 10 standard indices with full mapping templates. Alert claiming uses a separate regular index because data streams are append-only — a constraint that shaped several architectural decisions.

Engineering patterns include: deadline racing (Promise.race with finally { clearTimeout }) so agents always return partial results rather than hanging, optimistic concurrency control with retry loops for state transitions, Promise.allSettled for parallel tool execution, progressive time-window expansion for ES|QL queries (1h → 6h → 24h), and self-validation before every A2A response.

The frontend is a Next.js 16 + React 19 dashboard with keyboard-first UX (Cmd+K command palette, j/k navigation), real-time incident tracking, agent trace visualization via Cytoscape.js, and a demo mode that runs entirely on mock data.


Challenges we ran into

ES|QL LOOKUP JOIN timing. Our change correlation query — joining deployment events with error logs to answer "what deployed right before this service broke?" — was one of the most powerful tools but also the most fragile. Getting the time-gap arithmetic right (DATE_DIFF in seconds, filtered to positive values under a configurable max_gap_seconds) required multiple iterations and a two-query fallback path for environments where LOOKUP JOIN isn't yet available.

Data stream constraints. We initially tried client.update() on vigil-alerts-default and silently lost data. Data streams only accept op_type: 'create'. We had to architect a separate alert-claims index and use targeted write patterns — a design that is correct but non-obvious, and one that cost us a debugging session before we understood the root cause.

Verifier dual comparison. Early versions checked only whether post-remediation metrics met the Commander's thresholds. This passed too easily when metrics happened to be low rather than genuinely healthy. Adding baseline verdicts (current value within 2σ of the 7-day mean) via the health comparison ES|QL tool caught false positives in our own verification — ironic for a system designed to catch false positives in alerts.


Accomplishments that we're proud of

The reflection loop genuinely works. When verification fails, the pipeline re-investigates with the Verifier's failure analysis as new context. In our self-healing demo scenario, the first remediation (pod restart) fails because error rates remain elevated. The system re-investigates, discovers through LOOKUP JOIN that a connection leak was introduced two deploys ago, plans a completely different fix (pool resize + hotfix deployment), and verification passes on the second attempt. This isn't retry logic — it's cognitive self-correction.

29 production-quality Elastic tools. Each ES|QL query is parameterized, tested, and handles missing data gracefully. The column-index pattern (buildColIndex) ensures we never break when ES|QL column order changes — a small abstraction that prevented an entire class of bugs.

Real integrations, not mocks-only. Vigil detects credential availability at runtime and routes to real Cloudflare WAF rules, real Okta user suspensions, real Kubernetes rollbacks — or gracefully falls back to mock mode. The approval flow chains through Slack interactive buttons with genuine human-in-the-loop decisions.

A self-improving knowledge base. When the Commander builds a novel remediation (no matching runbook) and the Verifier confirms first-attempt success, the Analyst auto-generates a new runbook with vector embeddings and deduplication. The next similar incident gets matched via hybrid search. The system literally gets smarter with every resolved incident.

53 test files covering unit, integration, agent behavior, and pipeline orchestration — including deadline injection for deterministic timing tests and in-memory Elasticsearch mocks that simulate _seq_no versioning and bulk operations.


What we learned

Contracts are non-negotiable in multi-agent systems. Without typed request/response contracts and self-validation at every agent boundary, pipelines produce subtle data corruption that compounds through each stage. Our ContractValidationError with field-level error arrays caught dozens of integration bugs during development that would have been invisible in production.

ES|QL is the right abstraction for agent tools. Parameterized queries in JSON descriptors, executed through a shared engine, give you the composability of an API with the analytical power of a query language. The shift from "agents call APIs" to "agents run queries" was the single biggest insight that shaped Vigil's architecture.

Guard conditions make state machines trustworthy. A state machine without guards is just a graph. With guards — suppression thresholds, approval requirements, reflection limits, verifier verdicts — it becomes a system you can reason about, audit, and explain to a compliance officer. Every transition in Vigil has a documented reason.

Graceful degradation is the line between demo and production. Promise.allSettled for parallel tool execution, sensible defaults when enrichment fails, and fire-and-forget for non-critical side effects mean the pipeline never halts on a single tool failure. In a system with 29 tools and 7 external integrations, something will be unavailable — and the agents need to keep working.


What's next for Vigil

Adaptive triage weights. The Analyst agent already calibrates scoring accuracy against actual outcomes. The next step is closing the loop so calibration results automatically update the priority formula coefficients — making triage accuracy a continuously improving metric.

MITRE ATT&CK coverage mapping. Correlating Vigil's detection and response history against the full ATT&CK matrix to identify and prioritize coverage gaps — turning incident data into a strategic security roadmap.

Interactive reflection. The Chat agent already supports natural-language queries via Kibana. Extending this to allow analysts to guide the reflection loop mid-flight — "try rolling back to the previous deployment instead" — would blend autonomous capability with human domain expertise.

Federated runbook exchange. Multiple Vigil instances sharing anonymized remediation patterns across organizations, so every participant's knowledge base grows from the collective incident history — making the entire ecosystem more resilient.

SOC-as-a-Service. With the Elastic Agent Builder foundation, Vigil is designed to be packaged as a managed service: organizations connect their Elasticsearch cluster and get autonomous incident response without deploying agent infrastructure.

Share this project:

Updates