💡 Inspiration

Every security team we spoke to had the exact same problem: too many alerts, not enough context.

The industry standard is to surface a CVSS score and call it a day. A 9.8 severity notification lands in a queue, and someone—eventually—decides whether it actually matters. That decision can take hours. Sometimes days.

Meanwhile, the CISA Known Exploited Vulnerabilities list grows every week with CVEs that real adversaries are actively weaponizing right now, against organizations exactly like yours.

We were inspired by a simple but powerful question: What if Splunk could not only detect the threat, but understand it, prioritize it, and act on it—autonomously?

That question became VulnSentinel.

🚀 What it does

VulnSentinel is an autonomous vulnerability management agent built natively inside Splunk Enterprise. It transforms the entire vulnerability lifecycle—from threat discovery to containment to executive reporting—into a closed-loop agentic workflow that runs without human intervention.

The pipeline runs in five stages:

  1. Discover: Continuously ingests the CISA KEV live threat feed, deduplicating CVEs via a local SQLite state database so each vulnerability is processed exactly once.
  2. Reason: Routes raw CVE descriptions to Foundation-Sec-1.1-8B-Instruct (via Splunk Hosted Models). Leveraging the model's specialized CTI-VSP benchmark training, it predicts CVSS v3 scores directly from threat text and maps the MITRE ATT&CK tactic—before the NVD even publishes an official score.
  3. Score: A custom Splunk SPL pipeline cross-references an internal asset lookup table and calculates a Contextual Business Risk Score (0–100) using four deterministic pillars: base severity, internet exposure, asset criticality, and active exploitation status.
  4. Act: When risk exceeds the critical threshold (>85), the Splunk MCP Server agent autonomously executes a network containment webhook, generates an ITSM ticket, and writes an immutable audit log back into Splunk.
  5. Measure: The VulnSentinel Command Dashboard calculates and displays the Enterprise Risk Reduction Yield %—the exact percentage of business risk eliminated by the autonomous workflow.

The result: The time between vulnerability disclosure and containment drops from hours to seconds.

🛠️ How we built it

VulnSentinel is structured as four integrated layers, all operating natively within the Splunk ecosystem.

Layer 1: Threat Intelligence Ingestion

A stateful Python worker (cve_ingestor.py) continuously polls the CISA KEV JSON feed. A local SQLite state database handles deduplication, ensuring historical CVEs are never reprocessed across system restarts.

Layer 2: AI Reasoning (Splunk Hosted Models)

Raw CVE descriptions are forwarded to Foundation-Sec-1.1-8B-Instruct. We specifically targeted its CTI-VSP (Vulnerability Score Prediction) training to predict CVSS v3 base scores directly from unstructured threat text, completely eliminating dependency on rate-limited external NVD APIs. The model simultaneously classifies each CVE against the MITRE ATT&CK framework and generates structured remediation context.

Layer 3: Deterministic Risk Engine

Enriched AI events are posted to Splunk via the HTTP Event Collector (HEC). A custom SPL pipeline evaluates four telemetry pillars to produce a highly auditable risk score capped at 100 points:

$$\text{Risk Score} = (\text{CVSS} \times 5) + \text{Exposure} + \text{Criticality} + \text{Exploitation}$$

Pillar Max Points Logic
Base Severity 50 pts CVSS × 5
Internet Exposure 20 pts +20 if is_internet_facing = true
Business Criticality 20 pts +20 if High / +10 if Medium
Active Exploitation 10 pts +10 if actively on the CISA KEV list

Layer 4: Agentic Remediation (Splunk MCP)

When risk breaches the threshold, mcp_agent.py executes. It fires a network containment webhook (simulated via Discord), creates a structured ITSM ticket in a local JSON store, and posts an immutable audit event back into Splunk to close the loop.

Layer 5: Executive Measurement

The Command Dashboard calculates the Enterprise Risk Reduction Yield using before-and-after risk states.

$$\text{Mitigated Risk} = (\text{CVSS} \times 5) \times 0.4$$

$$\text{Risk Reduction \%} = \frac{\text{Initial Risk} - \text{Mitigated Risk}}{\text{Initial Risk}} \times 100$$

Note: The 0.4 suppression factor reflects the fact that while containment eliminates network exposure, the vulnerability still persists on disk until a human applies the patch—ensuring an honest representation of residual risk.

⚡ Challenges we ran into

  • Structured output from Foundation-Sec: Getting Foundation-Sec-1.1-8B-Instruct to return consistently parseable JSON (CVSS scores and MITRE classifications) from free-form CVE descriptions required rigorous prompt engineering. We built a validation layer to handle malformed responses without stalling the ingestion pipeline.
  • Deduplication at scale: The CISA KEV feed is cumulative. On a cold start, the ingestor would attempt to process thousands of existing entries. Building a SQLite-backed deduplication layer that persists across restarts and efficiently identifies delta entries required careful state management.
  • Deterministic scoring vs. AI scoring: We made a deliberate architectural decision to use AI for threat classification but deterministic SPL math for risk scoring. Pure LLM risk scoring produces non-reproducible results that compliance teams cannot audit. By using SPL, every point in our model traces back to a specific, documented data source.
  • Closed-loop audit architecture: Writing remediation outcomes back into Splunk as structured, queryable events—rather than flat text logs—required careful HEC schema design so the same index could power both real-time alerting and retrospective dashboard metrics.

🏆 Accomplishments that we're proud of

  • Genuine AI integration, not AI decoration: Foundation-Sec is doing real cognitive work—predicting severity scores from raw threat text before the NVD has even processed them. This is not a chatbot wrapper; the AI output directly drives a deterministic downstream pipeline.
  • A defensible risk methodology: We designed the four-pillar scoring model to be defensible to a CISO, auditable to a compliance officer, and transparent to an engineer. No black boxes.
  • A complete Agentic loop inside Splunk: Observe → Reason → Act → Measure. Most projects demonstrate one or two of these. VulnSentinel demonstrates all four, end-to-end, with data flowing continuously.
  • The Enterprise Risk Reduction Yield metric: This is the number that moves executives. Not "we processed 47 CVEs"—but "we eliminated 80.2% of your business risk exposure in this cycle." That framing separates a security tool from a security program.

📚 What we learned

  • Business context is the missing layer in vulnerability management. The same CVE can be a P1 incident or a low-priority backlog ticket depending on three things: internet exposure, criticality, and whether adversaries are actively exploiting it. Building that context layer inside Splunk—where the asset inventory already lives—is immensely powerful.
  • Foundation-Sec's CTI-VSP capability is genuinely underutilized. The ability to predict vulnerability severity from unstructured descriptions gives organizations a detection speed advantage that matters when the exploitation window is measured in hours.
  • Agentic systems need honest residual risk accounting. Early versions of our model dropped the risk score to zero after containment. That is misleading. The 60% suppression factor we implemented reflects the actual security posture: contained, but not remediated. Honesty in metrics builds trust.
  • Splunk is a remarkably capable agentic platform. SPL is not just a query language. Combined with HEC, lookup tables, alert actions, and Hosted Models, it becomes the backbone of a highly complex autonomous workflow.

🔮 What's next for VulnSentinel

  • Native Splunk SOAR integration: Replace the webhook simulation with production-grade SOAR playbooks for real network isolation, firewall rule injection, and EDR quarantine actions.
  • Multi-asset campaign correlation: When one CVE affects multiple assets, VulnSentinel should trigger a coordinated campaign-level response rather than N individual containment actions.
  • Predictive risk forecasting: Use historical KEV ingestion data to model which asset classes are trending toward higher exposure, enabling proactive hardening before exploitation occurs.
  • Executive PDF reporting: Auto-generate a weekly risk reduction report from the Command Dashboard for board-level distribution, complete with trend lines, remediation velocity, and exposure heatmaps.

Built With

Share this project:

Updates