Inspiration

Every cyber incident starts a regulatory clock. 24 hours for NIS2's early warning, 72 hours for GDPR Article 33, and insurers typically expect notice within 48. Yet the evidence package needed to file a claim is still assembled by hand, over days or weeks. At the same time, W3C PROV offers a powerful idea: capture why a system produced a result, not just the result-turning a claim into something verifiable. SOAR playbooks already execute a structured response. What if that execution became the provenance record, and that record became the backbone of a defensible insurance claim?

What it does

CyberProof is a zero-touch pipeline that turns a completed Splunk SOAR playbook run into a court-grade cyber insurance evidence package, automatically.

When a playbook's fires:

  1. Provenance capture : builds a W3C PROV-JSON graph (Activities, Agents, wasInformedBy causal chain) from the SOAR REST API, SHA-256 hashed and rendered to SVG.
  2. Forensic enrichment : extracts the actual SPL queries the playbook ran from SOAR's logs and re-runs them via the Splunk MCP Server against BOTS v3 for real attacker timelines.
  3. Legal evidence generation : SaulLM-7B generates a multi-section insurance package: incident summary, causal-chain proof, regulatory deadlines, financial impact, coverage clauses, chain of custody, and forensic evidence.
  4. Dashboard delivery : posted to Splunk via HEC and visualized in Dashboard Studio: NIS2/GDPR/insurance countdowns, total claim, attack timeline, and links to every artifact.

From "playbook finished" to "claim amount + regulatory status + signed evidence document" — in about a minute, no human in the loop.

How it ahs been built

  • SOAR adapter for W3C PROV, following yProv4WFs's existing plugin pattern: container → Activity(level0), playbook_run → Activity(level1), action_run → Activity(level2), app/asset → Agent, cb_fn → wasInformedBy. Validated against a real SOAR 8.5.0 instance and a custom playbook on the BOTS v3 "Operation Frothly" scenario.
  • Dynamic MCP enrichment: parses the For Parameter: {...} Message: JSON in each app_run to recover the exact query the playbook ran, then dispatches it to the Splunk MCP Server. Fully playbook-agnostic.
  • Evidence generation : iterated prompts for consistent sections, deduplicated timelines, accurate deadline math, and currency-separated financials, with rates/metadata in a single config file.
  • Dashboard: built in Dashboard Studio with spath-based extraction, color-coded deadline table, financial breakdown, and a live attack-timeline table from index=botsv3.
  • Auto-trigger: a Flask listener receives on_finish()'s POST and runs the pipeline in the background- zero manual steps.

What I learned

  • Provenance is a trust layer. Separating deterministic provenance capture from LLM narrative means the graph is the source of truth, and the LLM just summarizes it.
  • Chain-of-custody needs proof, not assertion. A SHA-256 hash turned "unaltered" from a claim into something verifiable, cheaply.

What's next for CyberProof

  • Splunk AI Assistant (SAIA) integration for natural-language → SPL generation.
  • Branching/parallel provenance for multi-path investigations.
  • More playbook types : ransomware, BEC, data exfiltration.
  • Local SaulLM deployment for air-gapped/data-sensitive environments.
  • MITRE ATT&CK annotation of the provenance graph.
  • Blockchain-anchored provenance hashes for independent, third-party-verifiable chain of custody between insurer and insured.

Built With

  • botsv3
  • hec
  • huggingface
  • mcp
  • prov
  • python
  • saullm-7b
  • soar
  • splunk-mcp
  • w3c
  • yprov4wfs
Share this project:

Updates