Inspiration
Every cyber incident starts a regulatory clock. 24 hours for NIS2's early warning, 72 hours for GDPR Article 33, and insurers typically expect notice within 48. Yet the evidence package needed to file a claim is still assembled by hand, over days or weeks. At the same time, W3C PROV offers a powerful idea: capture why a system produced a result, not just the result-turning a claim into something verifiable. SOAR playbooks already execute a structured response. What if that execution became the provenance record, and that record became the backbone of a defensible insurance claim?
What it does
CyberProof is a zero-touch pipeline that turns a completed Splunk SOAR playbook run into a court-grade cyber insurance evidence package, automatically.
When a playbook's fires:
- Provenance capture : builds a W3C PROV-JSON graph (Activities, Agents,
wasInformedBycausal chain) from the SOAR REST API, SHA-256 hashed and rendered to SVG. - Forensic enrichment : extracts the actual SPL queries the playbook ran from SOAR's logs and re-runs them via the Splunk MCP Server against BOTS v3 for real attacker timelines.
- Legal evidence generation : SaulLM-7B generates a multi-section insurance package: incident summary, causal-chain proof, regulatory deadlines, financial impact, coverage clauses, chain of custody, and forensic evidence.
- Dashboard delivery : posted to Splunk via HEC and visualized in Dashboard Studio: NIS2/GDPR/insurance countdowns, total claim, attack timeline, and links to every artifact.
From "playbook finished" to "claim amount + regulatory status + signed evidence document" — in about a minute, no human in the loop.
How it ahs been built
- SOAR adapter for W3C PROV, following yProv4WFs's existing plugin pattern:
container → Activity(level0),playbook_run → Activity(level1),action_run → Activity(level2),app/asset → Agent,cb_fn → wasInformedBy. Validated against a real SOAR 8.5.0 instance and a custom playbook on the BOTS v3 "Operation Frothly" scenario. - Dynamic MCP enrichment: parses the
For Parameter: {...} Message:JSON in eachapp_runto recover the exact query the playbook ran, then dispatches it to the Splunk MCP Server. Fully playbook-agnostic. - Evidence generation : iterated prompts for consistent sections, deduplicated timelines, accurate deadline math, and currency-separated financials, with rates/metadata in a single config file.
- Dashboard: built in Dashboard Studio with
spath-based extraction, color-coded deadline table, financial breakdown, and a live attack-timeline table fromindex=botsv3. - Auto-trigger: a Flask listener receives
on_finish()'s POST and runs the pipeline in the background- zero manual steps.
What I learned
- Provenance is a trust layer. Separating deterministic provenance capture from LLM narrative means the graph is the source of truth, and the LLM just summarizes it.
- Chain-of-custody needs proof, not assertion. A SHA-256 hash turned "unaltered" from a claim into something verifiable, cheaply.
What's next for CyberProof
- Splunk AI Assistant (SAIA) integration for natural-language → SPL generation.
- Branching/parallel provenance for multi-path investigations.
- More playbook types : ransomware, BEC, data exfiltration.
- Local SaulLM deployment for air-gapped/data-sensitive environments.
- MITRE ATT&CK annotation of the provenance graph.
- Blockchain-anchored provenance hashes for independent, third-party-verifiable chain of custody between insurer and insured.
Built With
- botsv3
- hec
- huggingface
- mcp
- prov
- python
- saullm-7b
- soar
- splunk-mcp
- w3c
- yprov4wfs
Log in or sign up for Devpost to join the conversation.