OrcaStrike

Inspiration

Enterprise teams are shipping multi-agent workflows into production, but when one of those workflows does something it shouldn't — approves a suspicious payment, calls a tool with attacker-controlled arguments, bypasses an approval gate — the forensic story isn't there.

Logs and traces show what each agent did. They don't explain why the system became unsafe, which handoff promoted untrusted data, or which specific control would have broken the chain earliest.

Frameworks like IBM watsonx Orchestrate and the ADK cover orchestration, evaluation, and tracing. What's missing is the layer that sits after a run and reconstructs the trust failure. OrcaStrike is that layer.

What it does

OrcaStrike is a post-execution forensic debugger for agentic workflows. It runs a five-agent LangGraph finance-approval workflow against a poisoned invoice and produces a structured forensic report that answers:

Which trust boundary failed? The exact edge where unvalidated data was promoted (for example, document_analyst → planner without schema validation).
How did taint propagate? A field-level lineage chain: doc_text → summary → payment_instruction → vendor_id → erp_payment.args.vendor.
What was the blast radius? A numeric score plus the count of agents influenced, tools reached, privileged actions enabled, and whether an irreversible side effect was reached.
What is the earliest fix? A counterfactual control from a small registry (summary_validator, handoff_schema, approval_gate, taint_stripping), inserted at the cheapest boundary, with a guarded rerun proving the attack is blocked at the predicted step.

Demo Flow

Load the workflow graph.
Run Baseline — the attack succeeds, and ERP payment executes with a tainted vendor.
Render the forensic report with propagation path, blast-radius score, and recommended fix.
Run Guarded — the same attack is blocked at the exact boundary OrcaStrike predicted.

A separate red-team coverage matrix view renders a probes × controls grid from pre-generated Promptfoo artifacts, showing which guardrail catches which injection variant.

How we built it

Backend

FastAPI + LangGraph
A single StateGraph holds all five agents:
- intake
- document_analyst
- planner
- policy_check
- payment

Routing goes through the policy node conditionally based on an is_guarded flag on shared state.

Tools used

doc_search
vendor_lookup
erp_payment
notifier

Trust Metadata Layer

Every agent call, handoff, and tool invocation emits a TraceEvent carrying:

trust_labels
tainted_fields
lineage
changed_fields

Trust classes are a small closed enum:

external
agent_derived
agent_tainted
tool_derived
policy_validated
human_approved

Field-level lineage is tracked for the four fields on the exploit path; handoff-level trust is tracked everywhere else.

Forensic Pipeline

The forensic layer is a pipeline of services:

trace_collector writes events to Mongo
boundary_checker evaluates five hardcoded rules against the trace:
- unverified handoff
- taint promotion
- approval bypass
- privilege escalation
- doc text in tool args
propagation_engine reconstructs the chain
blast_radius_scorer produces the numeric score
counterfactual_engine picks the earliest-breaking control
finding_generator assembles the final Finding object

policy_engine consumes YAML policy configs and enforces them on guarded reruns, wired into both the planner and payment nodes.

Security Integrations

llm-guard: PromptInjection scanner runs inside policy_agent on the summary before it can be promoted to policy_validated
presidio-analyzer + presidio-anonymizer: power a redactor service for PII on tainted fields
pdfplumber: backs a scenario upload endpoint so invoices can be ingested as real PDFs

LLM

Groq llama-3.3-70b-versatile via AsyncGroq
Ollama fallback exposed through the OpenAI-compatible client

The client is a single module that switches on LLM_PROVIDER.

Frontend

Next.js 15 App Router
React 19
Tailwind v4
@xyflow/react v12 for the trust graph
framer-motion for transitions
lucide-react for icons

Components are split into:

graph/
findings/
run/
redteam/
dashboard/
shared/

All fetches go through lib/api.ts.

Pydantic schemas are mirrored in lib/types.ts
lib/mock-data.ts backs a NEXT_PUBLIC_USE_MOCK=true mode so the UI can demo without the backend running

Persistence

MongoDB via Motor (async)
Three collections:
- runs
- events
- findings

There are no migrations; Pydantic enforces shape on write.

Red-Team Matrix

Promptfoo was used offline to:

harvest injection probes into evals/redteam/probes.yaml
record fixture results into fixtures.json

The red-team route serves the coverage matrix to the frontend.

Challenges we ran into

Field-level lineage through LangGraph's shared state
Updates in LangGraph are partial dict merges, which made it easy to lose track of which upstream field produced which downstream field. We solved this with instrumentation at graph edges rather than rewriting agent implementations, and by scoping field-level lineage to the four critical fields on the exploit path.
Determinism vs. live LLM calls
The demo needed to be reproducible, but we still wanted the LLM in the loop for believability. We added a cached_responses fixture layer and a baseline run fixture so the poisoned-invoice scenario replays identically.
Parallel-team contract drift
Four people worked in parallel across backend runtime, forensic logic, frontend graph/replay, and integration. We kept Pydantic schemas (TraceEvent, Finding, BlastRadius, PropagationStep) and their TypeScript mirrors in lib/types.ts as the hard contract.
React Flow v12 typing
v12 changed the custom-node generic signature, which silently broke early graph work. We had to rework node component types mid-build.
Environment quirk
A Python 3.11 virtual environment created inside an iCloud-synced directory made imports hang for minutes. We documented a symlink workaround and fell back to a local ~/.cache/orcastrike-venv.

Accomplishments we're proud of

A real, end-to-end forensic report — not a generic scanner.
It identifies the entry point, propagation path, blast-radius score, failure taxonomy, and an earliest-break counterfactual.
A working counterfactual-proof loop:
compromised baseline → forensic diagnosis → earliest fix → guarded rerun that blocks at the predicted boundary
A probes × controls coverage matrix that turns “we think this guardrail helps” into a concrete grid showing which control catches which variant.
A UI that runs fully in mock mode, making the demo resilient to backend flakiness.
Integrated llm-guard, Presidio, and PDF ingestion without rewriting the agents — the indirection through the scenario runner and policy engine paid off.

What we learned

Agentic failures are rarely single-point failures; they are trust-degradation chains where each handoff nudges tainted data one step closer to an irreversible action.
The useful forensic output is not “agent X failed” — it is “control Y at boundary Z breaks the chain earliest at the lowest cost.”
LangGraph's shared state is convenient for developers and dangerous for security because fields flow implicitly unless you add explicit edge-level checks.
Determinism is a feature. A small fixture layer around the LLM was worth more than any clever prompt engineering.

What's next

Generalize beyond the finance workflow by inferring trust boundaries from arbitrary LangGraph topologies
Add a side-by-side baseline/guarded diff overlay on the trust graph
Export forensic reports in a format suitable for compliance and incident review
Expand the red-team matrix with more probe families and automatic regression tracking across runs
Build a richer policy authoring flow so teams can codify their own controls against the same rule registry