Inspiration

Anthropic's Frontier Red Team published "Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator", an analysis of 832 real accounts weaponizing AI across all 14 MITRE ATT&CK tactics. It gives you the taxonomy of what AI-enabled attacks look like. It doesn't tell you whether your deployment is actually vulnerable to any of them.

That gap is Riposte: turn a fixed threat taxonomy into a runnable, evidence-based verification suite you can point at your own agent.

What it does

Point Riposte at a target endpoint, a source repo, and a few lines of canary data (a private corpus + a benign baseline), and it runs a closed loop:

  1. Plan - selects MITRE ATT&CK techniques and generates adversarial fuzz seeds.
  2. Verify - drives a real headless browser (Browserbase + Stagehand) against the live target and runs each technique's scenario, capturing the DOM before/after and the network log as forensic evidence, not just the chat transcript.
  3. Evaluate - scores every response with ARiES, a calibrated composite metric: 0.35·M + 0.35·L + 0.20·A + 0.10·J, anomaly (PCA + Mahalanobis distance against a benign baseline), leakage (cosine + entity + token overlap against the private corpus), control failure (evidence-based, not text-based), and an ensemble LLM judge.
  4. Repair - on a critical finding (ARiES ≥ 75 or a confirmed control failure), drafts a defensive patch and opens a human-reviewed pull request. Nothing merges without a human.

Global ARiES is the maximum score across every attack in the run, not the average — one critical failure shouldn't get to hide behind ninety-nine successful defenses.

How we built it

  • Backend: Python/FastAPI, strictly layered (Routers → Services → Repositories), an asynchronous producer–consumer pipeline with no global singletons, four phases (plan/verify/evaluate/repair) wired through asyncio.Queues.
  • Frontend: Next.js + React, ports-and-adapters architecture, polling a typed AuditService interface so the transport can be swapped without touching components.
  • Browserbase + Stagehand drive the live verification scenarios.
  • Redis Stack (RediSearch) runs HNSW vector search so leakage detection against the private corpus is O(log N) instead of a brute-force scan.
  • MiniMax powers the ensemble judge and drafts the remediation patch.
  • GitHub API opens the actual HITL pull request.
  • Sentry instruments the pipeline, prompts and PII are never logged.

Challenges we ran into

  • The fuzzer is black-box by necessity. We don't have gradient access to the target, so instead of backpropagating to find adversarial tokens, we run simulated annealing: swap one token in the suffix, score the response against a cross-entropy loss over two fixed prototypes (compliant-leak vs. refusal), and accept worse mutations with Metropolis probability so the search doesn't get stuck in the first local trap it finds.
  • A scrolling bug that took real debugging to find. Our dashboard panels kept growing instead of scrolling as findings accumulated. The actual root cause was two layers deep: a shared GlassPanel component's inner wrapper was a plain <div> with no flex context, silently breaking flex-1 / overflow-y-auto for every panel that used it, not just the one we first noticed.
  • Calibrating ARiES itself. Early on, pure in-subspace Mahalanobis distance scored leaked secrets identically to benign text, the anomaly signal lived in the residual subspace, not the principal components. Fixed by combining T² with the reconstruction residual (SPE) and max-pooling over sentences so a single leaked sentence buried in an otherwise-normal response still gets caught.

What's next

Expanding the registered MITRE ATT&CK technique library, adding an SSE/live transport behind the same AuditService port, and persistent regression storage so repeat audits can flag re-introduced vulnerabilities.

Built With

Share this project:

Updates