Riposte - Break the model. Prove it. Patch it.

Riposte: autonomous red-teaming for AI agents, fuzz, verify against MITRE ATT&CK, score with ARiES, and ship a human-reviewed fix.
The math behind Riposte, explained: black-box simulated annealing, ARiES's four-signal blend, plain English first, formal math on click.
Live audit dashboard: 10 MITRE ATT&CK scenarios, real Browserbase sessions, Global ARiES scoring, and an adversarial fuzzer, all in one run.

Inspiration

Anthropic's Frontier Red Team published "Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator", an analysis of 832 real accounts weaponizing AI across all 14 MITRE ATT&CK tactics. It gives you the taxonomy of what AI-enabled attacks look like. It doesn't tell you whether your deployment is actually vulnerable to any of them.

That gap is Riposte: turn a fixed threat taxonomy into a runnable, evidence-based verification suite you can point at your own agent.

What it does

Point Riposte at a target endpoint, a source repo, and a few lines of canary data (a private corpus + a benign baseline), and it runs a closed loop:

Plan - selects MITRE ATT&CK techniques and generates adversarial fuzz seeds.
Verify - drives a real headless browser (Browserbase + Stagehand) against the live target and runs each technique's scenario, capturing the DOM before/after and the network log as forensic evidence, not just the chat transcript.
Evaluate - scores every response with ARiES, a calibrated composite metric: 0.35·M + 0.35·L + 0.20·A + 0.10·J, anomaly (PCA + Mahalanobis distance against a benign baseline), leakage (cosine + entity + token overlap against the private corpus), control failure (evidence-based, not text-based), and an ensemble LLM judge.
Repair - on a critical finding (ARiES ≥ 75 or a confirmed control failure), drafts a defensive patch and opens a human-reviewed pull request. Nothing merges without a human.

Global ARiES is the maximum score across every attack in the run, not the average — one critical failure shouldn't get to hide behind ninety-nine successful defenses.

How we built it

Backend: Python/FastAPI, strictly layered (Routers → Services → Repositories), an asynchronous producer–consumer pipeline with no global singletons, four phases (plan/verify/evaluate/repair) wired through asyncio.Queues.
Frontend: Next.js + React, ports-and-adapters architecture, polling a typed AuditService interface so the transport can be swapped without touching components.
Browserbase + Stagehand drive the live verification scenarios.
Redis Stack (RediSearch) runs HNSW vector search so leakage detection against the private corpus is O(log N) instead of a brute-force scan.
MiniMax powers the ensemble judge and drafts the remediation patch.
GitHub API opens the actual HITL pull request.
Sentry instruments the pipeline, prompts and PII are never logged.

Challenges we ran into

The fuzzer is black-box by necessity. We don't have gradient access to the target, so instead of backpropagating to find adversarial tokens, we run simulated annealing: swap one token in the suffix, score the response against a cross-entropy loss over two fixed prototypes (compliant-leak vs. refusal), and accept worse mutations with Metropolis probability so the search doesn't get stuck in the first local trap it finds.
A scrolling bug that took real debugging to find. Our dashboard panels kept growing instead of scrolling as findings accumulated. The actual root cause was two layers deep: a shared GlassPanel component's inner wrapper was a plain <div> with no flex context, silently breaking flex-1 / overflow-y-auto for every panel that used it, not just the one we first noticed.
Calibrating ARiES itself. Early on, pure in-subspace Mahalanobis distance scored leaked secrets identically to benign text, the anomaly signal lived in the residual subspace, not the principal components. Fixed by combining T² with the reconstruction residual (SPE) and max-pooling over sentences so a single leaked sentence buried in an otherwise-normal response still gets caught.

What's next

Expanding the registered MITRE ATT&CK technique library, adding an SSE/live transport behind the same AuditService port, and persistent regression storage so repeat audits can flag re-introduced vulnerabilities.