-
-
Riposte: autonomous red-teaming for AI agents, fuzz, verify against MITRE ATT&CK, score with ARiES, and ship a human-reviewed fix.
-
The math behind Riposte, explained: black-box simulated annealing, ARiES's four-signal blend, plain English first, formal math on click.
-
Live audit dashboard: 10 MITRE ATT&CK scenarios, real Browserbase sessions, Global ARiES scoring, and an adversarial fuzzer, all in one run.
Inspiration
Anthropic's Frontier Red Team published "Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator", an analysis of 832 real accounts weaponizing AI across all 14 MITRE ATT&CK tactics. It gives you the taxonomy of what AI-enabled attacks look like. It doesn't tell you whether your deployment is actually vulnerable to any of them.
That gap is Riposte: turn a fixed threat taxonomy into a runnable, evidence-based verification suite you can point at your own agent.
What it does
Point Riposte at a target endpoint, a source repo, and a few lines of canary data (a private corpus + a benign baseline), and it runs a closed loop:
- Plan - selects MITRE ATT&CK techniques and generates adversarial fuzz seeds.
- Verify - drives a real headless browser (Browserbase + Stagehand) against the live target and runs each technique's scenario, capturing the DOM before/after and the network log as forensic evidence, not just the chat transcript.
- Evaluate - scores every response with ARiES, a calibrated
composite metric:
0.35·M + 0.35·L + 0.20·A + 0.10·J, anomaly (PCA + Mahalanobis distance against a benign baseline), leakage (cosine + entity + token overlap against the private corpus), control failure (evidence-based, not text-based), and an ensemble LLM judge. - Repair - on a critical finding (
ARiES ≥ 75or a confirmed control failure), drafts a defensive patch and opens a human-reviewed pull request. Nothing merges without a human.
Global ARiES is the maximum score across every attack in the run, not the average — one critical failure shouldn't get to hide behind ninety-nine successful defenses.
How we built it
- Backend: Python/FastAPI, strictly layered (Routers → Services →
Repositories), an asynchronous producer–consumer pipeline with no global
singletons, four phases (plan/verify/evaluate/repair) wired through
asyncio.Queues. - Frontend: Next.js + React, ports-and-adapters architecture, polling a
typed
AuditServiceinterface so the transport can be swapped without touching components. - Browserbase + Stagehand drive the live verification scenarios.
- Redis Stack (RediSearch) runs HNSW vector search so leakage detection
against the private corpus is
O(log N)instead of a brute-force scan. - MiniMax powers the ensemble judge and drafts the remediation patch.
- GitHub API opens the actual HITL pull request.
- Sentry instruments the pipeline, prompts and PII are never logged.
Challenges we ran into
- The fuzzer is black-box by necessity. We don't have gradient access to the target, so instead of backpropagating to find adversarial tokens, we run simulated annealing: swap one token in the suffix, score the response against a cross-entropy loss over two fixed prototypes (compliant-leak vs. refusal), and accept worse mutations with Metropolis probability so the search doesn't get stuck in the first local trap it finds.
- A scrolling bug that took real debugging to find. Our dashboard panels
kept growing instead of scrolling as findings accumulated. The actual root
cause was two layers deep: a shared
GlassPanelcomponent's inner wrapper was a plain<div>with no flex context, silently breakingflex-1/overflow-y-autofor every panel that used it, not just the one we first noticed. - Calibrating ARiES itself. Early on, pure in-subspace Mahalanobis distance scored leaked secrets identically to benign text, the anomaly signal lived in the residual subspace, not the principal components. Fixed by combining T² with the reconstruction residual (SPE) and max-pooling over sentences so a single leaked sentence buried in an otherwise-normal response still gets caught.
What's next
Expanding the registered MITRE ATT&CK technique library, adding an SSE/live
transport behind the same AuditService port, and persistent regression
storage so repeat audits can flag re-introduced vulnerabilities.
Built With
- anthropic
- asyncio
- browserbase
- docker
- fastapi
- framer-motion
- github-api
- minimax
- nextjs
- python
- react
- redis
- redisearch
- sentry
- stagehand
- tailwindcss
- typescript
- vercel

Log in or sign up for Devpost to join the conversation.