Resilience Lab

Replay the moment your AI agent breaks

Inspiration

AI agents are easy to demo and hard to trust in production. Most demos show the happy path, but real systems fail through model brownouts, MCP tool errors, stale retrieval, schema breaks, and blocked handoffs. Resilience Lab was built to answer one question: before an agent reaches users, can we prove how it behaves when the stack breaks?

What it does

Resilience Lab is an agent flight recorder and chaos replay console. It injects realistic failures across the agent stack, replays the incident, scores launch readiness, blocks unsafe paths, and exports evidence for launch reviews or regression tests.

Judges can run a one-click demo that shows a claims agent recovering from model, MCP, retrieval, and handoff failures.

How we built it

We built Resilience Lab with React, TypeScript, Vite, and a deterministic replay engine. The engine generates scenario-specific incidents, timelines, dependency health, resilience scores, regression checks, remediation tasks, and JSON evidence reports.

The UI is designed as a production-style resilience dashboard with local session history, failure toggles, judge mode, and exportable reports.

Challenges we ran into

The hardest challenge was avoiding a generic monitoring dashboard. We wanted the product to prove something specific: whether an AI agent can recover safely under dependency failure.

We also had to make the demo feel realistic without requiring live infrastructure, so we built a deterministic replay engine that simulates the kinds of failures teams actually face in production.

Accomplishments that we're proud of

We built a full interactive prototype with failure injection, replay timelines, resilience scoring, dependency health, launch gating, remediation tasks, local history, and report export.

We are proud that the product is not just another AI assistant. It is reliability infrastructure for AI agents, focused on what happens when systems fail.

What we learned

Agent reliability is not just uptime. It is about whether the agent knows when not to answer, how it communicates degraded context, how it routes recovery, and how teams can prove that behavior before launch.

What's next for Resilience Lab

Next, Resilience Lab would connect to real model gateways and MCP servers, capture live traces, and turn replay sessions into CI regression tests.

We also want to add team workspaces, hosted incident reports, launch approvals, and integrations with engineering tools so agent recovery becomes part of the release process.

Built With

Updates

VT VT started this project — May 24, 2026 03:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.