Inspiration

AI agents are usually judged by their best-case output, but real users meet them when something breaks: an LLM provider errors out, an MCP server times out, tool JSON is malformed, or latency spikes. FailoverPilot makes that failure path visible, auditable, and understandable.

What it does

FailoverPilot is an interactive web demo for resilience testing an agent runtime. Judges can inject common failure modes and watch the agent respond: primary LLM outage detection, MCP server fallback, latency-aware routing, malformed response handling, recovery policies, and user-facing confidence explanations.

How I built it

The demo is built as a static HTML, CSS, and JavaScript app deployed on Netlify. The UI models the user-side control plane for resilient agents: chaos inputs, recovery policy, timeline events, provider routing, MCP strategy, health score, and final user-facing explanation.

Challenges

The key challenge was making infrastructure resilience understandable without requiring a complex backend setup. The demo focuses on what users see, what the agent discloses, and how confidence changes during failure.

Accomplishments

  • Complete browser-based resilience demo
  • Clear fallback policy modes
  • Human-readable failure explanations
  • Live timeline for audits and judging
  • Public Netlify deployment

What I learned

Resilient agents are not only about retries and failover. They also need honest UX: users should know when data is cached, when a provider changed, and when an answer has lower confidence.

What's next

Next steps are real provider health checks, live MCP probes, signed run reports, observability traces, and team-level policy templates.

Built With

Share this project:

Updates