FailoverPilot

Inspiration

AI agents are usually judged by their best-case output, but real users meet them when something breaks: an LLM provider errors out, an MCP server times out, tool JSON is malformed, or latency spikes. FailoverPilot makes that failure path visible, auditable, and understandable.

What it does

FailoverPilot is an interactive web demo for resilience testing an agent runtime. Judges can inject common failure modes and watch the agent respond: primary LLM outage detection, MCP server fallback, latency-aware routing, malformed response handling, recovery policies, and user-facing confidence explanations.

How I built it

The demo is built as a static HTML, CSS, and JavaScript app deployed on Netlify. The UI models the user-side control plane for resilient agents: chaos inputs, recovery policy, timeline events, provider routing, MCP strategy, health score, and final user-facing explanation.

Challenges

The key challenge was making infrastructure resilience understandable without requiring a complex backend setup. The demo focuses on what users see, what the agent discloses, and how confidence changes during failure.

Accomplishments

Complete browser-based resilience demo
Clear fallback policy modes
Human-readable failure explanations
Live timeline for audits and judging
Public Netlify deployment

What I learned

Resilient agents are not only about retries and failover. They also need honest UX: users should know when data is cached, when a provider changed, and when an answer has lower confidence.

What's next

Next steps are real provider health checks, live MCP probes, signed run reports, observability traces, and team-level policy templates.

Built With

ai-agents
css
html
javascript
mcp
netlify
observability
resilience
ux

Updates

K He started this project — May 12, 2026 01:59 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.