Inspiration
The spark came from a video by Underscore_ featuring Renaud Heitz (CTO of Exotec), detailing the orchestration of hundreds of robots in 24-hour delivery warehouses. As a 5th-year student (UQAC/ESGI) new to robotics, I dove into the academic side and noticed a glaring gap: every benchmark assumed a "perfect world." No mechanical failures, no sensor drops, no deadlocks. While most researchers were racing for the fastest millisecond, I realized that in the real world, a single "dead" robot can cause a massive cascade propagation that brings a multi-million dollar facility to a halt. MAFIS was built to be the bridge between theoretical "perfect" pathfinding and the messy reality of warehouse logistics.
What it does
MAFIS is a Fault Resilience Observatory for lifelong Multi-Agent Pathfinding (MAPF). It allows researchers to inject faults into robot fleets, simulating stuck or delayed agents – to observe how failures propagate through a system. It pairs every faulted run with a deterministic, fault-free baseline, turning every metric (throughput, delay, heat) into a measure of deviation. It features a real-time 3D simulation with a dashboard that runs entirely in the browser, providing a research tool environment for testing how robotics algorithms survive when things go wrong.
How we built it
Engine: Powered by Rust 2024 and the Bevy ECS (Entity Component System) for high-concurrency simulation and clean data-driven architecture. Compiled to WebAssembly (WASM) for a zero-install, 120+ FPS experience in the browser, while maintaining a headless CLI for high-speed statistical "Monte Carlo" experiments. (note: parallel computation and multi-threading is not fully optimized on web: for maximum performance we should run the desktop version but the web version is more polished for user experience)
Deterministic Timeline: We built a custom state-management system that allows for "Time-Travel" rewinding. You can rewind the simulation to a specific tick and re-watch a failure event without breaking the internal states of the solver, scheduler, or metrics.
SOTA Solvers: Ported and implemented complex Multi-Agent algorithms (PIBT, RHCR, Token Passing) from C++ research papers into idiomatic, safety-guaranteed Rust.
Challenges we ran into
The "Mathematical Wall" of state-of-the-art MAPF algorithms was significant, translating abstract paper logic into high-performance Rust required a deep dive into graph theory and priority inheritance. Technically, the Timeline/Rewind feature was the hardest puzzle. Ensuring that the solver's internal search tree, the UI's reactive state, and the cumulative metrics remained perfectly synchronized during a rewind without leaking state or causing non-determinism required a rigorous approach to data ownership and snapshotting.
Accomplishments that we're proud of
I'm incredibly proud of the Real-Time Metric Synchronization. There was a "Eureka" moment when, after weeks of debugging, the UI and the simulation engine finally spoke the same language - seeing the live charts dip and recover in perfect sync with the 3D robots navigating a "traffic jam" was amazing. Additionally, achieving 120+ FPS in WASM with 200+ active agents proves that high-fidelity research tools can be accessible and performant in a web browser.
What we learned
This project completely shifted my perspective from "Optimal" to "Robust." I learned that Chaos Engineering is dangerously underestimated in robotics. In a world where we ship software faster than ever, "Optimal" is a fragile goal - "Reliable" is what actually keeps the lights on. Building this from first principles as a newcomer taught me that curiosity and the right tools (like Rust and ECS) can bridge the gap between "having no background" and "contributing to the field."
What's next for MAFIS - Multi-Agent Fault Injection Simulator
- Heterogeneous agents: Simulating different size of robots with different specs.
- Machine learning: Predict data under faults to each scenario to observe a pattern and maybe find a formal model to explain the why.
- Rescue Robot: Add a rescue robot that will take the dead robot and place it to a safe zone. Then we compare if it gives better throughput.
Links
Credits / Acknowledgments
Thanks to these communities for their incredible work and contributions :
Thanks to researchers for their incredible work :
PIBT (Priority Inheritance with Backtracking): Okumura, K., Machida, M., Dieite, X., Kono, Y., & Tamura, S. (2019). Priority Inheritance with Backtracking for Iterative Multi-agent Path Finding. In Proceedings of the AAAI Conference on Artificial Intelligence. arXiv:1901.11282
RHCR (Rolling-Horizon Collision Resolution): Li, J., Tinka, A., Kiesel, S., Ma, H., Kumar, T. S., & Koenig, S. (2021). Lifelong Multi-Agent Path Finding in Large-Scale Warehouses. In Proceedings of the AAAI Conference on Artificial Intelligence. arXiv:2005.07371
TP (Token Passing): Ma, H., Li, J., Kumar, T. S., & Koenig, S. (2017). Lifelong Multi-Agent Path Finding for Online Pickup and Delivery Tasks. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). arXiv:1705.10868
Special thanks to professor Okumura, pillar of the SOTA in the MAPF world who says that my project looks cool :>
Built With
- astro-framework
- bevy
- rust
- webassembly
Log in or sign up for Devpost to join the conversation.