MAFIS - Multi-Agent Fault Injection Simulator

Configuration - Setup the run
Run - Observe
Experiments - Headless runs

Inspiration

The spark came from a video by Underscore_ featuring Renaud Heitz (CTO of Exotec), detailing the orchestration of hundreds of robots in 24-hour delivery warehouses. As a 5th-year student (UQAC/ESGI) new to robotics, I dove into the academic side and noticed a glaring gap: every benchmark assumed a "perfect world." No mechanical failures, no sensor drops, no deadlocks. While most researchers were racing for the fastest millisecond, I realized that in the real world, a single "dead" robot can cause a massive cascade propagation that brings a multi-million dollar facility to a halt. MAFIS was built to be the bridge between theoretical "perfect" pathfinding and the messy reality of warehouse logistics.

What it does

MAFIS is a Fault Resilience Observatory for lifelong Multi-Agent Pathfinding (MAPF). It allows researchers to inject faults into robot fleets, simulating stuck or delayed agents – to observe how failures propagate through a system. It pairs every faulted run with a deterministic, fault-free baseline, turning every metric (throughput, delay, heat) into a measure of deviation. It features a real-time 3D simulation with a dashboard that runs entirely in the browser, providing a research tool environment for testing how robotics algorithms survive when things go wrong.

How we built it

Engine: Powered by Rust 2024 and the Bevy ECS (Entity Component System) for high-concurrency simulation and clean data-driven architecture. Compiled to WebAssembly (WASM) for a zero-install, 120+ FPS experience in the browser, while maintaining a headless CLI for high-speed statistical "Monte Carlo" experiments. (note: parallel computation and multi-threading is not fully optimized on web: for maximum performance we should run the desktop version but the web version is more polished for user experience)
Deterministic Timeline: We built a custom state-management system that allows for "Time-Travel" rewinding. You can rewind the simulation to a specific tick and re-watch a failure event without breaking the internal states of the solver, scheduler, or metrics.
SOTA Solvers: Ported and implemented complex Multi-Agent algorithms (PIBT, RHCR, Token Passing) from C++ research papers into idiomatic, safety-guaranteed Rust.

Challenges we ran into

The "Mathematical Wall" of state-of-the-art MAPF algorithms was significant, translating abstract paper logic into high-performance Rust required a deep dive into graph theory and priority inheritance. Technically, the Timeline/Rewind feature was the hardest puzzle. Ensuring that the solver's internal search tree, the UI's reactive state, and the cumulative metrics remained perfectly synchronized during a rewind without leaking state or causing non-determinism required a rigorous approach to data ownership and snapshotting.

Accomplishments that we're proud of

I'm incredibly proud of the Real-Time Metric Synchronization. There was a "Eureka" moment when, after weeks of debugging, the UI and the simulation engine finally spoke the same language - seeing the live charts dip and recover in perfect sync with the 3D robots navigating a "traffic jam" was amazing. Additionally, achieving 120+ FPS in WASM with 200+ active agents proves that high-fidelity research tools can be accessible and performant in a web browser.

What we learned

This project completely shifted my perspective from "Optimal" to "Robust." I learned that Chaos Engineering is dangerously underestimated in robotics. In a world where we ship software faster than ever, "Optimal" is a fragile goal - "Reliable" is what actually keeps the lights on. Building this from first principles as a newcomer taught me that curiosity and the right tools (like Rust and ECS) can bridge the gap between "having no background" and "contributing to the field."

What's next for MAFIS - Multi-Agent Fault Injection Simulator

Heterogeneous agents: Simulating different size of robots with different specs.
Machine learning: Predict data under faults to each scenario to observe a pattern and maybe find a formal model to explain the why.
Rescue Robot: Add a rescue robot that will take the dead robot and place it to a safe zone. Then we compare if it gives better throughput.

Links

GitHub

Website

Credits / Acknowledgments

Thanks to these communities for their incredible work and contributions :

Rust

Bevy - Game engine in Rust

Astro Framework

Thanks to researchers for their incredible work :

PIBT (Priority Inheritance with Backtracking): Okumura, K., Machida, M., Dieite, X., Kono, Y., & Tamura, S. (2019). Priority Inheritance with Backtracking for Iterative Multi-agent Path Finding. In Proceedings of the AAAI Conference on Artificial Intelligence. arXiv:1901.11282
RHCR (Rolling-Horizon Collision Resolution): Li, J., Tinka, A., Kiesel, S., Ma, H., Kumar, T. S., & Koenig, S. (2021). Lifelong Multi-Agent Path Finding in Large-Scale Warehouses. In Proceedings of the AAAI Conference on Artificial Intelligence. arXiv:2005.07371
TP (Token Passing): Ma, H., Li, J., Kumar, T. S., & Koenig, S. (2017). Lifelong Multi-Agent Path Finding for Online Pickup and Delivery Tasks. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). arXiv:1705.10868

Special thanks to professor Okumura, pillar of the SOTA in the MAPF world who says that my project looks cool :>

Built With

astro-framework
bevy
rust
webassembly

Updates

Teddy Truong posted an update — Apr 12, 2026 05:25 PM EDT

Website not up to date

The tool I developed is evolving way faster than what I thought, I have so much ideas and reliability checks that my website is not up to date with the actual simulator tool. So if you see some differences between a docs in the website and the tool, I'm aware, I'll update it later ! :>

Thanks for the people that are using my tool, it would be motivating if you give me a star in my repo!

Best regards, Teddy

Log in or sign up for Devpost to join the conversation.

Teddy Truong started this project — Apr 12, 2026 04:49 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.