π‘ Inspiration
The inspiration for convLSTM-battlesnake came from the realization that standard Battlesnake bots often play too "instinctively," making decisions based only on the current frame. We wanted to build a snake with a temporal memoryβone that could "see" the history of its opponents' trails to predict where they are heading, not just where they are.
π What it does
It is a Deep Reinforcement Learning agent that competes in 4-snake multiplayer matches. It doesn't just avoid walls; it learns high-level tactical behaviors like "coiling" to protect territory, "cutoff" maneuvers to trap opponents, and health-based aggression. It uses a safety-first heuristic layer to ensure it never makes a basic mistake like hitting a wall while the neural network handles the complex strategy.
π οΈ How we built it
- Framework: Built with PyTorch and FastAPI.
- Model: A custom architecture featuring a ConvLSTM backbone for temporal feature extraction and Multi-Head Self-Attention for cross-snake interaction.
- Compute: Optimized for the RTX 4060 locally, but scaled to 8x NVIDIA A100 GPUs on the Wulver Supercomputer cluster.
- Algorithm: Used Proximal Policy Optimization (PPO) with a custom curriculum designed for high-density combat. ### π§ Challenges we ran into
- Distributed Scaling: We faced significant hurdles with Distributed Data Parallel (DDP) orchestration, specifically solving rendezvous timeouts between compute nodes and the login node.
- Out-of-Bounds Edge Cases: We spent critical time debugging a rare
IndexErrorthat only appeared during high-intensity 4-snake collisions on the cluster. We solved this with robust state clipping in the model's head-feature extraction. - The Clock: Designing and training a 500-million-step model with matches only an hour away forced us to pivot to a Tournament Sprint curriculum, jumping straight into combat training. ### π Accomplishments that we're proud of
- 8-GPU Parallelism: Successfully scaling training across 2 nodes (8 GPUs) to reach a throughput of over 4 million steps per minute.
- Stability: Fixing a critical crash-on-death bug 15 minutes before the tournament deadline.
- Unified Pipeline: Creating a standardized setup that works seamlessly across local hardware and supercomputing clusters. ### π What we learned We learned the intricacies of multi-node distributed training, the importance of differentiable state encoding, and how to effectively "force-curriculum" a model when time is the primary constraint. We also gained deep experience in debugging remote SLURM environments. ### π What's next for convLSTM
- Royale Mode Mastery: Further tuning for the 1000+ turn "Royale" endgame where the board shrinks.
- Territory Heuristics: Deepening the reward function to prioritize area control even when enemies are far away.
- Long-Term Evolution: Running the full 500-million-step curriculum over multiple days to see the limits of its strategic depth.
Log in or sign up for Devpost to join the conversation.