convLSTM

💡 Inspiration

The inspiration for convLSTM-battlesnake came from the realization that standard Battlesnake bots often play too "instinctively," making decisions based only on the current frame. We wanted to build a snake with a temporal memory—one that could "see" the history of its opponents' trails to predict where they are heading, not just where they are.

🐍 What it does

It is a Deep Reinforcement Learning agent that competes in 4-snake multiplayer matches. It doesn't just avoid walls; it learns high-level tactical behaviors like "coiling" to protect territory, "cutoff" maneuvers to trap opponents, and health-based aggression. It uses a safety-first heuristic layer to ensure it never makes a basic mistake like hitting a wall while the neural network handles the complex strategy.

🛠️ How we built it

Framework: Built with PyTorch and FastAPI.
Model: A custom architecture featuring a ConvLSTM backbone for temporal feature extraction and Multi-Head Self-Attention for cross-snake interaction.
Compute: Optimized for the RTX 4060 locally, but scaled to 8x NVIDIA A100 GPUs on the Wulver Supercomputer cluster.
Algorithm: Used Proximal Policy Optimization (PPO) with a custom curriculum designed for high-density combat. ### 🚧 Challenges we ran into
Distributed Scaling: We faced significant hurdles with Distributed Data Parallel (DDP) orchestration, specifically solving rendezvous timeouts between compute nodes and the login node.
Out-of-Bounds Edge Cases: We spent critical time debugging a rare IndexError that only appeared during high-intensity 4-snake collisions on the cluster. We solved this with robust state clipping in the model's head-feature extraction.
The Clock: Designing and training a 500-million-step model with matches only an hour away forced us to pivot to a Tournament Sprint curriculum, jumping straight into combat training. ### 🏆 Accomplishments that we're proud of
8-GPU Parallelism: Successfully scaling training across 2 nodes (8 GPUs) to reach a throughput of over 4 million steps per minute.
Stability: Fixing a critical crash-on-death bug 15 minutes before the tournament deadline.
Unified Pipeline: Creating a standardized setup that works seamlessly across local hardware and supercomputing clusters. ### 📖 What we learned We learned the intricacies of multi-node distributed training, the importance of differentiable state encoding, and how to effectively "force-curriculum" a model when time is the primary constraint. We also gained deep experience in debugging remote SLURM environments. ### 🚀 What's next for convLSTM
Royale Mode Mastery: Further tuning for the 1000+ turn "Royale" endgame where the board shrinks.
Territory Heuristics: Deepening the reward function to prioritize area control even when enemies are far away.
Long-Term Evolution: Running the full 500-million-step curriculum over multiple days to see the limits of its strategic depth.

Built With

pytorch

Updates

giovannettif Giovannetti started this project — Mar 08, 2026 12:58 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.