πŸ’‘ Inspiration

The inspiration for convLSTM-battlesnake came from the realization that standard Battlesnake bots often play too "instinctively," making decisions based only on the current frame. We wanted to build a snake with a temporal memoryβ€”one that could "see" the history of its opponents' trails to predict where they are heading, not just where they are.

🐍 What it does

It is a Deep Reinforcement Learning agent that competes in 4-snake multiplayer matches. It doesn't just avoid walls; it learns high-level tactical behaviors like "coiling" to protect territory, "cutoff" maneuvers to trap opponents, and health-based aggression. It uses a safety-first heuristic layer to ensure it never makes a basic mistake like hitting a wall while the neural network handles the complex strategy.

πŸ› οΈ How we built it

  • Framework: Built with PyTorch and FastAPI.
  • Model: A custom architecture featuring a ConvLSTM backbone for temporal feature extraction and Multi-Head Self-Attention for cross-snake interaction.
  • Compute: Optimized for the RTX 4060 locally, but scaled to 8x NVIDIA A100 GPUs on the Wulver Supercomputer cluster.
  • Algorithm: Used Proximal Policy Optimization (PPO) with a custom curriculum designed for high-density combat. ### 🚧 Challenges we ran into
  • Distributed Scaling: We faced significant hurdles with Distributed Data Parallel (DDP) orchestration, specifically solving rendezvous timeouts between compute nodes and the login node.
  • Out-of-Bounds Edge Cases: We spent critical time debugging a rare IndexError that only appeared during high-intensity 4-snake collisions on the cluster. We solved this with robust state clipping in the model's head-feature extraction.
  • The Clock: Designing and training a 500-million-step model with matches only an hour away forced us to pivot to a Tournament Sprint curriculum, jumping straight into combat training. ### πŸ† Accomplishments that we're proud of
  • 8-GPU Parallelism: Successfully scaling training across 2 nodes (8 GPUs) to reach a throughput of over 4 million steps per minute.
  • Stability: Fixing a critical crash-on-death bug 15 minutes before the tournament deadline.
  • Unified Pipeline: Creating a standardized setup that works seamlessly across local hardware and supercomputing clusters. ### πŸ“– What we learned We learned the intricacies of multi-node distributed training, the importance of differentiable state encoding, and how to effectively "force-curriculum" a model when time is the primary constraint. We also gained deep experience in debugging remote SLURM environments. ### πŸš€ What's next for convLSTM
  • Royale Mode Mastery: Further tuning for the 1000+ turn "Royale" endgame where the board shrinks.
  • Territory Heuristics: Deepening the reward function to prioritize area control even when enemies are far away.
  • Long-Term Evolution: Running the full 500-million-step curriculum over multiple days to see the limits of its strategic depth.

Built With

Share this project:

Updates