TauZero

As a team, we've always loved playing adversarial games like poker and chess that have complex but interesting optimal strategies. We were naturally inspired by Google's AlphaZero AI, not only because it beat the world's top chess-playing algorithms like stockfish, but also because it played such far-sighted, positional moves that had strategic consequences long after the move was played. AlphaZero displayed a true strategic 'understanding' of the game that is reteaching us as humans how to think about this ancient game. And we wanted to build a chess AI like it. A chess AI requires two main parts: a valuator, which tells you how good a certain board state is, and a searcher, which helps search for the best board state using that valuator.

The first iteration of TauZero used a simple search algorithm (minimax) that brute forced the search. And for the valuator we hard coded a rule-based system to value certain board states (eg. a knight on your side contributes +3 to a total, while a pawn on your opponent's side contributes -1 to the total board value).

We quickly discovered two things:

(1) we needed a more accurate valuator, because the agent wasn't taking into account 'positional advantages' in chess (i.e. are my pieces on better squares than yours?), and positional threats (i.e. am I about to move a piece to a square that will make your game significantly harder to play in the future). In order to correct the issues caused by a purely heuristic approach, we needed a model that was capable of encapsulating a complex understanding of how pieces interact on the board. To do this, we used a convolutional neural net implemented by extending nn.Module and using PyTorch's object oriented API. The CNN is an optimal choice for our model given that convolutional filters can capture spatial relationships and thus take into account the interaction between pieces located at differing points on the board. By training the model using PyTorch's implementation of autograd to execute supervised learning on Stockfish board evaluations, our engine began playing more conventional openings and setups in chess theory and in human games in general. Overall an improvement.

(2) We needed to speed up our game-tree search as the current method allowed for a very limited search depth.We looked to decrease the number of possibilities our agent had to consider (alpha-beta pruning), as well as some other "position probing" techniques where we'd only fully explore possibilities we previously deemed promising, saving computational time.

And finally, with our experience, we implemented a drastically different algorithm--the same algorithm in the AlphaZero paper, with a few modifications. This was probably the most difficult part of the project, as we read through difficult academic literature and coded up complex data structures in order to store what we needed to run the algorithm. But in the end we recreated AlphaZero, excepting that we pre-trained the network before letting it perform self play, in order for it to converge faster. In order to carry out move selection, we used PyTorch for the entirety of the training framework and move selection algorithm. The the architecture is essentially a deep residual convolutional neural net that was trained using backpropagation on the cross-entropy between predicted prior probabilities and posterior probabilities improved using Monte-Carlo Tree Search. In order to implement this algorithm, we heavily relied on Torch's utilities that allow us to calculate and retain gradients through unconventional sampling routines, which would not have been easy using a less flexible machine learning framework. Furthermore, Torch's easy GPU acceleration that allowed us to quickly transfer tensors between the CPU and GPU helped us vectorize our code and make the most out of limited computational resources.

Our next steps for this project will be to rewrite everything in C++ using PyTorch's ATen library, where we can take advantage of multithreading and feasibly run multiple workers on a threaded game tree data structure so as to search the state space rapidly and enable faster training.

Overall it was so interesting being able to create algorithms in a space we understood from a non-RL perspective so well, and to leverage that knowledge to build something tangible.

Built With

Updates

Collin Wang started this project — Aug 20, 2020 10:13 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.