MineSweeper

Yuchen Zhou posted an update — Dec 08, 2020 10:59 PM EST

Final Reflection:

How do you feel your project ultimately turned out and how did you do relative to your base/target/stretch goals? We reported a 62% winning rate after approximately 20,000 iterations across a batch size of 256. Comparing to a state of art winning rate of 90.2% of a deep q-learning, our winning rate is fairly low. However, this q-learning method is trained with 440,000 iterations updates and a batch size of 400. Additionally, we found that the winning rate of our model keeps increasing for more epoch of training. We believe that decreasing the learning rate during the training for more epochs could increase the final winning rate. Due to the limitation of the time, we think a 62% winning rate is good enough for this final project.

Did your model work out the way you expected it to? We set up a base benchmark of a 5% winning rate because it is a winning rate of an SVM supervised training algorithm. We also observed a decrease in loss and a corresponding increase in accuracy rate.

How did your approach change over time? What kind of pivots did you make, if any? would you have done differently if you could do your project over again? We did change the methodology. We decided to train our model in a Reinforcement Learning way. After reading the paper EVOLUTION STRATEGIES AND REINFORCEMENT LEARNING FOR A MINESWEEPER AGENT, which also has the state-of-art winning rate, we gave up reinforcement learning because they have a better training strategy on deep q-learning and policy gradient. However, our backup plan: training a supervised learning model became the major component of this project. We further observed that Minesweeper is a game that a local optimum in the early stage of the game would not lead to a bad state of the game. This means supervised learning fits to minesweeper well. If we could do this project again, we would do the same thing again.

What do you think you can further improve on if you had more time? One thing we could do is definitely train more epochs and lower the learning rate as the loss converges. Additionally, we could also try other experiments. For example, whether our model is affected by the size of the board, whether our extra feature encoding affects the final winning rate.

What are your biggest takeaways from this project/what did you learn? Since we have read several papers, we are getting familiar with the topics of Minesweeper, and work has been done for other people using AI to play Minesweeper.

Log in or sign up for Devpost to join the conversation.