We train a Minesweeper solver in a supervised learning fashion with DNN. Please check the following links for videos, posters, and writeups.

Oral presentation https://www.youtube.com/watch?v=relWi1_cGoA

Final report: https://docs.google.com/document/d/1gVKpW4PxELbyKKCYk0zPtbkX_A8HGNW_XJjSpgQwC0s/edit

Digital poster: https://docs.google.com/presentation/d/15-u-bcvtFNMAGkFawJOBygf9jK9Q1weO/edit#slide=id.p1

Github: https://github.com/zhou671/MineSweeper

You can see all the reflections on the post we have updated recently.

Built With

python
tensorflow

Updates

Yuchen Zhou posted an update — Dec 08, 2020 10:59 PM EST

Final Reflection:

How do you feel your project ultimately turned out and how did you do relative to your base/target/stretch goals? We reported a 62% winning rate after approximately 20,000 iterations across a batch size of 256. Comparing to a state of art winning rate of 90.2% of a deep q-learning, our winning rate is fairly low. However, this q-learning method is trained with 440,000 iterations updates and a batch size of 400. Additionally, we found that the winning rate of our model keeps increasing for more epoch of training. We believe that decreasing the learning rate during the training for more epochs could increase the final winning rate. Due to the limitation of the time, we think a 62% winning rate is good enough for this final project.

Did your model work out the way you expected it to? We set up a base benchmark of a 5% winning rate because it is a winning rate of an SVM supervised training algorithm. We also observed a decrease in loss and a corresponding increase in accuracy rate.

How did your approach change over time? What kind of pivots did you make, if any? would you have done differently if you could do your project over again? We did change the methodology. We decided to train our model in a Reinforcement Learning way. After reading the paper EVOLUTION STRATEGIES AND REINFORCEMENT LEARNING FOR A MINESWEEPER AGENT, which also has the state-of-art winning rate, we gave up reinforcement learning because they have a better training strategy on deep q-learning and policy gradient. However, our backup plan: training a supervised learning model became the major component of this project. We further observed that Minesweeper is a game that a local optimum in the early stage of the game would not lead to a bad state of the game. This means supervised learning fits to minesweeper well. If we could do this project again, we would do the same thing again.

What do you think you can further improve on if you had more time? One thing we could do is definitely train more epochs and lower the learning rate as the loss converges. Additionally, we could also try other experiments. For example, whether our model is affected by the size of the board, whether our extra feature encoding affects the final winning rate.

What are your biggest takeaways from this project/what did you learn? Since we have read several papers, we are getting familiar with the topics of Minesweeper, and work has been done for other people using AI to play Minesweeper.

Log in or sign up for Devpost to join the conversation.

Yuchen Zhou posted an update — Nov 23, 2020 03:59 PM EST

Introduction

We still want to make a minesweeper solver. However, after reading several papers we decide to change our benchmark. We will include them as related works later. Most of the trainings and testings are done on 4x4, 5x5, 6x6 grids and find the winning rate versus training iteration. We have seen some winning rates of over 90%. Our 32x32 grids might be too huge for an agent to learn anything, but we would still try it. We also would like to give an introduction of what is a minesweeper, more importantly, it has been found that determine whether a given mine configuration satisfies the board constraints is NP-complete. Enumerating all possible situations belongs to #p-complete class problem. This minesweeper solver is trying to predict the tile of gird that has the lowest possibility to be a mine when uncovered. We try to design a policy network and a value network with CNN layers. We would also try to use a pure convolutional network as a benchmark for prediction.

Challenges

We are reading papers to make sure our idea hasn't been implemented by others. It is not challenging but a little bit time-consuming.

Insights

We are still designing the model. We plan to have residual connections and batch normalization layers. But we might need to change our model once we find someone has implemented such a model.

Plan

We are spending time reading papers right now, and it might take several more days. We plan to set up the game environment recently, we expect these two jobs to finish at the same time. We might spend more time tuning our model. We just changed our benchmark. And our goal right now is just achieving a higher winning rate instead of getting a high score.

Log in or sign up for Devpost to join the conversation.

Yuchen Zhou started this project — Nov 12, 2020 04:38 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.