See the second checkpoint here : https://docs.google.com/document/d/17suAialptR3Np_P_x1qedufz3L55FbiSIPRoqJhaykw/edit

See the full version here : https://docs.google.com/document/d/1tSpgPeaGS4yO1oC5Gpsq2IhGqi6hxR6hseq_ZHJbxDk/edit?usp=sharing

Advanced Tetris Bot

by Tea Spin: Naphat Permpredanun (npermpre), Treetased Vividhwara (tvividhw)

1. Introduction There are several bot from Atari or puzzle games being written from the past after a research on reinforcement learning has emerged rapidly, and one of them is tetris. Tetris is a game with very simple rules: you can only move the pieces in specific ways. Your game is over if your pieces reach the top of the screen, and you can only remove pieces from the screen by filling all the blank space in a line. With these simple, finite rules, there are several programmers coding on tetris bot (such as This GitHub reposition : https://github.com/nuno-faria/tetris-ai ). However, we found another game similar to Tetris called “Jstris''. Jstris combines Tetris rules with more complex rules. For example, each time a player clears some amount of lines, they can send these lines (which are called “garbage”) to other players to induce their chance to lose, or by having a special move called “T-spin” that creates more garbage given the same number of lines cleared. You can read the other rules here : https://jstris.jezevec10.com/guide#overview. With these bizarre rules, our goal is to build a bot that could play this game efficiently, which is defined as a capability to perform a T-spin and continue playing with a bunch of reasonable moves.

1.1. Related Work Even though Tetris bot got more attention in the game development field, there are some papers talking about writing in this more complex game. For example, the paper Playing Tetris with Deep Reinforcement Learning[1] that initiates the idea of using deep Q learning into this program instead of REINFORCE in class. The other resource is a medium article Reinforcement Learning on Tetris[2] that guides us several pathways to tackle this problem. Finally, we found the open source code from this github repository : https://github.com/zeroize318/tetris_ai and we tried to deviate some structure as we’ll mention later.

2. Methodology The model paper and medium article suggested two different architectures for the learning process: a deep Q learning and a deep Q learning with two ConV2D. However, from the medium articles, “The major issue for Q-learning to apply on Tetris is that many steps will be trivial. The action {a} for the game includes {left, right, down, turn left, turn right, hold, soft drop and hard drop}, but it is not our priority to teach the AI learn how to move a piece. When a human is playing Tetris, it is trivial to consider how to move a piece to a certain location; instead, we consider to where we should move it.”[3]. Consequently, with the suggestion from the article, we implemented only a deep Q learning algorithm with ConV2D.

However, we found that the open source code doesn’t train the model concentrating on the T-spin. Therefore, our group decided to divide the model into three parts based on the training process : training with a purely randomizer, training with a purely randomizer with a focus on T-spin, and training with an adapted randomizer (will mention later) with a focus on T-spin.

2.1. Data The dataset used in the model paper consisted of two arrays determining the overall status of the current game status. The first array, a 2D-array, represents the board that our bot is playing onto. This array is an array with a size of the height of the board times the width of the board (after visualizing it) filled with 0 and 1 such that 0 represents the blank space and 1 represents the filled space. The second array, a 1D array, represents the property of our board. This array will represent the following : the height of each column, the hole on each column, the number indicates whether the hold is allowed or not, the tetromino on hold, the tetromino that this bot is playing, and all next tetrominoes.

2.2. Metrics To assess the model efficiency, we will evaluate the points per piece in the game of 7-bag randomizer. 7-bag randomizer is a type of randomizer in Tetris games. It randomizes the next pieces in bags. Each bag consisted of 7 different tetris pieces (S, Z, T, I, O, L, J) in a random order. The point used to evaluate the model is based on the number of lines cleared at a time (4 lines cleared at a time get a higher score compared to clearing 1 line 4 times), the consecutive clearing of the lines (clearing lines consecutively give more score), the nature of the line clear (t-spin get more point compared to normal clearing). A move is counted as T-spin if the T pieces is successfully place with the final move be a spin and the final position of the T pieces is locked by 3 block

The original model shows around 9 points per piece in a completely random game, we aim for a similar or higher result in points per piece in our 7-bag game.

Since the focus of the model is to execute T-spin, another measurement is the number of T-spin that the bot executes after a number of pieces. This could be counted separately in the number of lines that the T-spin of the bot clear (0, 1, 2, or 3 lines)

3. Challenges We found that our main challenge after reimplementing the paper is the randomizer of the open source code. We find that the randomizer for the tetrominoes generates a complete random, which is using the random library in Python. However, based on the Jstris guide, the default randomizer for this game is 7-bag randomizer. Therefore, we have to reimplement the randomizer and other functions related to retrieving the next tetrominoes such as a parameter of the class, or a function that calls the new tetromino, which sometimes complicated because we have to change the next piece from independent to dependent to each other.

Another challenge is some impractical play from the bot when playing in Jstris. We found that if our deviation of reward function induced a chance of T-spin, our bot will build an extremely high stack, which could easily be interrupted when applying in the real gameplay.

4. Results Our best performance bot scores 13.56 point per pieces by pre training for 26 epochs on We measure point per pieces and number of T-spin on 3 bots which are trained You could see the similar result of training here : https://youtu.be/ABGmrZNNI5Q

5. Reflection and Discussion

Our team has two main expectations : capability of making a special move such as T-spin, and capability to optimize the score based on the reward. Our model achieves both goals after getting pre-trained by 26 epochs and trained in the environment by 10 epochs as we gain around 13 points per piece after 1000 pieces are placed, even more than the given open sources with 9-10 points per pieces placed. Also, from an eyeballing and counting, we could detect a consistent T-spin being applied for a model of 7-bag randomizer with a focus on T-spin, which ends up doing 116 T-spin given an expected number of tetromino T appears around 1000/7 = 143 appearances, which is around 81% T-spin occurrence, which exceeds our expectations on around 70% T-spin occurrence.

During our approach to reach this 81% occurrence and 13 points per piece, we have two major changes during the process. The first change is a decision for a variation of the model. At first, we decided to apply both Deep Q Learning and REINFORCE with ConV2D. However, we found that there is a complication on determining the critical function for a loss of REINFORCE, so we decided to relinquish it. The other concern is the process of training the model. Before we found the problem on the randomizer, we trained the model and got one practical model. However, we found that this model can poorly perform on the 7-bag randomizer with an inexplicable reason. Therefore, we set the models

If we have more time, we would apply two more applications. First, we will detect that our randomization is not entirely complete because some functions from both the papers and open sources are circulated around the complete random randomizer. Therefore, in the long term, there are some inaccuracies in randomization. So, we think we could further improve this randomization to suit the Jstris in case this bot has to play in a longer game. Second, we think we could apply the REINFORCE algorithm with ConV2D instead of DQN because of the easier tackling of the exploration-exploitation process, so we could consider doing it given more time.

In this project, we have learned how complex Reinforcement Learning (RL) could be, especially when the actions are not binary compared to Homework 6 that we tackle the CartPole game which considers only two choices (left or right). We also learned that RL and CNN could be combined to tackle some games problem The biggest takeaway from this project was some deep learning problem could occur from a combination of idea (such as this Jstris project that we combine RL and CNN together)

6. Code Repository Our codebase is accessible on GitHub at: https://github.com/NaphatPRM/advanced_tetris_bot

7. References [1] L, R. (2021, October 5). Reinforcement learning on Tetris. Medium. Retrieved December 9, 2021, from https://medium.com/mlearning-ai/reinforcement-learning-on-tetris-707f75716c37. [2] Stevens, M., & Pradhan, S. (n.d.). Playing Tetris with deep reinforcement learning. Retrieved December 9, 2021, from http://cs231n.stanford.edu/reports/2016/pdfs/121_Report.pdf.

Built With

Share this project:

Updates

posted an update

Introduction: Our group tried to find an optimization strategy to play the game called Jstris. This game is an adaptation of tetris with some additional rules such as an ability to attack other players, or some other techniques such as T-spin, following this overview https://jstris.jezevec10.com/guide#overview This problem relates to Reinforcement Learning since we’re interested in making the machine to learn based on the rewards of the score it could get instead of having the specific labels. We may include some use of options such as changing from Deep Q Learning, which is what the research implements into REINFORCE that we study in the class Challenges: What has been the hardest part of the project you’ve encountered so far? : By now, the hardest part so far is including new metrics of calculating our scores. For example, we have to include some scoring system that coordinates with the movement of keyboards such as T-spin. In this case we now consider spotting T-spin using the current state and the next state to see if there is a T-tetromino placed in a more complex way than just a simple drop. Another problem is how many T-spin the agent played since it yields a different score (based on https://n3twork.zendesk.com/hc/en-us/articles/360046263052-Scoring ). In our game, we may simplify it into having one-score-for-all T-spin Insights: Are there any concrete results you can show at this point? : By now, we could come up with the result based on the traditional way of playing tetris. We still cannot make a promising output related to the T-spin action. How is your model performing compared with expectations? : By now it doesn’t work as we expect yet since we could not apply T-spin into the reward model

Plan: Are you on track with your project? What do you need to dedicate more time to? : By now, I think making the new reward function that considers the T-spin and also detecting them are the main priority What are you thinking of changing, if anything? : May change or using a choice between DQN and REINFORCE May simplify the T-spin score for an easier implementation

Log in or sign up for Devpost to join the conversation.