Two Agent Q Learning (angel problem)

Screenshot from during training

Inspiration

Last week I was watching the documentary about alpha Go about how the deep mind team used machine learning to beat the world champion Go, player. I really liked the idea of training an ai to play a game by just having it play hundreds of thousands or millions of games. I also watched a video about the solutions to the angel problem (https://en.wikipedia.org/wiki/Angel_problem) and it made me curious to see if I could mash the two together, and have the ai figure out the solution on its own.

What it does

The game is fairly simple and takes place on a grid with an "angel" or mouse spawned randomly near the middle. On each turn of the game, the "trapper" or devil player gets to place one wall, and then the angel gets to move one square in any of the cardinal directions. If it gets to the outer edge of the board it wins, if it runs into a wall it loses. They play a ton of games against each other and use q learning to create strategies and improve over time.

How I built it

Everything was coded in python, with two classes to control the trapper and the mouse and a class to manage the q learning aspects for both. Finally, there was the game manager and displayer, which processed the moves from each agent and displayed the grid using pygame. The game board is a simple 2d NumPy array, with different values corresponding to different things (0 = empty, 1 = wall etc). This is then fed into the trapper, who uses q learning to select which square it wants to block, which is then added to the game board. This updated board is then fed into the mouse class, which uses a similar process to update the q table and then select an action. Rinse and repeat. For the rewards, the mouse was given a default of -1 to incentivize moving around, while the trapper was rewarded by default because its goal is to stall out the game.

Challenges I ran into

Originally I wanted to feed both players data from the entire board, but I quickly realized that was unrealistic. It would run really well on small boards, (10x10 or so) but started to really struggle passed that because there were just too many possible game states. To solve this, I decided to limit both players to seeing a small subsection of the grid centered around the mouse, but that led to the mouse being much stronger and winning almost 100% of the time. To fix this, I let the trapper see one square further, meaning it could place walls outside the vision of the mouse, which made the games much more interesting. The final sizes I settled on was a 5x5 for the mouse and a 7x7 for the trapper.

Accomplishments that I'm proud of

Machine learning has always seemed like this cool black box that I never really tried to understand, but these past few weeks I've been doing a lot of ML projects and it feels really badass just being able to just whip up a q-learner on the spot and I've come to realize it's much easier (although definitely not easy) to understand than I thought. I also think I did a good job designing the game rules because at least to me, its super mesmerizing to just sit back and watch it learn.

What I learned

Writing the q learning class from the ground up really forced me to understand everything a lot better, and I'm really glad I did it. This was also the first time I ever tried using q learning in a "competitive" game where agents play against other agents in addition to being one of the first games I've ever made. Overall, I'm pretty happy with it, especially for only a 24-hour hackathon. Also looking back at what I've written, I've identified so many things that I wished I had done at the beginning that would be too time-consuming to go back and fix, but will 100% be added next time I make a project like this.

What's next for Two Agent Q Learning (angel problem)

One thing I really wanted to add but didn't have time for was the ability to play against the ai. I had a brief hacky way to play against it, but it involved typing moves into a console and having to view the game board in the console so I ended up removing it. So the next step is definitely a revamp of the display code to let someone take control of either player with a nice UI to control movement. I also want to set this up on a nice computer so I can run millions of games instead of hundreds of thousands, as my computer is quite old and slow which I think would yield some of the more complex strategies I wanted to see.

(If you want to try this for yourself, make sure to create an empty folder called qtables, with an empty mouse_q.txt and trapper_q.txt inside!)