Agent of Capitalism

Maxime Hendrikse Liu posted an update — Nov 23, 2020 09:00 PM EST

Introduction:
We all agreed that ideally, we would implement reinforcement learning for our final project. After looking at a few different ideas, it seemed as if most previously-existing simple games already had a reinforcement model designed for them. We thought that by creating our own game, we could not only make an entirely new reinforcement learning model but also a cool 2d game.
Our goal is to develop an agent that can find an optimized policy for collecting coins while avoiding its enemy. We will be training it on a fixed map initially, but if it succeeds on this, we hope to expand the project to train it to operate on random, procedurally-generated maps.

Challenges:
The hardest part of the project so far has been coming up with a way to measure reward/loss for our model. We capped the agent on the maximum number of steps in an episode and we are measuring how well the agent performs in each level by the number of coins the model can collect within that time.
Another challenge we faced was that initially, the model kept running into a wall, accruing negative rewards intentionally. We weren’t sure whether the model was simply trying to minimize its reward for some reason. After playing around with the reward settings, we eventually fixed the problem—we think that the problem may have been that our rewards/learning rate were too large, causing the model training to diverge.

Insight:
We are monitoring the reward that the model gets over sequential episodes. After training for several hundred epochs, we can plot the rewards, and see that rewards per episode generally increases as the model trains—so we know that the model is learning. We also have a function that prints the game state so that we can see the model moving and collecting coins.

Plan:
We believe we are on track with this project: our model is successfully learning to move about the map and collect coins, and trains successfully on two different level maps—so we have accomplished our base goal. We’re now going to dedicate our time to improving the model: so far our model is implemented with a REINFORCE framework. We’re planning to improve it by attempting to implement REINFORCE with Baseline.
Next steps include creating more complicated maps to test the model on, adding the enemy to the game and training the model to avoid it, adding the possibility for the model to attack the enemy, and finally possibly attempting to develop a model that can work while getting a new procedurally-generated map for each episode.

Log in or sign up for Devpost to join the conversation.