Title: Agent of Capitalism
Designing a simple agent to collect coins and attack opponents

Poster (with animated GIFs!):
https://docs.google.com/presentation/d/14aqBrEfeqA5ZnwtBVUfIzenN6ti2bm6B5Ak5Szxmobg/edit?usp=sharing
Final Reflection:
https://docs.google.com/document/d/1reU8fU0eTK9qTOVzGnplOtiSY_pbKGr022CAIsYuD2U/edit?usp=sharing
Runs (with enemy):
https://docs.google.com/document/d/1YrWdpqESJdfzKvV6yPFDiKcYOMl8j6ArDn3ogXZqYq8/edit?usp=sharing

Who:
Andrew Cooke - acooke1
Daniya Seitova - dseitova
Long Do - ldo6
Maxime Hendrikse Liu - mhendrik

Introduction:
We all agreed that ideally, we would implement reinforcement learning for our final project. After looking at a few different ideas, it seemed as if most previously existing games already had a reinforcement model designed for them. We thought that by creating our own game, we could not only make an entirely new reinforcement learning model but also a cool 2d game.
Our goal is to develop an agent that can find an optimized policy for collecting coins while avoiding its enemy. We will be training it on a fixed map initially, but if it succeeds on this, we hope to expand the project to train it to operate on random, procedurally-generated maps.

Related Work:
Gene, our mentor TA, recommended a paper called "Rogue-Gym: A New Challenge for Generalization in Reinforcement Learning," linked here: https://arxiv.org/pdf/1904.08129.pdf
Rogue-Gym discussed a model that generated a simple single-player roguelike and trained reinforcement-learning agents to play it.

Data:
We will be doing reinforcement learning on a simple game in a 2D map. The data will be the actions during the agent's play-throughs of the given map, from start to finish (the finishing condition is when all the coins have been collected, or if the player fails to avoid the enemy).

Methodology:
Because we are still learning about Reinforcement Learning in class, this methodology may change as we learn more.
For now, we expect to experiment between using Deep Q-Learning or a REINFORCE policy network to determine what actions the agent will take.

Metrics:
Our base goal is to have an agent that can move about the map and collect coins—we will test the model on several maps of escalating size and complexity.
Our target goal is to have the agent effectively collecting all the coins in the map while avoiding or attacking the enemy, which will always move towards the player.
A stretch goal is to train the agent to collect coins and avoid the enemy on randomly procedurally-generated maps, rather than just a fixed map.

Ethics:
What implications does this project have, beyond the 2D game?
One interesting ethical question surrounding video games is whether they promote violence.
We have discussed training the agent first with only the options to move around the map (thus forcing it to learn to avoid the enemy rather than attacking it). We are interested to see, if we then provide the agent with the option to attack the enemy, whether the optimized approach will then include attacking the enemy in order to more quickly collect the coins. If this is the case, it may be true that including options to solve problems violently in video games may normalize these approaches beyond the game.

Why is Deep Learning an interesting approach to this problem?
By studying how the agent will train to use the actions provided to it, we can observe how a deep learning agent may optimize its behavior on other problems. If given the option to attack, will the model use it? The answer to this question creates implications for why we should limit the actions available to other deep learning agents.

Division of Labor:
Long and Maxime will work on developing the game API—designing the map(s) for training and coding how, given an action, the game will generate a game state to return to the deep learning network.
Andrew and Daniya will develop the deep learning network, determining how to take in a game state, pass it through a reinforcement learning framework, and return the optimal next action.

Built With

Share this project:

Updates

posted an update

Introduction:
We all agreed that ideally, we would implement reinforcement learning for our final project. After looking at a few different ideas, it seemed as if most previously-existing simple games already had a reinforcement model designed for them. We thought that by creating our own game, we could not only make an entirely new reinforcement learning model but also a cool 2d game.
Our goal is to develop an agent that can find an optimized policy for collecting coins while avoiding its enemy. We will be training it on a fixed map initially, but if it succeeds on this, we hope to expand the project to train it to operate on random, procedurally-generated maps.

Challenges:
The hardest part of the project so far has been coming up with a way to measure reward/loss for our model. We capped the agent on the maximum number of steps in an episode and we are measuring how well the agent performs in each level by the number of coins the model can collect within that time.
Another challenge we faced was that initially, the model kept running into a wall, accruing negative rewards intentionally. We weren’t sure whether the model was simply trying to minimize its reward for some reason. After playing around with the reward settings, we eventually fixed the problem—we think that the problem may have been that our rewards/learning rate were too large, causing the model training to diverge.

Insight:
We are monitoring the reward that the model gets over sequential episodes. After training for several hundred epochs, we can plot the rewards, and see that rewards per episode generally increases as the model trains—so we know that the model is learning. We also have a function that prints the game state so that we can see the model moving and collecting coins.

Plan:
We believe we are on track with this project: our model is successfully learning to move about the map and collect coins, and trains successfully on two different level maps—so we have accomplished our base goal. We’re now going to dedicate our time to improving the model: so far our model is implemented with a REINFORCE framework. We’re planning to improve it by attempting to implement REINFORCE with Baseline.
Next steps include creating more complicated maps to test the model on, adding the enemy to the game and training the model to avoid it, adding the possibility for the model to attack the enemy, and finally possibly attempting to develop a model that can work while getting a new procedurally-generated map for each episode.

Log in or sign up for Devpost to join the conversation.