posted an update

First we looked at BidexHands, a reinforcement learning project that trains a set of hands to open jars, doors, and other tasks. However, upon talking with the professor, she mentioned that something with less parameters would be better to start with. So we opted for a game player agent based on this paper. The method of deep reinforcement learning in the paper utilizes a combination of Q-learning and experience replay to train the agent. For this specific example, the paper used the classic game snake, but we also found that OpenAI gym has a super mario module, so we are training mario to clear a stage. We just thought that the bidexhands project was something that was really cool to implement, especially seeing the virtual agent improve incrementally was really interesting. We are planning to implement everything using tensorflow since we’ve been using it for the semester, even though both the paper and bidex hands use pytorch as for libraries. This is a reinforcement learning project mixed with a CNN, where the mario agent will be able to make its way through the stage incrementally. We use a reward system being distance reached from the start, and a CNN to detect any external agents that can harm mario on the journey. The hardest part of this project so far was figuring out where to start. We began with the idea of building a CNN to work as a geo locator. Essentially we wanted to have the model to predict the coordinates of where a given photo was taken. However, we slowly realized that we probably did not have enough data to make this feasible. Without Google Maps API (which is quite expensive) we were forced to pivot. We then began messing around with OpenAI Gym, which is a toolkit that helps with reinforcement learning projects. We began with the idea of trying to make an RL program that would learn to use prosthetic hands to perform basic tasks. But after talking with Professor Singh, we came to the conclusion that this project would be a bit out of scope of the class. Now, we are focused on making an RL model that can play the first level of Mario, which is a much more reasonable task. The hardest part so far has been getting the Mario to learn and optimize over time fast enough. We are going to try to invest in Google Colab to make it learn enough to beat a level. We are able to get the model running and output a visualization of our agent playing the game. The concrete results that can be shown are the loss of the model over time and a video of the agent attempting to beat a given level. We expected the behavior of the model to generalize so that the agent would learn techniques to jump over pipes and other obstacles. However, the model seems to train for the first level in a way that only works for the given path and fails on other levels. We expected it to show some generalized learning but it seems to struggle even when training for a while. We need to dedicate more time to training the model on Google Cloud for a large number of training steps. Reinforcement learning isn’t as straightforward of a process as we thought and the model’s loss varies over time rather than steadily decreasing. We are also thinking of changing the model itself by trying out different architectures, as we should invest computing resources into the best one instead of training many of them. We also may change the objective of our project to solve a specific level rather than being able to beat the whole game. The goal would be to have an agent that has found an optimal path to successfully completing the first level.

Log in or sign up for Devpost to join the conversation.