DeepKart64

The Deep Rats: Raj Paul, Kyle Reyes, Floria Tsui, Nick Young

Our project centers around reimplementing the following paper, perhaps with a few variations: http://cs231n.stanford.edu/reports/2017/pdfs/624.pdf. Primarily, this is a reinforcement learning problem. We’re interested in solving a game-related problem for a few reasons. Firstly, progress is both measurable and satisfying, so we all found this to be an interesting problem. Secondly, since this draws from RL, we think that the techniques in this paper will be highly relevant to more unique and interesting problems (than, say, classification). As well, there are interesting software engineering challenges surrounding bootstrapping an emulator and whatnot that are also interesting.

This project was inspired in part by the “NeuralKart: A Real-Time Mario Kart 64 AI” paper published at Stanford, which used convolutional neural networks to learn the features of the terrain (from 4 different maps) and imitation learning to learn steering controls from a dataset. Their model struggled to handle situations with sharp turns, narrow roads, and obstacles. In particular, they concluded their paper by noting that the ability of RL through Deep Q Learning or Policy Gradients would be able to reward preferable performance and punish error conditions for the model. We hope to tackle exactly this problem.
Existing implementations:
- http://cs231n.stanford.edu/reports/2017/pdfs/624.pdf
- https://docs.google.com/document/d/1uxzeSMqj56YGWh8LkzfNriuGvA3aWU3olg-MSCgWuSI/edit

While we won’t be using any premade datasets, we will be running on Mario Kart 64 for the N64, and we will be using an emulator (Bizhawk or Mupen64Plus) to simulate our Mario Kart games.

We intend to use reinforcement learning in training.
The paper uses imitation learning and acknowledges its limitations. Some of these include the inability for their model to address sharp turns, narrow roads, and sudden obstacles. To address this, we are hoping to use reinforcement learning instead. Because we are using an architecture different from theirs, this will be the most difficult part.

What experiments do you plan to run?
- We plan to run our trained model on maps which were not directly learned. We hope to train our model such that it can handle both maps with and without walls.
For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate?
- For us, finishing the race is a good notion of success, as is how fast the network was able to finish.
If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model.
- The authors of this paper quantified results of their model by measuring the time it took to finish a map.
What are your base, target, and stretch goals?
- Base: completing maps.
- Target: similar performance to the ones in the paper.
- Stretch: It can beat the bots while appearing human (incorporates stochasticity).

What broader societal issues are relevant to your chosen problem space?
- Self-driving cars - Dataset collection: Stanford paper’s approach of imitation learning introduces the issue of: “However, imitation learning controllers suffer from a fundamental distribution mismatch problem. In practical terms, experts are often too good, and rarely find themselves in error states from which they must recover. Thus, the controller never learns to correct itself or recover, and small errors in prediction accumulate over time”
- Gaming fairness - It’s the year 2020, and gaming is as hot as ever. With esports tournaments racking up millions in prize pools, creating an effective gaming agent could be devastating to fairness in gaming.
Why is Deep Learning a good approach to this problem?
- In general, it is difficult to hardcode features of an environment of driving games (i.e. maps), and we want our model to be able to be successful on maps it hasn’t seen before (game may update, retraining wouldn’t necessarily be necessary), so we can use CNNs (and deep learning) to learn the features of the maps instead.