Deep-NES

Inspiration

Our group was inspired by the incredible research being done today in reinforcement learning in Autonomous Cars and AGI Robots. We set out to create a simple to conceptualize and explain environment for us to showcase the technology that many people are afraid of and skeptical of, in a way that is friendly.

What it does

Simply, it plays Super Mario Bros. on the Nintendo Entertainment System.\

But that's not the point of our model.

The goal of Deep-NES is to humanize AI --- it's just as bad at video games as us all (when given 24 hours to practice). Deep-NES is a learning tool for individuals with little understanding of AI to be relatable, and to provide an understanding that we are far away from the Terminator scare factor that many news-outlets are trying to sell to the public.

How we built it

Like most hacks, we set things up quickly using an environment for our software to live. Like a website uses the internet and related protocols to exists, to train our AI, we used Gym. Gym is an environment that emulates video games and has an interface that is programmable. As well, like a competent programmer would use library functions from the language instead of reinventing the wheel, to implement our RL agent, we used Tensorflow. Tensorflow is a library that provides highly optimized interface for modelling AI. To accomplish this, we designed a model, all from scratch, to accomplish our goal of a AI mario speed runner.

To do this, we implemented a modified Policy Gradient learning algorithm to optimize all of the parameters of the model we designed ( a deep multilayer perceptron ). We used a variant with modification from our ideas of a REINFORCE algorithm. It works through playing several games, computing gradients of our model, waiting to apply these gradients for several episodes and checking score. If the actions score is positive, it will apply the gradient to nudge the model into the right direction of optimization. We then run it through a gradient descent algorithm to further optimize our model.

TL;DR we used modifications of state of the art machine learning concepts to help us design our own model from scratch, and to then train this model.

Challenges we ran into

Our hardware is very slow, and not optimized for the sheer volume of matrix computations our optimization required. To overcome this, we trained several models for hours overnight. For reference, most models are trained on state of the art super computers for days. So our Mario may not be the best, but he tries! His dads are just poor.\

Accomplishments that we're proud of

We went from 3 guys with no RL knowledge, and 1 with some ML knowledge, to designing our own custom RL model!

What we learned

Reinforcement learning, and the math behind it!

What's next for Deep-NES

Training a LOT longer so mario gets a smarter brain!

Showcasing our results to de-mystify modern AI and RL models for the public. AI is coming, and we need to be educated about it to make smart decisions regarding policy, and educating the public is the first step in making a change ( our "Delta" hack!) !.

Built With

gym
python
tensorflow

Updates

Bilal Jaffry started this project — Jan 27, 2019 11:43 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.