Inspiration

We were inspired by games like the rocket mode in Geometry Dash, where you control a rocket and navigate through obstacles. Building on this concept, we developed a similar game and implemented a reinforcement learning agent that learns to play it using Q-Learning.

What it does

The agent learns to control the plane by deciding whether to activate thrusters and move the plane higher or not. The environment provides the agent 7 observable values - the vertical position, the current vertical velocity and 5 rays which measure distance to the closest obstacle in front of the plane.

How we built it

The game is built in python using the libraries

  • pygame, for the game visualization,
  • pytorch, for training the neural network.

The level obstacles are generated procedurally using Perlin Noise, a techinique that is often used in games in order to obtain natural looking terrain. The levels are being generated progressively harder by making the tunel smaller and smaller.

Challenges we ran into

Tic-Tac-Toe

At first, to learn more about reinforcement learning, we implemented a simple Q-learning agent to play Tic-Tac-Toe. It was trained against another agent that selected moves at random every turn. the agent learned how to exploit the fact that the random opponent often leaves spots open that lead to completing three in a row. It also learned to preventively block when the opponent has already two marks in a row. We tested the agent by playing against it ourselves.

Furthermore, we tried to implement the natural next step in this task, and that was for the agent to play against itself (self-play). However, we have encountered training instability, the training process didn't converge and we were forced to abandon this project.

Cave Descent game environment

Instead of focusing on self-play problem, we wanted to experiment with Deep Q-Learning instead. That's why we have started the Cave Descent game environment project.

First we implemented a simple table based Q-Learning agent that avoids touching the floors or the ceilings. Afterwards, we made the more complex obstacles that change in time.

The final agent we built was using a Deep Neural Network Q-Learning. This approach does not use a large table to store the values, but estimates the q-value function using the network. Initially, this approach was not able to successfully train the network. Soon we have realized, that we needed to increase the model size to prevent underfitting. After this change, the neural network agent became the best one we have came across, getting much longer flight times compared to the previous table Q-Learning agent.

Accomplishments that we're proud of

Overall, we have managed to obtain an excelent result. The agent is able to learn how to avoid obstacles using very limited information about its surrounding. Throught the task, we have incrementally improved both the table based solution and later the neural network one until the agents can survive the whole episode without crashing.

What we learned

Most of all, we have learned a great lot about reinforcement learning and especially about using neural networks to determine the policy of an agent. Even though we have faced significant challenges when trying to implement self-play in the Tic-Tac-Toe task, it still enabled us to learn a lot. Also, one shouldn't forget about the impact of different parameters on the convergence of the method and how to structure the code in order to iterate quickly and implement and try new features.

What's next for Cave Descent Deep Reinforcement Learning

There are a lot of possible directions where to go from here. First of all, other features could be added as an input to the DQN, increasing the agent's ability to capture more complex behaviours. Furthermore, adding different enemies in the cave and the action of shooting them from the plane could provide an added challenge to the whole environment.

Built With

Share this project:

Updates