Lunar Lander Agent - TEAM INOV8

GIF
Early episodes...
GIF
250 Episodes Later...
GIF
760 Episodes Later...

Inspiration

We were inspired by sheer automation and the passion to innovate a system that learns and develops on its own and independently executes an assigned task without any human intervention.

What it does

It is a lunar lander agent that independently maneuvers around the lunar environment to land safely without any collisions.

How we built it

We used reinforcement learning to achieve the end result and analyzed numerous episodes to maximize the reward which in turn is proportionate to the accuracy.

We used OpenAI-Gym to create and play with the environment and used DQN (deep Q learning) to create an agent which learns how not to crash by crashing multiple times.

Challenges we ran into

Training and testing over the GPU was challenging and running episodes until reaching the desired result was time consuming. Finding the right hyper-parameters was challenging and took majority of the time.

Results

The following were the results:

First few episodes: negative rewards (poor accuracy (~5% accurate)
~250 episodes later: Relatively positive rewards (~100) (improved accuracy ~ 70% accurate)
~760 episodes later: Improved accuracy with controlled speed. (~ 200 reward points~ 98% accuracy)

Accomplishments that we're proud of

The environment is considered beaten if the agent scores roughly an average of 200 reward points in 100 episodes. After approximately 9 rough hours of testing and debugging we finally achieved a perfect result.

What we learned

The most important quality we learnt was to not give up. Testing again and again with different hyper parameters and debugging the code seemed like an endless loop. But