Inspiration
We were inspired by sheer automation and the passion to innovate a system that learns and develops on its own and independently executes an assigned task without any human intervention.
What it does
It is a lunar lander agent that independently maneuvers around the lunar environment to land safely without any collisions.
How we built it
We used reinforcement learning to achieve the end result and analyzed numerous episodes to maximize the reward which in turn is proportionate to the accuracy.
We used OpenAI-Gym to create and play with the environment and used DQN (deep Q learning) to create an agent which learns how not to crash by crashing multiple times.
Challenges we ran into
Training and testing over the GPU was challenging and running episodes until reaching the desired result was time consuming. Finding the right hyper-parameters was challenging and took majority of the time.
Results
The following were the results:
- First few episodes: negative rewards (poor accuracy (~5% accurate)
- ~250 episodes later: Relatively positive rewards (~100) (improved accuracy ~ 70% accurate)
- ~760 episodes later: Improved accuracy with controlled speed. (~ 200 reward points~ 98% accuracy)
Accomplishments that we're proud of
The environment is considered beaten if the agent scores roughly an average of 200 reward points in 100 episodes. After approximately 9 rough hours of testing and debugging we finally achieved a perfect result.
What we learned
The most important quality we learnt was to not give up. Testing again and again with different hyper parameters and debugging the code seemed like an endless loop. But
What's next for Lunar Lander Agent
We intend to improvise by actually running the algorithm in real physical environment by connecting it to a hardware setup with servos and rotors.
Log in or sign up for Devpost to join the conversation.