GoDeep

Idil Cakmur posted an update — Nov 23, 2020 07:19 PM EST

Project Check-In #2: GoDeep

Introduction: We are going to reimplement a version of DeepMind’s AlphaGo Zero: “Mastering the Game of Go Without Human Knowledge.”
In 2016, DeepMind built the first artificial intelligence system that could defeat a champion Go player: AlphaGo. That system learned first by analyzing human professional games, learning to mimic their moves, and then playing itself to improve. After AlphaGo, DeepMind wanted to take the challenge a step further and build a system that masters the game without any human knowledge, just by self-play and reinforcement learning. We chose this paper because, as two Go enthusiasts, we were really interested in building our own AlphaGo Zero. This is a reinforcement learning problem.

Challenges: The hardest part of the project so far has been understanding Minigo and the hundreds of files that make it up. However, we have resolved this problem by deciding not to use most of Minigo. We only took the relevant parts which include only a couple of files (Go.py, coords.py for now). The lack of Minigo interaction will mean that testing will have to be done manually with online AI agents.

Insights: The results we have to show so far are the environment and the player that can play random moves. This is where we expected to be at this point. We also have a way of collecting data from self-play for training.

Plan: We are on track with the project. We need to start the Deep Learning section, that is, connecting self-play to value network training and then creating the MCTS to train the policy network. We need to spend more time thinking about the internal architecture of the policy network and the value network. We are well-positioned to succeed:)

Log in or sign up for Devpost to join the conversation.