Donglin Liu posted an update — Nov 15, 2021 08:46 PM EST

Team member

Donglin Liu (dliu67)

Introduction

https://arxiv.org/pdf/2106.06135.pdf
It is a project to reimplement a self-play deep reinforcement Learning AI of a card game named "Doudizhu". Doudizhu is a famous card game in china and it has a lot of different versions. My purpose is to reimplement the AI of the basic version in the paper and try to implement some of the other versions.

Related Work

https://arxiv.org/pdf/1910.04376.pdf
In this paper, authers present an open-source toolkit for rl research in card game, which includes Doudizhu. They analyze the rule of Doudizhu and implement an environment for the Doudizhu. They also create an interface for testing. Here is the github link:https://github.com/datamllab/rlcard

The author of the paper I want to reimplement also share their codes in github. https://github.com/kwai/DouZero

Data

Since it is a self-play reinforcement learning project, data is created by three agents playing again each other.

Methodology

In paper, author use Deep Monte-Carlo method to do the learning. They use LSTM to encode the historical move and use MLP to encode action and state together to update Q-table. I think the hardest part will be the DMC method.

Metrics

I plan to do a lot of simulations with other pretrained models to test my bot. I also plan to compare my bot with paper’s model to see the difference. The authors of the paper are hoping to find a Doudizhu AI that can beat all other AI agents at that time. As a result, they run simulations with different models and use the winning percentage and the Average Difference in Points to quantified their results.
My base goal is to change the framework and generate similar agents as the paper’s. My target goal is to change the rule of the game and generate a good agents to another rule version of this card game. My Stretch goal is to change the model structure(eg. LSTM to transformers) to see the difference.

Ethics

Why is Deep Learning a good approach to this problem?
Since Doudizhu is a really complicated game. It has thousands of possible actions and situations. Each of three positions has different strategy. It is hard to use a heuristic function to predict the next move. At the same time, RL can predict good moves by thousands of simulations.

What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative?
My dataset is created by the simulations of my bot and different pretrained models. It is created by AI and the rewards are obvious. It is hard to say if it is representative, but if we simulations enough times, it will be relatively representative.

Division of labor

Donglin Liu: All parts of the project

Log in or sign up for Devpost to join the conversation.