Team member
Donglin Liu (dliu67)
Introduction
https://arxiv.org/pdf/2106.06135.pdf
It is a project to reimplement a self-play deep reinforcement Learning
AI of a card game named "Doudizhu". Doudizhu is a famous card game in
china and it has a lot of different versions. My purpose is to
reimplement the AI of the basic version in the paper and try to
implement some of the other versions.
Related Work
https://arxiv.org/pdf/1910.04376.pdf
In this paper, authers present an open-source toolkit for rl research in
card game, which includes Doudizhu. They analyze the rule of Doudizhu
and implement an environment for the Doudizhu. They also create an
interface for testing. Here is the github
link:https://github.com/datamllab/rlcard
The author of the paper I want to reimplement also share their codes in github. https://github.com/kwai/DouZero
Data
Since it is a self-play reinforcement learning project, data is created by three agents playing again each other.
Methodology
In paper, author use Deep Monte-Carlo method to do the learning. They use LSTM to encode the historical move and use MLP to encode action and state together to update Q-table. I think the hardest part will be the DMC method.
Metrics
I plan to do a lot of simulations with other pretrained models to test
my bot. I also plan to compare my bot with paper’s model to see the
difference. The authors of the paper are hoping to find a Doudizhu AI
that can beat all other AI agents at that time. As a result, they run
simulations with different models and use the winning percentage and the
Average Difference in Points to quantified their results.
My base goal is to change the framework and generate similar agents as
the paper’s. My target goal is to change the rule of the game and
generate a good agents to another rule version of this card game. My
Stretch goal is to change the model structure(eg. LSTM to transformers)
to see the difference.
Ethics
Why is Deep Learning a good approach to this problem?
Since Doudizhu is a really complicated game. It has thousands of
possible actions and situations. Each of three positions has different
strategy. It is hard to use a heuristic function to predict the next
move. At the same time, RL can predict good moves by thousands of
simulations.
What is your dataset? Are there any concerns about how it was collected,
or labeled? Is it representative?
My dataset is created by the simulations of my bot and different
pretrained models. It is created by AI and the rewards are obvious. It
is hard to say if it is representative, but if we simulations enough
times, it will be relatively representative.
Division of labor
Donglin Liu: All parts of the project
Log in or sign up for Devpost to join the conversation.