Introduction:
We'd like to delve into deep reinforcement learning by implementing a model that can perform comparably well to an intermediate-level human player in the four-player, predominantly Middle-Eastern card game Trex. The game is relatively straightforward for a human to learn, but does have a very specific and comparatively complex structure containing a fixed number of sub-games. There are 4 Kingdoms each consisting of 5 contracts, and, glossing over the nature of the Trex contract, each contract is effectively 13 tricks. Each one requires strategy to consistently perform well. We find it a compelling challenge to tackle because at any given time (particularly early in a round, or at any point if one doesn't count cards) the game state is one of imperfect information. Thus, it will be interesting to see how / to what extent a model will be able to cope with the combination of, at varying times, both very constrained choices and more arbitrary actions ('hedging its bets'). Finally, the fact that the model must play to better both itself and its partner, whose hand is unknown but can be the subject of inference over time, adds an additional layer to this goal.
Challenges:
At this point in time, our resources have essentially been exclusively dedicated to appropriately modeling Trex in Python and, correspondingly, implementing a non-deep AI player that outperforms players that select from legal cards at random. Thus, this is where the bulk of the challenges have arisen. While we initially established a repository that contained a particular model decomposing the game into stateful players and stateless “contracts” that operate on those players, it became difficult to apply this approach to the Monte Carlo search algorithm that, it recently became apparent, we would need to support an AI player in a game of imperfect information. The next challenge was then to adapt an ISMCTS python implementation (Information Set Monte Carlo Tree Search) originally designed for Knockout Whist to Trex. The adaptations required meaningful contributions and edits, both in game logic and selective conversion to Numpy for performance optimization. The new method of modeling the game, dividing the ‘world’ into polymorphic game states used to construct the game tree, and stateless agents that perform move selection, seems very promising for our transition into the Deep Learning portion of the project.
Another primary dimension of difficulty we've been working through has been planning our architecture for the actual Deep Learning model, discussed below.
Insights:
We don’t have any concrete results that we can show on the deep model. However, as mentioned above, we currently have a baseline implementation of the game that we’re solving, as well as an AI agent that can beat naive (random) players with statistical significance over the course of dozens of games.
Plan:
We’re on track with the project. Our current priority is to learn more about the model we have chosen, which is going to be DeepStack. We’re finalizing our understanding of the model, and will start to implement the deep model as of tomorrow. Since we're done with our AI models that we’ll use as a baseline metric, the only milestone left on the horizon is to implement and train the Deep Learning model.
Log in or sign up for Devpost to join the conversation.