Getting Started
The rulesets were discussed, and I can type them out for you. Well, no, this is just a place where we can throw everything. It's actually so cool. She's surprisingly accurate. We then looked at the rule set and discussed whether it should be like a generation or something like Stockfish. Yes, we concluded that something like Stockfish would be better after I went to the game and came back. Minmax, minmax, minmax. Yeah, we pretty much talked about minmaxing the entire time. After that, I looked it up on ChatGPT, and ChatGPT said that training a model wouldn't be as good as just using a regular chess engine like Stockfish. Then, we set up a running game and randomly paired players to set up our testing environment.
Coming Up with Basic Rewards
Rewards
- 3 in a Row + INF
- If 2 pieces are ever touching + 1
Punish
- Losing -INF
Version 1
We set up a basic model. We got a simple model going, and all it did was have a minimax function for the placement stage of the game. The minimax function would recursively go through the game. We had the evaluate function that saw if two pieces were next to each other, and it would give a positive reward if that was the case. This model had 213 Wins / 86 Losses / 1 Tie.
Version 2
Version 2 implemented the other half of the algorithm. The first version only handled the placement phase of the game, but after all the pieces were placed, a new set of rules was needed for the next phase, where you pick up pieces and move them somewhere else. Version 2 had that functionality and was slightly better than Version 1, but now it's up to par with what the game should be. It had 214 Wins / 86 Losses / 0 Ties.
Rewards
- 3 in a Row + INF
- If 2 pieces are ever touching + 3
- Place in the center + 0.5
- Breaking up Enemy Pieces + (FREE)
- Avoid more than 4 in a 3x3 grid (don't want too many neighbors) + 0.1
- WIN CONDITION: No DOT space of any kind!
Punish
- Losing -INF
- If you had 2 pieces touching and made them not touch anymore -0.3
- Edges -0.3
- Corners -1
Version 3
Version 3 was the first version where everything came together and created the best possible model. It was the first version where we fixed many of the major bugs that were preventing the bot from working properly. I updated the judge bots to swap players and do more analytics in that scenario. This was a lot of fun and helped with testing and making the bot more reliable. I added a couple of conditions to check if a piece was near the edge to discourage certain behavior, and I added a condition to prioritize pieces in open spaces. This model had 258 Wins / 42 Losses / 0 Ties.
Rewards
- Bank Square: More surrounding +1 (friend) | +0.5 (enemy)
Punish
- Punish Empty Square -2
Version 4
In Version 4, we fixed a bunch of bugs, namely the evaluate function not detecting when the player wins and issues with the alpha functions. After Version 4, the bot's performance improved significantly, beating the random bots by quite a bit and even beating Version 3. Version 4 had 298 Wins / 2 Losses / 0 Ties. An improvement I would suggest is adding a randomized start seed so we can have a randomized baseline to compare different bots.
Version 5
Version 5 brought significant improvements over Version 4. I fixed a bug that stopped the program from continuing if the game had ended, so now the program is faster. This also improved accuracy significantly, potentially enough to beat new rivals from another team who used Stockfish. I believe I can beat them, but more testing and tuning are required. Version 5 had 184 Wins / 113 Losses / 3 Ties against Version 4.
Version 6
Version 6 was essentially just an upgrade of Version 5. I took what I learned from Version 5 and applied it to Version 6. I also made it so that after the eighth move, or once the players' pieces had been placed, the game starts considering the next phase where we minimax the moving pieces instead of just placing them. Version 6 had 199 Wins / 101 Losses / 0 Ties against Version 3.
Version 7
Version 7 brought the model up to par with Version 6, but it was about twice as fast. Using bitmasking, I was able to massively increase speed, though it required hours of work on bitmasking. In the end, this model was 5-6% worse than Version 6, so Version 6 is the one that will be used for the competition. Given more time, I would add hardcoded precomputed openings to help the bot get ahead.
Log in or sign up for Devpost to join the conversation.