Title:
SLM (Scrabble Learning Model)
Who:
Morgan Lo (mlo5), WaTae Mickey (wmickey), Moise Gasana (mgasana), Ben Kang (bhkang).
Introduction:
Online, there exists a small but mighty community for the board game Scrabble, dedicated to following the professional Scrabble scene. Within this community, in any given year, there are likely 15-25 professional Scrabble tournaments streamed on Twitch and YouTube, each covering the top board, composed of multiple camera feeds, the main one of which is a top-down view of the Scrabble board.
These live streams include broadcasters, typically Scrabble Grandmasters, who will attempt to evaluate positions or possibilities, which often involves the use of Scrabble evaluation engines. Unfortunately, because tournament boards are not digitized in a manner like professional chess, these commentators (as well as viewers) must manually plug in values for themselves into these engines. This can be a tedious and arduous task, since a Scrabble game involves placing one hundred letter tiles.
Our project's goal is to leverage both computer vision and convolutional neural networks to convert a Scrabble board from a livestream screenshot into programmatically useful data, which could then be used for a variety of applications, such as computer analysis to return best moves, a website to display digitized live game results, or a continuous analysis bot that processes the frames from a livestream itself.
Although the implementation details are somewhat unclear at the moment, our very, very preliminary plan (very much subject to change) is to utilize OpenCV for basic detection of the board and locating corners, then slicing up the board to get images of each letter, and then applying a CNN to perform classification. As for a dataset, we intend to train the CNN with transfer learning via a Printed Character Dataset (unsure which one precisely), but ideally one with fonts similar to those used by Scrabble tiles.
Related Work:
There was no prior work that project members were aware of prior to proposing this project; however, after researching possible existing implementations, we have found three:
An implementation by a software dev agency for Mattel, the makers of Scrabble, to develop an app to count the points on a scrabble board — not open source, but a page on it here: https://hookbang.com/portfolio/scrabblevision/ An implementation using the default scrabble board using OpenCV with k-nearest neighbor (not relevant to us), but which contains some pertinent details about some of the finer details of classification: https://github.com/jheidel/scrabble-opencv An implementation for point counting only, and using only a specific image board and image, with OpenCV and Keras: https://github.com/piotr-walen/Scrabble-Detector
Our implementation will be different from all three in the way that we expect to run our program on a video feed as opposed to static images, which present certain different challenges associated with the orientation of the images as well as the specific technique we intend to use (different from all three).
Data:
For our project, we will employ a Printed Character Dataset to train our convolutional neural network (CNN) for recognizing individual Scrabble tiles from screenshots of livestream footage. The chosen dataset will need to include a variety of fonts that closely resemble those used on Scrabble tiles, ensuring high accuracy in real-world application. We anticipate the need for significant data preprocessing, such as resizing images for uniformity, augmenting the dataset to improve generalization (via transformations like rotation, scaling, and noise addition), and potentially manually annotating a subset of images specifically tailored from captured screenshots of actual Scrabble games to address any discrepancies between standard printed characters and the fonts used on Scrabble tiles.
We aim to compile a sufficiently large dataset to train a robust model, estimating thousands of labeled images across various font styles. To streamline the collection and labeling process, we will explore using automated scripts to extract and label images from video streams of Scrabble tournaments, followed by manual corrections to ensure accuracy.
Methodology:
Model Architecture Our model will utilize a convolutional neural network (CNN) architecture, which is well-suited for image classification tasks. Specifically, we plan to leverage a pre-trained model such as VGG16 or ResNet, employing transfer learning to adapt it to our specific task of recognizing Scrabble tiles. This approach allows us to benefit from the pre-trained model's ability to detect general features and fine-tune it to specialize in the peculiarities of Scrabble tile fonts.
Training the Model The CNN will be trained using a combination of real and augmented data to enhance its ability to generalize across different game conditions and tile orientations. We will use a standard supervised learning approach, where the model learns from labeled data (tile images labeled with their corresponding letters). Training will involve typical steps such as dividing the data into training and validation sets, using batch processing, and applying optimizers like Adam or SGD to minimize the classification error. The model's performance will be evaluated using accuracy metrics and a confusion matrix to understand specific areas of strength and weakness in tile recognition.
Implementation Challenges and Backup Plans Given the dynamic and varied nature of video data, particularly from different lighting conditions and camera angles in livestreams, one of the main challenges will be ensuring the model is robust enough to handle these variations. Additionally, the segmentation of individual tiles from a complex, moving video frame will require precise and efficient image processing techniques.
If initial results are not satisfactory, we might consider alternative approaches such as: Enhancing the data preprocessing step to include more sophisticated image normalization and augmentation techniques. Exploring different CNN architectures that may be more adept at handling the specific variations in our data. Implementing a more complex system that first detects the entire board and its geometry before focusing on individual tiles, potentially using techniques from geometric deep learning.
Metrics:
For our model success is determined by being able to take a picture of a scrabble board and accurately capture the board state in a form which is usable by websites or more in-depth models. Accuracy for our model needs to be very high in order to be used for competitions there must be little to no error in word and letter recognition. We will access our model against individual letters such that it can recognize all of the possible scrabble tiles. It will need to be able to effectively understand each individual tile even on a cluttered board. The base goal is to be able to recognize each tile successfully and retain decent accuracy (90%) on large boards with many tiles. Target goal is to be able to accurately (95%+) recognize all board states no matter the clutter given a clear image of the board. A stretch goal would include additional functionality including being able to accurately return a list of possible moves sorted by points.
Ethics:
What broader societal issues are relevant to your chosen problem space? Privacy and Consent: Our project involves processing livestreams, so an ethical consideration is the privacy of the players and their consent to have their game analyzed. It would be important to ensure that all players are aware of and consent to the use of AI in analyzing their gameplay. Furthermore, the data derived from these videos, such as player strategies or frequently made moves, should respect player privacy. Bias and Fairness: AI systems can inadvertently learn and perpetuate biases present in their training data, so, in the context of our project, the system could favor certain strategies or plays over others if the dataset is not diverse in terms of strategy or game style. Ensuring the dataset reflects a wide range of playing styles and levels is essential. Accessibility: Our project has the potential to make Scrabble more accessible to people who might otherwise not engage with it, so it is important to ensure that this would be equally accessible to a wide variety of people. Why is Deep Learning a good approach to this problem? Deep Learning is a good approach to this problem for a variety of reasons. Pattern Recognition: Deep Learning models are exceptionally good at recognizing complex patterns in images and videos, which makes them well-suited for identifying Scrabble tiles and their arrangements on the board. Transfer Learning: The use of pre-trained models allows us to significantly reduce the amount of Scrabble-specific data needed to train our model. Real-Time Processing: Deep learning models can analyze and interpret data in real-time, so it would be ideal for live-streamed content. Adaptability: Deep learning models can be continuously improved with new data, so as more games are analyzed, the model can learn from the new data and become more accurate.
Division of labor:
Data preprocessing - Moise Corner detection and image splitting- WaTae Training - Ben Optimizing - Morgan
Log in or sign up for Devpost to join the conversation.