Summary
This was a project that I did as my final project in MATH 725 (The Mathematics of Data and Networks I). I have always enjoyed playing and watching soccer, and my dream job is doing data analysis within a soccer club. Within this project I created a GUI in C# so that I could collect passing data easily from a match. The GUI is very simple; for all 11 players it includes a buttons that indicate they made a pass, received a pass (enabled after pass made has been pressed), shot on target, shot off target, or turned the ball over. The information is stored in a simple array and written to a file. I used this to log the passing data from the first half while watching Chelsea vs Manchester United, in the Premier League, on November 28th 2021. The I collected is present in the file 'Chelsea11-28.csv', which is used in the main file. There is also an 'Example.csv' file, in which I made up some numbers for Paris Saint Germain, so that the simulated game would be a little more interesting.
I wrote the match simulator in Python, as the libraries NetworkX and Pandas allow me to easily create the graph, exercise the weighted walks, and process the data. I created methods that generate the graph from the file, make a singular play (take one step in the random walk), simulate a single possession (makes singular plays until a turnover or shot occurs), methods to visualize and print the possessions, and finally a method to simulate an entire game (simulates possessions per team a fixed amount of times). The weights in the random walk are based upon the number of times something happened within the data. For a very simple example, assume Messi, Ronaldo, and Neymar are on the same team. Messi passes to Ronaldo 2 times and Messi passes to Neymar 3 times. For simplicity's sake assume Messi does nothing else according to the data. In this example when Messi is in possession of the ball, he passes to Neymar with a probability of 3/5 or 60%, and passes to Ronaldo with a probability of 2/5 or 40%.
I do not think that this would accurately simulate a match as there are many more factors that play into a match, but I would be interested to see how well it performs with more data. If you are interested in looking at the code, the GitHub repository is linked below.

Log in or sign up for Devpost to join the conversation.