In soccer, multiple options are available to each player at a given moment. The decisions that a player makes can increase or decrease his or her team’s chances of victory.A player will often take a low likelihood shot, dribble into trouble, or make passes when a shot may be a better choice, to the frustration of themselves, their teammates, and the fans. Given this, we were endeavoured to understand an individual player’s decision making in the final third.

What it does

The Player Decision Projection portion of our project models each player’s decision making process, using a Gaussian process classifier built on the spatial location of his or her attacking actions. To visualize that decision-making, we create maps displaying the likelihood of a player deciding to pass, shoot, or dribble in their attacking 40 yards. This information, combined with video, holds potential for opposition scouting, as a way to evaluate the types of decisions attacking players might make, and to gameplan effectively against them. It’s also useful as a coaching tool, to understand and improve the decisions of players on a coach’s own team.

Complementing the decision projection modeling, our xShot model determines the likelihood that a possession that ends within 40 yards of goal will end in shot and the xG associated with each possession. The propensity of a player to shoot and the number of goals a player scores at the end of a possession can be evaluated.

The third piece of our project employed causal estimation to infer the value of the opportunities passed up when a player takes a shot. Many players are inefficient in their decision-making, and this estimation is an attempt to account for those efficiencies.

How we built it

The Gaussian models were tuned for multiclass classification on coordinate data from the 2011 MLS season. A model was built per player to predict whether his next action would be to pass, shoot, or dribble while on the ball in the attacking 40 yards, and an optimal kernel was selected by evaluating the models on the mean micro-average auc score across all players. The modeling approach was then validated on 2018 data against a naive baseline to ensure its value. In order to get per-player plots, a model was trained on available data for a given year and then scored and plotted across a grid.

For the xShot modeling, we developed a simplified expected goals model and an algorithm to chain individual events to possessions using the 2011-2013 MLS seasons as the training set. The characteristics of each possession ending within 40 yards of goal were summarized and a generalized linear model for shots and xG per possession (xGPoss) was developed based upon spatial, temporal, and other possession characteristics. The model was then applied to possessions for the other competitions and compared for individual players and teams.

For causal estimation, we followed the strategy employed Yam and Lopez (2018) to infer the value lost by conservative play calling in NFL fourth-down situations.We used the xG model discussed above, which incorporates distance from goal and goal mouth available for example, to assess the value of shots taken. We then built a propensity matching model in order to group possessions their similarity. This incorporated covariates including the possession’s starting point, path, and speed of play. To then assess the value of those alternative possessions that did not end in a shot, we used the xGPoss model noted above. The last step combined these three pieces: we compared the xG value of each shot to the ball progression xG values derived from 20 possessions most similar to that shot. The difference between those two is the value added (or subtracted) by taking that shot.

We built a demo frontend with Bootstrap to browse a sample of the data and visualize the Decision Projection.

Challenges we ran into

The most difficult piece of the decision-making modeling was the end product - preparing visually appealing plots that were informative and still easy to comprehend. Computationally, creating a model and plots for every player is also time intensive, so we had to do that smartly.

Accomplishments that we're proud of

Our results hit the sweet spot of being interesting and informative. They land in the nice 80-20 ratio of insights that confirm intuition to more novel insights that contradict it, and this makes it fascinating to pore through. Constructing a project that quickly captures that curiosity is difficult to do in a few weeks, much less a few hours.

We are also proud of our use of causal estimation. Causal methods are difficult to learn, and difficult to employ appropriately, and as a result are underused in sports analytics, in spite of their power. We’re pretty happy of our ability to pick up and apply that area to soccer.

What we learned

This project was enlightening in a variety of ways. On the Gaussian process heatmaps, we learned that most players seem to be playing efficiently. They pass when outside the box and shoot when inside, give or take a few feet for each individual.

The xShot model illuminated the difference between target players who hold the ball and distribute, versus those that are closer to traditional strikers, taking shots themselves.

A nice example of the two parts, the Gaussian decision model and xShot model, coming together is evident is Michael Bradley’s 2018 season. Bradley scored a long goal vs Azteca midway through the season, and the difference in his play before and after this goal is readily apparent. Pre-Azteca goal Bradley doesn’t shoot as often, especially not from distance. Afterwards his heatmap lights up, as does his xShots value, as he lets loose on more shots.

What's next for Modeling decision making in the final third

On the Gaussian process side, a lot of low-hanging fruit exists that would both improve the models and make them more interesting. Using a Gaussian process makes the models flexible, so it’d be worthwhile to incorporate additional covariates on things like game state and the opposition team. The ability to see how a player’s decision making changes when his or her team is trailing or leading would be fascinating. Additionally, making the model hierarchical, while computationally intensive, would be a great next step. For many players, especially those without much data, the model isn’t especially good, so lending modeling strength from similar players is likely worthwhile for improve the tools ability for opposition scouting.

With all measured offensive actions chained - by possession - into either a shot (with an associated xG value), interruption in play (foul, out, or loose ball), or turnover - it would be a useful next step to connect the value of the result to the preceding actions. It would then be straightforward to assess a general value of actions (take-ons, pass types) leading to a shot. For example: Are successful take ons associated with more dangerous shots (higher xG)? Does that increased goal danger justify the increased risk of losing the ball (either from a failed take on or extending possession near goal)? How is the likelihood of scoring increased or decreased by extended possession (by time or aggregate distance the ball travels between actions)?

These measurements could, in turn, be used to test the player behavioral model in greater depth.

Built With

Share this project: