Machine Learning Model building & Opta Soccer Data Analyzing - Team:34-Hours-No-Sleep Objective: Back-End Develop a new model for evaluating team performance based on team composition( team average rate and team average age), competitor,home-away factor and rest days, and few other fetatures. Develop a team match style strategy model, it contains 77 match event type (like, pass, corner awarded) as features, the model would provide the features with high priority for a excellent team performance. By utilizing the developed model, the coach could make a scientific and reasonable team match strategy. Front-End A web app for evaluating team performance and comparing team playing styles based on the FIFA, OPTA data Publish the website on AWS(already setup the AWS server).
Current Project Description: Machine Learning Part: Data Analyze: The dataset can be split into two categories, the rating dataset for individual players provided by FIFA 18; another dataset opta contains the details related to event happened in a single match. What we conclude that we decide to hands-on from team level not individual level, and the goals we choose are: “predicting Win-draw-Lose for the coming match for two team” “Find the most influenced factors that affects the game result”
Data Processing: This procedure can be divided into two steps, which are “convert individual data into team data” and “split data into history features and on-site features”. Convert individual data into team data: Leveraging FIFA18 dataset, OPTA , USMNT and World Cup dataset. Generate Features for next step Split data into history features and on-site features: historical data feature and current data features can both influence the game result. Historical data might contain play styles, statistics for before games, and rest date. Etc. Current data could be the average rating and age composition for today’s lineup, and also have feature about your rival. Feature engineering and selection: We finally chose out 84 features total for total 301 games played by international teams. The features included:
0: game_id 1: date 2: homeOraway 3:home_team_id 4: avg rating 5: avg age 6: home performances list 7: away_team_id 8: away avg rating
9: away avg age 10: away performance list 11: Final class
Final features we used for training model and predicting:
Current Day 0: rest date 1: homeOraway 2: self average rating 3:self average age 4: opposite average rating 5: opposite average age
6-83(#6) home last game performances 84(#7):Class(0:loss 1:draw 2: win)
Model Selection & Development. We choose following machine learning models as our method, since the time reason, we didn’t get enough sample dataset, so deep learning might not satisfy our goal perfectly. Traditional machine learning Models was chosen by us: Logistic Regression (three different kernel with l2 penalty) SVM (Linear, poly, rbf kernels, ovo vs ovr) Random Forests K Nearest Neighbour Model Validation & Performance From out result, SVM with linear kernel achieve the best overall performance. Best: Three Class Classification: Accuracy - 0.474
Here are some results:
Algorithms name: Sklearn SVM - Linear Kernel F1-measure: 0.474 Accuracy: 0.47 Recall: 0.474 Precision: 0.474 Confusion matrix [[16 6 10] [ 8 8 3]
[10 3 12]]
Algorithms name: Sklearn SVM - rbf Kernel F1-measure: 0.421 Accuracy: 0.42 Recall: 0.421 Precision: 0.421 Confusion matrix [[32 0 0] [19 0 0]
[25 0 0]]
Algorithms name: Sklearn SVM - poly Kernel F1-measure: 0.368 Accuracy: 0.37 Recall: 0.368 Precision: 0.368 Confusion matrix [[ 9 3 20] [ 9 3 7]
[ 7 2 16]]
Algorithms name: Sklearn Logistic Regression Model(newton-cg) F1-measure: 0.408 Accuracy: 0.41 Recall: 0.408 Precision: 0.408 Confusion matrix [[16 5 11] [ 8 5 6]
[11 4 10]]
Algorithms name: Sklearn Random Forests F1-measure: 0.434 Accuracy: 0.43 Recall: 0.434 Precision: 0.434 Confusion matrix [[19 2 11] [11 2 6]
[12 1 12]]
Algorithms name: Sklearn KNN F1-measure: 0.421 Accuracy: 0.42 Recall: 0.421 Precision: 0.421 Confusion matrix [[32 0 0] [19 0 0]
[25 0 0]]
Website
By Using Native Javascript, Jquery, Jquery UI, Ajax, Bootstraps framework To Build a Web Application. Implement the google translate Api to allow the people from the whole world can use our website.
The initial website capture is presented in Figure 1. Based backend calculation need, dropbox of league, teams, home-away factor and inputs of match date, average team rate, average team age are provided for user.
Figure 1 Initial website interface
After user filled in data and click predict, a bar chart and a pie chart will be displayed, the interface is presented below. The bar chart is the most important match factors to result, x value is the factor name and y value represents the importance. If move mouse over the bar, a pop hiht of factor name and accurate value will be presented.Beside the bar chart, the pie chart show the prediction of result. For regular match, rate of win, lose and tie will be displayed. If the match is for championship, the rate of tie would be zero.
Figure 2 Prediction Page Database
In order to use and browsing data in a convenient way, we create a database with Sqlite3. We first merge the data that we want to use from the dataset files. Then we build a python program to insert data into the table.
Distribution server set-up Provide the best work environment by offering solid connection from Terminal to Server based on the large cluster setup experience Install and setup the working environment for tensorflow, python, flask to make sure our data analysis person can have better experience with analyzing the Opta Soccer data.
Our project is Capacity to be further developed, and scalable not only for US soccer, The whole soccer world: Our platform will be very capacity for the further developed, due to the time limitation, we do not have time to complete all the training data, we implement over 140 feature from Opta, we set all the OPTA data(including MLS, NWSL, USMNT, USWNT, WC, WWC). We only be able to build the module for the MLS, World Cup. We try to focus on the women’s football match if we have enough time.

Log in or sign up for Devpost to join the conversation.