The inspiration for this project was firstly to try using ML to solve a real world problem. Football is an incredibly hard sport to predict given how fickle the games outputs are. Therefore, we tried to use ML to predict the scores and which team won. This is super variable in the sports industry since an absolute ton of people bet online.
What it does
What the user does is choose two teams, one home and one away the program runs and outputs the winner along with the score.
How we built it
- We used pandas to clean the data, so we attributed numbers to the team names, got rid of strings such as dates and division.
- We attributed a system where 1 was if the home team won -1 was if the home team lost an 0 was if they drew.
- We created the matrices and we used Random Forest Classification to create a model.
- We created functions that took in inputs and outputted the data we wanted (goals, who won).
- We created a UI interface in a website form using html and css where the user clicks the two teams he/she wants to use and these values are taken and inputted into the respective functions.
- Another function is called to output the winning team and score.
Challenges I ran into
There was a LOT of challenges in this program especially since both of us were completely new to programming. A lot of problems occured with CSV files outputting floats instead of ints after mapping and when combing csv by copy and pasting, everything became an object. Another problem was taking the user inputs and storing them as the appropriate variables for the teams and inputting them in the function. Since our data is multivariate input and multivariate output, linear regression (what we learned in the workshop) did not really work. Thus, we had to find an appropriate model generator, which works with this. We used random forest classification which in itself was pretty tough to understand.
Accomplishments that we're proud of
We actually made an entire program from the website to the underlying code. We actually used Machine Learning to predict something and learned the basic processes of data science (which is the field I want to go into). We also got relatively good accuracy on our predicted data.
What I learned
I learned how to:
- Clean data using panda
- Use scikit to create a model
- Learned what Random Forest Classification is (kind of)
- Learned how to create a website using html
- Learned how to take data from html and input in a python script
Ideally we would like to fine tune it with more data and change the input matrix to include not just the teams but win percentage which then could automatically be inputted to give better data. Also we could tackle other areas of football such as the lucrative World Cup.