Inspiration

We were inspired to work on a project with a lot of data and predictions that can be realized in the future. We aim to build on this model with the goal of outperforming sports betting sites.

What it does

Our model takes player names from the home and visiting team and returns the predicted score difference of the game.

How we built it

We began the build of this project by collecting all MLB player stats for the past ~30 years as well as games with the starting roster. Then we implemented polynomialization, standardization, and PCA to our features. Finally we used the _ sklearns ridge _ model to implement linear regression.

Challenges we ran into

A big problem for us was the runtime. Since we had large datasets, runtime for training, collecting data, validation, and hyperparameter testing was long. We ran into problems with the data, with names and other features being different or misspelled in the data which required time to solve.

Accomplishments that we're proud of

We are proud of the model that we created, namely due to the nature (sports games) being highly variable. Integrating the model into a web app was also an interesting accomplishment.

What we learned

Along with the implementation of the machine learning model, we learned how to create an integrated web app using _ html _, _ javaScript _, and _ flask _. We also improved and were introduced to new python packages such as _ torch _, _ sklearns ridge _, _ pickle _, and others.

What's next for Baseball Outcome Predictor

The next step for our model is to improve its accuracy, with the goal of being able to beat out the sports betting websites. Revising and re selecting the features we used are some of the next steps in improving our model.

Share this project:

Updates