Everyone enjoys watching sports such as basketball, and often times these sports can seem unpredictable. We came up with this idea to see how well we can predict outcomes in the game of basketball using data science and machine learning. In the world of sports betting and analytics, any competitive edge that one gains can prove to be extremely beneficial in the long run.
What it does
We use python to create datasets and import that into Splunk to perform data analysis to find correlations between data points. Then we use those correlations in the data to figure out which NBA team will win given their seasonal statistics.
How we built it
We used Python with the requests library to make calls to the ProBasketball API, which provided endpoints that contains information about NBA teams, games, and boxscores. We then provide this information to our data science/machine learning toolkit Splunk, which helps us find correlations in the data set and help us predict future values.
Challenges we ran into
NBA Stats API did not have good documentation and often had missing values and generally is not a well designed API. So, we needed to make additional REST calls which added on to the time we needed to do the data engineering.
Accomplishments that we're proud of
We were able to use a machine learning model to predict the winner of a NBA game with approximately 65% accuracy. This accuracy goes up to 80% when looking at teams that are more predictable because they are either really good or really bad teams, such as the Golden State Warriors or the Brooklyn Nets. Our model does not perform as well (50%) when provided a team who had a season with a lot of ups and downs (making outcomes of games more difficult to predict), such as the Portland Trailblazers, and our model predicts with 65% accuracy when presented with a team who performs closer to expectation (Oklahoma City Thunder). Given the data of the entire 2016-2017 season, our machine learning model predicts game with an accuracy of about 65%.
What we learned
We learned how to use Splunk to perform data analysis and perform machine learning. We also learned the differences between several of the machine learning models available.
What's next for Predictheball
Predictheball is very much incomplete. The first step is to use an API which allows our data engineering python script to have better performance and ideally use more information to increase the accuracy of our machine learning models.