Inspiration
None of our team had participated in a hackathon before, we are all massive Georgia Tech sports fans, and we are all interested in Data & Analytics, so we figured participating in the GT Sports Hackathon would be a great time and introduction to hackathons. Our entire team is also interested in professional sports, and had some team members with forecasting experience through taking Regression and Forecasting in previous semesters. This challenge allowed us to apply some of our knowledge to a real problem, while assisting the other team members in learning new skills.
What it does
We built a predicative model to forecast MLB game attendance across the league. More specifically, it forecasts the Atlanta Braves home attendance in a more precise manner than just using the median season attendance average as a rough estimate. Additionally, it can help to assist the Braves ticketing group with optimized ticket tier pricing based off of the predicted attendance. Our model predicted every single 2019 Atlanta Braves home game based off of training on 2012-2018 data. On average, it was only off by 468 people per game (~98% accurate). Our MSE was 35485392, where as the median baseline MSE was 9907762390. This means our model does a better job of predicting a Braves home game attendance than if we just assumed the median for the season by a factor of three.
How we built it
The majority of the project was completed using python. The first step to complete was to create a data set that would help to capture all of the potential variance in stadium attendance for baseball games. Web Scraping was completed using APIs and other sources, as well as manually pulling data into CSVs for use. After this the data was cleaned in a way that would allow for it to be used effectively by a model. We used random forest regressors to train a bunch of biased trees, which then are averaged over each other to remove the bias in the data. This worked very well on the training data as our predicted R-Squared value was 0.915 - which means we didn't over-fit our model and it predicts attendance very well.
Challenges we ran into
None of our team members had experience with web scraping to collect publicly available data, so gathering the data that was needed as an input for our model was time-consuming and tricky. A lot of free API's restrict the number of requests you can do per day, so this required some creativity in finding ways around this constraint. This was also our first time utilizing Github, so there were plenty of growing pains with that. Additionally, running into minor errors with our data while trying to fit various forecasting models required painstaking time to back-track through our code to find errors with how we were manipulating and cleaning our data. On top of that, the 24 hours went by a lot faster than we anticipated, so the very end was hectic in getting everything together in the final model.
Accomplishments that we're proud of
Ultimately, we are proud to have spent what little free time we have during the weekend working together on a challenging problem that allowed us to utilize our knowledge and skillsets that Georgia Tech has taught us and apply them to a real world problem.
What we learned
Due to our various backgrounds in Civil Engineering, Industrial Engineering, and Computer Science we all were able to teach and learn new skills over the course of the hackathon. The CS major taught us how to utilize GitHub and work with object-oriented programming. The Civil Engineer helped us with data collecting, cleaning and interpreting the data. And finally, the Industrial Engineers helped teach how to fit and train a model and work with data in python.
What's next for THWg
Most of us are graduating in the next 12 months, so we will be pursuing our careers in data & analytics as well as Civil Engineering. Otherwise, they will continue their education and hopefully be inspired to participate in more Hackathon's after successfully completing their first one!
Log in or sign up for Devpost to join the conversation.