Basketball has always been my favorite sport. Whenever the March Madness tournaments showed up each year, I always wanted to create the most accurate bracket. However, I never found a way to easily compile and analyze data to make valid decisions on which team would advance the farthest. When I discovered the R programming software, I realized that I could combine my love for data science programming with my love for sports.
What it does
My hack compiles various regular season stats of march madness teams over the past four years, organizes and stores them, and determines which characteristics, such as field goal percentage, have a high correlation toward march madness success. Once these characteristics are determined, my hack created a linear regression equation that determined how far a team was expected to advance in the tournament based on their regular season stats. I used this equation to predict the outcome of the 2017 March Madness tournament.
How I built it
In order to gather data, I took CSV files from the NCAA website and organized the stats through the R Studio platform. I also used the ggplot library to build data visualizations of my data. Furthermore, I used a ranking system that determined how far a team advanced into the tournament. The lower the number/ranking, the farther the team advanced.
Challenges I ran into
One challenge I ran into involved making my hack better at machine learning. I wanted to find a way to tell the computer itself to give me which type of trend line (linear vs logistic vs natural splines) fits my data the best.
Accomplishments that I'm proud of
I am proud of my method of organizing my data. When I gathered the csv files of the regular season stats, the files contained stats for all 345 teams of each season, but I only cared about 64 teams. Furthermore, each csv file contained only one particular stat. For example, one csv file contained field goal percentages for 2016 while another contained turnovers per game for 2013. I used an algorithm that told the computer that if a team name matches one that made the tournament for a particular year, then insert that team's stat into a corresponding cell of an overall data table.
What I learned
This is the first time that I ever used R in a large project. I refined my skills in using various functions of R, such as creating regression analyses and creating data visualizations.
What's next for March Madness Predictor
While as of March 26, my bracket predictions are in the 99.4 percentile in the world, I want to add more machine learning into my hack. For example, I want my computer to be able to pick for me which set of variables create the most accurate equation for any type of regression.