Data-Dive-Hackathon

(For more information, check out our Power Point presentation in the Github repo!)

Hypothesis:

The team which was playing best at half time would continue with that momentum to win the game.

Logistic Regression:

We chose to use a logistic regression because it uses multiple input variables (score differential at half time, scaled team advantage (combination of QBR, pass rating, and previous years win percentage), and past season standings) and predicts binary outcomes (the winner of each game).

We used Excel for data cleaning and R for computation.

We used 70% of the data to train our predictive ML model and 30% to test it.

Results:

We were able to predict game winners with 81% accuracy

We found that our SIMPLER model actually performed better than our more complex one (using a turn-over feature) We were able to determine that although we were wrong 19% of the time, within that error, 28% of them went to overtime and our Mean Absolute Score Difference was 4.5 points (3/4s of a Touchdown)

Of the games we guessed incorrectly, on average we estimated their probability to be between 35%-65%, fairly close to our threshold probability…Aka very close to guessing right.