Sabermetrics makes a huge assumption: the independence of events. It assumes all hitting metrics are due to only the batter's contribution. It’s time to move past a discrete view of the game and into one of continuity.
What it does
It predicts the output of OPS production of a batter - on deck duo. How good would Barry Bonds have been if he had Vladimir Guerrero protecting him instead of Edgardo Alfonzo?
How we built it
Using Retrosheet's and Lahman's databases, we noted the production of all batter and on deck combinations. We created a regression model using this production, the batter's "independent" seasonal contribution and the on deck batter's "independent" seasonal contribution.
Challenges I ran into
The databases are huge. I had wanted to perform a regression by decade, but there was far too much. With over 2 million entries in since 2000 alone, it crashed my computer a couple times.
Accomplishments that I'm proud of
We got a value to show how much the on deck batter contributes. It can be as much as 20 points of OPS. This shows that there is an attribution that is being overlooked and can open many doors in how we view the game. We can start looking at how many Runs Created are added to a batter by a runner being on base. We can look at RC against a bullpen from a bench player that can switch hit. Though none of these will produce values nearly as high as the wRC+ of Mike Trout, it is a step into viewing the game in a new way.
What I learned
What's next for BonDS
Try other regression models. Due to players with extreme statistics, the OPS seemed to form an x^3 distribution centered about the mean. Furthermore, we'd like to find a way to deal with players with little playing time. The current model requires at least 100AB per duo. It would be nice to find a solution to the "Ripken Effect".