Wednesday, May 30, 2012

Tugging On Superman's Cape

Considering the number of hitting statistics, it is curious how few focus on runs.  Runs, after all, are what decide ball games.  It's not hits or home runs or stolen bases, but runs.  The best known statistics that focus on runs are Runs Batted In and Runs (Scored). Many believe that these two statistics can be poor describers of an individual batter's performance since both are so heavily affected by the quality of teammates batting in the line up with that batter.  To some extent, I agree.

Bill James created a stat called "Runs Created" in an attempt to measure the contribution of a hitter toward the scoring of runs.  The formula is RC = (H + BB)(TB)/(AB + BB).  I'll be honest.  I'm not a fan of Runs Created.

My problem here, is that there seems no logical reason for bringing together these variables in the formula.  It's as though James has simply thrown together several statistical measures through some combination of mathematical operations, and out spits a result that "seems" to work.  My suspicion is that there is little connection between Runs Created and actual runs scored.  I checked Runs Created with the results of the games from 5/16/12 (yesterday's games from when I started writing this) and found only a 73.9% correlation between Runs Created and actual Runs Scored*. I'll admit this is not nearly enough to fully prove my suspicions, but I'd bet good money I'm on the right track.  And, I think this is the perfect time for me to play the I'm-A-15-Year-College-Math-Instructor-And-You're-Just-Gonna-Have-To-Trust-Me-On-This-One card here, so, I'm calling Runs Created a swing and a miss.

Keeping in mind the quest for the "Holy Grail" hitting statistic and keeping the focus on the scoring of runs, I'd like to end this post by asking two simple but thought-provoking questions. How are runs scored, and how can this process be measured?

Stay thirsty my friends.

* - For all 30 teams, I used the team's combined statistics to calculate the Runs Created and compared it to the actual Runs Scored for that team using the Linear Regression Test resulting in an rvalue of .739.

Thursday, May 17, 2012

"Good" Baseball Statistics

You know, to me, baseball statistics are a lot like medical tests. In the same way a blood test or an MRI or a CATSCAN can shed light on something not previously seen, statistics can shed light on an aspect of a player's ability. Home runs reveal how often a player can hit the ball out of the park while stolen bases indicate how speedy a player can be on the base paths.

The Sabermetrics movement has brought to light many aspects of the game that were previously ignored, giving rise to a good number of new hitting and pitching statistics.  These new statistics have gained traction due in large part to the fantasy sports industry that has a seemingly insatiable thirst for the WHIPs and the OPSs and the BABIPs of players (I am admittedly one those fantasy owners).   I'm speculating that the complexity of these new stats is what is keeping them from becoming as widely used and accepted as the older ones, though.  Two important groups, casual fans and those in the baseball world that either can't do or aren't interested in doing math, need a baseball statistic to be simple before they will accept it.  WHIP has gained some traction, I think, because it can be simply interpreted as "base runners allowed per inning", while OPS and BABIP likely will take much longer to be accepted.

When it comes to hitting statistics, the age old question is how do you compare the singles hitting speedsters to the tape measure home run hitting clean-up hitters?  The "Quest for the Holy Grail" has been the attempt at creating a hitting stat that can somehow be used to compare all hitters in a fair manner.  OPS (On Base Percentage Plus Slugging Percentage) is probably the statistic that most closely comes to accomplishing this today.

However, I see several things going against OPS.  I'll address the "math" issue first.  Creating a statistic by adding two other statistics is a questionable tactic.   Without knowing the mean or standard deviation or OBP and Slugging Percentage, I'd speculate that there is greater variation with Slugging Percentage which would allow it to dominate over OBP.  Essentially, Slugging Percentage is Batman and On Base Percentage is Robin.  I could exaggerate this affect by creating a new stat, say, Stolen Bases Plus Batting Average.  Because Stolen Bases are Counting Numbers and Batting Averages are 3 digit decimals, SBPBA is entirely reliant on the number of Stolen Bases.

A lack of familiarity of both On Base Percentage and Slugging Percentage by casual fans and non-math-savvy baseball enthusiasts, as I stated earlier, may prevent it from obtaining the ubiquity of the traditional baseball stats, but ultimately I think its biggest downside is the lack of simplicity. If a player has an OPS of .850, what does that mean exactly? Is he an average hitter, above average hitter, or a below average hitter? Should we be anticipating that a run is about to score? Maybe a home run or extra base hit is about to be hit.

For me then, a "good" baseball stat is one that is simple to derive meaning from, but also shows something about the abilities of players that other stats cannot.

More to come.