Monday, June 4, 2012

What's The Deal With Christhian Martinez?

I'm going to take a break from Runs Credited for this post to focus on pitching for a minute.

I've been in an NL-only old school roto 4x4 fantasy baseball league since 1989. The other owners are sharks. Seriously, like Great White Sharks.  I'm a bit ashamed to admit how badly I get my ass kicked some years. The owners are so shrewd, it's nearly impossible to have any kind of edge, as every owner knows every NL player as well  as many of the minor leaguers.

Back in February, I was preparing for this year's auction, focusing on pitchers.  I was just thinking about what makes them good and whatnot when I dreamed up a new stat. I can't publicly say what the stat is*, but it does make sense** that good pitchers are good at this and poor pitchers are poor at it, regardless of whether they are starters or relievers.

I calculated this stat, I'll call it Stat X, for every NL pitcher with 25 or more innings pitched. I chose 25 innings because Ryan Franklin pitched 27 2/3 innings, or rather 27 2/3 horrendous innings in his disastrous 2011 season. I figured if a pitcher didn't pitch more than Ryan Franklin, I'm not interested in him. I quickly calculated the mean and standard deviation for Stat X, then determined the corresponding z-score for each pitcher. Voila, Stat X, ready to be scrutinized.

Before going any further, I just want to say that these are the results from 2011, and their predictive value, like any other stat from last year, is entirely questionable and, well, unknown.

There are some interesting z-scores on the list.  Let's start with Sergio Romo, the highest on this list.  His z-score compelled me to look up his other stats from last year, and I found that he did indeed have an excellent year.  I picked him up at the auction along with Joe Blanton (z-score 1.83) and Jeff Karstens (z-score 1.58) based on their Stat X from last year.  Our league changed the Saves category to Saves + Holds/2 starting this year, so Romo has been especially valuable.  Blanton has not been as good as he was last year, but he has a z-score of .93 for this year as of his last outing for Stat X, so he remains on my team for now.  Karstens has been hurt, but his return is imminent.

On Philadelphia's staff, Hamels, Lee, and Halladay were quite good at Stat X.  Hamels rated a little better last year, and he may be better than Lee and Halladay again this year due to injuries.

On the other end of the spectrum, Carlos Marmol, Brian Wilson, Edinson Volquez, and Heath Bell rated very poorly last year.  Volquez has been better this year, perhaps because of the change in scenery, but the others are doing poorly this year as well, so beware.

Finally, what's the deal with Christhian Martinez?  He had the third-best z-score (1.85) last year.  His other stats were also very good. This year, he has pitched well, except for a 10 day stretch in mid-May.  The Braves, though, only use him for long relief mop up duty.  I realize they have a great thing going with O'Flaherty, Venters, and Kimbrel to finish close games off, but Martinez is wasting away. Some team ought to make an offer on him.

*If I gave away publicly what this stat is, the other owners will seriously make me pay for it.  I'm willing to show you what I came up with - just click the link that says "Stat X".  Write me and convince me you're not in my fantasy league, and I'll even tell you what Stat X is.

**I ran this by a couple of people whose baseball wisdom I respect, and they agreed that Stat X is potentially a valuable pitching stat.

Friday, June 1, 2012

One Of My Favorite Plays Of All-Time...

Bases loaded, Neifi Perez on third, Barry Bonds at the plate, 1 out.  I laugh every time I see this play (you've got to watch the clip all the way through to fully see what happened).

This play is a clue to where I'm headed here.  A partial answer to how runs are scored is "In every way you can possibly imagine", including a heads up base runner sneaking in right in front of fielders who have the ball in their hands and could tag him out easily if they wanted to!

Wednesday, May 30, 2012

Tugging On Superman's Cape

Considering the number of hitting statistics, it is curious how few focus on runs.  Runs, after all, are what decide ball games.  It's not hits or home runs or stolen bases, but runs.  The best known statistics that focus on runs are Runs Batted In and Runs (Scored). Many believe that these two statistics can be poor describers of an individual batter's performance since both are so heavily affected by the quality of teammates batting in the line up with that batter.  To some extent, I agree.

Bill James created a stat called "Runs Created" in an attempt to measure the contribution of a hitter toward the scoring of runs.  The formula is RC = (H + BB)(TB)/(AB + BB).  I'll be honest.  I'm not a fan of Runs Created.

My problem here, is that there seems no logical reason for bringing together these variables in the formula.  It's as though James has simply thrown together several statistical measures through some combination of mathematical operations, and out spits a result that "seems" to work.  My suspicion is that there is little connection between Runs Created and actual runs scored.  I checked Runs Created with the results of the games from 5/16/12 (yesterday's games from when I started writing this) and found only a 73.9% correlation between Runs Created and actual Runs Scored*. I'll admit this is not nearly enough to fully prove my suspicions, but I'd bet good money I'm on the right track.  And, I think this is the perfect time for me to play the I'm-A-15-Year-College-Math-Instructor-And-You're-Just-Gonna-Have-To-Trust-Me-On-This-One card here, so, I'm calling Runs Created a swing and a miss.

Keeping in mind the quest for the "Holy Grail" hitting statistic and keeping the focus on the scoring of runs, I'd like to end this post by asking two simple but thought-provoking questions. How are runs scored, and how can this process be measured?

Stay thirsty my friends.

* - For all 30 teams, I used the team's combined statistics to calculate the Runs Created and compared it to the actual Runs Scored for that team using the Linear Regression Test resulting in an rvalue of .739.

Thursday, May 17, 2012

"Good" Baseball Statistics

You know, to me, baseball statistics are a lot like medical tests. In the same way a blood test or an MRI or a CATSCAN can shed light on something not previously seen, statistics can shed light on an aspect of a player's ability. Home runs reveal how often a player can hit the ball out of the park while stolen bases indicate how speedy a player can be on the base paths.

The Sabermetrics movement has brought to light many aspects of the game that were previously ignored, giving rise to a good number of new hitting and pitching statistics.  These new statistics have gained traction due in large part to the fantasy sports industry that has a seemingly insatiable thirst for the WHIPs and the OPSs and the BABIPs of players (I am admittedly one those fantasy owners).   I'm speculating that the complexity of these new stats is what is keeping them from becoming as widely used and accepted as the older ones, though.  Two important groups, casual fans and those in the baseball world that either can't do or aren't interested in doing math, need a baseball statistic to be simple before they will accept it.  WHIP has gained some traction, I think, because it can be simply interpreted as "base runners allowed per inning", while OPS and BABIP likely will take much longer to be accepted.

When it comes to hitting statistics, the age old question is how do you compare the singles hitting speedsters to the tape measure home run hitting clean-up hitters?  The "Quest for the Holy Grail" has been the attempt at creating a hitting stat that can somehow be used to compare all hitters in a fair manner.  OPS (On Base Percentage Plus Slugging Percentage) is probably the statistic that most closely comes to accomplishing this today.

However, I see several things going against OPS.  I'll address the "math" issue first.  Creating a statistic by adding two other statistics is a questionable tactic.   Without knowing the mean or standard deviation or OBP and Slugging Percentage, I'd speculate that there is greater variation with Slugging Percentage which would allow it to dominate over OBP.  Essentially, Slugging Percentage is Batman and On Base Percentage is Robin.  I could exaggerate this affect by creating a new stat, say, Stolen Bases Plus Batting Average.  Because Stolen Bases are Counting Numbers and Batting Averages are 3 digit decimals, SBPBA is entirely reliant on the number of Stolen Bases.

A lack of familiarity of both On Base Percentage and Slugging Percentage by casual fans and non-math-savvy baseball enthusiasts, as I stated earlier, may prevent it from obtaining the ubiquity of the traditional baseball stats, but ultimately I think its biggest downside is the lack of simplicity. If a player has an OPS of .850, what does that mean exactly? Is he an average hitter, above average hitter, or a below average hitter? Should we be anticipating that a run is about to score? Maybe a home run or extra base hit is about to be hit.

For me then, a "good" baseball stat is one that is simple to derive meaning from, but also shows something about the abilities of players that other stats cannot.

More to come.