Members: Login | Register | Feedback
Pitching Variance
Posted: 14 October 2006 11:53 PM   [ Ignore ]

This post can also be found—with graphics—at:  Pitching Variance

In comparing starting pitching in a playoff series, one should NOT use season averages (BB/9, K/9, SNLVAR, ERA, etc.) to compare the pitching rotations of two teams that are facing each other in the playoffs. Because pitchers only pitch once or twice per series, there is little chance that a pitcher’s performance will lie close to his regular season average in an individual game.

For example, Pedro Martinez struck out about 28 percent of the batters that he faced (prior to his injury in late June), but on some days he was red hot (like when he struck out 11 of the 25 batters he faced in San Diego on 22 April). On other days he was not so hot (like when he only struck out 3 of the 25 batters he faced in Washington on 12 April).

Averages are important, but when comparing the pitchers in an individual game, it’s far more important to look at a pitcher’s variance.

Intuitively, a rational manager would like to start a pitcher who gives up few walks and few home runs and such a rational manager would also like to be reasonably certain that the number of walks and home runs that the pitcher gives up will lie close to his (low) average. (A similar statement can also be made for strike outs).

To that end, I calculated a pitcher’s probability of striking out a batter, walking a batter and giving up a home run to a batter and I also calculated the standard deviation around those averages.

Before getting into the specific details of how the probabilities were calculated, let’s have a little fun and look at the probabilities for each starting pitcher in the Mets, Cardinals and Tigers rotation as well as the confidence intervals around those probabilities.

Once again, it’s easier to understand the probabilities and confidence intervals by viewing the graphics at the page:  Pitching Variance

Here are the probabilities that a pitcher will strike out a batter:

pitcher   lower 90% CI   SO/Bat   upper 90% CI
STL-Weaver   4%  12%  30%
NYM-Glavine   3%  13%  39%
DET-Robertson   4%  13%  35%
STL-Carpenter   10%  19%  35%
NYM-Maine   4%  16%  46%
DET-Verlander   3%  13%  39%
STL-Suppon   5%  11%  24%
NYM-Trachsel   3%  9%  25%
DET-Rogers   3%  10%  27%
STL-Reyes   3%  15%  53%
NYM-Perez   4%  16%  48%
DET-Bonderman   9%  21%  42%

Here are the probabilities that a pitcher will walk a batter:

pitcher   lower 90% CI   BB/Bat   upper 90% CI
STL-Weaver   1%  5%  18%
NYM-Glavine   1%  6%  22%
DET-Robertson   2%  6%  20%
STL-Carpenter   1%  4%  16%
NYM-Maine   2%  7%  29%
DET-Verlander   1%  6%  21%
STL-Suppon   2%  7%  22%
NYM-Trachsel   3%  9%  23%
DET-Rogers   1%  6%  22%
STL-Reyes   2%  7%  25%
NYM-Perez   3%  11%  37%
DET-Bonderman   2%  6%  17%

Here are the probabilities that a pitcher will give up a home run to a batter:

pitcher   lower 90% CI   HR/Bat   upper 90% CI
STL-Weaver   1%  4%  14%
NYM-Glavine   0%  2%  9%
DET-Robertson   1%  3%  11%
STL-Carpenter   1%  2%  8%
NYM-Maine   1%  3%  15%
DET-Verlander   0%  2%  10%
STL-Suppon   1%  2%  8%
NYM-Trachsel   0%  2%  11%
DET-Rogers   0%  2%  9%
STL-Reyes   1%  3%  16%
NYM-Perez   1%  4%  12%
DET-Bonderman   0%  2%  8%


Technical Notes

Now, the more technically minded among you probably want to know how I computed these probabilities and standard deviations.

First, I computed the ratio of strike outs to the number of batters faced* for each game that the pitcher started, the ratio of walks to the number of batters faced and the ratio of home runs to the number of batters faced. This yields estimates of the probabilities of striking out a batter, walking batter and giving up a home run to a batter (in each game).

Since we’re dealing with percentages, so a probability has to be converted into the natural logarithm of its odd ratio to obtain a continuous variable:

ln odds ratio = ln(prob) - ln(1-prob)

Then, I computed the average and standard deviations of the natural logarithms of the odd ratios, computed the 90% confidence interval and converted those intervals back into percentages:

prob. = exp(ln odds ratio)/(1 + exp(ln odds ratio))


* I took the raw numbers from MLB game logs, so I didn’t get the precise number of batters faced in each game. Instead, I had to add: outs + hits + walks to get the approximate number of batters. The trouble here is that a batter who reaches first base and then gets caught stealing is counted as two batters.

Posted: 15 October 2006 01:22 AM   [ Ignore ]   [ # 1 ]

When I originally started looking at pitching variance, I was only interested in the confidence intervals, so that we could meaningfully compare starting pitchers for the Mets-Cardinals series and (unfortunately for my Mets) the Cardinals-Tigers series.

Now that my poor Mets are sitting on Tommy Lasorda’s couch watching the post-season on Fox, I’ve started to think what else we could do with the variance.

For example, although sabermetricians have attacked the ERA for years, pitchers are still charged with the runs scored by batters who reached base while they were pitching. ... But why does the idea persist that a good pitcher can limit the number of run scored against him?

Is there any correlation between the number of runs scored against a pitcher and the pitcher’s variance? To my knowledge, previous research has only looked for a correlation between the number of runs scored against a pitcher and the pitcher’s averages. The role of variance has never been analyzed.

I’d like to run some Poisson regressions to see if the number of runs charged to a pitcher depends on the variance of his strikeout rate, the variance of his walk rate and the variance of his home run rate.

To do that, I’ll need game-by-game logs for each starting pitcher.

Assembling that stuff is incredibly time-consuming, so I was wondering if anyone has already compiled those logs and would be willing to share their dataset.

Please let me know if you have a dataset that you could share.

Stats Junkie

Posted: 16 October 2006 03:06 PM   [ Ignore ]   [ # 2 ]



"Cerebus: Lord Julius always said that insanity was the last line for defense for the master bureaucrat
Elf: I don't get it
Cerebus: It's hard to get a refund when the salesman is sniffing your crotch and baying at the moon
Elf: Oh…I get it now
Cerebus: Insanity is a vritually impregnable gambit…
...but you have to lay the groundwork early in the game…"

Cerebus 29

Posted: 16 October 2006 04:17 PM   [ Ignore ]   [ # 3 ]

[post deleted by stats junkie]