Baseball for the Thinking Fan

Login | Register | Feedback

You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Saturday, August 14, 2004

Research Note: Tangotiger’s Pitch Count Estimators

Information to accompany Walt’s analysis.

In his recent article on seasonal pitch counts over time, Steve Treder used Tangotiger’s “basic” pitch count estimator ( Tangotiger has also introduced a more expanded version (

I want to address two issues here. First is how well these estimators work in predicting actual pitch counts for a moderate-sized sample of pitchers. Second is how these two estimators relate over the study period 1946-2003.

Empirical Verification

I have two small sets of seasonal pitch count data against which to verify these estimators. From ESPN, I downloaded seasonal pitch counts for qualified starters (i.e. at least 1 IP per team game played) in the years 2001 to 2003. Vinay Kumar provided me with summary data compiled from for all pitchers in the years 1988 to 1990. In this latter data set, there were a few PAs for most pitchers for which pitch information wasn’t available, so the pitch total was estimated assuming those missing PAs were average PAs for that pitcher.

The following table summarizes the quality of the fit of the two estimators:
qualified 1988-90
qualified 1988-90
N 261 244 1,367

r (correlation) .934 .951 .998
R2 (var. explained) .872 .904 .996
Mean Error +61.5 +177 +72
r (correlation) .938 .959 .998
R2 (var. explained) .879 .920 .996
Mean Error -53.2 +41 +18

These correlate quite well with actual pitch counts. Still, at least when talking about qualified starters, there’s still 8-12% of the variance unaccounted for. Nevertheless, based on these small sample comparisons, the estimators should work well for noticing trends. The Spearman rank correlations are also quite high, so they appear well suited for ranking pitchers.

The mean errors are a little more troubling. They’re not huge and of relatively little concern for the xPCE, but are higher than we’d like for the bPCE. That’s still only about 2-5 pitches per start, but that may be the difference between eras.

I wondered if the mean error of the PCEs was stable for different pitch counts:

qualified 1988-90
N 261 244
<3,000 pitches +86 +201
3,000-3,399 +55 +172
3,400+ +36 +144
<3,000 pitches -24 +67
3,000-3,399 -61 +34
3,400+ -80 +9

There’s clearly a difference based on the number of pitches. Oddly, the raw number of pitches incorrectly estimated is higher with fewer pitches – note this is bias, not variance. This downward trend suggests that at higher level of pitch counts such as those in the 70s, both estimators might underestimate actual pitch counts.

In summary, these estimators appear to be fine for spotting trends across seasons and perhaps for ranking pitchers within a season. The xPCE appears to be better than the bPCE. But care should be taken when using individual pitch count estimates. The estimates appear to be biased, especially the bPCE which overestimates pitch counts consistently in these samples. Like any model, it may fit the sample quite well but still produce poor estimates for at least some individuals. Of course, these results are small sample results and these estimators should be tested against a larger sample.

Relationship over Time

While I don’t have the data to assess how well these estimators predict actual pitch counts over a long period of time, I can look at how they relate to one another over time. For each 10-year period between 1946 and 2003, I calculated the ratio of the bPCE to the xPCE. Remember, the years 1981, 1994, and 1995 are deleted due to the strike.

Year Ratio
1946-1955 1.053
1956-1965 1.042
1966-1975 1.042
1976-1985 1.049
1986-1995 1.042
1996-2003 1.035

Nothing to be too alarmed about, but the gap does vary by decade and perhaps by the pitch count, so using the bPCE probably overstates era differences.

Walt Davis Posted: August 14, 2004 at 08:17 PM | 4 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Chris Dial Posted: August 14, 2004 at 08:54 PM (#797781)

I’ll pretty this up later.  Patience is a virtue.  Or Dan/Jim can.

   2. Human Papelbon Virus Posted: August 15, 2004 at 03:47 AM (#798492)

Those numbers are quite good.

I’m assuming Tango only had access to a few years worth of actual pitch count numbers, which means that the samples used here are samples that were used to create the pitch count estimator in the first place (so of course the R^2 will be high, it’s optmized for this sample). It’s a given that the estimator will be less accurate for other samples that weren’t used to generate the equation. It’s not Walt’s fault - he’s presenting all he can - but it’s still a little misleading because this accuracy will not hold up for other samples. It’s just a matter of how big a difference should we expect when trying to make estimates?

   3. Walt Davis Posted: August 16, 2004 at 10:08 PM (#800796)

Actually, it’s even more amazing. And the “in-sample” nature is not that big a deal.

First, I did have two sets of data 2001-2003 and 1988-1990 and the results look pretty similar.

Second, the bPCE is just a straight linear function.  I wouldn’t say it was “optimized” really.

Third, I’m not clear if he’s referring to the xPCE only or both, but Tango tells me that he came up with the formula based just on the league average and Randy Johnson and Brad Radke (and theory).  So these really are out-of-sample tests and it really is amazing the estimators fits this well.

I do expect differences across eras but I doubt they’ll be so dramatic as to change the conclusion that it’s a pretty good estimator.

Doing some additional work on the 2001-2003 data, it appears that we get more accuracy if we break BIP into GB, FB, other with GB taking fewer pitches than FB.  Sac bunts allowed and GDP are also significant (fewer pitches).  But that full model only buys us about 3 points of R-square.

There doesn’t appear to be too much heteroscedasticity (i.e. increasing error variance as np increases).  So the big remaining concern is the bias-np relationship which suggests some non-linearity.  I haven’t figured out the relationship yet though and I’m probably not going to put too much more effort into it given how well it already works.

Again, this note looks just at the level of pitcher-season.  It may not work well for game pitch counts.  I’m not super-comfortable in using it for point estimation—i.e. Jose Lima threw X pitches in 1998—even at the seasonal level.  Both these argue against using it to estimate something like PAP3.  But for correlational or trend analysis, it would seem to work quite well as a substitute for seasonal pitch counts and it seems to work just fine for ranking pitchers (at least within a season).

   4. Human Papelbon Virus Posted: August 17, 2004 at 01:59 AM (#801274)

Second, the bPCE is just a straight linear function. I wouldn’t say it was “optimized” really.

Third, I’m not clear if he’s referring to the xPCE only or both, but Tango tells me that he came up with the formula based just on the league average and Randy Johnson and Brad Radke (and theory). So these really are out-of-sample tests and it really is amazing the estimators fits this well.

I assumed tango used linear regression to create his equations using a data set of at least a few years, hence the “optimized” comment.

If he really did just create the model using league average, randy, and radke, it begs the question why didn’t he use multiple regression and a larger sample? I’d assume it would be more accurate and have better generalizability.

You must be Registered and Logged In to post comments.



<< Back to main

BBTF Partner

Dynasty League Baseball

Support BBTF


Thanks to
for his generous support.


You must be logged in to view your Bookmarks.


Page rendered in 0.2515 seconds
42 querie(s) executed