Research Note: Tangotiger’s Pitch Count Estimators
Information to accompany Walt’s analysis.
In his recent article on seasonal pitch counts over time, Steve Treder used Tangotiger’s “basic” pitch count estimator (http://www.tangotiger.net/pitchCounts.html). Tangotiger has also introduced a more expanded version (http://www.tangotiger.net/pitchCountEstimator.html).
I want to address two issues here. First is how well these estimators work in predicting actual pitch counts for a moderate-sized sample of pitchers. Second is how these two estimators relate over the study period 1946-2003.
I have two small sets of seasonal pitch count data against which to verify these estimators. From ESPN, I downloaded seasonal pitch counts for qualified starters (i.e. at least 1 IP per team game played) in the years 2001 to 2003. Vinay Kumar provided me with summary data compiled from Retrosheet.org for all pitchers in the years 1988 to 1990. In this latter data set, there were a few PAs for most pitchers for which pitch information wasn’t available, so the pitch total was estimated assuming those missing PAs were average PAs for that pitcher.
The following table summarizes the quality of the fit of the two estimators:
N 261 244 1,367
r (correlation) .934 .951 .998
R2 (var. explained) .872 .904 .996
Mean Error +61.5 +177 +72
r (correlation) .938 .959 .998
R2 (var. explained) .879 .920 .996
Mean Error -53.2 +41 +18
These correlate quite well with actual pitch counts. Still, at least when talking about qualified starters, there’s still 8-12% of the variance unaccounted for. Nevertheless, based on these small sample comparisons, the estimators should work well for noticing trends. The Spearman rank correlations are also quite high, so they appear well suited for ranking pitchers.
The mean errors are a little more troubling. They’re not huge and of relatively little concern for the xPCE, but are higher than we’d like for the bPCE. That’s still only about 2-5 pitches per start, but that may be the difference between eras.
I wondered if the mean error of the PCEs was stable for different pitch counts:
N 261 244
<3,000 pitches +86 +201
3,000-3,399 +55 +172
3,400+ +36 +144
<3,000 pitches -24 +67
3,000-3,399 -61 +34
3,400+ -80 +9
There’s clearly a difference based on the number of pitches. Oddly, the raw number of pitches incorrectly estimated is higher with fewer pitches – note this is bias, not variance. This downward trend suggests that at higher level of pitch counts such as those in the 70s, both estimators might underestimate actual pitch counts.
In summary, these estimators appear to be fine for spotting trends across seasons and perhaps for ranking pitchers within a season. The xPCE appears to be better than the bPCE. But care should be taken when using individual pitch count estimates. The estimates appear to be biased, especially the bPCE which overestimates pitch counts consistently in these samples. Like any model, it may fit the sample quite well but still produce poor estimates for at least some individuals. Of course, these results are small sample results and these estimators should be tested against a larger sample.
Relationship over Time
While I don’t have the data to assess how well these estimators predict actual pitch counts over a long period of time, I can look at how they relate to one another over time. For each 10-year period between 1946 and 2003, I calculated the ratio of the bPCE to the xPCE. Remember, the years 1981, 1994, and 1995 are deleted due to the strike.
Nothing to be too alarmed about, but the gap does vary by decade and perhaps by the pitch count, so using the bPCE probably overstates era differences.
Posted: August 14, 2004 at 08:17 PM | 4 comment(s)
Login to Bookmark