Page rendered in 0.2417 seconds
42 querie(s) executed
— Where BTF's Members Investigate the Grand Old Game
Saturday, August 14, 2004
Research Note: Tangotiger’s Pitch Count Estimators
Information to accompany Walt’s analysis.
In his recent article on seasonal pitch counts over time, Steve Treder used Tangotiger’s “basic” pitch count estimator (http://www.tangotiger.net/pitchCounts.html). Tangotiger has also introduced a more expanded version (http://www.tangotiger.net/pitchCountEstimator.html).
I want to address two issues here. First is how well these estimators work in predicting actual pitch counts for a moderate-sized sample of pitchers. Second is how these two estimators relate over the study period 1946-2003.
I have two small sets of seasonal pitch count data against which to verify these estimators. From ESPN, I downloaded seasonal pitch counts for qualified starters (i.e. at least 1 IP per team game played) in the years 2001 to 2003. Vinay Kumar provided me with summary data compiled from Retrosheet.org for all pitchers in the years 1988 to 1990. In this latter data set, there were a few PAs for most pitchers for which pitch information wasn’t available, so the pitch total was estimated assuming those missing PAs were average PAs for that pitcher.
The following table summarizes the quality of the fit of the two estimators:
These correlate quite well with actual pitch counts. Still, at least when talking about qualified starters, there’s still 8-12% of the variance unaccounted for. Nevertheless, based on these small sample comparisons, the estimators should work well for noticing trends. The Spearman rank correlations are also quite high, so they appear well suited for ranking pitchers.
The mean errors are a little more troubling. They’re not huge and of relatively little concern for the xPCE, but are higher than we’d like for the bPCE. That’s still only about 2-5 pitches per start, but that may be the difference between eras.
I wondered if the mean error of the PCEs was stable for different pitch counts:
There’s clearly a difference based on the number of pitches. Oddly, the raw number of pitches incorrectly estimated is higher with fewer pitches – note this is bias, not variance. This downward trend suggests that at higher level of pitch counts such as those in the 70s, both estimators might underestimate actual pitch counts.
In summary, these estimators appear to be fine for spotting trends across seasons and perhaps for ranking pitchers within a season. The xPCE appears to be better than the bPCE. But care should be taken when using individual pitch count estimates. The estimates appear to be biased, especially the bPCE which overestimates pitch counts consistently in these samples. Like any model, it may fit the sample quite well but still produce poor estimates for at least some individuals. Of course, these results are small sample results and these estimators should be tested against a larger sample.
Relationship over Time
While I don’t have the data to assess how well these estimators predict actual pitch counts over a long period of time, I can look at how they relate to one another over time. For each 10-year period between 1946 and 2003, I calculated the ratio of the bPCE to the xPCE. Remember, the years 1981, 1994, and 1995 are deleted due to the strike.
Nothing to be too alarmed about, but the gap does vary by decade and perhaps by the pitch count, so using the bPCE probably overstates era differences.
You must be logged in to view your Bookmarks.
Loser Scores 2014
(8 - 2:36pm, Nov 15)
Winning Pitcher: Bumgarner....er, Affeldt
(43 - 8:29am, Nov 05)
Last: ERROR---Jolly Old St. Nick
What do you do with Deacon White?
(17 - 12:12pm, Dec 23)
Last: Alex King
(15 - 12:05am, Oct 18)
Nine (Year) Men Out: Free El Duque!
(67 - 10:46am, May 09)
Who is Shyam Das?
(4 - 7:52pm, Feb 23)
Last: RoyalsRetro (AG#1F)
Greg Spira, RIP
(45 - 9:22pm, Jan 09)
Last: Jonathan Spira
Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010
(5 - 12:50am, Sep 18)
Mike Morgan, the Nexus of the Baseball Universe?
(37 - 12:33pm, Jun 23)
Last: The Keith Law Blog Blah Blah (battlekow)
Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011
(2 - 8:03pm, May 16)
Last: Diamond Research
Retrosheet Semi-Annual Site Update!
(4 - 3:07pm, Nov 18)
What Might Work in the World Series, 2010 Edition
(5 - 2:27pm, Nov 12)
Last: fra paolo
Predicting the 2010 Playoffs
(11 - 5:21pm, Oct 20)
SABR 40: Impressions of a First-Time Attendee
(5 - 11:12pm, Aug 19)
Last: Joe Bivens, Minor Genius
St. Louis Cardinals Midseason Report
(12 - 12:42am, Aug 10)