You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Saturday, August 14, 2004Research Note: Tangotiger’s Pitch Count EstimatorsInformation to accompany Walt’s analysis. In his recent article on seasonal pitch counts over time, Steve Treder used Tangotiger’s “basic” pitch count estimator (http://www.tangotiger.net/pitchCounts.html). Tangotiger has also introduced a more expanded version (http://www.tangotiger.net/pitchCountEstimator.html). I want to address two issues here. First is how well these estimators work in predicting actual pitch counts for a moderatesized sample of pitchers. Second is how these two estimators relate over the study period 19462003. Empirical Verification I have two small sets of seasonal pitch count data against which to verify these estimators. From ESPN, I downloaded seasonal pitch counts for qualified starters (i.e. at least 1 IP per team game played) in the years 2001 to 2003. Vinay Kumar provided me with summary data compiled from Retrosheet.org for all pitchers in the years 1988 to 1990. In this latter data set, there were a few PAs for most pitchers for which pitch information wasn’t available, so the pitch total was estimated assuming those missing PAs were average PAs for that pitcher. The following table summarizes the quality of the fit of the two estimators:
These correlate quite well with actual pitch counts. Still, at least when talking about qualified starters, there’s still 812% of the variance unaccounted for. Nevertheless, based on these small sample comparisons, the estimators should work well for noticing trends. The Spearman rank correlations are also quite high, so they appear well suited for ranking pitchers. The mean errors are a little more troubling. They’re not huge and of relatively little concern for the xPCE, but are higher than we’d like for the bPCE. That’s still only about 25 pitches per start, but that may be the difference between eras. I wondered if the mean error of the PCEs was stable for different pitch counts:
There’s clearly a difference based on the number of pitches. Oddly, the raw number of pitches incorrectly estimated is higher with fewer pitches – note this is bias, not variance. This downward trend suggests that at higher level of pitch counts such as those in the 70s, both estimators might underestimate actual pitch counts. In summary, these estimators appear to be fine for spotting trends across seasons and perhaps for ranking pitchers within a season. The xPCE appears to be better than the bPCE. But care should be taken when using individual pitch count estimates. The estimates appear to be biased, especially the bPCE which overestimates pitch counts consistently in these samples. Like any model, it may fit the sample quite well but still produce poor estimates for at least some individuals. Of course, these results are small sample results and these estimators should be tested against a larger sample. Relationship over Time While I don’t have the data to assess how well these estimators predict actual pitch counts over a long period of time, I can look at how they relate to one another over time. For each 10year period between 1946 and 2003, I calculated the ratio of the bPCE to the xPCE. Remember, the years 1981, 1994, and 1995 are deleted due to the strike. Year Ratio Nothing to be too alarmed about, but the gap does vary by decade and perhaps by the pitch count, so using the bPCE probably overstates era differences. 
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2017
(7  11:24am, Dec 22) Last: fra paolo 20172021 CBA (1  10:47am, Oct 04) Last: villageidiom Loser Scores 2015 (12  2:28pm, Nov 17) Last: jingoist Loser Scores 2014 (8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: ERRORJolly Old St. Nick What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: fra paolo 

Page rendered in 0.3294 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Chris Dial Posted: August 14, 2004 at 08:54 PM (#797781)I’ll pretty this up later. Patience is a virtue. Or Dan/Jim can.
Those numbers are quite good.
I’m assuming Tango only had access to a few years worth of actual pitch count numbers, which means that the samples used here are samples that were used to create the pitch count estimator in the first place (so of course the R^2 will be high, it’s optmized for this sample). It’s a given that the estimator will be less accurate for other samples that weren’t used to generate the equation. It’s not Walt’s fault  he’s presenting all he can  but it’s still a little misleading because this accuracy will not hold up for other samples. It’s just a matter of how big a difference should we expect when trying to make estimates?
Actually, it’s even more amazing. And the “insample” nature is not that big a deal.
First, I did have two sets of data 20012003 and 19881990 and the results look pretty similar.
Second, the bPCE is just a straight linear function. I wouldn’t say it was “optimized” really.
Third, I’m not clear if he’s referring to the xPCE only or both, but Tango tells me that he came up with the formula based just on the league average and Randy Johnson and Brad Radke (and theory). So these really are outofsample tests and it really is amazing the estimators fits this well.
I do expect differences across eras but I doubt they’ll be so dramatic as to change the conclusion that it’s a pretty good estimator.
Doing some additional work on the 20012003 data, it appears that we get more accuracy if we break BIP into GB, FB, other with GB taking fewer pitches than FB. Sac bunts allowed and GDP are also significant (fewer pitches). But that full model only buys us about 3 points of Rsquare.
There doesn’t appear to be too much heteroscedasticity (i.e. increasing error variance as np increases). So the big remaining concern is the biasnp relationship which suggests some nonlinearity. I haven’t figured out the relationship yet though and I’m probably not going to put too much more effort into it given how well it already works.
Again, this note looks just at the level of pitcherseason. It may not work well for game pitch counts. I’m not supercomfortable in using it for point estimation—i.e. Jose Lima threw X pitches in 1998—even at the seasonal level. Both these argue against using it to estimate something like PAP3. But for correlational or trend analysis, it would seem to work quite well as a substitute for seasonal pitch counts and it seems to work just fine for ranking pitchers (at least within a season).
Second, the bPCE is just a straight linear function. I wouldn’t say it was “optimized” really.
Third, I’m not clear if he’s referring to the xPCE only or both, but Tango tells me that he came up with the formula based just on the league average and Randy Johnson and Brad Radke (and theory). So these really are outofsample tests and it really is amazing the estimators fits this well.
I assumed tango used linear regression to create his equations using a data set of at least a few years, hence the “optimized” comment.
If he really did just create the model using league average, randy, and radke, it begs the question why didn’t he use multiple regression and a larger sample? I’d assume it would be more accurate and have better generalizability.
You must be Registered and Logged In to post comments.
<< Back to main