— Where BTF's Members Investigate the Grand Old Game
Sunday, February 29, 2004
What does play-by-play data tell us about DIPS?
Several years ago, Voros McCracken published a 2-part thesis, entitled Defense Independent Pitching Stats (DIPS). In it, he introduced two important concepts in evaluating baseball performance. One, that a player?s (in this case, a pitcher?s) sample performance does not necessarily tell you a lot about his true talent, and thus his likely future performance. And two, that different components of a player?s performance do not necessarily convey the same quality of information about that player?s true talent with regard to that component.
Voros found that a pitcher?s batting average on balls in play ($H or BABIP), defined as non-hr hits per non-hr balls in play, conveys quite a bit less information about a pitcher?s talent than does, say, his HR, BB, or SO rate. He concluded, and correctly so, that a pitcher?s sample BABIP is not a particularly good predictor of his future BABIP. He found that the correlation between a pitcher?s one year BABIP and his next year BABIP was low, as compared to that of his HR, BB, and SO rates. In fact, when Voros ran a year-to-year (y-t-y) linear regression of all pitchers who had at least 162 IP?s in 1998 and 1999, he got the following correlation coefficients.
Voros? conclusion, based upon the .153 correlation coefficient above, was "that the ability for pitchers to prevent balls in play doesn?t exist, or if it does, it doesn?t really amount to much for almost all pitchers." Unfortunately, Voros did not adequately explain exactly what these correlations mean, the relationship between sample size and correlation, and how team defense and a pitcher?s home park factor into the equation.
In baseball, when you correlate a measure of a player?s performance from one time period to another (it could be from one year to another but it doesn?t have to be), the resultant correlation coefficient simply indicates how well the measurement in the first time period is useful in predicting the same measurement in the second time period. While the relatively low correlation for $H in Voros? regression suggests that a pitcher?s BABIP in one year is not very useful in predicting his BABIP in the next year, it does not necessarily mean that pitchers have little or no ability to prevent hits on balls in play.
For one thing, if all pitchers in the major leagues, particularly those with most of the IP?s, had around the same ability to prevent hits on balls in play, then the y-t-y correlations for BABIP would be near zero, even though preventing hits on balls in play may very much be a skill. Secondly, if the IP?s or TBF?s (technically, the BIP?s) of the average pitcher in the regression were small enough, the correlations would necessarily be small as well, even if preventing hits on BIP?s were a skill and there were significant variation in that skill from player to player. Finally, even if pitchers did indeed have unique abilities to prevent balls in play, if those abilities were to change significantly from year to year, the y-t-y correlations might also be small.
Let?s review the three things that can affect the y-t-y correlations, thus the predictability, of any statistical measure in baseball, and in particular, BABIP. One, the spread of true talent in the league. Even if something is indeed a skill, if there is little or no spread in true talent with respect to that skill, in the population from which we are sampling (major leagues pitchers), then the y-t-y correlation for any sample size will be close to zero. That bears repeating and I?ll put it another way. If all players in baseball have essentially the same level of skill in any area that we are measuring, then they might as well have no skill at all, as far as predictive value is concerned! Two, the smaller the sample size in our sample measures (the number of BIP?s for each pitcher in each year), the smaller the correlation, even if there is significant talent or skill associated with that measure and there is a large spread of talent within the population.
In fact, the size of the correlation coefficient is a direct function of the spread of talent in the population and the size of the samples in each data element. If there is any skill whatsoever related to the performance measure we are sampling and there is any spread of true talent in the population with regard to that skill, then as the sample size get larger, the correlation always approaches 1. The converse of that is also true. Regardless of how much skill and what the spread of talent is, as the sample size gets smaller, the correlation always approaches zero.
In Voros? regression, the sample size of each data element was fairly large (a minimum of 162 IP?s, or around 500 BIP), such that a correlation of .153 does indeed suggest that there is little skill or spread of talent in the population with respect to BABIP. Keep in mind though, that if Voros were able to sample pitchers who had millions of IP?s in each of two different time periods, the correlation would indeed be close to 1, if there were any skill and spread of talent associated with BABIP (and that talent remained fairly constant across the time periods sampled).
Finally, the third influence on the y-t-y correlations is the amount by which a player?s true talent may change from one year to the next. The more that a player?s true talent changes from year to year, in a random or semi-random fashion, the smaller the y-t-y correlations will be. That effect can me mitigated by doing in-season regressions, like first half on second half (remember I said that the regressions and correlations can be done across any time periods). In fact, if the requisite data is available, it is indeed better to utilize single season data for the regressions.
Now, because regressions and correlations do not imply cause/effect, only relationships in general, it is important to be aware of and note, anything else besides a pitcher?s talent for preventing hits on BIP that might influence the correlation. For example, even if there were no true correlation (based on a pitcher?s skill and the spread of talent in the league) in a pitcher?s y-t-y BABIP, team defense, park influences, or some other systematic factor, might create a positive correlation. One way we can control for the bias of a player?s defense and home park is to park and defense adjust a pitcher?s BABIP before running the regressions. Another way would be to run a regression only on pitchers who have switched teams (and home parks of course). If we do the latter, we essentially factor out the park and defensive influences, although there will still be some park and defense noise, which could adversely affect (push them towards zero) the correlations. At the least, in doing the latter, we shouldn?t get any false (with respect to pitcher skill) correlations created by pitchers pitching in the same home park and in front of the same defense in both years of the regression.
In revisiting Voros? thesis, the first thing I did was to essentially duplicate his regression analysis. Rather than using an estimate of a pitcher?s BABIP from his traditional stat line, I used play-by-play (PBP) data to record his exact hits per BIP (BABIP). I ignored all balls in play that were bunted and all foul balls. I also removed all games in Colorado from the database.
I regressed all pitchers who had at least 300 BIP in back-to-back years from 1992 to 2003. There were 312 data pairs. Each data pair was independent (I regressed 1992 on 1993, 1994 on 1995, etc.) and the average number of BIP?s in each data pair was around 520. This corresponds to around 168 IP?s, a slightly smaller sample size than in Voros? regression - thus we would expect to get smaller correlations, everything else being equal.
Here is the result:
$H (N=496), r = .137 (1SD=.043)
That is almost exactly what Voros got, considering that my sample size (for BIP) was smaller than Voros?.
Remember, however, that I said that defense and park factors could create a correlation unrelated to pitcher skill. So what happens if we run the same regressions, but this time only on pitchers who switched teams from one year to the next in the data pairs? In other words, if there is any systematic relationship between a pitcher?s BABIP in one year and that in the next, since he has a different home park and defense in each of the two years, that relationship would likely be a result of pitcher skill only.
Here is the result for only those pitchers who switched teams:
$H (N=107), r = .036 (1SD=.094)
That?s right - once we take defense and home park out of the equation, there appears to be almost no skill in a pitcher?s ability to prevent hits on balls in play! Voros was right!
As several people have pointed out subsequent to Voros? original articles, that doesn?t necessarily means that a small percentage of individual pitchers or even certain types of pitchers do not have a unique ability to prevent hits on BIP, such that their $H in one year might somewhat predict their $H in another year. What it does mean is that as a general rule, once the bias of a pitcher?s home park and defense are removed, his past $H is a very poor predictor of his future $H.
As I explained before, no matter how small the sample correlation is for x number of BIP, as long as there is any true correlation at all, given a large enough sample size, the correlation will eventually be 1 (assuming that whatever talent a pitcher does have for preventing hits is fairly constant over time). However, given a true correlation of .036 for 500 BIP samples, it would take more than a 13,000 BIP sample (4,200 innings or around 20 years of pitching) to have a correlation of .500. In other words, of you wanted to estimate a pitcher?s true $H from a year?s worth of his sample data (~500 BIP), you would regress his sample $H over 96% towards the league average; if you wanted to estimate a pitcher?s true $H from his 20-year history (~13,000 BIP), you would regress his sample $H only 50% toward the mean (the regression amount always equals 1-r).
Getting back to the possibility of certain classes of pitchers having unique hit preventing abilities, it should be clear that fly ball pitchers, on the average, will have a different $H than will ground ball pitchers, since a fly ball has a higher out percentage than a ground ball. In fact, extreme ground ball pitchers have a BABIP of .297 (1992-2003), whereas extreme fly ball pitchers have a BABIP of .281 (extreme = top and bottom 10% in G/F ratio for pitchers with at least 100 BIP in a season). Of course, the run value of a FB hit is greater than that of a GB hit, such that the actual run value of all pitchers? BABIP is almost exactly the same, regardless of their G/F ratios.
In order to get a better idea as to why a pitcher appears to have little if any control over the outcome of his BIP, I separated BIP into six categories:
1) Line drive through the infield
2) Line drive in the outfield
3) Pop fly on the infield
4) Pop fly in the outfield
5) Fly ball in the outfield
6) Ground ball (no bunts)
These categories are based on the judgment of the play-by-player scorers. To some extent we can expect some of the subtle distinctions to be at least partially based on the outcome (e.g., if a ball in the outfield could reasonably be scored as either a fly ball or a line drive, if it is not caught, it is probably more likely to be scored the latter). Also, as you will be able to infer from the following charts, it is also possible that there are some severe biases among scorers (assuming that the same scorer tends to score the same team from year to year).
First, here are the percentages of BIP and the hit percentages for the above categories in 2003. The second column is the number of balls in that category divided by the total number of BIP. The third column is the hits per BIP for that category.
Now, here are the y-t-y correlations for these same categories. Again, pitchers had to have at least 300 BIP in back-to-back years to qualify for the regressions. As in the BABIP regressions, there were 107 pitchers who qualified in the 1992-2003 database and who switched teams from year x to year x+1. There were another 389 pitchers who qualified and who did not switch teams.
Pitchers who switched teams (N=107):
Again, since we only looked at pitchers who switched teams from year x to year x+1, we have essentially removed the home park and defensive influences from the correlations.
The results are quite interesting. Even though the overall correlation on BABIP (the last entry in the third column) is near zero, there appear to be several components that have some predictive value, and are therefore somewhat within a pitcher?s control.
Not surprisingly, a pitcher?s FB and GB as a percentage of his total BIP (essentially his G/F ratio) are very much within a pitchers control and appear to be relatively stable from year to year. The number of IF pop flies and to some extent OF pop flies, as a percentage of all non-GB BIP, are somewhat a unique function of the pitcher as well. In other words, good pitchers may tend to get more pop files than bad pitchers, as a percentage of their total non-ground ball balls in play.
Even though IF and OF line drives individually do not correlate well from year to year, the percentage of all line drives a pitcher allows appears to be somewhat within his control as well. So good pitchers may also give up fewer line drives per BIP.
The last column, or the hit percentage correlations, is even more interesting. The only category that a pitcher appears to have any significant control over is his hits per outfield line drive. Essentially what this means is that the better pitchers allow outfield line drives that are easier to catch. In other words, a ground ball is a ground ball, a fly ball is a fly ball, and a line drive to the infield is a line drive to the infield, regardless of the pitcher. On the other hand, all outfield line drives are not created equal. As well, the small positive correlation in the hits per GB category may suggest that good pitchers allow ground balls that are slightly easier to field.
To summarize the implications of the above chart, even though the y-t-y $H correlation of a 500 BIP pitcher is very small, a pitcher may have a fair amount of control over certain components of those BIP. The regression results suggest that good pitchers give up slightly fewer line drives and slightly more pop flies, as a percentage of their total BIP, and that their line drives hit to the outfield (and perhaps their ground balls) may be softer and therefore easier to catch. Further research using batted ball speed as one of the regression parameters may be useful.
Finally, to get an idea as to how home parks and defense (and perhaps the PBP scorers) affect the above regressions, here is the same chart of correlations, using only those pitchers who played on the same team in year x and year x+1.
Pitchers who did not switch teams (N=389):
As you can see, many of the correlations increase substantially, suggesting that defense, home park, or the PBP scorer, plays a significant role in creating these correlations in the first place. Whereas the increased correlations in the last column are not surprising, considering that defense and home park can have a substantial affect on the hit percentages of the various BIP components, the correlations in the second column are surprising. There is no particular or logical reason why a pitcher?s defense or his home park should play a substantial role in the percentage of his BIP that are line drives or pop flies, yet from the above chart, that appears to be the case. I can only hypothesize that the PBP scorers may be creating a bias in the data.
In conclusion, while it appears that Voros was essentially correct in that a pitcher has little control over his BABIP, he was not able to investigate this phenomenon on a more granular level, which requires an analysis of PBP data. Such an analysis suggests that pitchers may have more or less control over various components of their BIP than their overall $H y-t-y correlation would imply. In fact, good pitchers probably tend to give up fewer and softer line drives and easier pop flies than do poorer pitchers. It also appears that defense and park factors, and perhaps even PBP scorers, can exert considerable influence on a pitcher?s y-t-y correlations for $H or for various of the BIP components. Finally, Voros failed to explain the considerable influence that sample size (the number of BIP, not the number of pitcher seasons in the sample) has on the y-t-y correlations, regardless of the pitcher?s skill and the spread of talent in the population.