Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Sunday, February 29, 2004

DIPS Revisited

What does play-by-play data tell us about DIPS?

Several years ago, Voros McCracken published a 2-part thesis, entitled Defense Independent Pitching Stats (DIPS).  In it, he introduced two important concepts in evaluating baseball performance.  One, that a player?s (in this case, a pitcher?s) sample performance does not necessarily tell you a lot about his true talent, and thus his likely future performance.  And two, that different components of a player?s performance do not necessarily convey the same quality of information about that player?s true talent with regard to that component.

 

Voros found that a pitcher?s batting average on balls in play ($H or BABIP), defined as non-hr hits per non-hr balls in play, conveys quite a bit less information about a pitcher?s talent than does, say, his HR, BB, or SO rate.  He concluded, and correctly so, that a pitcher?s sample BABIP is not a particularly good predictor of his future BABIP.  He found that the correlation between a pitcher?s one year BABIP and his next year BABIP was low, as compared to that of his HR, BB, and SO rates.  In fact, when Voros ran a year-to-year (y-t-y) linear regression of all pitchers who had at least 162 IP?s in 1998 and 1999, he got the following correlation coefficients.

BB=.681

$SO=.792

$HR=.505

$H =.153

Voros? conclusion, based upon the .153 correlation coefficient above, was "that the ability for pitchers to prevent balls in play doesn?t exist, or if it does, it doesn?t really amount to much for almost all pitchers."  Unfortunately, Voros did not adequately explain exactly what these correlations mean, the relationship between sample size and correlation, and how team defense and a pitcher?s home park factor into the equation.

 

In baseball, when you correlate a measure of a player?s performance from one time period to another (it could be from one year to another but it doesn?t have to be), the resultant correlation coefficient simply indicates how well the measurement in the first time period is useful in predicting the same measurement in the second time period.  While the relatively low correlation for $H in Voros? regression suggests that a pitcher?s BABIP in one year is not very useful in predicting his BABIP in the next year, it does not necessarily mean that pitchers have little or no ability to prevent hits on balls in play. 

 

For one thing, if all pitchers in the major leagues, particularly those with most of the IP?s, had around the same ability to prevent hits on balls in play, then the y-t-y correlations for BABIP would be near zero, even though preventing hits on balls in play may very much be a skill.  Secondly, if the IP?s or TBF?s (technically, the BIP?s) of the average pitcher in the regression were small enough, the correlations would necessarily be small as well, even if preventing hits on BIP?s were a skill and there were significant variation in that skill from player to player.  Finally, even if pitchers did indeed have unique abilities to prevent balls in play, if those abilities were to change significantly from year to year, the y-t-y correlations might also be small.

 

Let?s review the three things that can affect the y-t-y correlations, thus the predictability, of any statistical measure in baseball, and in particular, BABIP.  One, the spread of true talent in the league. Even if something is indeed a skill, if there is little or no spread in true talent with respect to that skill, in the population from which we are sampling (major leagues pitchers), then the y-t-y correlation for any sample size will be close to zero.  That bears repeating and I?ll put it another way.  If all players in baseball have essentially the same level of skill in any area that we are measuring, then they might as well have no skill at all, as far as predictive value is concerned!  Two, the smaller the sample size in our sample measures (the number of BIP?s for each pitcher in each year), the smaller the correlation, even if there is significant talent or skill associated with that measure and there is a large spread of talent within the population. 

 

In fact, the size of the correlation coefficient is a direct function of the spread of talent in the population and the size of the samples in each data element.  If there is any skill whatsoever related to the performance measure we are sampling and there is any spread of true talent in the population with regard to that skill, then as the sample size get larger, the correlation always approaches 1.  The converse of that is also true.  Regardless of how much skill and what the spread of talent is, as the sample size gets smaller, the correlation always approaches zero. 

 

In Voros? regression, the sample size of each data element was fairly large (a minimum of 162 IP?s, or around 500 BIP), such that a correlation of .153 does indeed suggest that there is little skill or spread of talent in the population with respect to BABIP.  Keep in mind though, that if Voros were able to sample pitchers who had millions of IP?s in each of two different time periods, the correlation would indeed be close to 1, if there were any skill and spread of talent associated with BABIP (and that talent remained fairly constant across the time periods sampled).

 

Finally, the third influence on the y-t-y correlations is the amount by which a player?s true talent may change from one year to the next.  The more that a player?s true talent changes from year to year, in a random or semi-random fashion, the smaller the y-t-y correlations will be.  That effect can me mitigated by doing in-season regressions, like first half on second half (remember I said that the regressions and correlations can be done across any time periods).  In fact, if the requisite data is available, it is indeed better to utilize single season data for the regressions.

 

Now, because regressions and correlations do not imply cause/effect, only relationships in general, it is important to be aware of and note, anything else besides a pitcher?s talent for preventing hits on BIP that might influence the correlation.  For example, even if there were no true correlation (based on a pitcher?s skill and the spread of talent in the league) in a pitcher?s y-t-y BABIP, team defense,  park influences, or some other systematic factor,  might create a positive correlation.  One way we can control for the bias of a player?s defense and home park is to park and defense adjust a pitcher?s BABIP before running the regressions.  Another way would be to run a regression only on pitchers who have switched teams (and home parks of course).  If we do the latter, we essentially factor out the park and defensive influences, although there will still be some park and defense noise, which could adversely affect (push them towards zero) the correlations.  At the least, in doing the latter, we shouldn?t get any false (with respect to pitcher skill) correlations created by pitchers pitching in the same home park and in front of the same defense in both years of the regression.

 

In revisiting Voros? thesis, the first thing I did was to essentially duplicate his regression analysis. Rather than using an estimate of a pitcher?s BABIP from his traditional stat line, I used play-by-play (PBP) data to record his exact hits per BIP (BABIP).  I ignored all balls in play that were bunted and all foul balls.  I also removed all games in Colorado from the database. 

 

I regressed all pitchers who had at least 300 BIP in back-to-back years from 1992 to 2003.  There were 312 data pairs.  Each data pair was independent (I regressed 1992 on 1993, 1994 on 1995, etc.) and the average number of BIP?s in each data pair was around 520.  This corresponds to around 168 IP?s, a slightly smaller sample size than in Voros? regression - thus we would expect to get smaller correlations, everything else being equal. 

 

Here is the result:

$H (N=496), r = .137 (1SD=.043)

That is almost exactly what Voros got, considering that my sample size (for BIP) was smaller than Voros?.

 

Remember, however, that I said that defense and park factors could create a correlation unrelated to pitcher skill.  So what happens if we run the same regressions, but this time only on pitchers who switched teams from one year to the next in the data pairs?  In other words, if there is any systematic relationship between a pitcher?s BABIP in one year and that in the next, since he has a different home park and defense in each of the two years, that relationship would likely be a result of pitcher skill only. 

 

Here is the result for only those pitchers who switched teams:

$H (N=107), r =  .036 (1SD=.094)

That?s right - once we take defense and home park out of the equation, there appears to be almost no skill in a pitcher?s ability to prevent hits on balls in play!  Voros was right!

 

As several people have pointed out subsequent to Voros? original articles, that doesn?t necessarily means that a small percentage of individual pitchers or even certain types of pitchers do not have a unique ability to prevent hits on BIP, such that their $H in one year might somewhat predict their $H in another year.  What it does mean is that as a general rule, once the bias of a pitcher?s home park and defense are removed, his past $H is a very poor predictor of his future $H. 

 

As I explained before, no matter how small the sample correlation is for x number of BIP, as long as there is any true correlation at all, given a large enough sample size, the correlation will eventually be 1 (assuming that whatever talent a pitcher does have for preventing hits is fairly constant over time).  However, given a true correlation of .036 for 500 BIP samples, it would take more than a 13,000 BIP sample (4,200 innings or around 20 years of pitching) to have a correlation of .500.  In other words, of you wanted to estimate a pitcher?s true $H from a year?s worth of his sample data (~500 BIP), you would regress his sample $H over 96% towards the league average; if you wanted to estimate a pitcher?s true $H from his 20-year history (~13,000 BIP), you would regress his sample $H only 50% toward the mean (the regression amount always equals 1-r).

 

Getting back to the possibility of certain classes of pitchers having unique hit preventing abilities, it should be clear that fly ball pitchers, on the average, will have a different $H than will ground ball pitchers, since a fly ball has a higher out percentage than a ground ball.  In fact, extreme ground ball pitchers have a BABIP of .297 (1992-2003), whereas extreme fly ball pitchers have a BABIP of .281 (extreme = top and bottom 10% in G/F ratio for pitchers with at least 100 BIP in a season).  Of course, the run value of a FB hit is greater than that of a GB hit, such that the actual run value of all pitchers? BABIP is almost exactly the same, regardless of their G/F ratios.

 

In order to get a better idea as to why a pitcher appears to have little if any control over the outcome of his BIP, I separated BIP into six categories:

 

1) Line drive through the infield

2) Line drive in the outfield

3) Pop fly on the infield

4) Pop fly in the outfield

5) Fly ball in the outfield

6) Ground ball (no bunts)

 

These categories are based on the judgment of the play-by-player scorers. To some extent we can expect some of the subtle distinctions to be at least partially based on the outcome (e.g., if a ball in the outfield could reasonably be scored as either a fly ball or a line drive, if it is not caught, it is probably more likely to be scored the latter).  Also, as you will be able to infer from the following charts, it is also possible that there are some severe biases among scorers (assuming that the same scorer tends to score the same team from year to year).

 

First, here are the percentages of BIP and the hit percentages for the above categories in 2003.  The second column is the number of balls in that category divided by the total number of BIP.  The third column is the hits per BIP for that category.

 

 

 

Type of BIP

Percentage of BIP	

Hit percentage


Line Drive IF

 

.063

 

.593

 

Line Drive OF

 

.143

 

.812

 

Pop Fly IF

 

.056

 

.038

 

Pop Fly OF

 

.016

 

.269

 

Fly ball OF

 

.248

 

.135

 

Ground Ball

 

.475

 

.228

 

Now, here are the y-t-y correlations for these same categories.  Again, pitchers had to have at least 300 BIP in back-to-back years to qualify for the regressions.  As in the BABIP regressions, there were 107 pitchers who qualified in the 1992-2003 database and who switched teams from year x to year x+1.  There were another 389 pitchers who qualified and who did not switch teams.

Pitchers who switched teams (N=107):

 

Type of BIP

"r" for Percentage of BIP

"r" for Hit percentage


Line Drive IF

 

.049

 

-.054

 

Line Drive OF

 

.009

 

.365

 

Pop Fly IF

 

.264

 

.069

 

Pop Fly OF

 

.128

 

-.062

 

Fly ball OF

 

.656

 

.023

 

Ground Ball

 

.740

 

.062

 

All Line Drives

 

.140

 

.321

 

All BIP

 

N/A

 

.036

 

Again, since we only looked at pitchers who switched teams from year x to year x+1, we have essentially removed the home park and defensive influences from the correlations.

 

The results are quite interesting.  Even though the overall correlation on BABIP (the last entry in the third column) is near zero, there appear to be several components that have some predictive value, and are therefore somewhat within a pitcher?s control.

 

Not surprisingly, a pitcher?s FB and GB as a percentage of his total BIP (essentially his G/F ratio) are very much within a pitchers control and appear to be relatively stable from year to year.  The number of IF pop flies and to some extent OF pop flies, as a percentage of all non-GB BIP, are somewhat a unique function of the pitcher as well.  In other words, good pitchers may tend to get more pop files than bad pitchers, as a percentage of their total non-ground ball balls in play.

 

Even though IF and OF line drives individually do not correlate well from year to year, the percentage of all line drives a pitcher allows appears to be somewhat within his control as well.  So good pitchers may also give up fewer line drives per BIP.

 

The last column, or the hit percentage correlations, is even more interesting.  The only category that a pitcher appears to have any significant control over is his hits per outfield line drive.  Essentially what this means is that the better pitchers allow outfield line drives that are easier to catch.  In other words, a ground ball is a ground ball, a fly ball is a fly ball, and a line drive to the infield is a line drive to the infield, regardless of the pitcher.  On the other hand, all outfield line drives are not created equal.  As well, the small positive correlation in the hits per GB category may suggest that good pitchers allow ground balls that are slightly easier to field.

 

To summarize the implications of the above chart, even though the y-t-y $H correlation of a 500 BIP pitcher is very small, a pitcher may have a fair amount of control over certain components of those BIP.  The regression results suggest that good pitchers give up slightly fewer line drives and slightly more pop flies, as a percentage of their total BIP, and that their line drives hit to the outfield (and perhaps their ground balls) may be softer and therefore easier to catch.  Further research using batted ball speed as one of the regression parameters may be useful.

 

Finally, to get an idea as to how home parks and defense (and perhaps the PBP scorers) affect the above regressions, here is the same chart of correlations, using only those pitchers who played on the same team in year x and year x+1.

Pitchers who did not switch teams (N=389):

 

Type of BIP

"r" for Percentage of BIP

"r" for Hit percentage


Line Drive IF

 

.053

 

.073

 

Line Drive OF

 

.277

 

.390

 

Pop Fly IF

 

.459

 

.128

 

Pop Fly OF

 

.259

 

.054

 

Fly ball OF

 

.652

 

.229

 

Ground Ball

 

.777

 

.150

 

All Line Drives

 

.362

 

.336

 

All BIP

 

N/A

 

.165

 

As you can see, many of the correlations increase substantially, suggesting that defense, home park, or the PBP scorer, plays a significant role in creating these correlations in the first place.  Whereas the increased correlations in the last column are not surprising, considering that defense and home park can have a substantial affect on the hit percentages of the various BIP components, the correlations in the second column are surprising.  There is no particular or logical reason why a pitcher?s defense or his home park should play a substantial role in the percentage of his BIP that are line drives or pop flies, yet from the above chart, that appears to be the case.  I can only hypothesize that the PBP scorers may be creating a bias in the data.

 

In conclusion, while it appears that Voros was essentially correct in that a pitcher has little control over his BABIP, he was not able to investigate this phenomenon on a more granular level, which requires an analysis of PBP data.  Such an analysis suggests that pitchers may have more or less control over various components of their BIP than their overall $H y-t-y correlation would imply.  In fact, good pitchers probably tend to give up fewer and softer line drives and easier pop flies than do poorer pitchers.  It also appears that defense and park factors, and perhaps even PBP scorers, can exert considerable influence on a pitcher?s y-t-y correlations for $H or for various of the BIP components.  Finally, Voros failed to explain the considerable influence that sample size (the number of BIP, not the number of pitcher seasons in the sample) has on the y-t-y correlations, regardless of the pitcher?s skill and the spread of talent in the population.

 

Mitchel Lichtman Posted: February 29, 2004 at 06:00 AM | 43 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Robinson Cano Plate Like Home Posted: February 29, 2004 at 04:10 AM (#614737)
Fantastic. Do you have a breakdown for just the knuckleballers (sample size notwithstanding)?
   2. studes Posted: February 29, 2004 at 04:10 AM (#614741)
Awesome, MGL. This is the best work yet on DIPS. I love how the research keeps moving forward. Great job.

Ted, I think this is a big next step over Tippett's work.

I may come back with questions after I study it a bit more.
   3. John Posted: February 29, 2004 at 04:10 AM (#614742)
Fantastic. MGL definitely has the early lead in the race for the 2004 Ocsar...er, Primey...nominations.
   4. Human Papelbon Virus Posted: February 29, 2004 at 04:10 AM (#614743)
D from D raises a good point. More info is needed about the two groups before we can say the only difference between the two are y-t-y team
   5. Mike Emeigh Posted: February 29, 2004 at 04:10 AM (#614745)
Ted, I think this is a big next step over Tippett's work.

You still have the problem of selective sampling and correlation across a narrow range of performance variation. Pitchers who pitch enough to get 300 BIP in successive seasons may very well be the ones who have the most control over the results from balls in play, and pitchers who have the least control over the results probably won't hang around enough to get to 300 BIP in successive seasons. (300 BIP is usually around 70-80 IP).

The safest conclusion to be drawn is not that pitchers have *no* effect over the results from BIP - but that among the set of major league pitchers the differences are small enough so that we can treat the pitcher's impact as constant.

-- MWE
   6. studes Posted: February 29, 2004 at 04:10 AM (#614746)
A couple of reactions:

Mike, I'm not sure what you mean. I think MGL is uncovering the correlations that do exist between pitchers. He's finding the real trends under the noise, despite the sample size issues.

MGL, the more I think about it, the more I'm disturbed by the correlations in your last table. Don't large correlations like that undermine the validity of the data?
   7. David Concepcion de la Desviacion Estandar (Dan R) Posted: March 01, 2004 at 04:10 AM (#614749)
A good portion of Zito's consistently low BABIPs allowed can be traced to Oakland's superior defense and his pitchers' park, no?
   8. MGL Posted: March 01, 2004 at 04:10 AM (#614751)
Thanks for the great feedback so far. I'm off to Florida (Spring Training) for a few days, so I'll be out of touch until I get back. AED, the people who score these games tend to call flares, duck snorts, Texas Leaguers, etc., "pop flies to the OF," so yes, they are not caught as often as fly balls. It is just semantics...
   9. ChuckO Posted: March 01, 2004 at 04:10 AM (#614752)
I live in Atlanta and watched Greg Maddux pitch for quite a while. Maddux always said that he tries to get the batter to hit the ball on the ground back through the middle, between the shortstop and the second baseman. That way he figured he or the shortstop or the second baseman would get to a lot of them. It would be interesting to compare his success in this respect with similar pitchers, i.e., groundball control pitchers, especially mediocre pitchers. Is data available without charge to make such an analysis?

Anyway, the conclusions in this article certainly make intuitive sense. As many pitchers have pointed out, pitching is all about disrupting the batter's timing. If you're successful in doing that, it would seem that you'd be more likely to get weakly hit balls, and fewer homeruns as well.
   10. Chris Dial Posted: March 01, 2004 at 04:10 AM (#614754)
Problem with assigning scorer bias is that groundballs also jump. Regardless of how well scorers (and each data point MGL has is scored by more than one person) assess FB/LDs, there's almost no way to screw up a GB.

In fact, good pitchers probably tend to give up fewer and softer line drives and easier pop flies than do poorer pitchers.

That sounds really familiar...pretty much my adjustment for pitching staff quality on ZR about 6 years ago - and one of the big reasons I don't subscribe to "Andruw Jones is a god."

It's some nice work, MGL.
   11. bunyon Posted: March 01, 2004 at 04:10 AM (#614755)
I'm probably missing something, too. Why is it there would be little correlation, y-t-y, if everyone had the same skill.

Let's say I'm a pitcher with a true skill for $H of .333. Let's further say that that is a true skill and it is very repeatable. Why would it matter if other pitchers also had a true, repeatable skill for $H of .333? Wouldn't the correlation then be very strong for everyone?

   12. studes Posted: March 01, 2004 at 04:10 AM (#614756)
Well, good grief, guys. There's no reason to act like snobs. You want every thought and article to be completely original? Great ideas and analyses bear repeating, over and over. That's how they become accepted and used.

For instance, did you see that UZR was quoted in the New York Times today in a Mike Cameron article? No reference to primer, alas.

http://www.nytimes.com/2004/03/01/sports/baseball/01METS.html
   13. Dan 'The Boy' Werr Posted: March 01, 2004 at 04:10 AM (#614757)
I guess I'm missing something big. If the pitchers have more control over various components of BABIP, where does it go? I.e., what causes this control to drop out of the overall y-t-y correlation?

Ben, I'm not sure what you mean. The pitchers have better correlations on some components than overall, but they also have worse correlations on some components than overall. For the pitchers who switched teams, there are even two slightly negative correlations. In both sets, ground balls, the second most common type of BIP, have lower correlations than the overall correlations.

I assume the correlations balance out at around the overall BIP correlations. Right?

MGL, great work.
   14. Dan 'The Boy' Werr Posted: March 01, 2004 at 04:10 AM (#614759)
Let's say I'm a pitcher with a true skill for $H of .333. Let's further say that that is a true skill and it is very repeatable. Why would it matter if other pitchers also had a true, repeatable skill for $H of .333? Wouldn't the correlation then be very strong for everyone?

Bunyon,

Over a sample as short as a year, even if every pitcher has a true skill of .333, there's going to be some variation around that, and it will occur more or less randomly for everybody. So you might see results like this:
              Year 1  Year 2  Year 3
PITCHER A      .338    .333    .341
PITCHER B      .329    .334    .337
PITCHER C      .327    .330    .324
PITCHER D      .330    .339    .325

The overall correlation from year n to year n+1 in that example is -.115. With a real sample and with more pitchers and years, you'd get a better result.
   15. Chris Dial Posted: March 01, 2004 at 04:10 AM (#614760)
Re: pop flies
Pop flies are defined as "fly balls that don't travel 220 feet" in STATS scoring speak.

Duck snorts are problematic, as they will be interchangeably defined as popf flies and line drives. There generally aren't very many, as a percentage of any one pitcher's BIP.

Studes, do you think the article you cite *should* mention Primer?
   16. studes Posted: March 01, 2004 at 04:10 AM (#614761)
David, sorry if I misinterpreted your comment.

Chris, I don't think the Times should reference primer. After all, MGL has basically made his work public, with no copyright protection that I've seen. But it would be nice, particularly for readers who might want to learn more.
   17. Chris Dial Posted: March 01, 2004 at 04:10 AM (#614763)
Studes,
*everything* written down is copyrighted. It's implied (or actually, it's law). Everything that you write is copyrighted by you nowadays.

The answer is, yes, the writer should reference where he saw it. He doesn't have to say "MGL", he can link or saw "at Baseball Primer" or whatever - it helps his reader for starters.
   18. Mike Emeigh Posted: March 01, 2004 at 04:10 AM (#614765)
Mike, I'm not sure what you mean. I think MGL is uncovering the correlations that do exist between pitchers. He's finding the real trends under the noise, despite the sample size issues.

Suppose that pitchers really do have an ability to control the percentage of hits on balls in play - IOW, there is a level of "true talent" $H. Assume for the moment that, in order to pitch at all in the majors, you have to have a "true talent" $H of no more than .340. Over 300 BIP (which as I indicated is usually between 70-80 IP), this would mean that you'd allow 102 hits in 70-80 IP, exclusive of HR - somewhere around 11-13 non-HR hits per nine innings, which is about as many as a team can likely tolerate. To pitch well enough to make it through two consecutive seasons of 300+ BIP, you'd have to do better than that, in most if not all cases - a .300 level would be 90 non-HR hits over those 70-80 innings, which your team might be able to live with. The best pitchers usually average somewhere around .270 $H.

Now, if you select pitchers with 300+ BIP in back-to-back seasons, what you are effectively doing is selecting pitchers from the range .270-.300, rather than the .270-.340 range that is typical of all major league pitchers - and whether you realize it or not, you're removing a large group of pitchers from the study who operate at the margins of major league performance. Those pitchers might very well exhibit different effects from the ones that you are studying, because they aren't as good as the ones that you are studying, and it's conceivable that the reason they aren't as good is that they don't control H/BIP as well, for reasons which won't show up in a study limited to just good pitchers.

I know MGL understands all of this, so this comment wasn't really directed at him. The point I wanted to make is that correlation analysis needs to be taken with a grain of salt when applied across a restricted range of performance, especially when attempting to measure an aspect of skill that could - if it existed - directly impact that performance.

-- MWE
   19. studes Posted: March 01, 2004 at 04:10 AM (#614767)
Mike, I understand that. It's been discussed thoroughly.

But what about MGL's conclusions? Specifically:

To summarize the implications of the above chart, even though the y-t-y $H correlation of a 500 BIP pitcher is very small, a pitcher may have a fair amount of control over certain components of those BIP.

Seems to me that MGL has uncovered some interesting findings even within his "restricted range of performance". Do you see it differently?

Chris, you were testing me, right? Thanks for the info. I just might send the guy an e-mail.
   20. tangotiger Posted: March 01, 2004 at 04:10 AM (#614769)
If we remember the results of the Allen/Hsu "Solving DIPS" paper, which was based on pitchers with at least 200 BIP (I think), the implication there is that the true talent level of pitching preventing hits on BIP is 1 SD = .009 hits / BIP. That works out to +/- 10 runs for 95% of the pitchers.

The question is: how can we detect this, and how much of this can we detect.

Alot of this we just won't be able to detect.

It's apparent that it's alot easier to detect this based on the rate of Line Drives given up.

   21. Mike Emeigh Posted: March 01, 2004 at 04:10 AM (#614770)
Seems to me that MGL has uncovered some interesting findings even within his "restricted range of performance". Do you see it differently?

Is the correlation effect on hits on OF LDs for good pitchers applicable across the range of *all* pitchers? A correlation in a subgroup is not necessarily indicative of a correlation across the entire group, especially if the distribution of the performance being measured is heteroskedastic; there may easily be a subgroup in which a significant correlation is detected that can't be detected in the population at large, because the variance of the statistic is conditioned on the performance level.

That doesn't mean that the detected correlation is not still valuable; as Chris Dial suggests, the ability to generate *catchable* line drives might be a discriminant between good pitchers and bad pitchers. But there's a lot more investigation to do across the entire range of pitching performance before one can be assured that's the correct conclusion to draw.

-- MWE
   22. Human Papelbon Virus Posted: March 01, 2004 at 04:10 AM (#614771)
What if you took the methods behind UZR and applied it to pitchers?

For example: say Greg Maddux allowed 100 ground balls into zone 5 last yr, with 30 going for hits and 70 being turned into outs. Suppose the average GB-out rate for this zone was 60%. Maddux would have been expected to give up 40 hits on ground balls to zone 5. Then from ZR, we can see that between Castilla and Furcal, they saved Maddux 8 hits on ground balls to zone 5. Subtracting out the effects of the defense, we might see that Maddux allowed 2 less hits than expected on ground balls to zone 5. Do this over a number of years, and maybe we can observe a trend. Maybe maddux is able to prevent an average of 5 hits per year on ground balls to zone 5, independent of defense. If he can, then maybe we have quantified his skill at getting right handed batters to pull the ball weakly to the third baseman.

I'm just throwing this out there off the top of my head. Does this seem justifiable?
   23. Michael Humphreys Posted: March 01, 2004 at 04:10 AM (#614773)
MGL,

Thanks for the great article. Although I'm undoubtedly biased, it seems to support DRA:

"The number of IF pop flies and to some extent OF pop flies, as a percentage of all non-GB BIP, are somewhat a unique function of the pitcher as well. In other words, good pitchers may tend to get more pop files than bad pitchers, as a percentage of their total non-ground ball balls in play."

As you may recall, DRA allocates estimated infield pop-outs to fielders, and I speculated in the article that pop-ups might be the key variable (when zone data is unavailable) for determining a pitcher's effect on BABIP.

If I'm understanding your article correctly:

1) Infield pop-ups have the highest out-conversion rate of any category of BIP: 96%. As mentioned in the DRA article, it a pitcher generates a BIP that is *never* fielded (a HR), we charge the pitcher. If a pitcher generates a BIP that is almost *always* fielded (in infield pop-up), shouldn't the pitcher get credit?

2) With the exception of the overall tendency to give up either ground balls or outfield fly balls, the tendency to give up infield pop-ups is more persistent than the tendency to give up other forms of BIP. So generating infield pop-ups seems to be, to some degree at least, a skill. Query whether, given the relative rarity of infield pop-ups (at least compared with the category of all ground balls and all outfield fly balls), using larger samples of data (two-year v. two-year?) would show a higher "r" for infield pop-up generation.

The most surprising result seems to be that the rate of giving up line-drives is not that persistent (.14 "r", at least for pitchers who change teams, which I believe is the best data set to use), but *given* the allowance a line-drive, differing out-conversion rates *are* persistent. Perhaps AED's is right:

"If there were systematic differences between what is classified as a line drive, you would expect to see significantly lower r's for the players who changed teams. So it seems more likely that you're getting hammered by sample size or selective sampling . . . ."

On the other hand, it might be interesting to determine whether there is a *cross*-correlation between generating infield pop-ups and allowing line drives that are less likely to be hits. Throughout the development of DRA, I kept finding that the run-weight for infield fly outs was a little higher than it "should" be (compared to other BIP outs), but that was probably because infield fly outs correlate with *outfield* fly outs (which have higher out conversion than ground balls) and may correlate with "easier to field" line drives.

Thanks again for a very helpful study.
   24. bunyon Posted: March 01, 2004 at 04:10 AM (#614774)
Thanks, Dan. That seems clear no that you've said it.
   25. Michael Humphreys Posted: March 01, 2004 at 04:10 AM (#614775)
Sorry, meant to say:

"As you may recall, DRA allocates estimated infield pop-outs to *pitchers*."
   26. Human Papelbon Virus Posted: March 02, 2004 at 04:10 AM (#614782)
If that's right then the two intervals don't look that different to me, there's lots of overlap. So I would argue that taking park and defense out of the equation just gets us fewer data points.

But I'm a complete novice when it comes to using these types of statistics, so sorry if this point is completely wrong.


No such thing as standard deviation or confidence intervals for correlations.
   27. Too Much Coffee Man Posted: March 02, 2004 at 04:10 AM (#614784)
Sure, you can calculate a confidence interval for a correlation. The r must be converted to a Fisher's Z', with the standard error of Z' equal to 1/(square-root of N-3).
   28. Human Papelbon Virus Posted: March 02, 2004 at 04:11 AM (#614785)
No, this is not regression. r is just the correlation. we don't have the regression equation or the SEE. only then could we compute confidence intervals.
   29. Too Much Coffee Man Posted: March 02, 2004 at 04:11 AM (#614786)
I have a couple of questions, but I'll start with the less important ones. First, an observation:
In thinking about this:
"Even if something is indeed a skill, if there is little or no spread in true talent with respect to that skill, in the population from which we are sampling (major leagues pitchers), then the y-t-y correlation for any sample size will be close to zero."
This is a pretty important point. My first reaction was that you'd have to argue that the variance of this skill is very close to 0 for this to matter. Then, I realized that this is actualy very close to the current thinking about major league pitching (that most have no influence on batting average on balls in play). So, if you believe this, then the expected y-t-y correlation is zero. If you don't believe it, the expected correlation would be some value other than zero.

There's a few points in here I didn't understand. One is this:
"In fact, the size of the correlation coefficient is a direct function of the spread of talent in the population and the size of the samples in each data element. If there is any skill whatsoever related to the performance measure we are sampling and there is any spread of true talent in the population with regard to that skill, then as the sample size get larger, the correlation always approaches 1. The converse of that is also true. Regardless of how much skill and what the spread of talent is, as the sample size gets smaller, the correlation always approaches zero."
Why should the correlations be expected to approach 0 or 1.00 as the sample sizes get larger? Shouldn't the approach the "true" population value? This statement seems to assume that there are either perfectly linear or oblique relationships and nothing else.
   30. Too Much Coffee Man Posted: March 02, 2004 at 04:11 AM (#614787)
I don't want to get into a debate about this because it's off-track.
But correlation and (simple) regression are the same thing. In simple linear regression, the beta weight for standardized scores is also the correlation.
See the home page for a site that calculates confidence intervals.
   31. Human Papelbon Virus Posted: March 02, 2004 at 04:11 AM (#614791)
Thanks for correcting me TMCM. Sorry for the misinformation.
   32. Michael Humphreys Posted: March 02, 2004 at 04:11 AM (#614792)
Tom N,

"Even if the pitcher has a specific skill at generating various types of near certain outs that doesn't mean we necessarily would credit the pitcher. Imagine if we had even more granular data to work with that included not just the zone, but the precise speed of all BIP. Imagine further that Carlos Zambrano has a skill at inducing slow hit groundballs to the left side of the infield. His sinker is fast enough so that those pitchs usually aren't pulled directly down the line. So if 99% balls hit in the SS's zone hit between 55-80 MPH are converted into outs should the pitcher get credit?"

That is precisely what would happen under a UZR system that also provided a P[itcher] Zone Rating. David Pinto's Probabilistic Model for Range does something similar.

"If inducing pop flies of either variety is a skill, we should take that ability into account when evaluating pitchers, much the same way we take GB/FB ratio into account. That doesn't mean pitchers are contributing to pop flies being converted into outs. The H% r itself seems to suggest that it's the infielders doing the work. Just because OF fly balls are almost always converted into outs (86.5%) doesn't mean that outfielders shouldn't get credit for their defensive contribution."

Simplifying somewhat, under a PZR/UZR system, the pitcher would get almost all of the credit for the out; the fielder only a tiny amount. DRA is a non-zone system that allocates responsibility for BIP to fielders, except that it allocates responsibility for infield fly outs to pitchers. Doing so results in an allocation of pitcher and fielder credit for BIP that comes close to matching the ratio found under the Allen/Hsu "Solving DIPS" paper, as well as fielder ratings that match up well with UZR.
   33. Human Papelbon Virus Posted: March 02, 2004 at 04:11 AM (#614794)
(.050, .222) for the full data set, and (-.154, .224) for the pitchers that moved.

So I think taking park and defense out of the equation doesn't give you anything new. It just gives you wider confidence intervals.


Not to dwell on this point, but I was under the impression that the standard deviation reported was for the pitcher's $H, not for the correlation coefficient. If this is true, then this SD can't be used to create confidence intervals for the correlation coefficient, can it?
   34. Too Much Coffee Man Posted: March 02, 2004 at 04:11 AM (#614795)
This is, as I like to say, at the limits of my knowledge.
The sampling distribution of r is not normal, so you could not use the standard deviation to build a confidence interval.
To build the confidence intervals, you convert r to Z' and add/subtract a weight based on alpha times a multiplier (normally a standard error). I THINK it must be the case that Z' has a distribution that is determined by the sample size, hence the need to only know N to calculate the confidence intervals. But, I could be wrong.
   35. Jim (jimmuscomp) Posted: March 02, 2004 at 04:11 AM (#614798)
"I regressed all pitchers who had at least 300 BIP in back-to-back years from 1992 to 2003. There were 312 data pairs. Each data pair was independent (I regressed 1992 on 1993, 1994 on 1995, etc.)"


Can anyone explain why you wouldn't use overlapping data pairs (i.e. 1992 on 1993, 1993 on 1994, etc.)? I think this is what Tippett did. You would nearly double your sample size, and perhaps pick up on some weird every-other-year patterns (e.g. Saberhagen) that you would otherwise miss. Would this introduce some other selection bias?
   36. MGL Posted: March 04, 2004 at 04:11 AM (#614842)
Dick,

I would be happy to send you the data when I get back home. Anything I can do to help a fellow researcher...
   37. Mike Emeigh Posted: March 05, 2004 at 04:12 AM (#614856)
So cell C3 says that a pitcher has some control over whether a line drive becomes a hit or not, but then B3 says that he has little control over whether his BIPs become OF line drives in the first place. How can you control the second without first controlling the first?

Cell C3 indicates that there is a good y-t-y correlation for these pitchers in the rate at which outfield line drives become hits; it says very little about reasons why that might be true. It can be true, for example, if the rate at which outfield line drives are caught for outs varies little from team to team or pitcher to pitcher (IOW, if a high percentage of outfield line drives are inherently uncatchable). We need to keep open the possibility that there may be effects other than that of the pitcher which could lead to the correlations that we see. That's why range of performance - or, alternately, performance variance, which essentially tells you the same thing - across the data set is so important to know. If the entire range of performance within the data set varies from, say 26% of OF LD caught for outs to 28% of OF LD caught for outs, I'd have no problem concluding that pitchers have little influence over whether or not OF LD became outs, even if there were a strong y-t-y correlation within that group of selected pitchers.

-- MWE
   38. Noffs Posted: March 06, 2004 at 04:12 AM (#614860)
Not to make you do this again, MGL, but what happens if you change "Hit percentage" to "Percent reaching base"? I ask because the error rates associated with groundballs could make a grounder a lot more valuable than it appears.
   39. MGL Posted: March 07, 2004 at 04:12 AM (#614878)
Hoiles,

If we only look at pitchers who play for the same team in the regression "pairs," even if pitchers had no control over any of their BIP components, clearly we might (probably will) see some y-t-y correlations, due to good or bad defense, park effects, etc. Those are the correlations I am trying to "factor out."

Now, if we only look at pitchers who changed teams, we have no "systematic bias" because the assumption is that for a large sample, the average park effects and defense of the new team is league average. Therefore if pitchers had no control over any of the BIP components, the correlations for pitchers who changed teams would have to be zero, as there would be nothing "connecting" their BIP rates from one year to the next, which is essentially the definition of correlation (a "connection," loosely speaking).

If pitchers do have some control over some of their BIP components, then that control, as measured by the correlation coefficient, should survive the "changing of teams," since the variation in y-t-y rates as a result of the new defense and park is random (no bias).

As far as how introducing another variable (even if it isn't biased) effects the magnitude of the correlation coefficients, I don't know. I would have to defer to the statisticians for that answer. I could simulate the effect and come up with an answer I suppose. IOW, if the true y-t-y correlation for, say OF line drive hit percentage, were .200, and we then introduced some noise (changing teams and therefore changing defense and parks, which presumably affects the true OF line drive hit percentage independent of the pitcher, would the correlation change? I don't think so, but I am not sure.

In any case, changing of teams cannot create a correlation where none existed or even increase a correlation where one did exist, since where any given pitcher goes is presumably random, at least as far as his first year BIP rates. If there is no "connection" between his BIP rates in his first year, and where he goes, then there is by definition no bias and there can be no increase in the correlations.

As far as the ROE's on ground balls, there is probably not an extra bias associated with ROE's, so adding it into the mix shouldn't change anything, although since many more ROE's are on GB's to the left side of the infield, we might see some "extra" correlation (higher coefficient) based on a pitcher' handedness. Since we aren't really interested in that kind of "control," it is probably a good thing NOT to include ROE's in the ground balls...
   40. MGL Posted: March 08, 2004 at 04:12 AM (#614902)
Hoiles,

With all respect, I think that you are completely wrong, but not being a statistician, I would have to defer to someone like AED if he is still lurking.

If I get a chance, I'll do some "correlation sims" which should clear up the issue...
   41. Old Matt Posted: March 08, 2004 at 04:13 AM (#614909)
Each data pair was independent (I regressed 1992 on 1993, 1994 on 1995, etc.)

These data pairs are not independent, since it's still the same person pitching in 1992 and 1994. Let's say I'm taking your blood pressure once a year -- you would not say that your blood pressure in 1994 is independent of 1992. There are ways to adjust for this data structure, where you have repeated measurements on the same person, in linear regression. I'd have to pull out my school notes to give you the details, but basically you come up with some assumed correlation matrix that describes where correlations are in the data, then the regression procedure uses that information in its calculations. Probably only affects the standard errors of your calculations, though.
   42. MGL Posted: March 08, 2004 at 04:13 AM (#614921)
Now, how do you know that, say, I'm not a professional statistician myself, with a post-graduate degree in that field?

Come on, that's a silly question.

Could you point out what exactly was wrong in my arguments (other than a couple of typos)? I'll give one more analogy, and I would appreciate it if you would think about it instead of blindly saying it's wrong.

Honestly, I haven't had the time to go through your premise. On the other hand, I did take lots of time to think through my thesis when I wrote the article (several months ago).


As I said, I think you are wrong, but I'm not sure. I did say that I think you are wrong, not that I was sure you are wrong, didn't I? There is a big difference (from my perspective).

I am no statistician either but I do know a fair amount about regressions and correlations, and have done a fair amount of work in baseball analysis using those "tools."

However, I'm not sure I am capable of addressing your specific concerns, so I apologuice for just saying that "I think you are wrong," as that is not particualry helpful. As I said, I can probably write a simple sim or two that would tell us how changing parks (park effects and defense) affects a correlation, whether or not there is one in the first place.

I'll try and do that tonight....
   43. MGL Posted: March 09, 2004 at 04:13 AM (#614925)
Hoiles, you may be right, but I just don't have the time to pursue this right now....

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
The Ghost is getting a Woody
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.5881 seconds
66 querie(s) executed