You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Sunday, February 29, 2004DIPS RevisitedWhat does playbyplay data tell us about DIPS? Several years ago, Voros McCracken published a 2part thesis, entitled Defense Independent Pitching Stats (DIPS). In it, he introduced two important concepts in evaluating baseball performance. One, that a player?s (in this case, a pitcher?s) sample performance does not necessarily tell you a lot about his true talent, and thus his likely future performance. And two, that different components of a player?s performance do not necessarily convey the same quality of information about that player?s true talent with regard to that component.
Voros found that a pitcher?s batting average on balls in play ($H or BABIP), defined as nonhr hits per nonhr balls in play, conveys quite a bit less information about a pitcher?s talent than does, say, his HR, BB, or SO rate. He concluded, and correctly so, that a pitcher?s sample BABIP is not a particularly good predictor of his future BABIP. He found that the correlation between a pitcher?s one year BABIP and his next year BABIP was low, as compared to that of his HR, BB, and SO rates. In fact, when Voros ran a yeartoyear (yty) linear regression of all pitchers who had at least 162 IP?s in 1998 and 1999, he got the following correlation coefficients.
BB=.681 $SO=.792 $HR=.505 $H =.153 Voros? conclusion, based upon the .153 correlation coefficient above, was "that the ability for pitchers to prevent balls in play doesn?t exist, or if it does, it doesn?t really amount to much for almost all pitchers." Unfortunately, Voros did not adequately explain exactly what these correlations mean, the relationship between sample size and correlation, and how team defense and a pitcher?s home park factor into the equation.
In baseball, when you correlate a measure of a player?s performance from one time period to another (it could be from one year to another but it doesn?t have to be), the resultant correlation coefficient simply indicates how well the measurement in the first time period is useful in predicting the same measurement in the second time period. While the relatively low correlation for $H in Voros? regression suggests that a pitcher?s BABIP in one year is not very useful in predicting his BABIP in the next year, it does not necessarily mean that pitchers have little or no ability to prevent hits on balls in play.
For one thing, if all pitchers in the major leagues, particularly those with most of the IP?s, had around the same ability to prevent hits on balls in play, then the yty correlations for BABIP would be near zero, even though preventing hits on balls in play may very much be a skill. Secondly, if the IP?s or TBF?s (technically, the BIP?s) of the average pitcher in the regression were small enough, the correlations would necessarily be small as well, even if preventing hits on BIP?s were a skill and there were significant variation in that skill from player to player. Finally, even if pitchers did indeed have unique abilities to prevent balls in play, if those abilities were to change significantly from year to year, the yty correlations might also be small.
Let?s review the three things that can affect the yty correlations, thus the predictability, of any statistical measure in baseball, and in particular, BABIP. One, the spread of true talent in the league. Even if something is indeed a skill, if there is little or no spread in true talent with respect to that skill, in the population from which we are sampling (major leagues pitchers), then the yty correlation for any sample size will be close to zero. That bears repeating and I?ll put it another way. If all players in baseball have essentially the same level of skill in any area that we are measuring, then they might as well have no skill at all, as far as predictive value is concerned! Two, the smaller the sample size in our sample measures (the number of BIP?s for each pitcher in each year), the smaller the correlation, even if there is significant talent or skill associated with that measure and there is a large spread of talent within the population.
In fact, the size of the correlation coefficient is a direct function of the spread of talent in the population and the size of the samples in each data element. If there is any skill whatsoever related to the performance measure we are sampling and there is any spread of true talent in the population with regard to that skill, then as the sample size get larger, the correlation always approaches 1. The converse of that is also true. Regardless of how much skill and what the spread of talent is, as the sample size gets smaller, the correlation always approaches zero.
In Voros? regression, the sample size of each data element was fairly large (a minimum of 162 IP?s, or around 500 BIP), such that a correlation of .153 does indeed suggest that there is little skill or spread of talent in the population with respect to BABIP. Keep in mind though, that if Voros were able to sample pitchers who had millions of IP?s in each of two different time periods, the correlation would indeed be close to 1, if there were any skill and spread of talent associated with BABIP (and that talent remained fairly constant across the time periods sampled).
Finally, the third influence on the yty correlations is the amount by which a player?s true talent may change from one year to the next. The more that a player?s true talent changes from year to year, in a random or semirandom fashion, the smaller the yty correlations will be. That effect can me mitigated by doing inseason regressions, like first half on second half (remember I said that the regressions and correlations can be done across any time periods). In fact, if the requisite data is available, it is indeed better to utilize single season data for the regressions.
Now, because regressions and correlations do not imply cause/effect, only relationships in general, it is important to be aware of and note, anything else besides a pitcher?s talent for preventing hits on BIP that might influence the correlation. For example, even if there were no true correlation (based on a pitcher?s skill and the spread of talent in the league) in a pitcher?s yty BABIP, team defense, park influences, or some other systematic factor, might create a positive correlation. One way we can control for the bias of a player?s defense and home park is to park and defense adjust a pitcher?s BABIP before running the regressions. Another way would be to run a regression only on pitchers who have switched teams (and home parks of course). If we do the latter, we essentially factor out the park and defensive influences, although there will still be some park and defense noise, which could adversely affect (push them towards zero) the correlations. At the least, in doing the latter, we shouldn?t get any false (with respect to pitcher skill) correlations created by pitchers pitching in the same home park and in front of the same defense in both years of the regression.
In revisiting Voros? thesis, the first thing I did was to essentially duplicate his regression analysis. Rather than using an estimate of a pitcher?s BABIP from his traditional stat line, I used playbyplay (PBP) data to record his exact hits per BIP (BABIP). I ignored all balls in play that were bunted and all foul balls. I also removed all games in Colorado from the database.
I regressed all pitchers who had at least 300 BIP in backtoback years from 1992 to 2003. There were 312 data pairs. Each data pair was independent (I regressed 1992 on 1993, 1994 on 1995, etc.) and the average number of BIP?s in each data pair was around 520. This corresponds to around 168 IP?s, a slightly smaller sample size than in Voros? regression  thus we would expect to get smaller correlations, everything else being equal.
Here is the result: $H (N=496), r = .137 (1SD=.043) That is almost exactly what Voros got, considering that my sample size (for BIP) was smaller than Voros?.
Remember, however, that I said that defense and park factors could create a correlation unrelated to pitcher skill. So what happens if we run the same regressions, but this time only on pitchers who switched teams from one year to the next in the data pairs? In other words, if there is any systematic relationship between a pitcher?s BABIP in one year and that in the next, since he has a different home park and defense in each of the two years, that relationship would likely be a result of pitcher skill only.
Here is the result for only those pitchers who switched teams: $H (N=107), r = .036 (1SD=.094) That?s right  once we take defense and home park out of the equation, there appears to be almost no skill in a pitcher?s ability to prevent hits on balls in play! Voros was right!
As several people have pointed out subsequent to Voros? original articles, that doesn?t necessarily means that a small percentage of individual pitchers or even certain types of pitchers do not have a unique ability to prevent hits on BIP, such that their $H in one year might somewhat predict their $H in another year. What it does mean is that as a general rule, once the bias of a pitcher?s home park and defense are removed, his past $H is a very poor predictor of his future $H.
As I explained before, no matter how small the sample correlation is for x number of BIP, as long as there is any true correlation at all, given a large enough sample size, the correlation will eventually be 1 (assuming that whatever talent a pitcher does have for preventing hits is fairly constant over time). However, given a true correlation of .036 for 500 BIP samples, it would take more than a 13,000 BIP sample (4,200 innings or around 20 years of pitching) to have a correlation of .500. In other words, of you wanted to estimate a pitcher?s true $H from a year?s worth of his sample data (~500 BIP), you would regress his sample $H over 96% towards the league average; if you wanted to estimate a pitcher?s true $H from his 20year history (~13,000 BIP), you would regress his sample $H only 50% toward the mean (the regression amount always equals 1r).
Getting back to the possibility of certain classes of pitchers having unique hit preventing abilities, it should be clear that fly ball pitchers, on the average, will have a different $H than will ground ball pitchers, since a fly ball has a higher out percentage than a ground ball. In fact, extreme ground ball pitchers have a BABIP of .297 (19922003), whereas extreme fly ball pitchers have a BABIP of .281 (extreme = top and bottom 10% in G/F ratio for pitchers with at least 100 BIP in a season). Of course, the run value of a FB hit is greater than that of a GB hit, such that the actual run value of all pitchers? BABIP is almost exactly the same, regardless of their G/F ratios.
In order to get a better idea as to why a pitcher appears to have little if any control over the outcome of his BIP, I separated BIP into six categories:
1) Line drive through the infield 2) Line drive in the outfield 3) Pop fly on the infield 4) Pop fly in the outfield 5) Fly ball in the outfield 6) Ground ball (no bunts)
These categories are based on the judgment of the playbyplayer scorers. To some extent we can expect some of the subtle distinctions to be at least partially based on the outcome (e.g., if a ball in the outfield could reasonably be scored as either a fly ball or a line drive, if it is not caught, it is probably more likely to be scored the latter). Also, as you will be able to infer from the following charts, it is also possible that there are some severe biases among scorers (assuming that the same scorer tends to score the same team from year to year).
First, here are the percentages of BIP and the hit percentages for the above categories in 2003. The second column is the number of balls in that category divided by the total number of BIP. The third column is the hits per BIP for that category.
Now, here are the yty correlations for these same categories. Again, pitchers had to have at least 300 BIP in backtoback years to qualify for the regressions. As in the BABIP regressions, there were 107 pitchers who qualified in the 19922003 database and who switched teams from year x to year x+1. There were another 389 pitchers who qualified and who did not switch teams. Pitchers who switched teams (N=107):
Again, since we only looked at pitchers who switched teams from year x to year x+1, we have essentially removed the home park and defensive influences from the correlations.
The results are quite interesting. Even though the overall correlation on BABIP (the last entry in the third column) is near zero, there appear to be several components that have some predictive value, and are therefore somewhat within a pitcher?s control.
Not surprisingly, a pitcher?s FB and GB as a percentage of his total BIP (essentially his G/F ratio) are very much within a pitchers control and appear to be relatively stable from year to year. The number of IF pop flies and to some extent OF pop flies, as a percentage of all nonGB BIP, are somewhat a unique function of the pitcher as well. In other words, good pitchers may tend to get more pop files than bad pitchers, as a percentage of their total nonground ball balls in play.
Even though IF and OF line drives individually do not correlate well from year to year, the percentage of all line drives a pitcher allows appears to be somewhat within his control as well. So good pitchers may also give up fewer line drives per BIP.
The last column, or the hit percentage correlations, is even more interesting. The only category that a pitcher appears to have any significant control over is his hits per outfield line drive. Essentially what this means is that the better pitchers allow outfield line drives that are easier to catch. In other words, a ground ball is a ground ball, a fly ball is a fly ball, and a line drive to the infield is a line drive to the infield, regardless of the pitcher. On the other hand, all outfield line drives are not created equal. As well, the small positive correlation in the hits per GB category may suggest that good pitchers allow ground balls that are slightly easier to field.
To summarize the implications of the above chart, even though the yty $H correlation of a 500 BIP pitcher is very small, a pitcher may have a fair amount of control over certain components of those BIP. The regression results suggest that good pitchers give up slightly fewer line drives and slightly more pop flies, as a percentage of their total BIP, and that their line drives hit to the outfield (and perhaps their ground balls) may be softer and therefore easier to catch. Further research using batted ball speed as one of the regression parameters may be useful.
Finally, to get an idea as to how home parks and defense (and perhaps the PBP scorers) affect the above regressions, here is the same chart of correlations, using only those pitchers who played on the same team in year x and year x+1. Pitchers who did not switch teams (N=389):
As you can see, many of the correlations increase substantially, suggesting that defense, home park, or the PBP scorer, plays a significant role in creating these correlations in the first place. Whereas the increased correlations in the last column are not surprising, considering that defense and home park can have a substantial affect on the hit percentages of the various BIP components, the correlations in the second column are surprising. There is no particular or logical reason why a pitcher?s defense or his home park should play a substantial role in the percentage of his BIP that are line drives or pop flies, yet from the above chart, that appears to be the case. I can only hypothesize that the PBP scorers may be creating a bias in the data.
In conclusion, while it appears that Voros was essentially correct in that a pitcher has little control over his BABIP, he was not able to investigate this phenomenon on a more granular level, which requires an analysis of PBP data. Such an analysis suggests that pitchers may have more or less control over various components of their BIP than their overall $H yty correlation would imply. In fact, good pitchers probably tend to give up fewer and softer line drives and easier pop flies than do poorer pitchers. It also appears that defense and park factors, and perhaps even PBP scorers, can exert considerable influence on a pitcher?s yty correlations for $H or for various of the BIP components. Finally, Voros failed to explain the considerable influence that sample size (the number of BIP, not the number of pitcher seasons in the sample) has on the yty correlations, regardless of the pitcher?s skill and the spread of talent in the population.
Mitchel Lichtman
Posted: February 29, 2004 at 06:00 AM  43 comment(s)
Login to Bookmark
Related News: 
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsWhat do you do with Deacon White?
(17  1:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  8:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  10:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  4:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  3:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH SABR 40: Impressions of a FirstTime Attendee (5  11:12pm, Aug 19) Last: Joe Bivens, Minor Genius St. Louis Cardinals Midseason Report (12  12:42am, Aug 10) Last: bjhanke Napoleon Lajoie: Definition of Grace (9  12:38am, Jul 01) Last: Hang down your head, Tom Foley Youth Baseball Hitting Drills: Shine the Light (5  6:47am, Mar 11) Last: Pat Rapper's Delight 



Page rendered in 0.8722 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Robinson Cano Plate Like Home Posted: February 29, 2004 at 04:10 AM (#614737)Ted, I think this is a big next step over Tippett's work.
I may come back with questions after I study it a bit more.
You still have the problem of selective sampling and correlation across a narrow range of performance variation. Pitchers who pitch enough to get 300 BIP in successive seasons may very well be the ones who have the most control over the results from balls in play, and pitchers who have the least control over the results probably won't hang around enough to get to 300 BIP in successive seasons. (300 BIP is usually around 7080 IP).
The safest conclusion to be drawn is not that pitchers have *no* effect over the results from BIP  but that among the set of major league pitchers the differences are small enough so that we can treat the pitcher's impact as constant.
 MWE
Mike, I'm not sure what you mean. I think MGL is uncovering the correlations that do exist between pitchers. He's finding the real trends under the noise, despite the sample size issues.
MGL, the more I think about it, the more I'm disturbed by the correlations in your last table. Don't large correlations like that undermine the validity of the data?
Anyway, the conclusions in this article certainly make intuitive sense. As many pitchers have pointed out, pitching is all about disrupting the batter's timing. If you're successful in doing that, it would seem that you'd be more likely to get weakly hit balls, and fewer homeruns as well.
In fact, good pitchers probably tend to give up fewer and softer line drives and easier pop flies than do poorer pitchers.
That sounds really familiar...pretty much my adjustment for pitching staff quality on ZR about 6 years ago  and one of the big reasons I don't subscribe to "Andruw Jones is a god."
It's some nice work, MGL.
Let's say I'm a pitcher with a true skill for $H of .333. Let's further say that that is a true skill and it is very repeatable. Why would it matter if other pitchers also had a true, repeatable skill for $H of .333? Wouldn't the correlation then be very strong for everyone?
For instance, did you see that UZR was quoted in the New York Times today in a Mike Cameron article? No reference to primer, alas.
http://www.nytimes.com/2004/03/01/sports/baseball/01METS.html
Ben, I'm not sure what you mean. The pitchers have better correlations on some components than overall, but they also have worse correlations on some components than overall. For the pitchers who switched teams, there are even two slightly negative correlations. In both sets, ground balls, the second most common type of BIP, have lower correlations than the overall correlations.
I assume the correlations balance out at around the overall BIP correlations. Right?
MGL, great work.
Bunyon,
Over a sample as short as a year, even if every pitcher has a true skill of .333, there's going to be some variation around that, and it will occur more or less randomly for everybody. So you might see results like this:
The overall correlation from year n to year n+1 in that example is .115. With a real sample and with more pitchers and years, you'd get a better result.
Pop flies are defined as "fly balls that don't travel 220 feet" in STATS scoring speak.
Duck snorts are problematic, as they will be interchangeably defined as popf flies and line drives. There generally aren't very many, as a percentage of any one pitcher's BIP.
Studes, do you think the article you cite *should* mention Primer?
Chris, I don't think the Times should reference primer. After all, MGL has basically made his work public, with no copyright protection that I've seen. But it would be nice, particularly for readers who might want to learn more.
*everything* written down is copyrighted. It's implied (or actually, it's law). Everything that you write is copyrighted by you nowadays.
The answer is, yes, the writer should reference where he saw it. He doesn't have to say "MGL", he can link or saw "at Baseball Primer" or whatever  it helps his reader for starters.
Suppose that pitchers really do have an ability to control the percentage of hits on balls in play  IOW, there is a level of "true talent" $H. Assume for the moment that, in order to pitch at all in the majors, you have to have a "true talent" $H of no more than .340. Over 300 BIP (which as I indicated is usually between 7080 IP), this would mean that you'd allow 102 hits in 7080 IP, exclusive of HR  somewhere around 1113 nonHR hits per nine innings, which is about as many as a team can likely tolerate. To pitch well enough to make it through two consecutive seasons of 300+ BIP, you'd have to do better than that, in most if not all cases  a .300 level would be 90 nonHR hits over those 7080 innings, which your team might be able to live with. The best pitchers usually average somewhere around .270 $H.
Now, if you select pitchers with 300+ BIP in backtoback seasons, what you are effectively doing is selecting pitchers from the range .270.300, rather than the .270.340 range that is typical of all major league pitchers  and whether you realize it or not, you're removing a large group of pitchers from the study who operate at the margins of major league performance. Those pitchers might very well exhibit different effects from the ones that you are studying, because they aren't as good as the ones that you are studying, and it's conceivable that the reason they aren't as good is that they don't control H/BIP as well, for reasons which won't show up in a study limited to just good pitchers.
I know MGL understands all of this, so this comment wasn't really directed at him. The point I wanted to make is that correlation analysis needs to be taken with a grain of salt when applied across a restricted range of performance, especially when attempting to measure an aspect of skill that could  if it existed  directly impact that performance.
 MWE
But what about MGL's conclusions? Specifically:
To summarize the implications of the above chart, even though the yty $H correlation of a 500 BIP pitcher is very small, a pitcher may have a fair amount of control over certain components of those BIP.
Seems to me that MGL has uncovered some interesting findings even within his "restricted range of performance". Do you see it differently?
Chris, you were testing me, right? Thanks for the info. I just might send the guy an email.
The question is: how can we detect this, and how much of this can we detect.
Alot of this we just won't be able to detect.
It's apparent that it's alot easier to detect this based on the rate of Line Drives given up.
Is the correlation effect on hits on OF LDs for good pitchers applicable across the range of *all* pitchers? A correlation in a subgroup is not necessarily indicative of a correlation across the entire group, especially if the distribution of the performance being measured is heteroskedastic; there may easily be a subgroup in which a significant correlation is detected that can't be detected in the population at large, because the variance of the statistic is conditioned on the performance level.
That doesn't mean that the detected correlation is not still valuable; as Chris Dial suggests, the ability to generate *catchable* line drives might be a discriminant between good pitchers and bad pitchers. But there's a lot more investigation to do across the entire range of pitching performance before one can be assured that's the correct conclusion to draw.
 MWE
For example: say Greg Maddux allowed 100 ground balls into zone 5 last yr, with 30 going for hits and 70 being turned into outs. Suppose the average GBout rate for this zone was 60%. Maddux would have been expected to give up 40 hits on ground balls to zone 5. Then from ZR, we can see that between Castilla and Furcal, they saved Maddux 8 hits on ground balls to zone 5. Subtracting out the effects of the defense, we might see that Maddux allowed 2 less hits than expected on ground balls to zone 5. Do this over a number of years, and maybe we can observe a trend. Maybe maddux is able to prevent an average of 5 hits per year on ground balls to zone 5, independent of defense. If he can, then maybe we have quantified his skill at getting right handed batters to pull the ball weakly to the third baseman.
I'm just throwing this out there off the top of my head. Does this seem justifiable?
Thanks for the great article. Although I'm undoubtedly biased, it seems to support DRA:
"The number of IF pop flies and to some extent OF pop flies, as a percentage of all nonGB BIP, are somewhat a unique function of the pitcher as well. In other words, good pitchers may tend to get more pop files than bad pitchers, as a percentage of their total nonground ball balls in play."
As you may recall, DRA allocates estimated infield popouts to fielders, and I speculated in the article that popups might be the key variable (when zone data is unavailable) for determining a pitcher's effect on BABIP.
If I'm understanding your article correctly:
1) Infield popups have the highest outconversion rate of any category of BIP: 96%. As mentioned in the DRA article, it a pitcher generates a BIP that is *never* fielded (a HR), we charge the pitcher. If a pitcher generates a BIP that is almost *always* fielded (in infield popup), shouldn't the pitcher get credit?
2) With the exception of the overall tendency to give up either ground balls or outfield fly balls, the tendency to give up infield popups is more persistent than the tendency to give up other forms of BIP. So generating infield popups seems to be, to some degree at least, a skill. Query whether, given the relative rarity of infield popups (at least compared with the category of all ground balls and all outfield fly balls), using larger samples of data (twoyear v. twoyear?) would show a higher "r" for infield popup generation.
The most surprising result seems to be that the rate of giving up linedrives is not that persistent (.14 "r", at least for pitchers who change teams, which I believe is the best data set to use), but *given* the allowance a linedrive, differing outconversion rates *are* persistent. Perhaps AED's is right:
"If there were systematic differences between what is classified as a line drive, you would expect to see significantly lower r's for the players who changed teams. So it seems more likely that you're getting hammered by sample size or selective sampling . . . ."
On the other hand, it might be interesting to determine whether there is a *cross*correlation between generating infield popups and allowing line drives that are less likely to be hits. Throughout the development of DRA, I kept finding that the runweight for infield fly outs was a little higher than it "should" be (compared to other BIP outs), but that was probably because infield fly outs correlate with *outfield* fly outs (which have higher out conversion than ground balls) and may correlate with "easier to field" line drives.
Thanks again for a very helpful study.
"As you may recall, DRA allocates estimated infield popouts to *pitchers*."
But I'm a complete novice when it comes to using these types of statistics, so sorry if this point is completely wrong.
No such thing as standard deviation or confidence intervals for correlations.
In thinking about this:
"Even if something is indeed a skill, if there is little or no spread in true talent with respect to that skill, in the population from which we are sampling (major leagues pitchers), then the yty correlation for any sample size will be close to zero."
This is a pretty important point. My first reaction was that you'd have to argue that the variance of this skill is very close to 0 for this to matter. Then, I realized that this is actualy very close to the current thinking about major league pitching (that most have no influence on batting average on balls in play). So, if you believe this, then the expected yty correlation is zero. If you don't believe it, the expected correlation would be some value other than zero.
There's a few points in here I didn't understand. One is this:
"In fact, the size of the correlation coefficient is a direct function of the spread of talent in the population and the size of the samples in each data element. If there is any skill whatsoever related to the performance measure we are sampling and there is any spread of true talent in the population with regard to that skill, then as the sample size get larger, the correlation always approaches 1. The converse of that is also true. Regardless of how much skill and what the spread of talent is, as the sample size gets smaller, the correlation always approaches zero."
Why should the correlations be expected to approach 0 or 1.00 as the sample sizes get larger? Shouldn't the approach the "true" population value? This statement seems to assume that there are either perfectly linear or oblique relationships and nothing else.
But correlation and (simple) regression are the same thing. In simple linear regression, the beta weight for standardized scores is also the correlation.
See the home page for a site that calculates confidence intervals.
"Even if the pitcher has a specific skill at generating various types of near certain outs that doesn't mean we necessarily would credit the pitcher. Imagine if we had even more granular data to work with that included not just the zone, but the precise speed of all BIP. Imagine further that Carlos Zambrano has a skill at inducing slow hit groundballs to the left side of the infield. His sinker is fast enough so that those pitchs usually aren't pulled directly down the line. So if 99% balls hit in the SS's zone hit between 5580 MPH are converted into outs should the pitcher get credit?"
That is precisely what would happen under a UZR system that also provided a P[itcher] Zone Rating. David Pinto's Probabilistic Model for Range does something similar.
"If inducing pop flies of either variety is a skill, we should take that ability into account when evaluating pitchers, much the same way we take GB/FB ratio into account. That doesn't mean pitchers are contributing to pop flies being converted into outs. The H% r itself seems to suggest that it's the infielders doing the work. Just because OF fly balls are almost always converted into outs (86.5%) doesn't mean that outfielders shouldn't get credit for their defensive contribution."
Simplifying somewhat, under a PZR/UZR system, the pitcher would get almost all of the credit for the out; the fielder only a tiny amount. DRA is a nonzone system that allocates responsibility for BIP to fielders, except that it allocates responsibility for infield fly outs to pitchers. Doing so results in an allocation of pitcher and fielder credit for BIP that comes close to matching the ratio found under the Allen/Hsu "Solving DIPS" paper, as well as fielder ratings that match up well with UZR.
So I think taking park and defense out of the equation doesn't give you anything new. It just gives you wider confidence intervals.
Not to dwell on this point, but I was under the impression that the standard deviation reported was for the pitcher's $H, not for the correlation coefficient. If this is true, then this SD can't be used to create confidence intervals for the correlation coefficient, can it?
The sampling distribution of r is not normal, so you could not use the standard deviation to build a confidence interval.
To build the confidence intervals, you convert r to Z' and add/subtract a weight based on alpha times a multiplier (normally a standard error). I THINK it must be the case that Z' has a distribution that is determined by the sample size, hence the need to only know N to calculate the confidence intervals. But, I could be wrong.
Can anyone explain why you wouldn't use overlapping data pairs (i.e. 1992 on 1993, 1993 on 1994, etc.)? I think this is what Tippett did. You would nearly double your sample size, and perhaps pick up on some weird everyotheryear patterns (e.g. Saberhagen) that you would otherwise miss. Would this introduce some other selection bias?
I would be happy to send you the data when I get back home. Anything I can do to help a fellow researcher...
Cell C3 indicates that there is a good yty correlation for these pitchers in the rate at which outfield line drives become hits; it says very little about reasons why that might be true. It can be true, for example, if the rate at which outfield line drives are caught for outs varies little from team to team or pitcher to pitcher (IOW, if a high percentage of outfield line drives are inherently uncatchable). We need to keep open the possibility that there may be effects other than that of the pitcher which could lead to the correlations that we see. That's why range of performance  or, alternately, performance variance, which essentially tells you the same thing  across the data set is so important to know. If the entire range of performance within the data set varies from, say 26% of OF LD caught for outs to 28% of OF LD caught for outs, I'd have no problem concluding that pitchers have little influence over whether or not OF LD became outs, even if there were a strong yty correlation within that group of selected pitchers.
 MWE
If we only look at pitchers who play for the same team in the regression "pairs," even if pitchers had no control over any of their BIP components, clearly we might (probably will) see some yty correlations, due to good or bad defense, park effects, etc. Those are the correlations I am trying to "factor out."
Now, if we only look at pitchers who changed teams, we have no "systematic bias" because the assumption is that for a large sample, the average park effects and defense of the new team is league average. Therefore if pitchers had no control over any of the BIP components, the correlations for pitchers who changed teams would have to be zero, as there would be nothing "connecting" their BIP rates from one year to the next, which is essentially the definition of correlation (a "connection," loosely speaking).
If pitchers do have some control over some of their BIP components, then that control, as measured by the correlation coefficient, should survive the "changing of teams," since the variation in yty rates as a result of the new defense and park is random (no bias).
As far as how introducing another variable (even if it isn't biased) effects the magnitude of the correlation coefficients, I don't know. I would have to defer to the statisticians for that answer. I could simulate the effect and come up with an answer I suppose. IOW, if the true yty correlation for, say OF line drive hit percentage, were .200, and we then introduced some noise (changing teams and therefore changing defense and parks, which presumably affects the true OF line drive hit percentage independent of the pitcher, would the correlation change? I don't think so, but I am not sure.
In any case, changing of teams cannot create a correlation where none existed or even increase a correlation where one did exist, since where any given pitcher goes is presumably random, at least as far as his first year BIP rates. If there is no "connection" between his BIP rates in his first year, and where he goes, then there is by definition no bias and there can be no increase in the correlations.
As far as the ROE's on ground balls, there is probably not an extra bias associated with ROE's, so adding it into the mix shouldn't change anything, although since many more ROE's are on GB's to the left side of the infield, we might see some "extra" correlation (higher coefficient) based on a pitcher' handedness. Since we aren't really interested in that kind of "control," it is probably a good thing NOT to include ROE's in the ground balls...
With all respect, I think that you are completely wrong, but not being a statistician, I would have to defer to someone like AED if he is still lurking.
If I get a chance, I'll do some "correlation sims" which should clear up the issue...
These data pairs are not independent, since it's still the same person pitching in 1992 and 1994. Let's say I'm taking your blood pressure once a year  you would not say that your blood pressure in 1994 is independent of 1992. There are ways to adjust for this data structure, where you have repeated measurements on the same person, in linear regression. I'd have to pull out my school notes to give you the details, but basically you come up with some assumed correlation matrix that describes where correlations are in the data, then the regression procedure uses that information in its calculations. Probably only affects the standard errors of your calculations, though.
Come on, that's a silly question.
Could you point out what exactly was wrong in my arguments (other than a couple of typos)? I'll give one more analogy, and I would appreciate it if you would think about it instead of blindly saying it's wrong.
Honestly, I haven't had the time to go through your premise. On the other hand, I did take lots of time to think through my thesis when I wrote the article (several months ago).
As I said, I think you are wrong, but I'm not sure. I did say that I think you are wrong, not that I was sure you are wrong, didn't I? There is a big difference (from my perspective).
I am no statistician either but I do know a fair amount about regressions and correlations, and have done a fair amount of work in baseball analysis using those "tools."
However, I'm not sure I am capable of addressing your specific concerns, so I apologuice for just saying that "I think you are wrong," as that is not particualry helpful. As I said, I can probably write a simple sim or two that would tell us how changing parks (park effects and defense) affects a correlation, whether or not there is one in the first place.
I'll try and do that tonight....
You must be Registered and Logged In to post comments.
<< Back to main