Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Friday, February 17, 2006

The Hardball Times: Gassko: Another Look at Batted Balls and DIPS

Retooling DIPS 3.0…with David Gassko.

So here’s the question that every reader is probably asking right now: Why? Why is this important? Why do we want to know DIPS 3.0? Why is this any better than regular old DIPS or FIP?

My answer: Because it includes more data. Generally speaking, the more data you have the better. That especially applies here, when there is so much variation in official pitching statistics. For example, we know that a pitcher has some control over his BABIP. We even know how, for the most part, he can control it. But there was no statistic that reflected that knowledge prior to DIPS 3.0.

This system allows us to understand and evaluate a pitcher’s performance in more granular ways than ever before. It tells us when something fluky has happened, and gives us a better idea of what to expect the rest of the way or in the future. DIPS 3.0 gives us a better picture and understanding of pitcher performance, and that is why it’s important. DIPS 3.0 is the next logical step in defense-independent pitching analysis because it incorporates batted ball data and largely corrects for the noise contained within that data. Voros himself acknowledged the importance of developing such a system three years ago.

Thanks to Guy M

 

Repoz Posted: February 17, 2006 at 06:54 PM | 98 comment(s) Login to Bookmark
  Tags: sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Joel W Posted: February 17, 2006 at 07:21 PM (#1866144)

I think it’s an important metric to have, I’m just not so sure I think it’s valid on a number of levels.  For one, I think you have a large issue with underestimating the fog.  Actually, Eric Van put it very nicely over on Sons of Sam Horn yesterday. 

There are still people who think that if BABIP (or any other stat)
correlates at, say, .20 a year, then the individual variations in true
BABIP must be very small, and/or can be ignored as statistically
significant but not “baseball significant.” Which is completely and
utterly wrong. As a simple thought experiment will tell you: imagine
that the true range of BABIP is from .250 to .350. Now, construct two
defensive teams, one composed of Little League fielders and the other
composed of the ghosts of the greatest defensive players in MLB
history. Play two seasons with the pitchers randomly re-assigned
between teams. How large a BABIP correlation do you expect to see?
Almost none, because the variation in defense swamps the variation in
pitching.

A signal of any strength can be drowned out by sufficient noise. This
is essentially BJ’s “Underestimating the Fog” insight, which I’m proud
to say I was arguing for 5 years ago”

Typical EV at the end, but good nonetheless.

I simply think that there are so many things that can effect a pitchers line drive rate that change from year to year that there is too much noise, but there is a discernable skill.  For example, if you download last year’s chart, you will see that Curt Schilling was the fifth best pitcher in the league for pitchers with over 400 BF!  That’s just ridiculous, and to chalk that up to luck would be insane.  Pitchers who get injured but still pitch will likely just give up a lot more linedrives, or will be hit harder when they do.  Clearly, if Schilling is healthy this year, he will have a lower LD rate, etc, and so there will be no y-t-y correlation, but the effect was clearly real last year.  If Schilling isn’t healthy, and is getting hit like that this year, then he won’t pitch much, and we won’t find his data meaningful because of small sample size.

This may make DIPS 3.0 helpful as a predictive tool, but certainly looking back I don’t think we can call it useful as an analytic tool.  It explains too much. 

Maybe figuring out line drive rates will require tools that aren’t captured in our traditional statistics.  Something that DSG worked on before, with his stuff rating, might tell us about future statistics.  It might be movement or bite, or velocity, but this does not mean that what we have isn’t real in some sense just because the Y-T-Y correlations are off.  So much happens with pitchers, their pitches lose bite or they lose velocity, or like Schilling, everything is just flat and hittable.

Does this mean that I think we should use LD rates right now as a predictive tool? Of course not, but the fact that the y-t-y isn’t there by no means suggests to me that it’s not a skill.

   2. Matt Clement of Alexandria Posted: February 17, 2006 at 07:27 PM (#1866156)

I’m still troubled by a statistic that doesn’t really explain whether it looks forward or backward.  If it looks forward, why has the regression only been done on certain numbers and not others?  If it looks backward, why have they made the leap from “lack of y-t-y correlation” to “didn’t happen”?  I can accept DIPS as a relatively rough projection/regression stat, but the terms “skill” and “control” suggest something backward-looking as well.

What I’m more interested in, though, is the threefold finding which underlies this study:

(1) Pitchers have great control over whether a ball becomes an outfield fly ball, infield fly ball, or ground ball, but show no consistency in the number of line drives they allow from one year to another; (2) Pitchers have pretty much no impact on whether a batted ball becomes a hit, and if it does, what kind of a hit it becomes; and (3) This includes home runs, which seem to be solely a function of the number of outfield flies and line drives a pitcher allows.

The control over BABIP as cited in Repoz’ introduction, then, refers only to a pitcher’s GB/FB ratio, and the expected BABIP of groundballs and flyballs.  It’s got nothing at all to do with “hittability” or anything like that.

I haven’t read THT, and I can’t speak to how they arrived at these conclusions.  I’d be very interested to see how they dealt with the seemingly contradictory conclusions of Tango and Erik Allen, as well as the work on knuckleballers and closers and other weirdos, and how they accounted for the critiques of Mike Emeigh that the method of data selection and analysis has unfairly excluded those pitchers who do lack a certain level of BABIP skill.  The net effect of those studies and arguments has suggested to me that the set of pitchers to whom a DIPS statistic is usefully forward-looking is much smaller than the rhetoric implies, and made me very skeptical of the claims that a DIPS statistic is backward-looking.

   3. Matt Clement of Alexandria Posted: February 17, 2006 at 07:40 PM (#1866182)

THT I read, as posting in this thread makes rather obvious.  I haven’t read the Hardball Times Annual, rather.

   4. GuyM Posted: February 17, 2006 at 07:51 PM (#1866196)

DSG (assuming you’re out there):  Nice work.  Making the spreadsheets available is a great service.  One request, if not a lot of extra work for you: could you include actual RA/G as a final column?  We can all find that elsewhere, of course, but having it side-by-side would be very helpful for putting your results into context.

   5. Kyle S Posted: February 17, 2006 at 07:52 PM (#1866199)

Joel, I agree with a lot of what you say. I understand what David is doing and why he does it; but I think he is straying further and further from the “goal” of DIPS - i.e. pitching statistics that only measure what the pitcher did, rather than a combination of what the pitcher did and what his defense did behind him.

In the instance of Washburn, his metric may do a decent job of that simply because nLD is very close to actual LD. But for a player that gives up way more LD than the model predicts, his model will treat the “extra” LDs as if they were actually GB/FB/IF. Since defenses usually convert more of those to outs, this player “picks up” runs in his DIPS 3.0 ERA. If “DIPS” still meant “defense independent”, we would ascribe the entir difference in DIPS RA and actual RA to the player’s defense—but that’s obviously not true any more, as some portion of a player’s LD allowed will be treated as having a 90% chance of becoming an out (OF or IF) when actually the true percentage was only 50%.

On the other hand, these statistics are probably much better than ERA at predicting ERA next year; that’s good. If that’s your goal, David, than bravo! But in that case, I suggest a new name might be in order.

   6. Los Angeles Waterloo of Black Hawk Posted: February 17, 2006 at 07:55 PM (#1866203)

There might be an easier way of doing this ... I took the spreadsheets from the Annual that have all the %‘s for BIP for pitchers ... the book also has the linear weight run values for each kind of BIP, so you can easily figure out for pitchers what they would have been “expected” to give up by multiplying everything together (compared to the baseline of an average pitcher).

There were roughly 4.59 RPG allowed in the majors last year, so these DIPS 3.0 figures can easily be converted into a Runs Saved figure.  So I did that for all the pitchers on the DIPS 3.0 worksheet.

Then I took the BIP spreadsheet and gave every pitcher a 15% LD rate, and re-figured the Runs Saved from those.

The correlation between the two lists was .9955.

   7. studes Posted: February 17, 2006 at 08:12 PM (#1866229)

I would also think the correlation between DIPS 3.0 and xFIP would be very high—.95, at any rate.  Not a knock on David’s work, just a comment on how other stats relate to it.

xFIP essentially combines pitcher-specific K and BB rates, with HR set at a league-wide rate per outfield fly and BABIP equal to the league rate.  Can be found on the THT site.

   8. Kyle S Posted: February 17, 2006 at 08:13 PM (#1866231)

Broad question: why not use actual LD allowed rate, coupled with linear weight run values for the BIP types, re-figure IP, and calculate RA from there? Is the goal simply to improve y-2-y correlation with RA?

   9. Joel W Posted: February 17, 2006 at 08:25 PM (#1866244)

We definitely do get into some more philosophical questions on the goal of DIPS:

1) Do we want to see what a pitcher did with the stat or what they will do?

2) Suppose a groundball pitcher is in front of a defense that gives up a lot of hits on ground balls (cough, yankees), but an outfield defense that does not (seattle); do we say that these are defense independent pitching statistics, and grant the pitcher the average out rate on those types of balls, or ought we ask them to adjust to their defense when evaluating their past performance (but not their future)?

On the whole, if we really wanted to see what control a pitcher had over LD rates we’d need to know:

1) what the average LD rate was for the hitter the pitcher faced, this could matter a lot.

2) Park factors for types of BIP.

3) Correlation with injury (I think this is big, and drowns out a lot of the effect).

I will certainly use this type of metric as a piece of my prognostications for next year (I expect Schilling to bounce back some), but I will not use it to absolve pitchers of certain results like I am willing to to some extent with the original version of DIPS.

   10. Damon Rutherford Posted: February 17, 2006 at 08:43 PM (#1866269)

First, By this formula, Washburn was expected to throw 170.4 innings last year, whereas he actually piled up 177.1., I don’t like changing his IP (and thus outs), since I think IP should be kept as actual at individual, team, and league level.  This simply means adjusting everything else, and the results should be the same, just with his actual IP.

Second, I didn’t read any mention about adjusting for park.  I assume I’ll have to do that on my own when comparing two pitchers.

Third, instead of calculating how many line drives are expected to become hits by using the league average, perhaps use the team average instead?  Perhaps this would more accurately separate the team fielding from the pitching? 

Note that I’m quickly brainstorming here, and don’t fully maintain my position on any of these comments, as I have yet to actually crunch any numbers.

   11. GuyM Posted: February 17, 2006 at 08:47 PM (#1866276)

I think he is straying further and further from the “goal” of DIPS - i.e. pitching statistics that only measure what the pitcher did,

That’s a valid point.  This is really more like “LIPS”—a luck-independent pitching statistic.  It approximates what a pitcher would do if we could re-run the season 1,000 times, with the pitcher always performing the same but a random sampling of fielders and opposing hitters.  Of course, it doesn’t entirely achieve that, since there’s also luck (and park impact) in one year’s K and BB rates, as well as the GB/FB ratio.  But he has certainly purged pitcher’s stats of quite a bit of the “noise,” as he claims. 

My question is: has he gone too far, such that he’s stripped out too much talent along with the noise?  This isn’t an easy question to answer.  We know that assuming all of BABIP reflect a pitcher’s talent is wrong, and assuming none of it reflects talent is also wrong.  Where do we draw the line?

DSG has made defensible decisions, based on his and JCB’s y-t-y correlations.  Personally, I’d do 3 things differently:

1) I would break out HRs and credit pitchers with those. The y-t-y correlation for HR/OF is low, but given the OF sample sizes (150 to 200 per year), that doesn’t mean much.  The difference between .08 and .14 HR/OF is about 12 HRs or 20 runs in a season—huge.  But that difference in talent would be swamped by noise looking at y-t-y correlation.  DIPS 3.0 assumes that Clemens has been lucky to be .09 the last 4 years (vs. .11 average)—and lucky to be below average in HR/9 over his whole career—but I don’t buy it.  I’d say the burden is on those who want to argue this isn’t a skill, and that hasn’t been proven yet.

2) Use actual LD%.  True, high or low LD% is mostly luck, but it’s not impacted by fielders and it DID happen.

3) Leave relievers—or at least closers—out of the analysis.  We know many of them have better than average hit rates on BIP.

Finally, the H% on GBs does correlate from y-t-y.  That may just reflect quality of IF defense.  But I think it’s an open question whether some pitchers allow easier/harder to field GBs.

   12. pkb33 Posted: February 17, 2006 at 08:48 PM (#1866278)

(3) This includes home runs, which seem to be solely a function of the number of outfield flies and line drives a pitcher allows.

I had understood in a previous discussion—-and it may have been in relation to DSG’s prior DIPS incarnation—-that the above was viewed with skepticism by many.

IIRC the HBT annual has a study on y-t-y correlations on HR% and it supported the above statement, at least insofar as there wasn’t a clearly identifiable correlation between % of LD and FB that became HR for pitchers across years.

So, what’s the consensus at this point on the above conclusion: is it a “fog” issue (i.e. there’s likely something wrong with the assumption but it’s beneath the detectable range in a large study) or is it, in fact, a reasonably well established statistical reality at this point?

   13. RobertMachemer Posted: February 17, 2006 at 08:55 PM (#1866292)

I’m not sure I understand the complaint about Schilling vis-a-vis DIPS 3.0 in post number 1.  DIPS 2.0 liked him even more, at least dERA-wise, although it’s possible that 3.0 likes him more relative to his peers…

Schilling
2.0 dERA: 3.70
3.0 dERA: 4.03

Or are you complaining about DIPS in general?  (Which you may be—it’s unclear to me).

   14. studes Posted: February 17, 2006 at 08:56 PM (#1866299)

IMO, HR/OF regresses to the mean heavily year-to-year, but not completely (THT analysis would indicate you regress one-year stats 90% to the mean).  If you have enough data from prior seasons, you might regress a lot less.  I figured that out at one point, but I can’t find my data. Will keep looking.

BTW, there is a great graph showing this in John Burnson’s “Graphical Pitcher”

   15. Joel W Posted: February 17, 2006 at 09:06 PM (#1866320)

Robert,

I suppose it’s a DIPS 2.0 argument to some extent, but not exactly.  With 2.0, I know what it’s doing, and I know I can say, “you know what, Schilling gave up a lot of line drives, so I think that regressing it that much is ridiculous.”  3.0 is saying “Schilling gave up a lot of line drives, but that’s just luck.” 

That just seems ridiculous to me.  It wasn’t luck, it was obvious that he was getting shelled and giving up line drives.  Maybe you could even normalize his line drive numbers, but not his run values per line drive.

   16. Kyle S Posted: February 17, 2006 at 09:10 PM (#1866325)

i thought DIPS 2.0 says “schilling has a high babip which should be regressed somewhat”. I didn’t think whoever did it (tango right?) had BIP data at the time, or at least he wasn’t using it in dips 2.0. is that wrong?

   17. studes Posted: February 17, 2006 at 09:22 PM (#1866339)

Interesting.  Three-year HR/OF trends aren’t quite as persistent as I thought they would be.  The correlation rises from .10 (one-year) to .165 (three years vs. the fourth year).  I park-adjusted the HR rates to get this.

So, even with three years’ data, you’d regress over 80% to the mean.

Personally, I do think there is “fog” in this sort of analysis, but I’m not a good enough statistician to figure out what the next step would be.

   18. GuyM Posted: February 17, 2006 at 09:22 PM (#1866340)

IMO, HR/OF regresses to the mean heavily year-to-year, but not completely (THT analysis would indicate you regress one-year stats 90% to the mean).

Assuming a sample size of about 150 OFs, then your 95% confidence interval will be +/-.05 around a mean of .11.  So yes, unless the true talent differences are enormous the y-t-y correlation will be low. But that doesn’t tell us whether there are important differences in ability.

This also takes us to the larger question of what the stat is for.  If you want DIPS to tell you “what did the pitcher do this year, separate from his fielders?,” then don’t regress (my vote).  If DIPS 3.0 is a projection for next year, you should regress.  But under what circumstances (and this is my problem with PrOPS too) do you want to make a projection for next season based only on the previous year’s stats?  It’s a statistic that answers a question no one would ask…..

   19. DSG Posted: February 17, 2006 at 09:34 PM (#1866353)


I simply think that there are so many things that can effect a pitchers line drive rate that change from year to year that there is too much noise, but there is a discernable skill. For example, if you download last year’s chart, you will see that Curt Schilling was the fifth best pitcher in the league for pitchers with over 400 BF! That’s just ridiculous, and to chalk that up to luck would be insane. Pitchers who get injured but still pitch will likely just give up a lot more linedrives, or will be hit harder when they do. Clearly, if Schilling is healthy this year, he will have a lower LD rate, etc, and so there will be no y-t-y correlation, but the effect was clearly real last year. If Schilling isn’t healthy, and is getting hit like that this year, then he won’t pitch much, and we won’t find his data meaningful because of small sample size.

This may make DIPS 3.0 helpful as a predictive tool, but certainly looking back I don’t think we can call it useful as an analytic tool. It explains too much.

I figured this would be the main criticism. For the record, Schilling’s DIPS 2.0 ERA is 3.70, in line with the DIPS 3.0 figure. My belief, and I think I made this clear in my article, though I’m happy to discuss it further, is that the point of DIPS is to weed out those components pitchers have little control over, if you define control as the ability to perform stably in a category from year to year. In that case, LDs have to be converted to league average. If you include actual LD-rate, your results will end up almost perfectly replicating component RA not DIPS. My goal here is to refine the basic DIPS methodology with more granular data: The results should be generally pretty close to DIPS.

If it looks backward, why have they made the leap from “lack of y-t-y correlation” to “didn’t happen”?

That’s what every DIPS version has done. We know pitchers have *some* control over these components, it’s just that for a simple stat like DIPS, it’s easier to disregard them totally because they correlate so poorly y-t-y. It’s not that pitchers have no control; it’s that whatever control they do have is so drowned out with noise that it’s simpler to disregard the components completely.

One request, if not a lot of extra work for you: could you include actual RA/G as a final column? We can all find that elsewhere, of course, but having it side-by-side would be very helpful for putting your results into context.

I did originally, then I had a problem with pitchers who played for two different teams in the same season. I’ll add that, and some other information and post the new spreadsheets soon enough, probably.

If “DIPS” still meant “defense independent”, we would ascribe the entir difference in DIPS RA and actual RA to the player’s defense—but that’s obviously not true any more, as some portion of a player’s LD allowed will be treated as having a 90% chance of becoming an out (OF or IF) when actually the true percentage was only 50%.

DIPS, even the original DIPS, isn’t truly defense independent. PZR (the flip-side of UZR) would be defense independent. DIPS is “things pitchers don’t really have much control (using my definition of control) over” independent. In that sense, I think I’m sticking to the spirit of DIPS.

I would also think the correlation between DIPS 3.0 and xFIP would be very high—.95, at any rate. Not a knock on David’s work, just a comment on how other stats relate to it.

Forgot to mention that in my article! Studes is right: They correlate very highly. I do think that the *extra* information in DIPS is very important, though (mostly infield flies).

Broad question: why not use actual LD allowed rate, coupled with linear weight run values for the BIP types, re-figure IP, and calculate RA from there? Is the goal simply to improve y-2-y correlation with RA?

It’s not. I’ve explained above and in the article: IMO, the goal of DIPS is to remove components pitchers don’t really control. LD are one such component.

I will certainly use this type of metric as a piece of my prognostications for next year (I expect Schilling to bounce back some), but I will not use it to absolve pitchers of certain results like I am willing to to some extent with the original version of DIPS.

But original DIPS is even more radical! Let’s say I had worked off the original DIPS formula, but had added adjustments for GB/FB ratio and for IF flies and maybe for HR/OF…then would you be saying the same thing? Original DIPS not only inserts league average LD-rate but league average OFfly-rate, league average GB-rate, league-average IFfly-rate, etc.!

I don’t like changing his IP (and thus outs), since I think IP should be kept as actual at individual, team, and league level. This simply means adjusting everything else, and the results should be the same, just with his actual IP.

Voros changes IP in DIPS. Tangotiger has told me that I *need* to do so. I agree. Outs on balls-in-play are just as much a function of BABIP as hits on balls-in-play are. If you adjust in one place, you must in the other as well.

Second, I didn’t read any mention about adjusting for park. I assume I’ll have to do that on my own when comparing two pitchers.

Yeah, though generally speaking park-adjusting will barely change things because parks have little or no effect on the types of BIP a pitcher allows and to a lesser extent, on BBs, Ks, and HBP.

Third, instead of calculating how many line drives are expected to become hits by using the league average, perhaps use the team average instead? Perhaps this would more accurately separate the team fielding from the pitching?

Maybe, but then shouldn’t you say that DIPS 1.0 and 2.0 should have used team BABIP as well? It’s one of the same, IMO.

   20. pkb33 Posted: February 17, 2006 at 09:38 PM (#1866361)

Yeah, though generally speaking park-adjusting will barely change things because parks have little or no effect on the types of BIP a pitcher allows and to a lesser extent, on BBs, Ks, and HBP.

Is it true that parks have similar FB/HR and LD/HR rates, though?  I’d think not.

Then again, that is a somewhat different question than I think you are trying to answer, too

   21. Joel W Posted: February 17, 2006 at 09:39 PM (#1866363)

Studes,

Do a pitchers strikeout rates in year X have any correlation with their HR/OF in year X+1? Walk rate? K/BB?  How about the velocity on their fastball or whether they consistently throw a 2-seam fastball? Or maybe it’s the mix of their pitches.  Maybe their LD% in year X doesn’t correlate with X+1 LD%, but it does with HR/OF (for example, if a pitcher was hittable hitters started taking cuts that were designed to hit homers).  I think there are a number of factors that could help us predict HR/OF rate in the future.  I truly feel that an important part of that is the “stuff” type ratings, and having a good grasp of the importance of various “subjective” factors that we can make objective.

   22. studes Posted: February 17, 2006 at 09:44 PM (#1866373)

Joel, yes to strikeout rates impacting HR/OF rates.  Don’t know about the other stuff.

I did raise the barrier of pitchers included in my analysis.  Originally, I included pitchers with at least 50 OF in each of the last four years. 130 of them.

When I raise the barrier to 100 OF (71 of them), I got a correlation of .31, which is more what i was expecting.  So if you get enough data for enough years, you can definitely start to pull back on the regression to the mean.

   23. DSG Posted: February 17, 2006 at 09:46 PM (#1866375)

Is it true that parks have similar FB/HR and LD/HR rates, though? I’d think not.

Obviously, no. But what I’m telling Greg is that DIPS 3.0 is actually largely park-adjusted already, in a way.

   24. Joel W Posted: February 17, 2006 at 09:53 PM (#1866387)

DSG,

You’re right, the earlier versions of DIPS are more radical, but I suppose this is what I wanted to say:

Before we had batted ball data, it made sense to say “a large amount of the variation in the data on BABIP is due to noise in fielding and in luck.”  Now, since we have data that can break down batted ball type, we can take out a lot of the D in dips. 

As somebody said above, that leaves us with LIPS, but not necessarily LIPS.  It leaves us with something which could be a skill but is drowned out by noise.  I think maybe looking at the batters a pitcher faced, for example, might tell us some of that “luck” element on line drives.  I don’t know how I feel about how that works looking backwards, attributing luck when we aren’t sure if something actually is luck.  Now, DIPS 1.0 and 2.0 eliminated luck along with defense, and probably some skill.

The idea of a pitcher having flat stuff and getting shelled is one of the things that made DIPS so initially unappealing.  Then Voros showed that there just isn’t that much of a predictable variation from year to year.  Now we have a way to say, “ok, this pitcher was getting shelled, look at all those line drives.”  Now, maybe it is simply hard to predict what pitchers will do from year to year in the “getting shelled department.”  It simply seems backwards to me to be able to identify who was getting shelled (batted ball data) and still regress it (DIPS 2.0 style).  When I saw Kevin Brown’s hit rate last year, I’d look at his DIPS and basically say, “BS the guy couldn’t get anybody out because his stuff was hittable,” and now we can show that it was hittable, but we’re still going to regress it?

   25. pkb33 Posted: February 17, 2006 at 10:00 PM (#1866395)

A guy getting hit particularly hard also gets washed out in this because his FB/HR and LD/HR rates get normalized.

That may make sense, but I think it doubles the concern about whether that type of actual pitching skill issue is getting overlooked in the metric, in addition to what Joel W noted.

   26. Los Angeles Waterloo of Black Hawk Posted: February 17, 2006 at 10:01 PM (#1866397)

Looking at Schilling for a sec (no park factors whatsoever involved in this post) ...

... last year he was -11.4 in runs allowed.  By DIPS 3.0 he was +6.4.  By using the THT data for all plate appearance results, he was +3.9.  By using the THT data and regressing LD% only (which per my above post correlated at .9955 for all pitchers last season), he was +6.5.  Regressing everything, he was +5.3.

I don’t know what any of this tells us.  Using DIPS and/or batted ball date, we find that he “should” have been around +5 or so last year.  Maybe it was really +4 or really +6, but I think pegging it that close is pretending that we know too much.

Here’s Johan Santana by the above:

Actual Runs Prevented:  +41.1
by DIPS 3.0:  +31.1
by All BFP Results:  +50.0
by All BFP Results, LD% Regressed:  +37.0
by Everything Regressed:  +23.3

He actually bounces around a bit.

I don’t know what any of this means, just throwing it out there ...

   27. Kyle S Posted: February 17, 2006 at 10:01 PM (#1866398)

DIPS, even the original DIPS, isn’t truly defense independent. PZR (the flip-side of UZR) would be defense independent. DIPS is “things pitchers don’t really have much control (using my definition of control) over” independent. In that sense, I think I’m sticking to the spirit of DIPS.

Of course none of the DIPS x.0 have been really “defense independent” thus far, but AFAIK that was their purpose, starting with Voros. I agree with GuyM that what you have done is really Luck IPS (mostly, anyway) and thus should choose a more appropriate name; maybe not LIPS, but not DIPS either.

A stat that uses actual LD rates, league-average BIP->out conversion rates and linear weights (perhaps park-adjusting HR/OF and LD/OF), and adjusts for IP would be a pretty damn good “true” DIPS. It wouldn’t be perfect, because there is still lots of luck in LDs to outs conversion ratio beyond the skill of the defense, but it would be pretty good. I can understand why you haven’t gone down that road, though, as your goal is somewhat different.

   28. Los Angeles Waterloo of Black Hawk Posted: February 17, 2006 at 10:04 PM (#1866404)

I agree that IP needs to be changed in DIPS, if you’re going to present it as an ERA or RA number.  I disagree that we “should” be doing that (though obviously it helps for ease of expression); I think presenting everything as a % of BFP is more meaningful.

   29. RobertMachemer Posted: February 17, 2006 at 10:09 PM (#1866413)

Ok, I just went through the DIPS 3.0 2005 numbers and compared them to the DIPS 2.0 2005 numbers. Here’s what I found for the Red Sox who pitched to over 100 batters…

          dERA-3.0   ERA    dERA-2.0
Schilling   4.03     5.69     3.70
Timlin      4.16     2.24     2.69
Bradford    4.18     3.86     3.76
Wells       4.31     4.45     4.01
Papelbon    4.33     2.65     4.24
Embree      4.46     7.65     4.97
Wakefield   4.67     4.15     4.60
Clement     4.76     4.57     4.18
Myers       4.79     3.13     4.31
Halama      4.87     6.18     4.31
Gonzalez    4.94     6.11     4.63
Miller      5.06     4.95     4.32
Arroyo      5.23     4.51     4.61
Foulke      5.43     5.91     5.20
Mantei      7.03     6.49     5.32

(I hope the pre-tags work… but if/when they don’t, maybe I’ll try reposting the numbers again using something other than <>‘s and pre’s).

Anyway, the only pitcher on the above list whose ‘new’ dERA is better than his ‘old’ dERA is Alan Embree. Every single other pitcher above looks worse after we change from 2.0 to 3.0.

That’s the sort of thing which doesn’t feel right to me. Is it saying that the Red Sox had a better defense than previously thought? Is it saying that the Red Sox were particularly lucky in regards to avoiding line drives?

By the old formula, the Red Sox had a dERA of 4.31 and an ERA of 4.74. I found that to be reasonable and figured that a lot of the difference was explainable: the Red Sox had really bad defense in 2005 (especially at shortstop and in left field), or at least that’s what I believed to be the case.

Now I don’t know what the new formula would suggest, but from the above changes in dERAs, version 3.0 added an average of about half-a-run to each pitcher’s old dERA—I think it’s reasonable to guess that the team dERA would thus be somewhere around 4.70-4.80, give or take some tenths of a run.

So is it suggesting that the Sox were very lucky in terms of how many line-drives they were allowing (and that we cannot expect that to continue in 2006)? (An alternate possibility, that the Sox had league-average defense, seems even more unlikely).

I dunno, I don’t have the math skills to dispute the methodology, but the results are pretty different than the ones given by 2.0, and I generally liked the ones in 2.0. It may just be I have to learn to deal with my general discomfort with the new numbers, but I still think the above result (virtually every pitcher’s dERA got worse) seems unlikely enough to raise at least some red flags.

   30. Mike Emeigh Posted: February 17, 2006 at 10:11 PM (#1866416)

It’s not that pitchers have no control; it’s that whatever control they do have is so drowned out with noise that it’s simpler to disregard the components completely.

Correlation analysis of LDIP percentage suffers from the same problem that correlation analysis of BABIP does, as MCoA alludes to in #2. The range of performance is capped on the high end (somewhere around 25% by my data sets, probably higher in the BIS data, for reasons we’ve discussed before), and pitchers who can’t control LDIP at all are weeded out of the picture very quickly. In my analysis, the group LDIP% for “cup-of-coffee” pitchers (those w/ 50 or fewer BIP in a season) was 21.1%; the average for regular pitchers (my baseline group) was 19.4%. The “c-o-c” pitchers also gave up a higher percentage of hits on LDIP. Those group differences are likely not due to chance; there is some level of control being exerted there. The questions are “how much” and “how can we tell”.

I think that rather than looking at y-t-y correlation you need to group pitchers by their level of performance. Take a group of low LDIP pitchers and compare them to a group of high LDIP pitchers, and see how they perform “as a group” each year. I’m trying to do something like this now, and since my wife is away I can actually work on it this weekend!

—MWE

   31. pkb33 Posted: February 17, 2006 at 10:14 PM (#1866418)

Anyway, the only pitcher on the above list whose ‘new’ dERA is better than his ‘old’ dERA is Alan Embree. Every single other pitcher above looks worse after we change from 2.0 to 3.0.

I wonder if that’s because his 8 HR in 38 innings shrank….

   32. Matt Clement of Alexandria Posted: February 17, 2006 at 10:22 PM (#1866425)

Robert -

They’re differently scaled.  3.0 is scaled to RA, and 2.0 to ERA.  If you convert the two stats to the same scale, you get 6 or 7 of the 15 with a higher 3.0 than 2.0.

   33. DSG Posted: February 17, 2006 at 10:27 PM (#1866433)

I think that rather than looking at y-t-y correlation you need to group pitchers by their level of performance. Take a group of low LDIP pitchers and compare them to a group of high LDIP pitchers, and see how they perform “as a group” each year. I’m trying to do something like this now, and since my wife is away I can actually work on it this weekend!

Mike,

I’m sure you know this, but the correlation is always higher when you look just at extremes.

   34. RobertMachemer Posted: February 17, 2006 at 10:40 PM (#1866453)

They’re differently scaled. 3.0 is scaled to RA, and 2.0 to ERA. If you convert the two stats to the same scale, you get 6 or 7 of the 15 with a higher 3.0 than 2.0.


Ah, yes, well, that would explain it, wouldn’t it?  (And now I found where he mentioned that).  Cool, thank you.

So by what would I multiply 2.0 (or 3.0) in order to get them on the same scale?  And is all of the information needed to calculate 3.0 available to the public?  (And if so, does anyone want to help me set up the spreadsheet?  And if not, how different are the results from 2.0 to the ones in 3.0?  Who comes out radically different?)

   35. chrisisasavage Posted: February 17, 2006 at 11:04 PM (#1866493)

Way cool!  I did almost exactly the same thing once using the hardballtimes.com stats copied into excel (BaseRuns based on batted ball types, neutralizing LD% to league average).  It worked pretty good, but was in no way an “analysis” I was more or less having fun.  Being a Mariners fan, it made me really really happy to see Felix Hernandez was nearly as good as his ERA indicated.  His LD % were too low, they will regress.  On the flip side the Washburn example puts a damper on my enthusiasm :( 

I thought of taking my original spreadsheet, applying the THT 2006 Annuals Batted Ball park factors, and making a drop down to show RA based on how different stadiums affect the pitchers batted ball lines and run values.

   36. Joel W Posted: February 17, 2006 at 11:23 PM (#1866529)

Felix Hernandez is a good example of why I think we should be looking at Quality of Batters Faced when trying to grasp the ability of pitchers to control LD%.  His QBF: .262/.326/.415

Factoring out as much noise as possible is important when trying to look at y-t-y correlation for difficult to measure skills.  How good the hitters a pitcher face were at hitting line drives is a good start.

   37. GuyM Posted: February 17, 2006 at 11:30 PM (#1866540)

if you define control as the ability to perform stably in a category from year to year.

It’s not that pitchers have no control; it’s that whatever control they do have is so drowned out with noise that it’s simpler to disregard the components completely.

To me, these comments capture the main problem underlying DIPS analysis: it assumes stability equals ability.  If a pitcher can be 10th in the league in BABIP one year but 65th the next year, it can’t be much of an ability—right?  Conversely, we don’t see that kind of volatility in K/9 (injury excepted), so it’s a “real” talent. 

But that’s mistaken.  A talent is real if 1) pitchers’ actual ability differs, and 2) the difference impacts RA in a meaningful ways. A lot of noise in the statistic doesn’t change that.  Let’s take two pitchers with these BABIP lines over 6 seasons:
A: .320, .260, .290, .310, .270, .290
B: .240, .270, .290, .250, .270, .300

Say pitcher A has true talent of .290, B is .270.  Now, you could say that B only has an advantage over A in 2 of the six years, so his BABIP advantage isn’t very meaningful (that’s in effect what DIPS concludes.).  But now let’s sequence these in terms of luck, from luckiest to unluckiest year for each pitcher:
A:  .260, .270, .290, .290, .310, .320
B:  .240, .250, .270, .270, .290, .300

Now it becomes more clear:  the talent is always there.  Pitcher B is always 20 points better, given equivalent luck.  The fact that in some year’s A may be very lucky while B is unlucky doesn’t change that.  Who do you want:
The one who will be .300 in his bad years or the one who will be .320? 
The player who will be .240 in his good years, or the .260 guy? 
It’s easy—you always want the better pitcher, because you don’t know and can’t control the luck.  That would be true even if BABIP commonly varied by 100 points per year.  The noise affects our ability to measure the talent—but it can’t change the magnitude or the importance of the talent.

   38. David Cameron Posted: February 17, 2006 at 11:36 PM (#1866550)

Felix Hernandez is a good example of why I think we should be looking at Quality of Batters Faced when trying to grasp the ability of pitchers to control LD%. His QBF: .262/.326/.415

AL average hitter: .268/.328/.424

I don’t happen to think 7 points of OPS is a big deal.

   39. Mike Emeigh Posted: February 17, 2006 at 11:45 PM (#1866563)

I’m sure you know this, but the correlation is always higher when you look just at extremes.

If the group of low LDIP% pitchers as a group still have LDIP lower than the norm the following year as a group - even though they regress toward the mean - and the group of high LDIP% still have LDIP higher than the norm the following year as a group - even though they also regress toward the mean - that goes a long way toward indicating the extent of a skill. I’m not talking about doing correlation, but about look at how the *group* performance varies y-t-y.

Let’s say league norm is 19% LDIP. Mariano Rivera has LDIP 14% in season 1, 18.5% in season 2, and 16% in season 3. There’s a lot of variance there, but he’s always below the league norm - doesn’t that suggest that Mariano is better than the average pitcher in preventing line drives? Or that a guy who posts a 22, 19.5, and 24 is worse? It sure does to me.

—MWE

   40. Mike Emeigh Posted: February 17, 2006 at 11:49 PM (#1866575)

I don’t happen to think 7 points of OPS is a big deal.

I don’t either. I looked at this a couple of years ago, and even with the unbalanced schedule there’s usually not a whole lot of variance in QBF (or QPF, for that matter).

—MWE

   41. Joel W Posted: February 17, 2006 at 11:54 PM (#1866581)

You’re right David, though 7 points of OPS is about a 1% difference in batter quality. 

Jamie Wright: .251/.321/.394
Seth McLung: .270/.337/.434

That’s basically the span of the league.  19 points of average, 16 points of OBP and 40 points of slugging over the course of the season seams significant, ieven if the OPS difference is only 56 points across the whole league.  When we are dealing w/ percentages of balls, where a change from 15% to 16% could mean so much in a regression analysis.

   42. RobertMachemer Posted: February 18, 2006 at 12:16 AM (#1866599)

Say pitcher A has true talent of .290, B is .270. Now, you could say that B only has an advantage over A in 2 of the six years, so his BABIP advantage isn’t very meaningful (that’s in effect what DIPS concludes.).


You could also say that A only has an advantage over B in two of the six years as well.  Even more importantly, A’s advantage in the years in which A is better is dwarfed by B’s advantage in the years that B is better.  But so what?

The problem is that you arbitrarily decided that A will have an true-talent level in BABIP of .290 and that B’s true-talent level is .270, and you then say that DIPS treats them both the same (and gets it wrong).  But what DIPS actually suggests is that there isn’t a true-talent BABIP (or, rather, an insignificant one compared to all the other noise).  Your initial assumption (that they have differing true-talent levels) contradicts DIPS from the get-go, so of course it will ‘prove’ DIPS wrong.

   43. chrisisasavage Posted: February 18, 2006 at 12:20 AM (#1866603)

I think Felix’s low LD% was the result of his extreme groundball tendancies.  I suspect an increase in LD% will be tied to how much he regresses in his ability to induce grounders.  I still expect him to take a bit of a hit on line drives this year.

   44. bibigon Posted: February 18, 2006 at 12:37 AM (#1866626)

This is great stuff, and I’m frankly a little confused as to why there’s so much resistence to it.

A lot of the arguments against it are little more than strawmen.

As for Eric obsession with underestimating the fog, the point he misses there, and the point he consistently misses when I debate this stuff on SoSH with him, is that just because there’s a lot of noise doesn’t mean we should err on the side of these things being bigger skills. Pretty consistently, he has a tendancy to assume that because these studies only indicate “There’s no evidence of the existence of” rather than “There’s evidence to the lack of existence of” that we should operate under the belief that these variations are all indicate of huge swings in true talent.

There are two problems I see with this…

1. He’s randomly assigning the burden of proof to the other side. “The evidence can’t prove anything definitively either way, so my starting assumption is probably right” is the basic way he goes about these issues. By showing the fog is large, he hasn’t shown that these skills exist on a macro level, he’s just shown that it could plausibly still exist, and be beyond our ability to see.

2. If these skills exist in a macro sense, but the noise is so large as to completely drown them out anyways, then exactly what usefull conclusions can we reach from that? At the very least, they all need to be regressed very heavily, because the noise is so strong. The very point Eric is making about the noise being strong is exactly why we shouldn’t take the raw values we take at face value, and yet, he seems to miss that part of it.

   45. Mike Emeigh Posted: February 18, 2006 at 01:02 AM (#1866646)

Well, here’s an initial data point - make of it what you will. (I’m not making anything of it myself yet, since it’s only one data point.)

There were 37 AL pitchers who allowed at least 300 BIP in both 2004 and 2005 for the same team. I divided them into thirds based on LD percentage, 12 “high”, 12 “low” and 13 “mid”. The group of 37 pitchers as a whole allowed LD on 18.4% of BIP in 2004, 18.6% in 2005.

The 12 pitchers on the high side in 2004 allowed LD on 20.6% of BIP in 2004, 20.0% in 2005. The 12 pitchers on the low side in 2004 allowed LD on 16.1% of BIP in 2004, 17.2% in 2005. In both cases, there was some regression toward the mean, but (perhaps surprisingly) not all that much. 6 of the 12 high pitchers were high in both 2004 and 2005, and one (Ryan Franklin) went from high to low. 7 of the 12 low pitchers were low in both 2004 and 2005, and one (Jon Garland) went from low to high.

The high LD group had a G/F ratio (excluding LD) of 1.11 in 2004, 1.15 in 2005. The low LD group had a G/F ratio of 1.55 in 2004, 1.54 in 2005.

—MWE

   46. GuyM Posted: February 18, 2006 at 01:14 AM (#1866652)

But what DIPS actually suggests is that there isn’t a true-talent BABIP (or, rather, an insignificant one compared to all the other noise). Your initial assumption (that they have differing true-talent levels) contradicts DIPS from the get-go, so of course it will ‘prove’ DIPS wrong.

You miss the point. DIPS looks at a few years of data, sees weak y-t-y correlation, and concludes there are no significant differences in true talent.  DIPS actually has no idea whether there is a 20-point difference in true talent, because it doesn’t look at sample sizes large enough to answer the question.  In fact, when you do look differences of this magnitude do exist.  So if my scenario “contradicts DIPS,” that’s only because DIPS is wrong.

Actually, my point is not about DIPS per se, but the emphasis on using y-t-y correlations to answer questions when much better (larger) samples are available.  My perception is that the DIPS work did a lot to encourage use of that method, but I could be mistaken.  But certainly, DIPs is an example of such a focus. 

If these skills exist in a macro sense, but the noise is so large as to completely drown them out anyways, then exactly what usefull conclusions can we reach from that? At the very least, they all need to be regressed very heavily, because the noise is so strong.

This is true only if you arbitrarily limit your analysis to one year at a time.  This is a good example of the way a focus on y-t-y correlation constrains and distorts our thinking.  Take the example of Mariano above: do you really want to regress last year’s LD% 90% of the way to league mean, when you have 4 years of data showing Rivera is better than average?  Or regress his .06 HR/FB rate (over 4 years) to the lg avg of .11?  Perhaps you think this improves your accuracy; I don’t.

   47. chrisisasavage Posted: February 18, 2006 at 01:21 AM (#1866660)

Makes sense, the more GB, the less LD, since the batters are lifting less pitches in the air.

   48. pkb33 Posted: February 18, 2006 at 01:22 AM (#1866662)

Pretty consistently, he has a tendancy to assume that because these studies only indicate “There’s no evidence of the existence of” rather than “There’s evidence to the lack of existence of” that we should operate under the belief that these variations are all indicate of huge swings in true talent.

I don’t think that’s what he says, though.  I think he says that there’s enough other variables in play that they obscure the same true talent level in the expressed metric (BABIP).

The problem with characterizing the studies as showing the lack of existence of the skill is you assume the other variables are fully accounted for, and we have good reason to think this is not reliably the case.

   49. reno dakota Posted: February 18, 2006 at 01:31 AM (#1866668)

Emeigh: To me, that’s the most surprising finding of all this. If there isn’t any y-t-y correlation for line drives as a % of BIP, then I’m totally on board with Gassko’s findings. But I really, really suspected that there would be. Not a huge amount or anything; perhaps an r-squared of something like .2. But something.

Can anyone point me to the study that proves this assumption? It seems to me that it’s at the very heart of the discussion.

TIA.

   50. misterdirt Posted: February 18, 2006 at 01:43 AM (#1866681)

I convert the pitcher’s new batted ball line into hit-types. For example, the average line drive became a single 50.8% of the time in the American League, so Washburn’s predicted number of singles off line drives would be .508*117 = 59. If we do the same for all the other batted ball types, we find that Washburn “should have” allowed 120 singles last year.

This step in David’s process creates problems in trying to compare DIPS 3.0 numbers to actual ERA.  There are relatively large differences in the conversion rates of OF’s depending on ball park and outfield position.  Call them Area Park Factors. And they exist for non HR fly balls as well as HR’s.  There are also Area Park Factors for LD’s but they vary less from park to park.  Using a league average conversion factor may give a resulting value that is closer to the pitcher’s “true value” as a pitcher but will not show what his value was in terms of runs prevented for his particular team or give a predictive value for what he will do if he stays with the same team.

The 12 pitchers on the high side in 2004 allowed LD on 20.6% of BIP in 2004, 20.0% in 2005. The 12 pitchers on the low side in 2004 allowed LD on 16.1% of BIP in 2004, 17.2% in 2005.

What I would make of this data point is that there is a difference between pitchers in their ability to prevent LDs, but the reason it doesn’t correlate well from year to year is because its small sample size error is larger than that of OF and GB since LDs occur at only half their rate.

   51. RobertMachemer Posted: February 18, 2006 at 02:26 AM (#1866716)

You miss the point. DIPS looks at a few years of data, sees weak y-t-y correlation, and concludes there are no significant differences in true talent. DIPS actually has no idea whether there is a 20-point difference in true talent, because it doesn’t look at sample sizes large enough to answer the question.


You’re right, I did miss your point.  And I think it’s a good one to make.  Thank you for explaining that to me.  Have there really been no attempts to look at BABIP over any larger periods of time than just one year?  (And then compared to the next year and the previous year?)

Anyway, it seems to me that if DIPS does a good job of predicting ERA (and it is purported to do so) without its needing to know true talent levels for BABIP, then does it matter if pitchers have that ability to prevent hits on balls in play?  If it is a real ability but is small and inconsequential compared to the ability of pitchers to avoid walks and home runs and to get strikeouts, then do we need to kick ourselves for not knowing it?  We don’t kick ourselves for not knowing how much Derek Jeter’s great baserunning adds runs to the Yankees, so why do we care about BABIP if DIPS does fine without it?

And then there’s the problem of identifying a pitcher’s true talent level for preventing hits on balls put in play amid all the noise of defense.  If pitcher A pitches for the Red Sox and pitcher B pitches for the Athletics, and they had your suggested run of BABIPs, there’d be no way to tell whether or not their numbers were the function of their good pitching or the function of the defenses behind them.  (Or their home parks).  Which is why Voros ended up ignoring it in the first place, essentially—he found he didn’t need it and it was too hard to eliminate the pitcher’s ability from that of his defense on balls put in play.  If it’s hard to find and you don’t need it, then what’s it matter for these purposes whether it’s real or not?  Some, sure, because if it is real, then we shouldn’t pretend it’s not real, because it will affect numbers to at least some extent, but that’s already figured into whatever the error-range for DIPS is.

   52. Mike Emeigh Posted: February 18, 2006 at 02:34 AM (#1866726)

I’m going to give you a quote from Clay Davenport’s April 2005 study of minor league pitchers (available as a premium article on the BPros Web site):

Suppose that there is a clear ability to make batters hit a ball weakly, and that teams can recognize it; clearly, this would be a valuable ability for a pitcher to have. Other things being equal, it would give a pitcher an advantage, like height in the NBA. Assuming that teams can recognize it and select for it, you would produce a major league where the selected population is better than its selection group—just as NBA teams are taller, on average, than NCAA teams (its principal recruiting pool), major league pitchers should be better than minor league pitchers, and you should be able to demonstrate a weeding out of the less able. Reaching the major leagues is a sensational example of Darwinian survival.

When Davenport studied minor league pitchers, this is exactly what he found; minor league pitchers who advance to the majors have better BABIP than their counterparts who do not.

—MWE

   53. Mike Emeigh Posted: February 18, 2006 at 02:43 AM (#1866733)

If there isn’t any y-t-y correlation for line drives as a % of BIP, then I’m totally on board with Gassko’s findings. But I really, really suspected that there would be. Not a huge amount or anything; perhaps an r-squared of something like .2. But something.

You’re not going to find y-t-y correlation very easily, for two reasons:

1. The logistics of the data set. The range of performance is too narrow, and the variance too large, for you to find a relationship. See this 2001 article by Keith Woolner, which talks about the noise in BABIP; the noise in LD rate is worse.

2. Pre-screening of the data set. Pitchers who can’t prevent line drives don’t get to pitch in the majors, or get weeded out very quickly once they get there. James made an astute observation in Win Shares (page 118) on this very topic, albeit in a different context:

If you studied the weights and skills of offensive linemen in the NFL, you might very well find that size was irrelevant to success within the group. All of the linemen would weigh between 290 and 350 pounds, but those who weighed 350 pounds might very well tend to be no more successful than those who weighed 290 pounds. Thus, you might very well find that the correlation between size and success for NFL linemen was zero. You could conclude from that that size was irrelevant to success as an NFL lineman. You could conclude that, but you would be wrong. Size is obviously important in an NFL lineman; it’s just that most of the variance has been eliminated by pre-screening (not allowing players who weigh “only” 250 or 260 to play the position at this level) and the remaining variance is disguised by internal biases … If you started using 185-pound players as offensive linemen, you would find out very quickly that size did matter.

Davenport’s observation, quoted above, is similar in nature.

—MWE

   54. GuyM Posted: February 18, 2006 at 03:26 AM (#1866771)

Anyway, it seems to me that if DIPS does a good job of predicting ERA (and it is purported to do so) without its needing to know true talent levels for BABIP, then does it matter if pitchers have that ability to prevent hits on balls in play?

I was going to cite Davenport’s study, but Mike beat me to it.  The one thing I would add is that the BABIP difference btwn ML-bound and non-ML-bound pitchers was roughly equal—in terms of RA—to the difference in K/9, BB/9, and HR/9.  That is, the ability to prevent hits on balls in play is about as important as the other 3 talents in distinguishing ML pitchers from those who don’t make it.

You can also see significant differences among MLB pitchers if you use career or multi-year data.  Keith Woolner and Tippett did this in their studies.  A couple of examples I recall:  Pedro has a careet BABIP of about .270, Andy Messermith was about .240.  That said, most pitchers are indeed clustered close to the mean.

   55. pigsooie1000 Posted: February 18, 2006 at 04:23 AM (#1866844)

Emeigh: To me, that’s the most surprising finding of all this. If there isn’t any y-t-y correlation for line drives as a % of BIP, then I’m totally on board with Gassko’s findings. But I really, really suspected that there would be. Not a huge amount or anything; perhaps an r-squared of something like .2. But something.

Can anyone point me to the study that proves this assumption? It seems to me that it’s at the very heart of the discussion.

TIA.

In the Hardball Times Annual, Gassko and JC Bradbury did a study “Do players control batted balls?” in which the correlation of LD per ball in play for pitchers was -.03.

   56. reno dakota Posted: February 18, 2006 at 04:42 AM (#1866857)

I understand the point that teams almost certainly *pre-screen” guys who are flat terrible at preventing line drives on BIP, and perhaps that they even do it as well as football teams screen out 175 pound linemen. The fact remains, though, that even among linemen—all of whom are big—size still matters. Just because there’s a base level of aptitude required to be a pitcher in the major leagues doesn’t mean that every pitcher possesses that exact amount of aptitude.

As for the variance point: I don’t know. I think that Emeigh’s post 45 looks pretty straightforward. I don’t have any issue with assuming every pitcher’s LD% of BIP is league average until they prove otherwise. But I thought Tippett proved a few years ago that everyone really isn’t the same at this.

   57. Srul Itza Posted: February 18, 2006 at 04:44 AM (#1866858)

Emeigh and Davenport are exactly right.  The object of pitching is to avoid having the batter hit the ball squarely—i.e., hit line drives.  You can do it by striking them out, by inducing grounders, by inducing pop ups.  So long as you keep your line drive percentage below a certain threshold, and do not screw it up by walking too many players, your defense will catch the requisite number of catchable balls, you will have a good ERA and good record, and advance in the system.

Guys who give up too many line drives don’t make it to the majors, or don’t last very long.  Because they are just not good pitchers.

   58. DSG Posted: February 18, 2006 at 04:44 AM (#1866859)

This step in David’s process creates problems in trying to compare DIPS 3.0 numbers to actual ERA. There are relatively large differences in the conversion rates of OF’s depending on ball park and outfield position. Call them Area Park Factors. And they exist for non HR fly balls as well as HR’s. There are also Area Park Factors for LD’s but they vary less from park to park. Using a league average conversion factor may give a resulting value that is closer to the pitcher’s “true value” as a pitcher but will not show what his value was in terms of runs prevented for his particular team or give a predictive value for what he will do if he stays with the same team.

Essentially, I’m already adjusting for park.

Emeigh: To me, that’s the most surprising finding of all this. If there isn’t any y-t-y correlation for line drives as a % of BIP, then I’m totally on board with Gassko’s findings. But I really, really suspected that there would be. Not a huge amount or anything; perhaps an r-squared of something like .2. But something.

.2 is huge! If you have 3 years of data, your “r” isn’t going to be that high! As noted above, JC and I found an “r” of -.03.

You miss the point. DIPS looks at a few years of data, sees weak y-t-y correlation, and concludes there are no significant differences in true talent. DIPS actually has no idea whether there is a 20-point difference in true talent, because it doesn’t look at sample sizes large enough to answer the question. In fact, when you do look differences of this magnitude do exist. So if my scenario “contradicts DIPS,” that’s only because DIPS is wrong.

If you’re trying to figure out true talent, DIPS is of course wrong. But that’s not the point of DIPS! DIPS purposly throws out components that pitchers don’t control much from year-to-year, not because pitchers have zero control over them, but because they have no predictive value in the short term.

(Short term being defined as a year, or sometimes even a few years of data.)

If the group of low LDIP% pitchers as a group still have LDIP lower than the norm the following year as a group - even though they regress toward the mean - and the group of high LDIP% still have LDIP higher than the norm the following year as a group - even though they also regress toward the mean - that goes a long way toward indicating the extent of a skill. I’m not talking about doing correlation, but about look at how the *group* performance varies y-t-y.

If you ask me, it is a skill. But it shouldn’t be included in DIPS.

To me, these comments capture the main problem underlying DIPS analysis: it assumes stability equals ability.

There is a point to doing this. DIPS is not a projection. DIPS is supposed to *explain* a pitcher’s performance. It’s supposed to tell us why his ERA/RA/ERC may or may not be a good indicator of how he actually did or has done, and thus will be expected to do in the next game, season, decade, whatever. That’s the point of DIPS, and I’m surprised so many people are missing it.

   59. DSG Posted: February 18, 2006 at 04:52 AM (#1866865)

The James analogy is a good one, but everyone quoting it is missing the point. It’s not that there exists no ability to prevent line drives. It’s that the variation in true talent among major league pitchers (and THAT is who we’re measuring) in LD% is so small, and the amount of noise so great, that it’s simpler to ignore LD% completely in single-season analysis (and maybe even if looking at many years of data, depending on the specific amount).

   60. reno dakota Posted: February 18, 2006 at 05:11 AM (#1866883)

Another thing: Tango pretty convincingly backed Tippett up.

This says that of the 713 pitchers with more than 3200 BIP, only 57% of them fell within one standard deviation of their teammates over the course of their careers (what we’d expect is more like 68% to be within one std dev). Guys with over 3200 BIP were more than one standard deviation worse than their teammates over the course of their careers 14% of the time, which is close to the 16% we’d expect given normal distribution.

So that means that guys with over 3200 BIP were more than one std dev better than their teammates 29% of the time, rather than the 16% we’d expect given normal distribution. Tango’s study isn’t nearly granular enough to be dispositive (what of guys like Jamie Moyer or Randy Johnson, who spent a long time in the majors before they really “figured out” how to pitch?) and yet it still seems to me pretty convincing proof that in individual cases you’re very likely to find dissimilar levels of skill.

Now, whether or not you care is another matter. If this fact futzes the usefulness of a very useful model, or makes the complications too unwieldy or what have you, then perhaps we don’t care. But it seems to me that the likelihood, based on these two studies, is that this is a skill that may take a bit of time to emerge from the data. But that doesn’t mean it isn’t there.

   61. reno dakota Posted: February 18, 2006 at 05:32 AM (#1866891)

If you’re trying to figure out true talent, DIPS is of course wrong. But that’s not the point of DIPS! DIPS purposly throws out components that pitchers don’t control much from year-to-year, not because pitchers have zero control over them, but because they have no predictive value in the short term.

I find that hard to believe. One of the first things I did when I opened up your sheet was look for guys on my fantasy team, one of whom happens to be Tim Wakefield. I was actually pleasantly surprised to see him at a 4.67 RA, because I know that DIPS models underrate him, as they do all knuckleballers. If the difference is as big as what Tippettt reported for Charlie Hough, it could be as much as a run every three games, or at least .33 in RA. That’s pretty huge.

The issue isn’t that these things don’t have predictive value in individual cases. Of course they do. If you know that Wakefield is going to be underrated, you apply a fudge factor to the numbers and you move on. It’s the same thing we do when we see that Pedro has thrown 120 pitches and is coming back out for the 9th. I understand that putting all of the things we can learn from doing career studies of every individual pitcher might not be a good use of time; it might overcomplicate the model and yield only minor improvements. But “no predictive value in the short term” seems harsh to me.

   62. DSG Posted: February 18, 2006 at 06:03 AM (#1866903)

DIPS underrates kunckleballers not because of LDs, but IF flies. DIPS 3.0 includes IF flies as a seperate variable, and so it has no such problem.

   63. pkb33 Posted: February 18, 2006 at 06:13 AM (#1866909)

Essentially, I’m already adjusting for park.

You aren’t for K and BB (where there are very different park factors) are you?  You are taking the pitchers actual totals there.

For batted balls, I see the case you are adjusting for park in the sense that you are taking league-average hit assumptions from the batted ball types…and thus, largely eliminating the impact of individual parks.

   64. Psychedelic Red Pants Posted: February 18, 2006 at 06:28 AM (#1866916)

So that means that guys with over 3200 BIP were more than one std dev better than their teammates 29% of the time, rather than the 16% we’d expect given normal distribution. Tango’s study isn’t nearly granular enough to be dispositive (what of guys like Jamie Moyer or Randy Johnson, who spent a long time in the majors before they really “figured out” how to pitch?) and yet it still seems to me pretty convincing proof that in individual cases you’re very likely to find dissimilar levels of skill.

Why would you expect a normal distribution?

   65. mgl Posted: February 18, 2006 at 09:02 AM (#1866985)

True BB and K rates do NOT vary much among parks.  And DIPS has ALWAYS really been LIPS.  DIPS was originally a misnomer and stands as such today.  Most of the “noise” stripped from the data in any DIPS formula is as a result of random variation and NOT variation in fielding talent among teams.  On second thought, I suppose you can call the random variation in BABIP “random variation in fielding talent” if you wanted to.  If you do, just know that with a DIPS formula, you are not separating out the true fielding talent of the defense.  Not even close.  If you tried to do that, that would not make much of a dent in data.

   66. Walt Davis Posted: February 18, 2006 at 09:46 AM (#1866991)

Why would you expect a normal distribution?

If in some population an event occurs randomly with a constant rate for everyone in the population (i.e. the assumption that all pitchers have the same BABIP), then any given event could be treated as a Bernoulli trial and a set of independent Bernoulli trials produces a binomial distribution.  For even a fairly small number of trials, the binomial follows a normal distribution (though it deviates some for small probability events).

Now I suppose it could follow a beta-binomial distribution of some sort for some reason.

Now if there is a talent and we restrict ourselves to ML pitchers, there’s no reason to expect the talent to be normally distributed.

Now one “problem” with Tango’s and Tippett’s and many other studies (and where DIPS basically just lucked out) is that BABIP really isn’t what we’re interested in.  What we really care about is OPSBIP or BABIP with SLGBIP or BABIP plus XB/BIP. 

We know that FB pitchers have lower BABIP but give up higher SLGBIP.  I don’t have the data to tell, but I suspect that most of the guys who have career BABIP rates significantly better than their teammates are flyball pitchers.  If for some reason we’re going to limit ourselves to looking at BABIP (which David gets away from), then ideally we’d compare a pitcher to the league/team average adjusted for their GB/FB rate.

Take Mike’s preliminary data example.  Combining his results with what we know about FB and GB out conversion, we have something like this:

FB pitchers: low BABIP on non-LDs, high XB/BIP on non-LDs, high LD%, high HR
GB pitchers: high BABIP on non-LDs, low XB/BIP on non-LDs, low LD%, low HR

The original DIPS controlled for HR-rate.  HR-rate is somewhat correlated with the other things, so the original DIPS had some of that other variation taken care of too.  But those other things counteract one another to an extent anyway.  FB pitchers “counteract” the XBs and LDs with lower hit-rates; the GBers counteract higher hit-rates with lower XBs and LDs.  This is how Voros got lucky in his original versions of DIPS—differences in BABIP get largely washed out by differences in the other variables (and also partly controlled for by HR-rates and probably K-rates which were already in the model).

Anyway, I’m not sure that studies which show us that there are more significant career differences in BABIP than we would expect if it was random tell us anything other than that some pitchers are FB pitchers and some are GB pitchers.  Now some studies that examine whether there are significant career differences in OPSBIP or (base runs)/BIP or something like that would be really cool.

   67. Psychedelic Red Pants Posted: February 18, 2006 at 11:16 AM (#1867000)

If in some population an event occurs randomly with a constant rate for everyone in the population (i.e. the assumption that all pitchers have the same BABIP), then any given event could be treated as a Bernoulli trial and a set of independent Bernoulli trials produces a binomial distribution. For even a fairly small number of trials, the binomial follows a normal distribution (though it deviates some for small probability events).

I know all that, but it doesn’t answer the question in the context in which it was asked.

This approach has a pretty clear problem when it comes to looking at pitchers with long MLB careers (3200 BIP’s ~ 1000 IP)—pitchers who, by random chance, have a LD% 2 SD’s worse than the average, but are otherwise identical to the others in the sample in matters of known pitcher skill, will be far less likely to pitch long enough to get 3200 BIPs because H/BIP’s are a huge part of the game and, traditionally, have been attributed to the pitcher. Baseball (as a system) would tend to cull the high-LD% population but not the low-LD% population, leading to a non-normal distribution. Even if there is no true talent involved in LD suppression, I’d only expect to see a normal distribution in populations of samples too small for manager/general manger reaction (100-150 DIPs maybe?). (Of course careers of that length would likely show the opposite effect—they were excluded from the long-career population. You’d have to take short samples out of long careers too to make a real normal distribution.)

Just to illustrate, 1000 sets of 1000 coin flips would yield a normal distribution. If after 800 flips I excluded all sets that, to that point, had 375 heads or fewer and then performed the remaining 200 flips in the nonexcluded sets then my non-excluded population would be head-heavy and non-normal. Likewise if after every 100 flips the sets that did not have at least 45 heads were excluded the resulting population would be head-heavy and non-normal.

The normal distribution is only a reasonable expectation when the criteria being measured for the distribution isn’t being used to select the sample.

   68. reno dakota Posted: February 18, 2006 at 11:42 AM (#1867001)

Gassko said:

DIPS underrates kunckleballers not because of LDs, but IF flies. DIPS 3.0 includes IF flies as a seperate variable, and so it has no such problem.

Didn’t know that. Thanks.

Davis said:

Anyway, I’m not sure that studies which show us that there are more significant career differences in BABIP than we would expect if it was random tell us anything other than that some pitchers are FB pitchers and some are GB pitchers.

Yeah, that’s a fair point. Still: It seems to me the question has yet to be adequately answered. My concerns started when I saw two well-respected sabermetricians come up with results contrary to the assumptions that underpin DIPS theory, and I really don’t think that those results have been integrated into the progress of DIPS analysis.

In any event, I don’t mean to re-hash an argument that I know has taken place many times on these boards and in which I’ve taken no part until now. I know I’m late to the DIPS party, and I know that you guys have worked out a lot of the problems without contrarians like me getting in the way. You’ve done a fine job getting this far—I suppose I just think there’s a little way between here and where I think the data currently available can take you.

   69. reno dakota Posted: February 18, 2006 at 12:02 PM (#1867003)

Inevitable:

I get the selection bias point. To me, the remarkable thing about Tango’s study was that there were a bunch of guys with 3200 BIP (which I think is about 6 seasons as an SP) who were more than one std dev below the team average. The number was 14%, where a *normal* distribution would have it at 16%. That seems to suggest that the bottom is almost where it should be, but the top is, in some way, different.

I’ll grant that these data alone do not prove this point, but it at least gives me something to wonder about. If this were actually my job, I’d probably try to drill those numbers down a bit further.

   70. greenback Posted: February 18, 2006 at 12:46 PM (#1867005)

And DIPS has ALWAYS really been LIPS. DIPS was originally a misnomer and stands as such today. Most of the “noise” stripped from the data in any DIPS formula is as a result of random variation and NOT variation in fielding talent among teams.

To borrow from linear algebra terminology, it seems to me people are confusing ‘independence’ with ‘span’ here.

   71. studes Posted: February 18, 2006 at 01:37 PM (#1867016)

Back to Mike’s study on line drives, Shandler has done some very good work in this area over the past two years.  A couple of points from the 2005 forecaster:

- Using STATS data from 1994-2003, they found a correlation of .314 in LD%.  Judging from their comments, this is R, not R squared, and the correlation was based on YTY, not multiple years.  I could be wrong.  They looked at pitchers with 500 BFP each year.

- They looked at the top 10% outliers each year and found that 60% regressed at least half way back to the mean.

- They also found that GB pitchers allowed LD at a lower rate, but they further found that the most extreme a pitcher, either in GB or FB rate, the lower the LD rate than less extreme GB or FB pitchers.

   72. pkb33 Posted: February 18, 2006 at 03:23 PM (#1867043)

True BB and K rates do NOT vary much among parks.

Raw BB and K rates do vary a good deal amongst parks though, don’t they?

So when you say “true” rates are you regressing them back already?

   73. studes Posted: February 18, 2006 at 03:33 PM (#1867044)

To respond to a couple of MGL points:

True BB and K rates do NOT vary much among parks.

I have found that they do vary by park, sometimes significantly (though that’s certainly a non-statistical judgement on my part).  For instance, I found that the biggest impact park factor at Pro Player is the strikeout rate factor.  I agree about BB.

And DIPS has ALWAYS really been LIPS. DIPS was originally a misnomer and stands as such today.

Perhaps in execution, but it was originally focused on BABIP, not all batted balls.  Therefore, saying that fielding might be related to the results wasn’t out of line.  It was only somewhat worse than saying DER measures fielding.

David has clearly taken it in a more LIP-like direction by normalizing the home run rate.  I’m not saying that’s bad or anything, just that it should be acknowledged as a new direction for the stat.

   74. GuyM Posted: February 18, 2006 at 03:34 PM (#1867045)

The STATS LD% finding is interesting.  It could be that DSG/JCB’s findings were impacted by the problems w/ BIS data we’ve discussed in the PMR threads.  The BIS LD% was stable 2002-2003 and 2004-2005 (but big shift from ‘03 to ‘04), so half of the study was potentially distorted.  DSG:  You may want to exclude 2003-2004 and see what the correlation is.  For that matter, other correlation may have problems as well, since out% for various BIP types changed as well.  Pinto has a good summary of the changes at:  http://www.baseballmusings.com/archives/012841.php.

If the correct r is .31, DIPS 3.0 should probably use actual LD%; if it’s much lower, DSG’s decision to use league average seems reasonable

Studes:  Does Shandler have any HR data that would argue for/against crediting pitchers with actual HRs allowed?

   75. DSG Posted: February 18, 2006 at 03:55 PM (#1867053)

Mitchel found no correlation for LD% for pitchers who switched teams using Stats data in his spectacular DIPS revisited.

And DIPS has ALWAYS really been LIPS. DIPS was originally a misnomer and stands as such today. Most of the “noise” stripped from the data in any DIPS formula is as a result of random variation and NOT variation in fielding talent among teams. On second thought, I suppose you can call the random variation in BABIP “random variation in fielding talent” if you wanted to. If you do, just know that with a DIPS formula, you are not separating out the true fielding talent of the defense. Not even close. If you tried to do that, that would not make much of a dent in data.

Thank you! If I don’t count enough as a “defensive expert” for people to accept what I’m saying about DIPS, does Mitchel?

   76. GuyM Posted: February 18, 2006 at 04:09 PM (#1867058)

There is a point to doing this. DIPS is not a projection. DIPS is supposed to *explain* a pitcher’s performance. It’s supposed to tell us why his ERA/RA/ERC may or may not be a good indicator of how he actually did or has done, and thus will be expected to do in the next game, season, decade, whatever. That’s the point of DIPS, and I’m surprised so many people are missing it.

You say it’s not a projection, but that it helps tell us what he’ll do in future games and seasons; you say it’s not a measure of true talent, but is a measure of what a pitcher “really” accomplished, which is basically the same thing.  I’d say that DIPS 3.0 is a measure of a pitcher’s true talent based on one season’s data.  Now, that’s going to be more useful for some purposes than others.  It’s a better tool for evaluating young pitchers (where we lack a MLB track record) than veteran pitchers.  It’s especially helpful for determining whether a dramatic change in performance—up or down—represents luck or a change in true talent.

But I can’t see using DIPS to tell me that Pedro “should” have given up 25 HRs rather than 19 (when he has a demonstrated ability to prevent HRs), or that he should have given up 27 more hits than he did.  DIPS will often be wrong on excellent pitchers, because of the interaction of the factors in BsR:  his non-existent HRs will drive in non-existent baserunners who reached on singles that never happened. 

So, if we use DIPS 3.0 appropriately, it can be a useful tool.  But it has it’s limits…

   77. DSG Posted: February 18, 2006 at 04:23 PM (#1867063)

So, if we use DIPS 3.0 appropriately, it can be a useful tool. But it has it’s limits…

Of course, I’ve never said anything differently.

   78. studes Posted: February 18, 2006 at 05:12 PM (#1867094)

Guy, as far as I can tell their home run per fly findings appear to be in line with the ones I quoted earlier from our stats.  However, I can’t find an article in which they directly quote a correlation between years.

   79. Mike Emeigh Posted: February 18, 2006 at 05:58 PM (#1867109)

It’s a better tool for evaluating young pitchers (where we lack a MLB track record) than veteran pitchers. It’s especially helpful for determining whether a dramatic change in performance—up or down—represents luck or a change in true talent.

I don’t agree. All it tells you is *what* happened; it doesn’t tell you *why* it happened.

In Curt Schilling’s case, what stands out in 2005 are three things:

—his FB/GB ratio (exclusive of line drives) went from 1.07 to 0.83
—his LDIP% went up only marginally (in the MLB stats, which don’t show the dramatic y-t-y LD fluctuations that the BIS stats do)
—his performances on component stats were worse across the board; he gave up a higher percentage of hits on GBs, on FBs, and on LDs in 2005 than he did in 2004. Boston’s defense was worse at getting outs on BIP in ‘05 than ‘04, but that would only explain a part of the change; Schilling was worse-than-team in all three categories.

The combination of all of these led to Schilling being torched on BIP. Schilling’s DIPS ERA went up, but by only about half a run; the majority of the change in his actual ERA can be traced to his BIP results. But we don’t really *know* whether it was the injury, a talent change, or some awful luck; we can only guess.

—MWE

   80. RobertMachemer Posted: February 18, 2006 at 07:25 PM (#1867155)

DIPS underrates kunckleballers


Does it still?  Version 2.0 has corrective multipliers (or whatever fancy name is appropriate) to account for knuckleballers and lefthanders.  With the multipliers included, doesn’t DIPS account for knuckleballers just fine?  Are you saying that whatever-the-numbers-are-for-knucklers-and-lefties should be different so that DIPS will no longer underrate them?

   81. Mike Emeigh Posted: February 18, 2006 at 08:31 PM (#1867222)

The mistake that everyone is making is the assumption that if something is an ability, correlation analysis will pick it up - and if the y-t-y correlation is weak, it isn’t an ability, and everyone should therefore be regressed to league average. This is untrue.

Suppose you have a group of 10 players. Based on other information we have available to use, we “know” that we can identify 5 of them as having a true LDIP% of 19% and that 5 of them have a true LDIP% of 21%. We can also “know” that the standard deviation of performance for any pitcher is 3% - that is, 2/3 of the time, a pitcher will be within 3% of his true talent.

The first year, these pitchers actually post LDIP% exactly equal to their true talent. In year two, the five guys who posted 19s in year 1 posted, in order, 17.7, 20.7, 21.9, 19, and 19.3; the five guys with 21s posted 21.2, 20.1, 19.3, 19.5, and 22.1. The correlation factor for this group is r=.273, suggesting only a weak relationship. These 10 pitchers actually all ended up within 3% of their assumed real level of talent (the sample STDEV is 1.39%). In year 3, the 19s posted 15.5, 15.2, 17.3, 23, and 21; the 21s posted 19.9, 15.7, 23.9, 19, and 21.8. The correlation with the “known” talent is again r=0.273; the correlation with year 2 data is -0.006. The sample STDEV in this case is 3.2%, closer to the “known” STDEV of 3.0%.

In both year 2 and year 3, the 19s as a group performed better than the 21s as a group. In year 2, the 19s posted an average 19.7% LDIP, the 21s a 20.4. In year 3, the 19s posted a 19.4, the 21s a 20.1. Even that doesn’t exactly tell us that there’s a real difference in ability - the difference may be more like 19.5 to 20.5 rather than 19 to 21 - but it’s telling that instead of regressing everyone to 20, I will probably be closer to the mark regressing to 19.5 for the 19s and 20.5 for the 21s. Correlation doesn’t tell me that.

The point I’m making here is that the correlation analysis is telling that that it can’t find strong evidence of an ability - even when I might “know” up front that there is one, and in the way that I’ve defined this problem, there definitely is one that underlies the data. When we conclude from correlation analysis “alone” that there is no ability because there is not a strong y-t-y correlation, and that regression to the mean is therefore appropriate, we are likely to be making a mistake.

That’s a rather long winded way of saying this:

No correlation does not mean no ability. It also does not mean that the differences are necessarily small enough so that they can be discarded from your analysis.

—MWE

   82. GuyM Posted: February 18, 2006 at 09:08 PM (#1867244)

I don’t agree. All it tells you is *what* happened; it doesn’t tell you *why* it happened.

DIPS isn’t a crystal ball, of course.  But if Schilling were a young pitcher, we’d feel pretty good about his prospects for improvement—whereas we wouldn’t if his poor performance mainly reflected bad K and BB rates.  An injury of course complicates any interpretation of performance, but I don’t think that invalidates the utility of DIPS in analyzing younger and non-injured pitchers. 

Schilling does raise an interesting issue though:  what is the role of BABIP in pitcher’s declining performance as they age?  Does BABIP rise in a pitcher’s final 1-2 seasons?  Do LD% and/or HR/FB rise?  This would shed light on how much of a skill these are.  Tango did one study showing Hit% rising with age, but I haven’t seen any others.

Studes:  Shandler found that high HR/FB pitchers regressed very heavily, but low HR/FB less so.  This would be consistent with the “Emeigh theorem”—if you give up a lot of HRs w/o a track record of prior success, you lose your job.  So the poorly-performing pitchers who remain in this kind of study (min. IP in consecutive years) improve in the following year.  The higher correlation for the low-HR pitchers is probably a better indicator of the durability of the skill.

   83. pkb33 Posted: February 18, 2006 at 10:00 PM (#1867284)

No correlation does not mean no ability. It also does not mean that the differences are necessarily small enough so that they can be discarded from your analysis.

Thanks for saying this, as this reality is so often missed in this discussion.

Correlation studies are good tools.  They aren’t perfect, and they aren’t the only tool.

   84. DSG Posted: February 18, 2006 at 10:03 PM (#1867288)

No correlation does not mean no ability. It also does not mean that the differences are necessarily small enough so that they can be discarded from your analysis.

It’s easier to say what a low correlation doesn’t mean than what it does, isn’t it? No one, except for Voros, has tried explaining what a low correlation does tell us over the past many years. People don’t want to make definitive statements and get grilled. I say screw it. Here is what a low correlation means: It means that the amount of impact skill has on a certain component, combined with the amount of noise, and the sample size, forms a number which is unlikely to be repeated in the next year/month/game/career. So in terms of single-season analysis, of course you should ignore that component, given a low enough y-t-y correlation! Once you get the sample size high enough or remove enough noise, then, and only then, do you keep it in.

   85. studes Posted: February 18, 2006 at 10:11 PM (#1867291)

The higher correlation for the low-HR pitchers is probably a better indicator of the durability of the skill.

Perhaps.  Graphically, it looks like the 04-05 data moved in the other direction (lower HR/F guys regressed more than the higher HR/F guys). Without Shandler’s data (or someone else’s data and analysis), I’ll personally stick to what I know from the BIS data, flawed as it is.

Mike, I certainly agree with what you’re saying.  I think a lot of folks agree (including David) that a lack of correlation doesn’t prove there is no true talent.  The issue is when/how do you feel comfortable saying there is for a specific pitcher?  I’m not a stats expert, but I’m guessing that with a large enough sample, the issue you’re raising rarely occurs over three year’s data.

   86. pkb33 Posted: February 18, 2006 at 10:22 PM (#1867297)

It means that the amount of impact skill has on a certain component, combined with the amount of noise, and the sample size, forms a number which is unlikely to be repeated in the next year/month/game/career.

The issue is that many take this statement (which is likely accurate) and interpret it to mean “no pitcher can significant affect BABIP”

There’s other variables, and a measurement issue, which stand in the way of that leap, imo.  But maybe I’m wrong…

   87. greenback Posted: February 18, 2006 at 10:25 PM (#1867301)

So in terms of single-season analysis, of course you should ignore that component, given a low enough y-t-y correlation!

Somebody has already addressed this though:

Generally speaking, the more data you have the better.

If you’re doing all this work, then why use single-season analysis?

   88. GuyM Posted: February 18, 2006 at 10:27 PM (#1867303)

So in terms of single-season analysis, of course you should ignore that component, given a low enough y-t-y correlation!

Two questions, DSG:

1) what is low enough?  For example, if Studes is right that r for HR/FB is .31, do you think you should continue to regress 100%?  And if Shandler is right that LD% r is .31? 

2) When and why do you want to do single-season analysis if your goal is determining a pitcher’s true talent level?  Your argument is somewhat circular:  “having decided to use only one year’s data to predict year 2, I should throw out the factors that don’t correlate.”  Well, yes.  But why are you doing that in the first place?

   89. DSG Posted: February 18, 2006 at 11:06 PM (#1867346)

If you’re doing all this work, then why use single-season analysis?

Derek Lowe. His RAs from 2002-04: 6.80 (182.7 IP), 5.00 (203.3), 2.66 (219.7). Three-year weighted average: 5.40 RA with 195.7 IP. Regress that 19% to an average RA of 4.87, and you get 5.30. Add another .14 runs because of his age, and you get 5.44. That was his Marcel last year. But his DIPS RA in 2004 was 4.95. Let’s do the math again with that RA (and using his xIP of 190). Three-year weighted average: 4.55 RA with 199.3 IP. Regress and age adjust and you get 4.75. His actual RA in 2005? 4.58. His actual IP? 222. If you looked at his raw RAs, you’d see a pitcher quickly declining, with no discernible true talent level. Look at his DIPS 3.0 RAs and you’ll see a pitcher with who probably over-performed in 2002 but is actually a 4.50-5.00 RA true talent. In fact, I bet if I did DIPS 3.0 calculations for 2002 (I’ll try to get to this), his DIPS 3.0 RA would be around 4.00, and you’d see this even more clearly.

Sometimes we don’t have more than a single-season. Sometimes we want to know if a player’s single-season line represents an actual change in talent or normal fluctuation. That’s where DIPS 3.0 (or any version of DIPS for that matter) comes in.

1) what is low enough? For example, if Studes is right that r for HR/FB is .31, do you think you should continue to regress 100%? And if Shandler is right that LD% r is .31?

I would test for significance. It probably is, but I won’t comment without actually looking at the data.

2) When and why do you want to do single-season analysis if your goal is determining a pitcher’s true talent level? Your argument is somewhat circular: “having decided to use only one year’s data to predict year 2, I should throw out the factors that don’t correlate.” Well, yes. But why are you doing that in the first place?

That is not my goal. My goal is to look at pitcher’s single-or-half-or-whatever-amount-season performance, and evaluate what it means.

   90. mgl Posted: February 19, 2006 at 03:14 AM (#1867596)

Just to clarify, and this may have already been said (and there are lots of guys on this thread who are better statisticians than I), but a y-t-y correlation (or any correlation of sample performance from one time period to another) tells you the exact spread of talent in the population, not whether there is any “talent” per se, related to that performance. 

In other words, BABIP may very much be a “skill” and the difference between a major leaguer and myself, a little leaguer, or even a minor leaguer may be great, but if the correlation is zero or near zero, then ALL major leaguers have around the same “talent” with regard to BABIP.  This is the same as “no talent” mathematcially speaking, but of course it does not technically mean that preventing hits on BIP is not a skill.

The other caveat is that because of the limitation of the size of the data, we are never sure of exactly that the true “r” is.  Whatever we come up with, even if we come up with exactly zero, the best we can say is that “We are x percent certain that the spread of “BABIP talent” is from y to z (wth a mid-point of zero, although I don’t think it is technically a mid-point with correlations),” or that, “Our best guess is that there is no spread of talent within the population of major league pitchers, with an uncertainty of x,” or, “Our best guess is that there is no spread of talent, but there is an x percent chance that there is some talent and the spread of that talent in the population is as high as y,” etc.

It really is as simple as that when you run correlations.  You get an “r,” the “r” tells you whether and how much spread of talent there is in the population of players you are sampling from, and based on the sample size of the data, you get an uncertainty in that “r” and thus an uncertainty in your estimate of the spread of talent.

There are better (more rigorous, that will give you an answer with a lower uncertainty for the same size data sample) techniques for estimating this spread of talent, but given a large enough data sample, the regression and corresponding “r” value is fine, epsecially if you are not overly concerned with the magnitude of the true value in the population you are trying to estimate from your sample (in the case, the true spread of BABIP skill).

   91. pkb33 Posted: February 19, 2006 at 04:29 AM (#1867677)

In other words, BABIP may very much be a “skill” and the difference between a major leaguer and myself, a little leaguer, or even a minor leaguer may be great, but if the correlation is zero or near zero, then ALL major leaguers have around the same “talent” with regard to BABIP.

If, say, it were the case that defense had a much larger impact on BABIP than pitching, but there was also a large variation in BABIP skill amongst pitchers, then we’d still get very low BABIP y-t-y correlation, wouldn’t we? 

I guess I’m not convinced that we aren’t trying to figure out slugging percentage y-t-y by looking at the correlation of triples hit, in other words, if that makes sense.

   92. mgl Posted: February 19, 2006 at 05:03 AM (#1867726)

If, say, it were the case that defense had a much larger impact on BABIP than pitching, but there was also a large variation in BABIP skill amongst pitchers, then we’d still get very low BABIP y-t-y correlation, wouldn’t we?

I should have said that we have to control for defense. 

Actually to the contrary, if defense played a much larger role than pitcher skill, but there were still pitcher skill, then we would get a very large y-t-y “r”.  Teams with good defense in one year would tend to be good the next.  If a pitcher were randomly assigned a defense each year and the spread of defensive talent were enormous compared to the spread of pitcher talent, then yes, the y-t-y “r” would be close to zero.

You can somewhat control for defense by just using batted ball types and assigning average values to those ball types, but even that assumes that a line drive from pitcher A has the same true value than a line drive from pitcher B, etc., which may or may not be the case, and that is one of the things we are trying to find out by running the regression in the first place.

So yes, it is a little more complicated than I made it out to be because of the darn defense getting in the way.  All the more reason to use some other techniques than a linear regression.

   93. pkb33 Posted: February 19, 2006 at 05:25 AM (#1867741)

Actually to the contrary, if defense played a much larger role than pitcher skill, but there were still pitcher skill, then we would get a very large y-t-y “r”. Teams with good defense in one year would tend to be good the next.

Yeah, probably…but that’s complex too as you have changeover even on the same team, plus aging going on, plus the pitcher’s own variation from their natural skill level.

Anyway, you ended up stating what I was trying to, and more clearly to boot…defense is a complication, we can only estimate its impact, and thus linear regression is limited in what it can do here.

   94. Slivers of Maranville (SdeB) Posted: February 19, 2006 at 05:38 AM (#1867750)

If, say, it were the case that defense had a much larger impact on BABIP than pitching, but there was also a large variation in BABIP skill amongst pitchers, then we’d still get very low BABIP y-t-y correlation, wouldn’t we?

Yes, but then the question would arise: if defense is so important, would you bother paying extra for a pitcher with relatively high BABIP skill rather than one with low skill, when the difference between them is not going to affect the season results 70 or 80 or 90% of the time?

   95. Walt Davis Posted: February 19, 2006 at 06:11 AM (#1867766)

Yeah, that’s a fair point. Still: It seems to me the question has yet to be adequately answered. My concerns started when I saw two well-respected sabermetricians come up with results contrary to the assumptions that underpin DIPS theory

The BABIP finding was what caused all the controversy (and still does for some reason) but the “assumption” of no BABIP “talent” was not a necessary part of DIPS (though I’m not sure Voros recognized this either).  For DIPS to be “exact” what was necessary was for differences in BABIP to have no effect on “defense-independent” ERA after controlling for K, BB, and HR rates (and, by assumption, defense).  For DIPS to be “good enough” what was necessary was for differences in BABIP to have negligible effect after controlling for K, BB, and HR rates.  To my knowledge, no one looked at this until JC Bradbury ... and if memory serves he found that BABIP did not impact after controlling for those other factors.  (Mike Emeigh’s regular point that it’s not clear what population we’re sampling from and making inferences to is certainly valid).

And no offense to Tippet and Tango, but Woolner showed long-term differences (5-year data I think) in BABIP within I think a couple months of DIPS’ publication.  Voros had already released DIPS 2.0, which addresses some of these issues, well before Tippett’s study (Tippett didn’t do a very wide search for later writings, he assumed it would be linked from Voros’ personal website ... a reasonable but incorrect assumption ... I know this from personal correspondence with Tippet).  I’m pretty sure all this work predated Tango’s small study as well.  (That’s not meant as a criticism of the previous poster ... none of us keep up on all the research, there’s no reason Tippett and Tango shouldn’t have been the first he saw).

At this point (for some time now really), I simply don’t see any value in looking at BABIP in isolation.  Even findings that there are differences in BABIP don’t invalidate DIPS because it’s not an appropriate test of the model underlying DIPS.

David’s work does seem a large step forward.  GB, FB, (LD data) and converting things into singles, doubles, etc. and calculating base runs seems like it should soak up most of whatever variation there might be due to “BABIP skill.”

   96. mgl Posted: February 19, 2006 at 10:45 AM (#1867963)

Yes, but then the question would arise: if defense is so important, would you bother paying extra for a pitcher with relatively high BABIP skill rather than one with low skill, when the difference between them is not going to affect the season results 70 or 80 or 90% of the time?

That is not the way it works.  Just because something is overwhelmed by noise or by something else that we can or cannot measure does NOT mean that that something has any more or less impact.  If we pay extra for a player whose BABIP skill provides an extra 10 runs per year on the average, we are increasing our mean run prevention by those 10 runs regardless of the “noise” or other factors that may overwhelm that 10 run difference.  So it still worth whatever we are willing to pay for those 10 extra runs.  What may happen in reality is that the variance around our runs allowed might be larger because of the noise, but the mean value will not change.  I suppose you can make an argument that the change in variance around the 10 extra runs might change the value of those 10 extra runs, but personally I am unaware of too many arguments along those lines.  For example, if I told you that a certain thing or player could change our expected team win total from 85 wins to 90 wins (a 5 win gain), but that 2/3 of the time we will win between 83 and 97 games (a SD or 7 wins), would those 5 extra games be worth any more or less than if I told you that we had a 2/3 chance of winning between 86 and 94 games (a SD of 4 wins)?  I suppose that if you are out of playoff contention, you want a large variance and if you are already well in playoff contention, you want a small variance, but that is getting very esoteric, and things are rarely that simple and straightforward…

   97. Mike Emeigh Posted: February 19, 2006 at 07:24 PM (#1868209)

For DIPS to be “exact” what was necessary was for differences in BABIP to have no effect on “defense-independent” ERA after controlling for K, BB, and HR rates (and, by assumption, defense).

The difficulty is that, when you control for two of these (K rate and HR rate), you are also indirectly controlling for BABIP as well. K rate and BABIP are interrelated, as are HR rate and BABIP (through a third variable, FB%). It’s another, more subtle way to pre-screen the data set in a way that will limit your ability to find the real impact of BABIP differences. It “may” be true that, in controlling for K and HR rate you’ve indirectly captured things that impact BABIP ability so that you don’t have to account for BABIP separately, and DIPS would “work” - indeed, I think this is in fact what is really happening. But it works not because preventing BABIP in a pitcher isn’t important, but because the other factors captured in DIPS also account for things that affect BABIP most.

I should also note that Voros fitted “original” DIPS to data from 1998 and 1999. To some extent, he got lucky that he looked at this during those two seasons, because there was a significant swing in BABIP between low DIPS and high DIPS ERA pitchers in 1998 and 1999, for some reason. Pitchers who project to have high DIPS ERA also typically have higher BABIP, as they have from 2003-2005. But in 1998 and (especially) 1999, the low DIPS ERA pitchers had higher BABIP (by .015 in 1999). It may have been just an accident of timing that DIPS came out looking so good.

—MWE

   98. GuyM Posted: February 20, 2006 at 02:32 PM (#1869176)

I don’t know why this didn’t occur to me earlier, but Pinto’s PMR provides a true DIPS.  PMR creates an estimate of the defense-independent outcome on all of a pitcher’s BIP:  singles, one base ROE, doubles, etc. Pinto, or DSG if he had access to the data, could use BsR to create a truly defense-independent RA estimate.  Pinto has provided the data at his site for a crude version of this, showing actual and projected outs on BIP for individual pitchers.  Using this, you could make a pretty good estimate of how many runs to add/subtract to account for quality of defense behind a pitcher.  But knowing the actual outcomes of the BIP would of course be even more accurate.  (Note: to use Pinto’s numbers, you should first adjust projected outs to equal actual outs for any given season.)

If we want LIPS (luck-independent pitching statistic), as DSG does, then DSG has indeed given us a good start.  Once he’s produced four years worth of DIPS 3.0 (2002-2005), DIPS RA minus actual RA can be analyzed to see if improvements are possible.  For example, using 2002-2004 LD% and/or 2002-2004 HR/FB may (or may not) prove better than regressing those factors 100%.  However, given the discontinuities in the BIS data, it may be a few years before we have definitive answers.)

You must be Registered and Logged In to post comments.

 

 

<< Back to main

Support BBTF

donate

Thanks to
Don Malcolm
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogKrauthammer: The Nationals and the Joy of Winning (An old fan needs a new philosophy.)
(1 - 8:58am, May 25)
Last: Avoid running at all times.-S. Paige

NewsblogOT: NBA Monthly Thread, May 2012
(1774 - 8:56am, May 25)
Last: Don't want the truth; just wanna see some dingers

NewsblogHP: Baseball is leaving the human factor behind
(12 - 8:52am, May 25)
Last: Double-Spin Mechanic

Sox TherapyThe Two Dan Bards
(13 - 8:50am, May 25)
Last: Jose Can You Seabiscuit

NewsblogRoy Halladay bobblehead with glove on wrong hand selling on MLB.com
(14 - 8:47am, May 25)
Last: smileyy

Newsblog12 Baseball Feats That Only Happened Once
(28 - 8:43am, May 25)
Last: SandyRiver

NewsblogMajor League Baseball named Sports League of the Year at Sports Business Awards
(11 - 8:33am, May 25)
Last: depletion

NewsblogFS Midwest: Streaker halts Cardinals-Phillies game
(3 - 8:27am, May 25)
Last: depletion

NewsblogMatinale: WADJ: Wins Above Derek Jeter
(2 - 8:24am, May 25)
Last: Fancy Pants is braggadocious about his Handle

NewsblogBoston.com: Curt Schilling’s 38 Studios lays off all staff
(45 - 8:04am, May 25)
Last: Golfing Great Mitch Cumstein

NewsblogNeyer: New Yankee Stadium: A Review
(75 - 8:01am, May 25)
Last: Harveys Wallbangers

NewsblogGreenberg: Cubs' Ricketts decries proposal
(750 - 7:54am, May 25)
Last: Jolly Old St. Neck Wound, Moral Idiot

NewsblogSullivan: Dan Haren Makes Mariners Look Like Mariners
(1 - 6:40am, May 25)
Last: The cushions are crowded for Edmundo

NewsblogShawn Green to play for Israel in World Baseball Classic
(12 - 5:50am, May 25)
Last: shoewizard

NewsblogPrimer Dugout (and link of the day) 5-25-2012
(1 - 5:33am, May 25)
Last: Tim Stauffer, Trot Nixon's Coming (Dan Lee)

Buy MLB playoff tickets, plus 2011 World Series, 2011 ALCS tickets and NLCS game tickets. We also have Texas Rangers playoff schedule, tickets to Red Sox games and Yankees game tickets. Plus, buy Phillies baseball tickets, Tigers playoff tickets and the biggies like ALDS baseball tickets and 2011 NLDS tickets.

Demarini, Easton and TPX Baseball Bats

 

 

 

AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets.

Page rendered in 1.0740 seconds
54 querie(s) executed