I solved DIPS. Not joking
Hi all.

So i think I solved DIPS.

What I mean is, there is true variability in BIPA, and it’s caused by P/PA.

What I did in the following plot was plot the average BIPA of all the pitchers from 2004-2005 with (according to THT’s 2 sig-fig P/PA stats) 4+P/PA, 3.9 P/PA, 3.8 P/PA, etc….. against their P/PA.

So X axis= P/PA of all pitchers in the group, Y axis is average BIPA of all pitchers in the group.

The relationship is linked here. R^2 is over 0.9

http://img412.imageshack.us/img412/183/untitleddh5.th.jpg

I think this is pretty cool.

Let me try and help you out here. A strikeout takes 4.8 pitches, a walk, 5.5. Everything else (mostly BIP) takes about 3.3. If you have a low BABIP, of course you’re going to be throwing fewer pitches because BIP take fewer pitches to throw. The only reason high-K rates don’t lead to higher pitch counts is because Ks are automatic outs and BIP aren’t, and the difference happens to cancel out. But that’s given an average BABIP. Obviously, if you have a low BABIP, it’s going to take you less pitches to get through a game, and vice-versa. So sorry, you haven’t solved anything.

What he is saying (I think) is that as Pitches per PA change, the BABIP changes. I tried it for 1998 AL, using 3.3 pitches per PA, 4.8 per k, and 5.5 bb. I got an R value of .168, which is in the neighborhood of the other DIPS studies I’ve seen. Maybe I did something wrong. Could you post your data?

No, this is wrong.

Why does the hit rate of BABIP have any effect on the number of BIP I allow? A low BABIP doesn’t mean I’m necessarily allowing more BIP; I might, and I certainly could afford to get away with it, but I don’t have to.

I don’t think it’s obvious at all that a low BABIP will let me throw fewer pitchers per batter. It will, however, allow me to throw fewer pitchers per INNING or per GAME, which I’m not measuring here.

The key is not to look for 1:1 scatter plot-eg, if you plot each individual’s pitcher’s P/PA against individual BABIP, you see noise.

But if you make the bucket large enough for BABIP and average it out, say, 10 pitchers per bin, then the noise dissipates and the signal pops out….the way I did it was by grouping pitchers by tenth’s of a P/PA, since that’s natural to the THT stats because they only have 2 sig figs…each bin is 3.95+, 3.85-3.95, 3.75-3.85, etc…. and there’s 7+ pitchers in each bin, more pitchers in the bins near the peak of the distribution.

1. BIP take fewer pitches to throw than K and BB. For this reason a pitcher having few BIP should have a lower P/PA than a pitcher with many BIP. But having a low BABIP is not the same as having few BIP, nor is a high BABIP the same as having many BIP. (K-rate does move things in that direction, but it’s a nudge, not a shove.) It does not necessarily follow from what you’re saying that low BABIP should produce lower P/PA (or vice-versa).

2. He’s talking about pitches per plate appearance, not per game.

3. I think what dzop is saying is that low BABIP correlates with high P/PA, not low P/PA. That would mean his findings run counter to your explanation. (You say low BABIP should mean fewer pitches; he says low BABIP correlates with high P/PA.)

Having said all that…

4. He still hasn’t solved anything, unless (a) P/PA is predictive - IOW, this year’s P/PA for Pitcher X is predictive of next year’s P/PA for Pitcher X - and (b) the correlation between BABIP and P/PA holds up year to year. If both are true, then the unexplained, seemingly random variance in BABIP from year to year can be predicted. But given that year-to-year BABIP is not well-correlated, I can’t imagine that you can have both (a) and (b). If you have (a), that means year-to-year P/PA is correlated… but y-t-y BABIP isn’t, so the correlation between the two should be unstable. If you have (b), all you’ve done is to find that P/PA is a good proxy for BABIP… but if so, then (a) can’t be true.

If you can’t predict or project BABIP with it, P/PA is useless in this context other than to save a lot of spreadsheet work to get BABIP. That’s still quite useful, mind you. But it doesn’t “solve DIPS”.

My very limited research has found that BABIP is a function of count, and the direction is as you expect (3-0 counts being the outlier). That conforms somewhat with what’s discussed here, since the only way to get more than 4 pitches in a PA is to get two strikes.

If anyone has pitches data (not estimated, but real) from pre-2004, it would be greatly appreciated. I want to test year-to-year correlation, but right now I only have 2 years to work with, 3 once 2006 gets added to lahman.

You can email it to me…I think any such file would be too large to attach here.

It would be much appreciated, and if propietary properly cited.

Retrosheet has pitch-by-pitch data back to 2000.

You really didn’t solve anything. From the ungrouped data there is a general correlation that links more P/PA to BABIP. r, linear correlation coefficient is pretty calculated from the distance between the a linear line of best fit and the data points. By averaging the data points in each group you made the correlation higher when you probably shouldn’t have. Take four examples from 2004-06 data. The r correlation with BABIP for:

.129 - GB%
-.219 - P/PA
-.263 - K/G
-.284 - IF/F
.331 - LD%

All four others have a higher linear correlation coefficient, except groundball rates. There is less variability in them. Now I am going to divide the group of 253 pitcher seasons into 7 subgroups for each stat like you did. Then find r^2 the same you did:

.3293 - GB%
.6912 - P/PA
.7096 - K/G
.8903 - IF/F
.8947 - LD%

As you can see, the all got a LOT higher (i didn’t square the r’s before). However, they stayed in the exact same order as the ungrouped frequency. I must have used a little bit different groups on the P/PA than you did.

