DIPS Version 2.0
You’ve seen it mentioned on ESPN.com and BaseballProspectus.com. You’ve read about it in Bill James’ new book. Now, read all about the new version right here on BaseballPrimer.com.
finally here. After over two years of debate, argument, re-debate and re-argument,
I?ve actually gotten to the next step in the process of developing pitching
statistics that are not dependent on the quality of defense behind the pitcher.
sure a good number of you have followed the bouncing ball on this so far. But
for those Johnny-come-latelys, here?s the state of things (in my opinion at
least) as they stand right now:
- The amount that MLB pitchers differ with regards to allowing hits on balls
in the field of play, is much less than had been previously assumed. Good
pitchers are good pitchers due to their ability to prevent walks and homers
and get strikeouts in some sort of combination of those three.
- The differences that do exist between pitchers in this regard are small
enough so that if you completely ignore them, you still get a very good picture
of the pitcher?s overall abilities to prevent runs and contribute to winning
- That said, the small differences do appear to be statistically significant
if generally not very relevant.
points and how this all came about, and the ramifications have been discussed
before. With the first step of DIPS as it was used in 1999 and 2000, dealt with
points 1 and 2, it did not deal with point 3 just yet. Now what I will try to
do is outline a necessary first step in trying to get at the very small differences
between pitchers in this regard, while still keeping the stats independent
of the defenses behind them.
One of the
biggest problems I?ve had with Defense Independent Pitching Statistics is a
misunderstanding about what it is I am trying to do. Many people argue that
since I agree that there are small differences between pitchers in this regard,
I should count the stat in the numbers just to a lesser extent. You run into
problems there, however, in that you can no longer safely assume that the quality
of the pitcher?s defense didn?t help or hurt him. Quite the contrary, that possibility
is as likely as any other.
say you have a pitcher who allowed a .275 rate of hits per balls in play while
his team in total posted a rate of .285, which would indicate a well above average
defense. The argument, as it has been made, is that I should keep the difference
between the pitcher and the team?s defense, and simply adjust for the defense
is that this is an inconsistent application. By using that .285 as the benchmark
for the team?s defense, you are assuming the pitchers on the team are not any
better at preventing hits per balls in play than anyone else. But by then using
this adjustment, to show one pitcher being better in this regard, you?ve just
contradicted yourself. In other words, the adjustment is only valid if we can
assume that there?s no difference between pitchers in this stat or that they
always even out on any particular pitching staff. But the latter argument is
obviously shaky, and if the previous argument is true, there?s no need for the
adjustment anyway. It becomes an entangled ?chicken and the egg? scenario at
the end of which we no longer have a pitching stat that is independent of defense.
I have always argued, to try and get at these differences from a different way.
A way that retains the defense independence of assuming no difference between
pitchers on hits per balls in play, but can fine-tune it to pick up small differences
here and there, while still avoiding the stats directly related to team
defense. I?m hoping that what I?ll detail here is an improvement in this direction.
Here?s a point by point breakdown:
IN THE NEW SYSTEM TO FIND DIFFERENCES BETWEEN HITS ON BALLS IN PLAY
Knuckleballers ? Well after two years of wondering on this score, I?ve
run through the numbers, and the biggest slam-dunk, ?no doubt about it? difference
to be found is here. Conclusively, knuckleballers have an advantage with regards
to hits per balls in play.? Some of the very lowest rates of the last 50 years
have been by these guys. Chralie Hough, Tim Wakefield, Phil Niekro, Hoyt Wilhelm,
they?re all a decent bit better than their teammates in this regard. Using Craig
Wright?s list from the Diamond Appraised of knuckleball pitchers, and adding
Wakefield, Dennis Springer and Steve Sparks, I found a definitive difference
between knuckleballers and normal pitchers. The knuckleballers tend to have
an advantage of anywhere from .008 to .012 depending on how I choose to slice
the pie. 10 points is a nice round number and should work well enough.
Lefty/Righty ? A small difference appears here, and I?m not really sure
where it comes from, but it appears to be statistically significant. It?s only
around a .002 difference and since I?m not distributing prescription drugs to
orphans here, we can put it in, and come up with the ?whys? and ?wherefores?
along the way.? It might be a ground-ball/fly-ball issue. Maybe lefties tend
to be more likely ground-ballers than righties. But at this point that?s really
only a guess.
Strikeouts ? Here?s where the murkiness starts. Looking at the numbers
over and over and over again, it becomes clear that a pitchers strikeout rate
during a single season is a bit better predictor of his hits in play the following
year than his own hits per balls in play. This is there and it?s real. Why?
I can only come up with two explanations: the obvious and the hard to show.
The obvious explanation is that the more a hitter swings and misses at a pitcher,
the more he also makes poor contact and therefore doesn?t hit the ball hard.
Most would favor this, though if this is the case, there are some questions
as to why the differences between pitchers aren?t greater.
another possibility, the more players you strikeout, the more often hitters
hit with two strikes. Since this is a noticeable ability for hitters,
if a particular situation causes a hitter to change his approach, it could effectively
make the hitter a worse hitter in this regard. And if a particular pitcher creates
more of these situations than others, it would effectively reduce the quality
of hitters he faces in this regard. This dovetails a bit better with what we
already know, but it also seems like a little bit of a reach.
own ideas and conclusions here.
Home Runs ? I wasn?t going to include this until the very end. I asked
myself if the numbers said more with it, than without, and I decided on ?with.?
While a shaky relationship, it appears that the more Home Runs a pitcher gives
up, the fewer hits per balls in play he gives up.
here is that the ?why? of this is a minefield. Clearly there?s an obvious problem
that if there were pitchers who gave up lots of homers AND lots of hits per
balls in play, that might greatly reduce their chances of remaining in the league
and therefore in the study. So this is a real problem.
So I ran
through every way I knew how to minimize this problem (using only established
pitchers; running the numbers over various times; etc.) , and at the end of
which, it was still there. Balancing these problems, is that Home Runs work
as a decent proxy for ground-ball/fly-ball tendencies, especially when we don?t
have this data available. It also lacks some of the problems inherent in those
numbers (which I?ll get to below).
does not affect the numbers much at all, but I felt it was worth putting in
there. I will provide the details on the system on my web-site, and this is
the only one of the above I think you could leave out if you so chose, and not
effect the other numbers badly at all.
NOT IN THE SYSTEM TO FIND DIFFERENCES BETWEEN HITS
ON BALLS IN PLAY
Walks ? If there?s any relationship here, it?s that the more walks a
pitcher allows, the fewer hits. But it suffers from the same problem Home Runs
suffer from above, but in this case, I think the problem dwarfs any effect that
may or may not be here. Maybe if there?s some additional refinements in the
future, this may go in. For now, I think it?s best if we leave it out.
Height ? Here?s something to ponder: why exactly do I get a statistically
significant correlation between the pitcher?s height and his hits per balls
in play? After racking my brain for days, I decided the best possible explanation
is that the shorter pitchers tend to be much better fielders than the taller
pitchers (say Greg Maddux versus Randy Johnson as an example). So maybe the
shorter guys stop more line drives up the middle, field bunts better and cover
first better thereby reducing the hits off them a bit.
realistically, this we should measure this in a different way other than how
tall the guy is, and technically it?s not pitching anyway but rather defense.
I can?t justify including this one as it?s weird and probably prone to a pretty
decent error rate. I still think it?s interesting though.
Ground-Ball/Fly-Ball Ratios ? Well, this one ain?t going away anytime
soon, so whatever I decide, it certainly is by no means final. It has been proposed
for some time that fly ball pitchers tend to have an advantage here over ground
ball pitchers. In the end, I?m pretty sure this is right, but the problems currently
here are tough to overcome.
I?m not convinced that these numbers are defense independent. There are really
three types of batted balls as they are counted: ground-balls, fly-balls and
line-drives. It isn?t hard to imagine a batted ball where it is difficult to
determine whether it was a line-drive or a fly-ball. Since humans are making
subjective judgments here, there is some question as to whether the eventual
determination of fly-ball or line-drive might be largely influenced as to whether
the ball falls in for a hit. Now granted, this would be a small effect, but
we?re talking about trying to find small effects to begin with. The problem
with skinning this cat is that even small biases need to be taken into account.
So the problem might be that an increase in hits per balls in play might cause
a decrease in ground-ball/fly-ball ratio, rather than the other way around.
Nothing can be more ugly to deal with than reverse-causation.
the differences here are so small, we need lots and lots and lots of data to
be able to be sure of significant tendencies. Because ground-ball and fly-ball
data are a relatively new phenomenon, it greatly reduces the amount of data
we have to examine. So without the extended data, this sort of analysis complicates
the whole shooting match. If we did use the ground-ball and fly-ball data, would
it then remove the effects of Home Runs and Left Handed pitchers? Even though
we can get the data now, it isn?t exactly published in every sports page or
on every web-site with pitching statistics. The use of ground-ball and fly-ball
data would then make it difficult for those with a desire to run numbers to
be able to do so.
For now, I?ll keep working on how to make this work (if indeed at the end it
really does), and continue to use Home Runs allowed as it?s proxy. I?ll probably
try and make a few phone calls to stats as to the nature of the numbers and
see if I can?t maybe count all of the balls in play not currently in the fly-ball
or ground-ball category as fly-balls.? This would once again make them defense
independent, but in my opinion probably lessen the relationship some as well.
When I (or someone else for that matter) can work this out, we?ll have DIPS
before anybody gets carried away here, I should note that of the 100 pitchers
with the most batters faced, 93 of them eventually rated within .006 of the
league average hits per balls in play rate using the above adjustments. As you
would guess by my comments, it comes as no surprise that the two lowest estimated
rates go to Steve Sparks and Tim Wakefield, both knuckleball practitioners (.279
and .284 respectively with a league average of .292). The highest estimate went
to Jimmy Anderson (.300) of the Pirates, who is lefty who didn?t strike much
of anybody out.
sure that in many instances, the estimates are actually going to be further
from whatever the pitcher?s actual ability may be, than simply assigning him
a league average rate. But I think on the whole this does get us a bit
closer to the true rates. The system was developed independent of this years?
results, so this year would be a decent test to see if it helps any. Comparing
the correlation between the pitcher?s hits per balls in play rate (using pitchers
with 100 or more innings pitched) and that of his team?s was .415, which is
essentially the way the old DIPS worked. Using these new adjustments, and estimating
the differences as being from his teammates rather than some ?league average?
the correlation is .428 a very modest improvement, but improvement nonetheless.
That doesn?t address the issue above regarding the teammate?s rate containing
some level of pitching. So, by then looking at how this system estimates pitching
affects this stat for each team, and then adjusting by team with that
adjustment made, the correlation then goes up a little more to .433. All of
this said, the difference between doing it this way, and simply using the pitcher?s
team rate is fairly minimal, but then again the differences between pitchers
in this regard is probably minimal anyway, so every little bit helps.
adjustment here is to use a new method to come up with the earned run estimates,
to avoid problems with possible differences in extra base hit ability. This
problem can be sidestepped for now by a simple reduction of the relative run
value of a non home-run hit and a slight increase in the relative value of a
home run. Multiple regression analyses confirm this.
there?s a lot to be discussed about this, and, as always, I?m hoping this is
the beginning of a discussion and not the end of it.
the link to this year?s DIPS numbers:
the link to the explanation on how to calculate this new version of DIPS:
Posted: January 25, 2002 at 06:00 AM | 32 comment(s)
Login to Bookmark