Page rendered in 0.4124 seconds
41 querie(s) executed
— Where BTF's Members Investigate the Grand Old Game
Friday, January 25, 2002
DIPS Version 2.0
You’ve seen it mentioned on ESPN.com and BaseballProspectus.com. You’ve read about it in Bill James’ new book. Now, read all about the new version right here on BaseballPrimer.com.
Well, it?s finally here. After over two years of debate, argument, re-debate and re-argument, I?ve actually gotten to the next step in the process of developing pitching statistics that are not dependent on the quality of defense behind the pitcher.
Now I?m sure a good number of you have followed the bouncing ball on this so far. But for those Johnny-come-latelys, here?s the state of things (in my opinion at least) as they stand right now:
The finer points and how this all came about, and the ramifications have been discussed before. With the first step of DIPS as it was used in 1999 and 2000, dealt with points 1 and 2, it did not deal with point 3 just yet. Now what I will try to do is outline a necessary first step in trying to get at the very small differences between pitchers in this regard, while still keeping the stats independent of the defenses behind them.
One of the biggest problems I?ve had with Defense Independent Pitching Statistics is a misunderstanding about what it is I am trying to do. Many people argue that since I agree that there are small differences between pitchers in this regard, I should count the stat in the numbers just to a lesser extent. You run into problems there, however, in that you can no longer safely assume that the quality of the pitcher?s defense didn?t help or hurt him. Quite the contrary, that possibility is as likely as any other.
For example, say you have a pitcher who allowed a .275 rate of hits per balls in play while his team in total posted a rate of .285, which would indicate a well above average defense. The argument, as it has been made, is that I should keep the difference between the pitcher and the team?s defense, and simply adjust for the defense upward some.
The problem is that this is an inconsistent application. By using that .285 as the benchmark for the team?s defense, you are assuming the pitchers on the team are not any better at preventing hits per balls in play than anyone else. But by then using this adjustment, to show one pitcher being better in this regard, you?ve just contradicted yourself. In other words, the adjustment is only valid if we can assume that there?s no difference between pitchers in this stat or that they always even out on any particular pitching staff. But the latter argument is obviously shaky, and if the previous argument is true, there?s no need for the adjustment anyway. It becomes an entangled ?chicken and the egg? scenario at the end of which we no longer have a pitching stat that is independent of defense.
Better, I have always argued, to try and get at these differences from a different way. A way that retains the defense independence of assuming no difference between pitchers on hits per balls in play, but can fine-tune it to pick up small differences here and there, while still avoiding the stats directly related to team defense. I?m hoping that what I?ll detail here is an improvement in this direction. Here?s a point by point breakdown:
IN THE NEW SYSTEM TO FIND DIFFERENCES BETWEEN HITS ON BALLS IN PLAY
Knuckleballers ? Well after two years of wondering on this score, I?ve run through the numbers, and the biggest slam-dunk, ?no doubt about it? difference to be found is here. Conclusively, knuckleballers have an advantage with regards to hits per balls in play.? Some of the very lowest rates of the last 50 years have been by these guys. Chralie Hough, Tim Wakefield, Phil Niekro, Hoyt Wilhelm, they?re all a decent bit better than their teammates in this regard. Using Craig Wright?s list from the Diamond Appraised of knuckleball pitchers, and adding Wakefield, Dennis Springer and Steve Sparks, I found a definitive difference between knuckleballers and normal pitchers. The knuckleballers tend to have an advantage of anywhere from .008 to .012 depending on how I choose to slice the pie. 10 points is a nice round number and should work well enough.
Lefty/Righty ? A small difference appears here, and I?m not really sure where it comes from, but it appears to be statistically significant. It?s only around a .002 difference and since I?m not distributing prescription drugs to orphans here, we can put it in, and come up with the ?whys? and ?wherefores? along the way.? It might be a ground-ball/fly-ball issue. Maybe lefties tend to be more likely ground-ballers than righties. But at this point that?s really only a guess.
Strikeouts ? Here?s where the murkiness starts. Looking at the numbers over and over and over again, it becomes clear that a pitchers strikeout rate during a single season is a bit better predictor of his hits in play the following year than his own hits per balls in play. This is there and it?s real. Why? I can only come up with two explanations: the obvious and the hard to show. The obvious explanation is that the more a hitter swings and misses at a pitcher, the more he also makes poor contact and therefore doesn?t hit the ball hard. Most would favor this, though if this is the case, there are some questions as to why the differences between pitchers aren?t greater.
To propose another possibility, the more players you strikeout, the more often hitters hit with two strikes. Since this is a noticeable ability for hitters, if a particular situation causes a hitter to change his approach, it could effectively make the hitter a worse hitter in this regard. And if a particular pitcher creates more of these situations than others, it would effectively reduce the quality of hitters he faces in this regard. This dovetails a bit better with what we already know, but it also seems like a little bit of a reach.
Draw your own ideas and conclusions here.
Home Runs ? I wasn?t going to include this until the very end. I asked myself if the numbers said more with it, than without, and I decided on ?with.? While a shaky relationship, it appears that the more Home Runs a pitcher gives up, the fewer hits per balls in play he gives up.
The problem here is that the ?why? of this is a minefield. Clearly there?s an obvious problem that if there were pitchers who gave up lots of homers AND lots of hits per balls in play, that might greatly reduce their chances of remaining in the league and therefore in the study. So this is a real problem.
So I ran through every way I knew how to minimize this problem (using only established pitchers; running the numbers over various times; etc.) , and at the end of which, it was still there. Balancing these problems, is that Home Runs work as a decent proxy for ground-ball/fly-ball tendencies, especially when we don?t have this data available. It also lacks some of the problems inherent in those numbers (which I?ll get to below).
This stat does not affect the numbers much at all, but I felt it was worth putting in there. I will provide the details on the system on my web-site, and this is the only one of the above I think you could leave out if you so chose, and not effect the other numbers badly at all.
NOT IN THE SYSTEM TO FIND DIFFERENCES BETWEEN HITS ON BALLS IN PLAY
Walks ? If there?s any relationship here, it?s that the more walks a pitcher allows, the fewer hits. But it suffers from the same problem Home Runs suffer from above, but in this case, I think the problem dwarfs any effect that may or may not be here. Maybe if there?s some additional refinements in the future, this may go in. For now, I think it?s best if we leave it out.
Height ? Here?s something to ponder: why exactly do I get a statistically significant correlation between the pitcher?s height and his hits per balls in play? After racking my brain for days, I decided the best possible explanation is that the shorter pitchers tend to be much better fielders than the taller pitchers (say Greg Maddux versus Randy Johnson as an example). So maybe the shorter guys stop more line drives up the middle, field bunts better and cover first better thereby reducing the hits off them a bit.
Of course, realistically, this we should measure this in a different way other than how tall the guy is, and technically it?s not pitching anyway but rather defense. I can?t justify including this one as it?s weird and probably prone to a pretty decent error rate. I still think it?s interesting though.
Ground-Ball/Fly-Ball Ratios ? Well, this one ain?t going away anytime soon, so whatever I decide, it certainly is by no means final. It has been proposed for some time that fly ball pitchers tend to have an advantage here over ground ball pitchers. In the end, I?m pretty sure this is right, but the problems currently here are tough to overcome.
For starters, I?m not convinced that these numbers are defense independent. There are really three types of batted balls as they are counted: ground-balls, fly-balls and line-drives. It isn?t hard to imagine a batted ball where it is difficult to determine whether it was a line-drive or a fly-ball. Since humans are making subjective judgments here, there is some question as to whether the eventual determination of fly-ball or line-drive might be largely influenced as to whether the ball falls in for a hit. Now granted, this would be a small effect, but we?re talking about trying to find small effects to begin with. The problem with skinning this cat is that even small biases need to be taken into account. So the problem might be that an increase in hits per balls in play might cause a decrease in ground-ball/fly-ball ratio, rather than the other way around. Nothing can be more ugly to deal with than reverse-causation.
Also, because the differences here are so small, we need lots and lots and lots of data to be able to be sure of significant tendencies. Because ground-ball and fly-ball data are a relatively new phenomenon, it greatly reduces the amount of data we have to examine. So without the extended data, this sort of analysis complicates the whole shooting match. If we did use the ground-ball and fly-ball data, would it then remove the effects of Home Runs and Left Handed pitchers? Even though we can get the data now, it isn?t exactly published in every sports page or on every web-site with pitching statistics. The use of ground-ball and fly-ball data would then make it difficult for those with a desire to run numbers to be able to do so.
For now, I?ll keep working on how to make this work (if indeed at the end it really does), and continue to use Home Runs allowed as it?s proxy. I?ll probably try and make a few phone calls to stats as to the nature of the numbers and see if I can?t maybe count all of the balls in play not currently in the fly-ball or ground-ball category as fly-balls.? This would once again make them defense independent, but in my opinion probably lessen the relationship some as well. When I (or someone else for that matter) can work this out, we?ll have DIPS Version 3.0.
Anyway, before anybody gets carried away here, I should note that of the 100 pitchers with the most batters faced, 93 of them eventually rated within .006 of the league average hits per balls in play rate using the above adjustments. As you would guess by my comments, it comes as no surprise that the two lowest estimated rates go to Steve Sparks and Tim Wakefield, both knuckleball practitioners (.279 and .284 respectively with a league average of .292). The highest estimate went to Jimmy Anderson (.300) of the Pirates, who is lefty who didn?t strike much of anybody out.
I?m fairly sure that in many instances, the estimates are actually going to be further from whatever the pitcher?s actual ability may be, than simply assigning him a league average rate. But I think on the whole this does get us a bit closer to the true rates. The system was developed independent of this years? results, so this year would be a decent test to see if it helps any. Comparing the correlation between the pitcher?s hits per balls in play rate (using pitchers with 100 or more innings pitched) and that of his team?s was .415, which is essentially the way the old DIPS worked. Using these new adjustments, and estimating the differences as being from his teammates rather than some ?league average? the correlation is .428 a very modest improvement, but improvement nonetheless. That doesn?t address the issue above regarding the teammate?s rate containing some level of pitching. So, by then looking at how this system estimates pitching affects this stat for each team, and then adjusting by team with that adjustment made, the correlation then goes up a little more to .433. All of this said, the difference between doing it this way, and simply using the pitcher?s team rate is fairly minimal, but then again the differences between pitchers in this regard is probably minimal anyway, so every little bit helps.
The final adjustment here is to use a new method to come up with the earned run estimates, to avoid problems with possible differences in extra base hit ability. This problem can be sidestepped for now by a simple reduction of the relative run value of a non home-run hit and a slight increase in the relative value of a home run. Multiple regression analyses confirm this.
Obviously there?s a lot to be discussed about this, and, as always, I?m hoping this is the beginning of a discussion and not the end of it.
Here is the link to this year?s DIPS numbers:
Here is the link to the explanation on how to calculate this new version of DIPS:
You must be logged in to view your Bookmarks.
Loser Scores 2015
(12 - 2:28pm, Nov 17)
Loser Scores 2014
(8 - 2:36pm, Nov 15)
Winning Pitcher: Bumgarner....er, Affeldt
(43 - 8:29am, Nov 05)
Last: ERROR---Jolly Old St. Nick
What do you do with Deacon White?
(17 - 12:12pm, Dec 23)
Last: Alex King
(15 - 12:05am, Oct 18)
Nine (Year) Men Out: Free El Duque!
(67 - 10:46am, May 09)
Who is Shyam Das?
(4 - 7:52pm, Feb 23)
Last: RoyalsRetro (AG#1F)
Greg Spira, RIP
(45 - 9:22pm, Jan 09)
Last: Jonathan Spira
Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010
(5 - 12:50am, Sep 18)
Mike Morgan, the Nexus of the Baseball Universe?
(37 - 12:33pm, Jun 23)
Last: The Keith Law Blog Blah Blah (battlekow)
Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011
(2 - 8:03pm, May 16)
Last: Diamond Research
Retrosheet Semi-Annual Site Update!
(4 - 3:07pm, Nov 18)
What Might Work in the World Series, 2010 Edition
(5 - 2:27pm, Nov 12)
Last: fra paolo
Predicting the 2010 Playoffs
(11 - 5:21pm, Oct 20)
SABR 40: Impressions of a First-Time Attendee
(5 - 11:12pm, Aug 19)
Last: Joe Bivens, Floundering Pumpkin