User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Page rendered in 0.5322 seconds
40 querie(s) executed
You are here > Home > Primate Studies > Discussion
| ||||||||
Primate Studies — Where BTF's Members Investigate the Grand Old Game Friday, January 25, 2002DIPS Version 2.0You’ve seen it mentioned on ESPN.com and BaseballProspectus.com. You’ve read about it in Bill James’ new book. Now, read all about the new version right here on BaseballPrimer.com. Well, it?s finally here. After over two years of debate, argument, re-debate and re-argument, I?ve actually gotten to the next step in the process of developing pitching statistics that are not dependent on the quality of defense behind the pitcher. Now I?m sure a good number of you have followed the bouncing ball on this so far. But for those Johnny-come-latelys, here?s the state of things (in my opinion at least) as they stand right now:
The finer points and how this all came about, and the ramifications have been discussed before. With the first step of DIPS as it was used in 1999 and 2000, dealt with points 1 and 2, it did not deal with point 3 just yet. Now what I will try to do is outline a necessary first step in trying to get at the very small differences between pitchers in this regard, while still keeping the stats independent of the defenses behind them. One of the biggest problems I?ve had with Defense Independent Pitching Statistics is a misunderstanding about what it is I am trying to do. Many people argue that since I agree that there are small differences between pitchers in this regard, I should count the stat in the numbers just to a lesser extent. You run into problems there, however, in that you can no longer safely assume that the quality of the pitcher?s defense didn?t help or hurt him. Quite the contrary, that possibility is as likely as any other. For example, say you have a pitcher who allowed a .275 rate of hits per balls in play while his team in total posted a rate of .285, which would indicate a well above average defense. The argument, as it has been made, is that I should keep the difference between the pitcher and the team?s defense, and simply adjust for the defense upward some. The problem is that this is an inconsistent application. By using that .285 as the benchmark for the team?s defense, you are assuming the pitchers on the team are not any better at preventing hits per balls in play than anyone else. But by then using this adjustment, to show one pitcher being better in this regard, you?ve just contradicted yourself. In other words, the adjustment is only valid if we can assume that there?s no difference between pitchers in this stat or that they always even out on any particular pitching staff. But the latter argument is obviously shaky, and if the previous argument is true, there?s no need for the adjustment anyway. It becomes an entangled ?chicken and the egg? scenario at the end of which we no longer have a pitching stat that is independent of defense. Better, I have always argued, to try and get at these differences from a different way. A way that retains the defense independence of assuming no difference between pitchers on hits per balls in play, but can fine-tune it to pick up small differences here and there, while still avoiding the stats directly related to team defense. I?m hoping that what I?ll detail here is an improvement in this direction. Here?s a point by point breakdown: IN THE NEW SYSTEM TO FIND DIFFERENCES BETWEEN HITS ON BALLS IN PLAY Knuckleballers ? Well after two years of wondering on this score, I?ve run through the numbers, and the biggest slam-dunk, ?no doubt about it? difference to be found is here. Conclusively, knuckleballers have an advantage with regards to hits per balls in play.? Some of the very lowest rates of the last 50 years have been by these guys. Chralie Hough, Tim Wakefield, Phil Niekro, Hoyt Wilhelm, they?re all a decent bit better than their teammates in this regard. Using Craig Wright?s list from the Diamond Appraised of knuckleball pitchers, and adding Wakefield, Dennis Springer and Steve Sparks, I found a definitive difference between knuckleballers and normal pitchers. The knuckleballers tend to have an advantage of anywhere from .008 to .012 depending on how I choose to slice the pie. 10 points is a nice round number and should work well enough. Lefty/Righty ? A small difference appears here, and I?m not really sure where it comes from, but it appears to be statistically significant. It?s only around a .002 difference and since I?m not distributing prescription drugs to orphans here, we can put it in, and come up with the ?whys? and ?wherefores? along the way.? It might be a ground-ball/fly-ball issue. Maybe lefties tend to be more likely ground-ballers than righties. But at this point that?s really only a guess. Strikeouts ? Here?s where the murkiness starts. Looking at the numbers over and over and over again, it becomes clear that a pitchers strikeout rate during a single season is a bit better predictor of his hits in play the following year than his own hits per balls in play. This is there and it?s real. Why? I can only come up with two explanations: the obvious and the hard to show. The obvious explanation is that the more a hitter swings and misses at a pitcher, the more he also makes poor contact and therefore doesn?t hit the ball hard. Most would favor this, though if this is the case, there are some questions as to why the differences between pitchers aren?t greater. To propose another possibility, the more players you strikeout, the more often hitters hit with two strikes. Since this is a noticeable ability for hitters, if a particular situation causes a hitter to change his approach, it could effectively make the hitter a worse hitter in this regard. And if a particular pitcher creates more of these situations than others, it would effectively reduce the quality of hitters he faces in this regard. This dovetails a bit better with what we already know, but it also seems like a little bit of a reach. Draw your own ideas and conclusions here. Home Runs ? I wasn?t going to include this until the very end. I asked myself if the numbers said more with it, than without, and I decided on ?with.? While a shaky relationship, it appears that the more Home Runs a pitcher gives up, the fewer hits per balls in play he gives up. The problem here is that the ?why? of this is a minefield. Clearly there?s an obvious problem that if there were pitchers who gave up lots of homers AND lots of hits per balls in play, that might greatly reduce their chances of remaining in the league and therefore in the study. So this is a real problem. So I ran through every way I knew how to minimize this problem (using only established pitchers; running the numbers over various times; etc.) , and at the end of which, it was still there. Balancing these problems, is that Home Runs work as a decent proxy for ground-ball/fly-ball tendencies, especially when we don?t have this data available. It also lacks some of the problems inherent in those numbers (which I?ll get to below). This stat does not affect the numbers much at all, but I felt it was worth putting in there. I will provide the details on the system on my web-site, and this is the only one of the above I think you could leave out if you so chose, and not effect the other numbers badly at all. NOT IN THE SYSTEM TO FIND DIFFERENCES BETWEEN HITS ON BALLS IN PLAYWalks ? If there?s any relationship here, it?s that the more walks a pitcher allows, the fewer hits. But it suffers from the same problem Home Runs suffer from above, but in this case, I think the problem dwarfs any effect that may or may not be here. Maybe if there?s some additional refinements in the future, this may go in. For now, I think it?s best if we leave it out. Height ? Here?s something to ponder: why exactly do I get a statistically significant correlation between the pitcher?s height and his hits per balls in play? After racking my brain for days, I decided the best possible explanation is that the shorter pitchers tend to be much better fielders than the taller pitchers (say Greg Maddux versus Randy Johnson as an example). So maybe the shorter guys stop more line drives up the middle, field bunts better and cover first better thereby reducing the hits off them a bit. Of course, realistically, this we should measure this in a different way other than how tall the guy is, and technically it?s not pitching anyway but rather defense. I can?t justify including this one as it?s weird and probably prone to a pretty decent error rate. I still think it?s interesting though. Ground-Ball/Fly-Ball Ratios ? Well, this one ain?t going away anytime soon, so whatever I decide, it certainly is by no means final. It has been proposed for some time that fly ball pitchers tend to have an advantage here over ground ball pitchers. In the end, I?m pretty sure this is right, but the problems currently here are tough to overcome. For starters, I?m not convinced that these numbers are defense independent. There are really three types of batted balls as they are counted: ground-balls, fly-balls and line-drives. It isn?t hard to imagine a batted ball where it is difficult to determine whether it was a line-drive or a fly-ball. Since humans are making subjective judgments here, there is some question as to whether the eventual determination of fly-ball or line-drive might be largely influenced as to whether the ball falls in for a hit. Now granted, this would be a small effect, but we?re talking about trying to find small effects to begin with. The problem with skinning this cat is that even small biases need to be taken into account. So the problem might be that an increase in hits per balls in play might cause a decrease in ground-ball/fly-ball ratio, rather than the other way around. Nothing can be more ugly to deal with than reverse-causation. Also, because the differences here are so small, we need lots and lots and lots of data to be able to be sure of significant tendencies. Because ground-ball and fly-ball data are a relatively new phenomenon, it greatly reduces the amount of data we have to examine. So without the extended data, this sort of analysis complicates the whole shooting match. If we did use the ground-ball and fly-ball data, would it then remove the effects of Home Runs and Left Handed pitchers? Even though we can get the data now, it isn?t exactly published in every sports page or on every web-site with pitching statistics. The use of ground-ball and fly-ball data would then make it difficult for those with a desire to run numbers to be able to do so. For now, I?ll keep working on how to make this work (if indeed at the end it really does), and continue to use Home Runs allowed as it?s proxy. I?ll probably try and make a few phone calls to stats as to the nature of the numbers and see if I can?t maybe count all of the balls in play not currently in the fly-ball or ground-ball category as fly-balls.? This would once again make them defense independent, but in my opinion probably lessen the relationship some as well. When I (or someone else for that matter) can work this out, we?ll have DIPS Version 3.0. — Anyway, before anybody gets carried away here, I should note that of the 100 pitchers with the most batters faced, 93 of them eventually rated within .006 of the league average hits per balls in play rate using the above adjustments. As you would guess by my comments, it comes as no surprise that the two lowest estimated rates go to Steve Sparks and Tim Wakefield, both knuckleball practitioners (.279 and .284 respectively with a league average of .292). The highest estimate went to Jimmy Anderson (.300) of the Pirates, who is lefty who didn?t strike much of anybody out. I?m fairly sure that in many instances, the estimates are actually going to be further from whatever the pitcher?s actual ability may be, than simply assigning him a league average rate. But I think on the whole this does get us a bit closer to the true rates. The system was developed independent of this years? results, so this year would be a decent test to see if it helps any. Comparing the correlation between the pitcher?s hits per balls in play rate (using pitchers with 100 or more innings pitched) and that of his team?s was .415, which is essentially the way the old DIPS worked. Using these new adjustments, and estimating the differences as being from his teammates rather than some ?league average? the correlation is .428 a very modest improvement, but improvement nonetheless. That doesn?t address the issue above regarding the teammate?s rate containing some level of pitching. So, by then looking at how this system estimates pitching affects this stat for each team, and then adjusting by team with that adjustment made, the correlation then goes up a little more to .433. All of this said, the difference between doing it this way, and simply using the pitcher?s team rate is fairly minimal, but then again the differences between pitchers in this regard is probably minimal anyway, so every little bit helps. The final adjustment here is to use a new method to come up with the earned run estimates, to avoid problems with possible differences in extra base hit ability. This problem can be sidestepped for now by a simple reduction of the relative run value of a non home-run hit and a slight increase in the relative value of a home run. Multiple regression analyses confirm this. Obviously there?s a lot to be discussed about this, and, as always, I?m hoping this is the beginning of a discussion and not the end of it. Here is the link to this year?s DIPS numbers: http://www.baseballstuff.com/mccracken/dips2001.html Here is the link to the explanation on how to calculate this new version of DIPS: http://www.baseballstuff.com/mccracken/dipsexpl.html
Voros McCracken
Posted: January 25, 2002 at 05:00 AM | 32 comment(s)
Login to Bookmark
Related News: |
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2017
(7 - 11:24am, Dec 22) Last: Mr Dashwood 2017-2021 CBA (1 - 10:47am, Oct 04) Last: villageidiom Loser Scores 2015 (12 - 2:28pm, Nov 17) Last: jingoist Loser Scores 2014 (8 - 2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43 - 8:29am, Nov 05) Last: ERROR---Jolly Old St. Nick What do you do with Deacon White? (17 - 12:12pm, Dec 23) Last: Alex King Loser Scores (15 - 12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67 - 10:46am, May 09) Last: DanG Who is Shyam Das? (4 - 7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45 - 9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5 - 12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37 - 12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2 - 8:03pm, May 16) Last: Diamond Research Retrosheet Semi-Annual Site Update! (4 - 3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5 - 2:27pm, Nov 12) Last: Mr Dashwood |
|||||||
About Baseball Think Factory | Write for Us | Copyright © 1996-2021 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| Page rendered in 0.5322 seconds |
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Warren Posted: January 25, 2002 at 12:21 AM (#604710)I'm not sure I understand how "the final adjustment here... [avoids] problems with possible differences in extra base hit ability." Given that pitchers have varied percentages of (non-homer) extra-base hits per hit allowed, does DIPS 2.0 take this into account? For example, each (non-homer) hit that Greg Maddux allows is more likely to be a single than each (non-homer) hit that Rick Helling allows. I realize there are problems with this, since extra-base hits are defense-dependent.
One more comment: after the defense-independent hits total (dH) is found (along with the other defense-independent stats), you figure ERA in a manner similar to component ERA - that is, using a linear system to predict runs allowed. Component ERA (or linear-weights ERA) certainly has its place, since it's been shown to predict future ERA better than plain vanilla ERA. But since you've mentioned that DIPS isn't designed with prediction in mind, wouldn't it make sense to adjust the pitchers' true ERA based on $H?
For example, imagine a pitcher who allowed 100 runs in 200 IP (4.50 ERA). For whatever reason ("clutch" pitching, good pickoff move, etc.) this ERA is lower than his component ERA, which is 4.80. Say he allowed 5 more hits than he "should" have, given his estimated $H rate. Your method essentially adjusts the 4.80 number to account for this, but why not adjust the real ERA (4.50)? This would create a DIPS ERA that has less predictive value, but that's not really the point of DIPS, is it?
All that being said, very nice work. I wonder if you'd share a few more details about how you determined that handedness, strikeout rate, etc. play a role in $H - can we see some of that data? Peer review and all that :)
-- The "knuckleballer's advantage" is most likely related to two phenomena: errors and K+WP/K+PB (knuckleballers have more of these). I recall reading a study (don't remember where) that showed there tended to be more errors behind knuckleballers, on average, than behind other pitchers; I'm in the process of checking this using PBP data. The K+WP/K+PB will definitely factor into the equation when you use Voros's estimator for BFP, because you're assuming there that all strikeouts are outs.
-- The lefty/righty issue, like many similar issues related to pitching, is actually a hitting issue. LHP face fewer LHB than do RHP, and LHB tend, on average, to be better hitters than RHB, and also to have a higher percentage of hits on balls in play, whether they are facing LHP or RHP.
-- MWE
How would you recommend making the park adjustments. Say from 1933, when all you have is the PPF and also, would new values need to be computed for older data?
Could you make a few of your relationships a little clearer here? I think I understand them, but when you mention lefty/righty (and to a lesser extent, though more easily figured out, height) you don't make clear which direction the difference is.
But he still scares me.
Warren on ERA: Warren, one problem with this is that theoretically the difference between the pitcher's ERA and his Component ERA (or whatever) is also defense dependent. If the pitchers defense turns more double plays, throws out more runners on the bases, makes catches in so called "clutch" situations, the credit for those kind of events, if given at all, should go to the defense. Since they have a direct effect on the difference between ERA and a component ERA, that difference is defense dependent.
Mike, the differences I calculated for knuckleballers wasn't in runs, it was done directly to hits allowed (I believe Tim Wakefield still has the largest career difference between his $H rate and his teams among active pitchers), so the factors you mentioned (except maybe the error rate but only slightly) probably aren't the driving force there.
As far as the 20 biggest differences, sure I could do that. Most reasonable work requests I get, I'll do and simply post on my site. If someone here finds one particularly interesting, they can "clutch hit" it. I'll whip up a list today.
Righties have a small advantage over lefties in the $H dept. Exactly how much is open for debate, and the causes are as of yet unknown (to me at least), but it's around .002 to .003 higher for lefties. Not much.
Sean,
A good question about changing values for older seasons. It should work, but you'll probably need new coefficients, that actually can be derived from the existing ones. (I'll e-mail you with my thoughts).
As far as park factors go, I think you could almost get away with using strictly the home run park factor on the pitcher's home run total and moving on since the hit rate is irrelevant as long as we stick strictly to pitching. Parks, in my experience, don't affect walks much at all so that leaves strikeouts as the only missing link. I'm guessing you'll get closer to what you need with the Homer factor than the run factor. You could run studies using the current run and home run factors, and see how they relate to the component stats, and then extrapolate backward. I'm sure that would have some error involved with it, but would still be valid, I suppose.
Sam,
I just don't know what I'm going to do with you. You see one bad work badge picture of me, and now I'm Ted Bundy. Now, I have to go. I have to get this human head in the freezer before it goes bad.
I wonder if there could there be a variation with the percentages of balls that go to right field against RHP, with RHp getting more balls hit there than LHP (and LHP getting more balls hit to LF)? If so, and assuming teams put better defenders in RF than in LF to begin with, then maybe lefties are subjected to poorer defense in relying more on their LFs?
You won't believe this, but a right-handed batter is 30% more likely than a left-handed batter to hit the ball to RIGHT field. Same story for a left-handed batter and left field. Apparently, there's no tendency for most batters to pull flyballs.
They do pull groundballs. A right-handed batter is FOUR times more likely than a left-handed batter to hit a groundball to third base, and twice as likely to hit a groundball to shortstop.
I had a hard time believing this myself, so I understand your skepticism. IAE, even reversed, the number of lefty and righty batters a team/pitcher faces won't make a huge dent in the fielding statistics of its/his outfielders.
I did use multiple regression. In fact, the rate I got for walks was actually in the other direction than yours (I got more walks, fewer hits). And the data set did compare the pithcers $H with that of his team, it is just that once you've found your relationships, then it is time to discard that measure since it is not defense independent.
Also I would be worried about using career rates WRT to $H since there's a selection problem.
Finally, you need to era adjust every stat category in order to make such comparisons useful. I'm not sure if you did, but if not you wind up with serious problems comparing results for pitchers from the 50s with those of the 90s. I believe I have my regression results if you wanted me to send them to you.
I'd be most interested in which pitchers show a persistent trend over the last three years (minimum 50 IP per season) to have H% above or below the predicted percentage after you've made the various adjustments. I think that would allow us to look at the exceptions, see if they have anything in common seemingly or if they instead are a bunch of random outliers. That might give your assertion that you've made all of the small adjustments that can be found.
On a related note, in the 2002 Baseball Forecaster by Ron Shandler, which you might not have read if you're not interested in future predictions, he mentions several pitchers as seeming to disprove your theory, but (i) Shandler might be using a simplified version of H% and certainly didn't make the smaller adjustments you mention, and (ii) random distribution would still produce a few pitchers that give the appearance of persistance when examined in isolation.
An average pitcher, pitching 200 innings, will have somewhere around 500 balls in play converted into outs by his fielders. That figure doesn't vary much for most pitchers. For the knuckleballer, though, it will vary. His fielders will have to get more outs, because some of the strikeouts that he gets that would usually be outs for other pitchers won't be outs for him. If his teammates make more errors than the norm as well, that's extra non-hits put into play.
A knuckleballer, to get 500 outs from his fielders, might have something like 550 non-hits in play over 200 innings, where a normal pitcher might only have 525 with the same quality of defense behind him. The flutterballer in that case is starting at a 25-runner disadvantage - before considering hits. If the team isn't better at preventing hits behind the knuckleballer than behind a normal pitcher, the knuckleballer will be at a further disadvantage - if the team $H is .300, the normal pitcher will allow 225 hits, the knuckleballer about 236 in the same number of innings. (And that's also before considering extra outs by the fielders needed to make up K-WP/PB).
What I'm saying is that the knuckleballer's advantage *has* to exist; for a knuckleballer to survive, he has to allow fewer hits on balls in play than the average non-knuckleballer, because (a) he needs more outs by his fielders than an average pitcher because of the K-WP/PB factor, and (b) there will be more balls that should be outs that aren't outs. That's probably one reason why there aren't more of them in the majors, because you need better defense behind them to get the same results.
-- MWE
Actually, I may have jumped the gun when I made the comment that knuckleballers need good defense to be successful. We're looking at the results of a ball in play as being a function of something other than the pitcher. The automatic assumption is that, if it's not the pitcher, it's got to be the defense. But there's another factor in this mix - the hitter. Suppose the hitter has the largest effect on the results of balls in play. Most pitchers have fairly predictable movement on their pitches; the fast ball moves one way, the cuve moves another, the slider and the splitter move in relatively predictable ways. The hitter therefore can get his timing down, and can guide the ball more easily. But the knuckleball *doesn't* move in a predictable pattern - that's why pitchers use it. Thus the hitter can't anticipate what the ball is going to do, and thus has less control as a result. Perhaps it is this reduction in the hitter's control over the result of the ball in play that provides the knuckleballer with an advantage.
Or perhaps not. It's a thought, to be pursued. Knuckleballers often end up with bad teams, as Bill James noted, and often pitch much better than one might expect. Bad teams don't often have good defenses, so there's probably something else going on. The knuckleball pitcher may be less affected by *bad* defense than other types of pitchers.
-- MWE
I'm wondering, though, if knuckleballers are winding up on bad teams that are just bad, or bad teams that specifically are bad because don't have enough pitching.
If knuckleballers are, as they seem to be, used primarily as a last resort for teams that don't have enough pitching, those teams could be bad enough to be bad teams without the need to be poor defensively.
When I say "not enough pitching" I mean, often, literally don't have enough major league pitching to throw the 1450 innings they need to pitch... and then you will be happy to let a Phil Niekro throw 330 innings for you, or Joe 250.
The "knuckleballer-as-last-resort" idea might mean that in fact the defenses that play behind knuckleballers are not bad, but that those teams are bad for a wholly different and understandable reason (no pitching).
I think I need to do a study...
Voros,
What are the historical numbers like for pitchers who throw primarily sidearm or underhand?
If it's the dancing of the knuckler at work, perhaps other unusual pitching styles would give similar results as well...
-- MWE
Texas has had an awful defense the last two years, and they haven't made a significant effort to address that in the offseason, from what I can see - although neither Blalock nor Perry can be much worse at 3B than Lamb, to be sure, and Everett isn't a bad defender, I just don't see where they're going to get a significant upgrade. The team $H was .314 in 2000, .312 last year. Davis should come back to the league some on the $H, but he still walks more guys than the league average and strikes out quite a bit fewer hitters than the average. I don't consider it likely that he'll get much better (not that a mid-4s ERA in the AL is too shabby). If he were pitching in Seattle, on the other hand, you might see another Jamie Moyer-type season.
-- MWE
Wow, thanks for the info on righties and lefties. Very interesting indeed.
The rate of errors that allow runners to reach base has fallen considerably. James used 71% in his original calculations for DER, and Clay Davenport used 70% in his defensive system in 1998. The rate over the last three years has been a lot closer to 60% - and more to the point, the rate of throwing errors on infield hits has risen to about 17% of all errors. I am absolutely certain that many of these would have been scored as straight-up errors in times past.
I would suspect that RH-hitters would, in fact, get some benefit from this effect, and that LHP would be hurt by it.
-- MWE
-- MWE
Vinay, in his comment, was asking whether fielding percentages for 1Bs would be higher than 3Bs even when you eliminate plays that 1Bs get for catching throws, and they are; the center for 1Bs is around .975 (between .971 and .977, actually, for 2000) while the center for 3Bs is around .960. Removing a few additional errors would only push that 1B center even higher.
-- MWE
Dave Burba looks like a decent bet despite going somewhere no better tahn Cleveland.
Jason Johnson ought to sink like a stone.
Joe Mays and Mark Buerhle are unlikely to be Cy Young candidates again this year but could be decent nonetheless.
Glendon Rusch looks like a very good choice.
Watch out for Kerry Wood if he remains healthy, could be a Cy Young candidate.
Derek Lowe as a starter looks to be a strong "value" play (though in a roto situation the lack of saves might mess with things).
I like Terry Adams behind a fairly good defense in Philadelphia.
I don't have a lot of confidence in Jon Garland.
Rick Helling should be better in Arizona.
--
I'll have projections for pitchers finished in as soon as a week and as late as two weeks. It depends on how much I get done this weekend.
Hope that helps.
Voros, I concur with Ray. There's something to be said for doing 90% of the work to gain an extra 10%. My suggestion is that "basic-DIPS" should always be presented, along with an "appendix" for V2.
Mike: I didn't realize that the 2B would get credit for an official assist on an error play.
"An assist shall be credited to each fielder who throws or deflects a batted or thrown ball in such a way that a putout results, or would have resulted except for a subsequent error by any fielder."
The Rules of Baseball can be found at the MLB Web site in the Baseball Basics section.
-- MWE
Also, re: other comments on related effects, noticed that you're almost ready to include walks in your model and, with Ks already there, the name Nolan Ryan immediately came to mind. Same thing re: HRs and Ks for Bert Blyleven. I think you mentioned that this sort of double effect is all taken care of, but maybe you could explain simply how that would happen - are there thresholds that can't be passed or arbitrary rules that no two factors can contribute more than something, or what?
Interesting stuff, but is it really that much more useful than the basic version. I mean, are guys who were nowhere with version 1 now in the elite class, or is it just a case of adding a few points to your correlation coefficients as you briefly described in your piece?
once the decision was made to do DIPS, why not take it as far as possible so that we can get the best possible picture of how much of defense is pitching, and how much is fielding?
There are several reasons why a person might want to stop short of a full-dress treatment:
1. Introduction of additional adjustments that add minimal additional accuracy at the expense of ease of use tend to turn people off. How many people followed all of Bill James's gyrations through the various technical Runs Created formulae, and how many people used one of the simpler forms which were less accurate but easier to remember and use?
2. Like all sabermetric methods, DIPS is a model of reality, and not an exact representation of reality. There will always be areas of performance that the model can't explain, and you cannot logically force the model to try to explain them, so you shouldn't try. It's certainly useful to look at things that the model doesn't explain, such as the knuckleballer's advantage, but it's not necessary to try to account for each and every one of those effects in the model itself.
3. You run the real risk of decreasing the model's overall accuracy in an attempt to model a particular parameter that affects the accuracy of the model in some cases.
Taking into account accuracy, clarity, and ease of understanding and use, DIPS I was (IMO) about as good as one could hope to achieve. I'm not suggesting that Voros shouldn't TRY to make adjustments and attempt to model other aspects of pitching and defense not covered by the model, not at all. I do suggest that the finding that pitchers have little to do with the results of balls in play is FAR more important than nailing down the exact value of "little", and that for almost all aspects of defensive analysis, the assumption of 0% impact implicit in DIPS I does not introduce a significant error if the impact were later found to be some small positive value.
-- MWE
This fact I find loses its impact in the refinements that Voros is bringing in.
Actually, the explanation is a lot simpler than that.
1. When hitters pull the ball on the ground, they tend to hit it into the hole or down the line.
2. When hitters don't pull the ball on the ground, they tend to hit it away from the hole/line, and most often wind up hitting it directly at the fielder.
This is true on both sides of the plate. Because there are more RH-hitters than LH-hitters, RH-hitters have a net advantage in this category, even though the rate of conversion on balls hit into the hole on either side of the diamond is virtually the same. There are actually a (slightly) smaller percentage of balls hit into the 1B-2B hole converted into outs than there are in the SS-3B hole, which is somewhat unexpected, and there are "far" fewer balls hit up the 2B side of the middle converted into outs than there are on the SS side, which is expected since 2Bs are moving away from first on those plays while SS are moving toward 1B. 2Bs do convert more balls hit directly at them than do SS, which makes up for some of the difference. Furthermore, as you would also expect, 1Bs convert more ground balls hit down the line into outs than do 3Bs, which also disadvantages LH-hitters. The result of all of this is that, on ground balls, LHB make outs far more often than do RHB.
The argument on fly balls is on more solid ground, in that players who can throw but don't move very well play RF and players who can run but don't throw very well play LF (if they can do both they play CF). Having players who cover less ground in RF also tends to favor RHB, because fly balls are usually hit to the opposite field. Because RHB are favored offensively on both fly balls and ground balls, RHP would be favored defensively, the effect that Voros noted.
I should point out that this effect is, again, mostly independent of the type of pitcher. The distribution of balls in play by hitters does not change significantly based on the handedness or GB/FB tendencies of the pitchers.
-- MWE
FWIW, my last name is pronounced as though it were spelled "Amy".
But your first point boils down to personal preference, and your 3rd point essentially means that Voros must be careful in his evaluation and judgement of his research, which is true enough but is generally true anyway of everyone who deals in sabermetrics.
Actually, it's not generally true that everyone who deals in sabermetrics takes that level of care, although it is most certainly true of Voros - but that's another thread.
I should clarify that when I talk about DIPS being a "model", and when David calls it an "estimator of pitching ability", we are really talking about the same thing. The model underlying DIPS is that you can gauge a pitcher's real level of performance by considering only his ability to prevent HRs and walks and to strike hitters out, and that because hit prevention is NOT primarily a pitching ability you do not need to look at that for an individual pitcher. The numbers that come out of DIPS are estimates of that ability level, based on the assumption that the underlying performance model is an accurate reflection of reality.
In evaluating a model, you need to ask two questions:
-- How accurately does this model reflect reality?
-- How well does this model correlate with reality?
These are NOT the same question, expressed two different ways. The accuracy question asks "How close is the estimator of ability to the actual ability?"; it doesn't ask if the estimator overshoots on one player and comes up short on another, only if the net result is small. The correlation question asks "How likely is it that if the model ranks Player A as being better than Player B, that Player A is actually better than Player B"? - it doesn't ask whether the difference between the two players is 10 perfomance units or 1000, as long as A>B on both scales.
A model can correlate well with reality and be terribly inaccurate, if it fairly consistently underpredicts or overpredicts. A model can also be quite accurate, yet not correlate very well with reality if the errors tend to offset one another, or if players tend to cluster in performance groups with relatively small variations in performance between groups. You cannot rely on one or the other as your sole indicator of a "good" model.
And what I tend to see happening is that people forget to balance the two views of the data. People will either distort the model away from a viable correlation with reality in an attempt to get the model to "add up" or they will focus on things that make the model correlate better, by modelling some small additional performance feature, and forget to check whether the model becomes less accurate for players as a group when the small effect is added.
And that's before we even consider the effect of the "ecological fallacy", where in looking at the performance of players we make inferences on aggregate data (team-level data) that may be unreliable as a result of bias resulting from the aggregation of the data. See this link for a discussion of this problem. (If Walt Davis is reading this, thanks for steering me in this direction.)
The problem that I have with complex models is that they imply a level of accuracy that very likely does not exist, because of the problems that I outline above. It's why I think my first point, which David calls a matter of "personal preference", is really something more than that. Maybe the simplest models are the best not just because they are easy to understand, but because they capture most of the information people need to know without any pretense at being more accurate than they likely really are.
-- MWE
Has anyone followed up on this? Do some batters have consistently high or low $H?
--
Ben
If you ever watch a game with a knuckleballer involved...more times than not you'll see the hitters off balance when swinging. Obviously due to the speed and movement of the ball hitters have to try and adjust. In the end, most hitters can't make the necessary adjustment and will not make "solid" contact. Without solid contact, safe hits will undoubtedly drop.
In the same light, slap hitters like Ichiro would suffer. While hitters like Ichiro count on just making contact and having the pitcher provide the needed energy to put the ball in play...preferably on a line in the gap...knuckleballers do not provide enough energy to just spray the ball. Hitters have to be able to generate more energy on their own for the ball do what it would against a "normal" pitcher.
In my opinion...which doesn't really mean much, in my opinion...hitters have a big hand in how well knuckleballers DIPS numbers look.
With that said, catching also plays a role. I would venture to guess that the majority of unearned runs charged to a knuckleballer involve the catcher in some way whether it's PB, WP, or a botched third strike. It could even be throwing errors trying to catch would be base stealers because knuckleballers are more vulnerable to stolen bases. I wouldn't blame "poor" defense for some of their unearned runs, I would venture to suggest that they brought it onto themselves.
I've never done any research into this matter, this is coming from pure speculation based on experience playing and just taking in a game or two (or a million!!! hehe). I've had to joy of trying to catch for a lefthanded knuckleballer as well as hit against several knuckleballers...and I hate it!!!
Oh, one other thing...as a lefthanded hitter...growing up, in my district little league league...we only one a handful of lefthanded pitchers so my experience against lefties was very limited. And I still to this day feel uncomfortable against them.
Nowadays, lefties are revered...unlike the old days. So, I think you are going to begin to see more lefthanded pitchers come up through the ranks and as a result, I think they're effectiveness will begin to diminish in the next couple of decades...if it hasn't already.
Again...purely observation.
Thanks,
John
You must be Registered and Logged In to post comments.
<< Back to main