|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
You are here > Home > Primate Studies > Discussion
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Primate Studies — Where BTF's Members Investigate the Grand Old Game Thursday, November 20, 2003Defensive Regression Analysis - Part 31999-2001 DRA-UZR-DM Ratings, Position-By-Position I will go through the positions in descending order of skill/importance, what Bill James long ago described as the Defensive Spectrum: shortstop, second base, center field, third base, right field, left field and first base. As promised, I will end with a review of the career DRA ratings (through 2001) for I-Rod and Piazza, the best and worst fielding full-time catchers over the past decade or so, as I lack the most up-to-date UZR ratings at catcher. UZR infielder ratings include "DP" ratings; UZR outfielder ratings do not include "Arm" ratings. In Part III, in the context of the discussion of historical outfielder ratings from 1974-2001, I will discuss DRA arm ratings in the outfield. All numerical ratings are denominated in terms of runs saved or allowed relative to a league-average fielder; e.g., +25 means 25 runs "saved"; –12 means 12 runs "allowed". The "Notes" column addresses examples where DRA and UZR seem to be reaching meaningfully different results. The following "code" of comments applies: "dm=dra" means that DM information strongly supports DRA; "dm~dra" means that DM information is mixed, but on balance, appears to favor DRA over UZR; "?" indicates it is unclear whether DM supports DRA or UZR; "dm~uzr" means that DM information is mixed, but on balance, appears to favor UZR; "dm=uzr" means that DM information strongly supports UZR. The one reference to "dial=dra" refers to an instance in which Chris Dial’s zone rating matches better with DRA. The one reference to "park" refers to a (Fenway) park effect. DM commentary comes from three separate sources: team essays for 1999 and 2000, which contain capsule summaries of individual player performance, and the "Gold Glove" essay for 2001. DM’s webpage does not provide team comments for 2001 or Gold Glove essays for 1999 and 2000. In general, this mix of essays generally does not provide commentary for average or below-average fielders in 2001. The team comments for 1999 and 2000 more than make up for the lack of "Gold Glove" essays for those years. Shortstop
DRA and UZR basically agree that Aurilia, Clayton, Renteria, Tejada, Deivi Cruz, Nomar, and Alex S. Gonzalez were basically average over the ’99-’01 period. DRA and UZR agree that Rey Sanchez was outstanding and that Neifi Perez was pretty good, at least in 2000. DRA and UZR basically agree that Jeter, Guzman and Alex Gonzalez were clearly below average, with the DRA ratings for Guzman being less extreme and the UZR rating for Jeter being less extreme. We could quibble about a few single season ratings—UZR shows Clayton as a viable Gold Glove candidate in 2001; DM’s 2001 Gold Glove Review ("DM GG") does not mention Clayton. On the other hand, DRA shows Renteria as a viable Gold Glove candidate in 2001, and DM GG doesn’t mention him either. The significant differences are over Vizquel, Ordonez, and, possibly, A-Rod. DM GG had this to say about Omar, my nomination for the most over-rated fielder in history: "[Vizquel] was one of three Cleveland infielders to be rewarded with Gold Gloves [in 2000]. But that infield was below the league average in turning ground balls into outs. And according to the STATS Major League Handbook, they were fourth worst in the league in converting double plays when grounders were hit in double-play situations. "The bottom line is that somebody isn't making nearly as many plays as people think . . . . "[In 2001], Cleveland's infield was 13th in the league in the percentage of ground balls turned into outs. And they were only a hair above the league average in double-play percentage. "You could argue that the infield looks bad because the corner guys -- Jim Thome at first, Travis Fryman and Russ Branyan at third -- don't cover much ground, and you'd be correct. Problem is, there's absolutely no evidence that their middle infielders are doing more than their share, either . . . . "Suffice it to say that Vizquel's range wasn't all that good this year." UZR rates Vizquel above average; DRA rates him below average. Rey Ordonez has a historically high UZR rating for 1999: +39. DM does not seem to suggest that Ordonez was having a historically outstanding season at short. "Error totals aren't usually a good indication of fielding prowess, but the four errors charged against Ordonez were impressive nonetheless." DM says nothing about his range. DRA rates Ordonez’s 1999 season at +10. Regarding A-Rod, DM seems to take a middle position between the moderately high rating he has under UZR and the barely above average rating he has under DRA. In 2000, DM’s team comment for Seattle describes A-Rod’s fielding in a manner that supports DRA: "While A-Rod lacks the great range of some other AL shortstops, he does rate above-average and has very good hands." UZR rates A-Rod’s 2000 season at +18; DRA rates it +9. DM has nothing to say about A-Rod in 2001. UZR rates him slightly above average (+8); DRA rates him as slightly below average (-6). All in all, DRA appears to have "worked" in evaluating full-time shortstops during the 1999-2001 period. Second Base
DRA and UZR basically agree at second base. In its 2000 Florida Marlins comment, DM classifies Luis Castillo among young players with "great speed and defense", so it’s probably the case that UZR has measured his fielding better than DRA, though DM does not elaborate at all regarding Castillo’s defense, and does not mention Castillo at all in its 2001 Gold Glove review. If DRA has failed to recognize his talent, it’s not a talent of significant magnitude. DRA, UZR and DM all agree that Pokey Reese was outstanding and that Adam Kennedy was very good, particularly in 2001. Center Field
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
I hope to do a 2002-03 DRA article, in which I use the 1974-2001 regression weights "out of sample". Perhaps Chavez will show up better then.
Sweet,
Thank YOU for reading the article.
Gilbert,
After I wrote the article, I realized I should have written something about Junior, who has been given a lot of Gold Gloves, and then came across this e-mail thread by MGL:
"Posted 5:46 a.m., June 21, 2003 (#1) - MGL
Griffey was always VERY overrated in CF and slightly overrated as a hitter, and of course his injuries and perhaps lack of conditioning have caused him to "age" faster than he should have."
I said it to you before, and I'll say it again -- I watched Greg Gagne about as much as Bill James did, and his assessment is too positive. Your ratings have him right. Not only that, but his backups in Minnesota -- usually Al Newman -- were generally pretty good.
I hope you have enough Snickers handy...
Brilliant work Michael.
Call me an official skeptic on the magnitude of some of these numbers. I struggle saying that Pokey Reese saved 39 runs above average in 1999, or Chipper cost his team 28 runs in the same year.
I looked at David Pinto's recent fielding analysis -- particularly focusing on Mark Ellis. Ellis wasn't too far off from Reese: ZR of .890 (Reese's was an awesome .905), same number of innings and peripheral stats. And Pinto says that Oakland second basemen recorded thirty more outs than expected, given where the ball was hit.
I'm guessing that it takes two to three "outs made above average" to save a run. That would mean that Reese would have had to prevent about 100 hits to save that many runs. Oakland second basemen, arguably the best in the majors last year, prevented 30 according to Pinto.
I know that you add in a ton of other factors, so a straight comparison isn't possible. But how is it possible for any second baseman to save his team 39 runs in a single season?
No. Saving an out is worth about 0.8 runs. I had a long discussion/proof on this. I'll see if I can dig it up.
============
Suppose a team with Ozzie at SS gives up on average 12 non-HR hits, and 2.6 walks every game (which of course is 27 outs). Applying .50 runs per non-HR hit (I know it should be closer to .55, but I just want to keep it basic), and .30 runs per BB, and -.10 runs per out, and we get 4.08 runs scored per game. And per game, we see that Ozzie's team faces 41.6 batters (again, let's not worry about DPs, etc).
Now, let's say Ozzie was traded for Spike, and let's say for every 41.6 batters faced, there is one ball that Ozzie gets to that Spike doesn't. So, for those 41.6 batters, Spike's team records 13 non-HR hits (1 more than Oz), 2.6 walks, and 26 outs (1 less than Oz). However, there's still one more out to go! Since Spike's team gives up 13 non-HR hits / 26 outs, we can estimate that this team will give up 13.5 non-HR hits, 2.7 walks, and 27 outs per game ( a total of 43.2 batters, a remarkable 1.6 MORE batters than Oz). Anyway, applying our LW constants, and we see that Spike's team gives up 4.86 runs per game.
This number is .78 runs MORE than Ozzie. This is the result of Ozzie getting to one more hit than Spike. .50 runs for the hit, and about .30 runs for the out gives you the .80 runs.
====================
Thanks, tango.
Does the value of this system (as compared to MGL's UZRs or Pinto's new system) come simply from the fact that it can be used to evaluate the seasons before play-by-play date or do you think there's more than that?
Charles, thanks for the insight re:Gagne.
Repoz, thanks for the compliment, though I'm not sure I'm getting the "Snickers" comment.
Studes, several points. (1) I hope to apply the weights out of sample for 2002-03 ratings. (2) Pokey's +39 runs could easily be an excessive estimate, but I believe his two-year average is pretty close to UZR. (3) Chipper's -28 rating is almost exactly the same as Chris Dial's zone rating for Chipper. It could be wrong (any estimate can be wrong), but I'm pretty sure the overall 1999-2001 rating for Chipper is right. After all, if he only cost a handful of runs a season, why would you move him to left. And I tend to trust Atlanta's judgment here. They have the best DRA team fielding rating in the '90s and the highest rating this year under David Pinto's model.
Tango, thanks for the runs-saved analysis. Spot on.
Tim M, I'd like to do a pitcher rating article (Dick Cramer has made the same suggestion), but it may be some time before I can get to it. DRA is a bit like DIPS, except that (I think) DIPS allocates all BIP outcomes to fielders, whereas DRA allocates estimated infield fly outs to pitchers and the remaining BIP to fielders. Also DIPS yield a *rate* number ("DIPS" ERA); DRA yields runs-saved numbers. DRA would probably provide a few meaningful adjustments to our all-time ratings for pitchers. I believe it would probably address two issues in Win Shares--the overestimation of pitcher value in the dead-ball era and the underestimation of pitcher value for the real outliers such as Pedro and The Big Unit. That said, I think we all pretty much know who the best pitchers have been. Providing a better estimate of whether Mays' glove enabled him to match Mantle's peak value is something a think a lot of fans feel they don't know and would be interested in.
Joe M., thanks for taking the time with the article. I too was surprised that Manny was OK under DRA. It's certainly possible that he was worse than DRA has determined. It's also possible that he has declined a bit in the past couple of years, which would be consistent with the pattern for almost all outfielders at his age. I agree that he is a bad fielder--I saw him misplay about half a dozen fly balls (without being charged for an error) in two Yankee games this season. I think his Pinto rating this year is not good, but not terrible, either.
J. Cross, DRA's relevance is in (a) providing better historical ratings, (b) evaluating minor-leaguers for whom we lack zone data and (c) providing a "back of the envelope" second opinion for surprising zone ratings. As mentioned in Part IV of the article above, zone ratings have the best data, but it's actually a very big challenge to figure out how best to use it, and sometimes you get ratings that are surprising. When a surprising rating pops up, an analyst could use the DRA rating as a reference point in the course of analyzing, play-by-play, why the zone rating is the way it is. Then the analyst can determine whether it might be appropriate to adjust the zone rating. It's all about trying to answer the same question using *different methods*--UZR, Pinto's probabilistic model, DRA, etc., etc.
I've kept Pete Palmer in the loop regarding DRA, but he said he didn't have the chance to try incorporating the DRA method (as described here) into his latest edition, which will be coming out soon. One possible way of getting DRA before the public would be as part of a future edition of one of Pete's books. We'll see. In the meantime, I hope to provide 2002-03 DRA ratings for Primer readers and will probably reveal all in a book, assuming no major league team is interested.
AED,
If you can point us to any publication (in print or on-line) that uses regression analysis as fully as it is used under DRA, I think we'd all appreciate knowing about it. Regression analysis *has* been used to make certain ad hoc adjustments for certain defensive statistics (I acknowledge in one of the threads to the second installment that I actually got the idea for DRA from a (presumably) regression-based adjustment under one of Bill James' formulas (see p. 222 in Win Shares).) However, I'm not aware that anyone has used regression analysis (i) to find (or attempt to find) *all* of the statistically significant relationships between publicly available pitching and fielding statistics, in order to develop better estimates of context-adjusted fielding plays made, and (ii) to determine the statistically significant weight, in runs, for each context-adjusted pitching and fielding event.
As far as accuracy for shortstop ratings, that's great if your system has a high correlation coefficient with UZR. But that's not the whole story.
First, the sample size here is fifteen. When I was analyzing the results under DRA, I compared the r-squareds for DRA, Win Shares, and Davenport Fielding Translation ("DFT") ratings against updated UZR ratings for all shortstops evaluated in Mike Emeigh's Jeter series (the sample size was possibly somewhat bigger). DRA was significantly better than Win Shares, somewhat better than DFT. Guess who came out best? The completely non-empirical Total Baseball Fielding Linear Weights rating. As Tango says, "Sample size, sample size, sample size."
Second, it might be worth checking the shortstop ratings against the UZR ratings adjusted for Diamond Mind commentary. I don't know what the correlation coefficient for shortstop ratings is, but the overall correlation coefficient at all positions is slightly over 0.8.
Third, almost as important as the correlation is the *coefficient* of the regression result; i.e., getting the "scale" of ratings correct. In Mike Emeigh's "Jeter" sample, Win Shares ratings were too "dampened"; DFTs were about right (as are DRA ratings); Fielding Linear Weights were too "big" (too much spread between high and low ratings). In general, DRA manages to match (almost perfectly) the average "scale" of fielding impact independently determined under UZR.
Fourth, getting ratings approximately correct at *all* positions, including pitcher, and having the ratings add up to the team runs allowed, is also very important. Win Shares does the latter by definition, as does DFT. Fielding Linear Weights does not. DRA doesn't "force" the ratings to add up to team runs allowed, but the DRA estimate of team runs allowed is more accurate than Linear Weights or Runs Created estimates of team runs *scored*.
I'm sure that Primer readers would be very interested in learning more about your system, and it wouldn't be necessary to reveal all of the details. I seem to recall Mike Emeigh writing that certain key details in the DFT methodology are still proprietary, as is Tom Tippett's system at Diamond Mind. I also believe that Tango has described the basic principles of his Leverage Index for relief pitchers without revealing the mechanics.
Went back and checked the DRA ratings for Manny Ramirez. They stop in 1999--his last 130+ game season in right field. So we're missing the last four years of ratings. His Range Factors as reported here at baseballreference.com dropped sharply after 1999 and stayed consistently low. (I know, I know, not the most reliable stat.) Nevertheless, it seems he did really decline after 1999, as evidenced by the the fact that his teams wouldn't *let* him play full-time in right field.
Though I'm sure it was just a minor factor, I would not be at all surprised if the sabermetric-saavy Red Sox have figured out that Manny does enough damage in the field to cause his *overall* value to be meaningfully less than commonly perceived.
Studes,
Yes, the run-weights for context-adjusted plays made at the various positions do differ, and, I think, teach us something.
The run weights for plays made at second and first were slightly higher than at short and third. I think the reason is that a singles (and doubles) on the right side of the diamond has more baserunner advancement value. If you prevent a hit with a higher value, you save more runs.
The run weight at center was consistently lower than in right and left. Two possible reasons. There may be doubles and triples "prevented/not prevented" down the lines than through the gaps. The other is that a ball *caught* in centerfield probably has a lower value to a defense, because it is likelier that a runner can tag up and advance. Why? Well, centerfield is deeper than right and left. In addition, *assists* rates at *both* corners are always higher than in center, in spite of the fact that centerfielders field many more BIP.
This is another example of how insights from DRA can complement and improve zone ratings. As far as I know, UZR tracks whether a BIP that falls in for a hit turns into a single, double or triple, as well as the average value of an out. But I don't believe that it tracks the different *baserunner advancement* value of the hit prevented or the play made, which *differs* by position. Yet another variable that could be added to a state-of-the-art zone system.
It hadn't occured to me that this could be used to evaluate minor league defense. To my mind that's a more important implication than evaluating all-time greats. Of course, it is good to have multiple methods to evaluate defense since none of them is as yet that dependable.
AED, if we don't get to know how your system works do we get to know who AED is or do you, like the system, need to be cloaked in mystery? It's not that ridiculous to think that whoever comes up with the best new tool to evaluate defenses will be hired by a MLB club. It is the next big thing, right?
Thanks. Neither Edmonds nor The Big Hurt played 5 seasons of 130 or more games during the 1974-2001 period of the historical study.
Edmonds played four such seasons. '95 and '98 with Cal/Ana; '00 and '01 with St. Louis. The team ratings in CF for those seasons were +14, +7, 0, and -3. I believe he has battled a fair amount of injuries.
The Big Hurt had only three 130+ seasons: '92, '93, and '96. The ratings are -10, -12, and -4. The first two seasons he played 158 and 150 games; the third season he played 139. His back-ups probably raised the team rating slightly.
As explained in the article, the ratings at first are based only on context-adjusted assists. I explain in the article why(Saeger/James) Estimated Unassisted Putouts at First Base ("EUPO-3") are probably not reliable enough for evaluating *good* or *adequate* first basemen, but *are* useful for evaluating *terrible* first basemen, and I suspect that Frank Thomas' rating would be appropriately reduced below its already poor level if his EUPO-3 were calculated.
In addition, neither DRA nor Win Shares addresses the important factor of catching throws from infielders (i.e., preventing "infielder" (throwing) errors).
J. Cross,
Thanks for your support. DRA can, in theory, be applied to minor league stats. The extent to which minor league DRA ratings would "translate" into major league performance is another question. I tend to think they will, at least as well as batting statistics. I think that is the biggest potential benefit of DRA to a major league team, though I also think it is useful as a "back of the envelope" double-check on zone ratings.
I actually do think that good fielding systems--UZR, DRA, Diamond Mind--are actually pretty reliable now. The year-to-year correlation in such ratings for individual fielders is at least as high as the BABIP for *hitters*, and possibly as high batting average (not sure which). Even if the "persistency" is only the same as BABIP for hitters, people routinely accept the significance of the BABIP component of batting performance without batting an eye (pun intended). It will take time, but fans will eventually accept well-designed fielding ratings, just as they have begun to accept OPS.
I'm not sure what you (MH) mean by "whether UZR incorporates into the value of a hit, baserunner advancement." Remember that UZR is context neutral - it only cares about the average value of a s,d, etc., inlcuding the average baserunning value of those hits. It does not measure a fielder's actual performance, viv-a-vis what the baserunner and out state is when a ball is hit. Does DRA account for this? If yes, then they are fundamentally different, although the differences will diminish with larger samples.
I am confused as to exactly how DRA can "add" to a UZR rating when that UZR rating is questionable for whatever erasons. I am not implying that it can't. It's just that, as I'm sure you know, it is hard to conceptualize a system that is based on a multiple regression analysis. In fact, it is hard to intuitively grasp how it can actually work and how it can be so precise.
While RF and third base UZR ratings may have the lowest year to year correlations (and hence, may be the most "unreliable" UZR ratings), I think that by far and away the least "accurate" are the OF ratings in general, and CF in particular, for obvious reasons. The principla reason is that many, if not most, fly balls in the OF are easily caught and many of them can be caught by more than one OF'er (and CF bears the biggest brunt of that problem), as well as by an infielder. UZR rating in the IF are very straight forward I believe.
Does DRA "pick up" the valu eof the first baseman taking throws from the IF'ers? UZR does not of course. Also, I don't know if you have them (if I made them publicy available), but it is useful, for various reasons, to have the breakdowns in UZR: the "error" portion of UZR as well as the "range" portion of UZR. I think you said you did not use the "arm" potion of UZR for outfielders. Did you uze the DP portion of UZR for IF'ers? Does DRA include arms for OF'ers and Dp's for IF'ers? I assume DRA includes errors as well as range for all fielders and they are lumped together (inextricable?).
Also, my catcher ratings are straighforward and are included in Super-lwts. I probably should include them in the UZR ratings although they have nothing to do with "zones." The include SB/CS, errors, and WP and PB's I think (maybe not WP's, I'm not sure). Do you have my catcher ratings from my UZR files? What does DRA "pic-up" as far as the catcher ratings?
In conclusion, this is one of the greatest innovations in sabermetrics of all time - at least the equivalent of Palmer's linear weights, probably better. Assuming one has the requisite data (but not the zone data) and is able to use the formulas, no one should ever talk about anything but DRA when they are discussing a player's defensive skill and/or performance. Heck, it might even turn out that DRA is better than UZR even if you have the zone data. I am blown away!
I would love to see you do a quick and dirty estimate of how DRA changes as a player ages (you have to use the "delta" approach of course to avoid selective sampling), since you can use so much data. My research with limited data points suggests that SS, 2B and CF lose 2 runs per year of age almost from the getgo (an early age), sort of like a player's triple's age curve, and that all other positions other than 1B lose 1 run per year (also from the getgo). 1B'man appears to get better with age (1 or 2 runs per year), peak around age 30 something (I forget exactly), and then decline, but there could much sample error in this assessment.
Let me try to address your last post in detail.
AED: Michael, the fact that few publish their systems with enough detail to really evaluate them (you and me included) makes it hard to tell. Personally, I can't really imagine what else these systems would be doing except for some sort of corrections to the stats from regression. But who knows, maybe we're the first!
MAH: Though I haven't provided all of the details of DRA, I have provided the basic "idea" behind DRA--which I have never seen in print or on-line before--as well as several key insights behind DRA (infield fly outs for pitchers). In addition, this article is, as far as I know, the first direct comparison of non-zone ratings with zone ratings at all positions (Mike Emeigh's article only covered results at short.) In that sense at least, I've provided more "detail" than *any* other non-zone system has ever provided (including Win Shares, DFT, you name it), enough for people to evaluate the basic plausibility of the system.
I think everyone here is completely in the dark about what your system is. You say that your rating system "actually sound[s] very much like" DRA, and that it is "also based on season stats rather than play-by-play data," but you don't really come out and say what, if any, role regression analysis has in your system. I don't understand your comment about how "these systems" (yours? mine? somebody else's?) could be a "correction to the stats from regression." There *aren't* any fielding "stats from regression" out there to "correct".
AED: "Perhaps I wasn't explicit, but I was comparing the per-season stats rather than the 3-year average, so the sample size is much larger (N=36). I have gone ahead and done the comparison with all infield positions. For the single-season infield stats (N=124), my correlation with UZR is 0.657 and yours is 0.613. So I think the shortstop numbers I initially checked (I had them handy because I had recently checked against the Jeter data Mike Emeigh wrote up a while back) showed a higher correlation with my rating than is representative, but there is still a difference between the two and I still find that we correlate with each other (0.798) way better than either of us correlates with UZR."
MAH: If anyone knowledgable about sabermetrics had achieved such results, it seems very surprising that they wouldn't have realized just how how much better such results were than Win Shares or DFT, and rushed to publish at least a summary description and results. (After all, Bill James and Clay Davenport make a lot of money selling their books.) Or already sold it to a team.
When you say (in your prior post) that you've haven't used PBP data, do you mean that you haven't used zone data, or that you haven't used Retrosheet data? As I mention in the discussion of third base ratings, there is Retrosheet (non-zone) data that I could have used that would probably have significantly increased the accuracy of DRA, particularly at third and right field. I didn't because I wanted to use only stats available throughout the history of baseball.
In addition, I might have tried further adjustments to team DRA ratings at all of the positions to *force* them to add up to team runs allowed (as Bill James does with his Runs Created estimates in Win Shares), but I wanted to stay true to the principle of no subjective weights or factors, beyond the disclosed assumptions, or, as David Smyth calls them, "practical approximations".
Are your ratings denominated in *runs*? Or is it a "rate" stat? If I had focused just on providing a better "rate" or "plays made" estimate *without* having to make the system yield run-weights through a global regression of runs allowed onto pitcher and fielder plays made, I already know of another method (also not used before) to accomplish this. Although I haven't tested that approach, I'm pretty sure that it would provide a rate stat with even more accuracy--as measured by correlation. But the point of DRA was to provide run estimates at each position that add up to a team runs allowed estimate. Does your system do that?
AED: The DM'd UZRs are a nice concept, but setting the "wrong" UZR values equal to your rating undermines its value as a comparison for your rating system. Since those values are set by definition to equal your own rating, they give a spuriously high correlation.
MAH: The purpose of the UZR/DRA/Diamond Mind comparison was to provide readers with the best information available with which to assess DRA. There were a few UZR ratings that seemed "off" without any reference to DRA ratings. Given that, it made sense to look at the results of another good system to resolve differences between UZR and DRA. When Diamond Mind supported UZR, the "DM'd" UZR rating is left entirely alone. When Diamond Mind supports DRA, I'm not just setting UZR equal to my own rating--I'm effectively setting UZR equal to *Diamond Mind's* rating.
There was a part of the article left out by the editors in which I explained all of this. The key sentence was as follows:
"I’m just consulting one well thought-out, empirically supported, publicly available set of defensive evaluations, the content of which I must accept on faith (DM), in order to evaluate surprising results under *another* well thought-out, empirically supported, publicly available set of defensive ratings, the content of which I must accept on faith (UZR)."
AED: I agree that the scale and "adding up" properties are both important. But regardless, UZR is a sufficiently superior system that I never thought there would be much interest in "obsolete" systems using only traditional fielding stats. (You've proven me wrong in that regard!)
MAH: Thanks for acknowledging the "scale" and "adding up" points. I agree that UZR is a terrific system, but there are a lot of baseball fans who buy books (Win Shares/Abstract) because they'd like to know about pre-zone rating fielders. In addition, I think the article demonstrates that DRA is a good "back of the envelope" system for highlighting certain UZR ratings that are worth a second look.
Again, I'm simply baffled that someone who would go to the trouble of developing such a seemingly accurate system as yours would (a) not have determined (before the DRA article came out) how much more accurate it is than other non-zone systems, and (b) have no idea that it would be of any interest to fans. That, and the fact that (a) your "regression" point above doesn't make sense, (b) you're still an anonymous poster, (c) you haven't begun to describe even the basic approach of your system and (d) you haven't provided your ratings very much makes me wonder. As I mention in the Introduction to the article, one of the key principles of DRA is that "everything has to add up." Something doesn't add up here.
You're right. It was Dwight's longevity that brought his overall rating down a little. Here are his 130+ game season ratings:
+13, [gap for 1977], +11, +18, +4, +15, -5, [gap for 1983--was he hurt?] -7, -2, -4.
Yes, one of the things I'm happy with is that DRA yields very few weird single-season results. When the Appendix of 1974-2001 single-season ratings is posted, you'll see further evidence of this.
Jason,
Haven't had the opportunity to run DRA on minor league data and test predictability of major league performance. I think it will work, because minor league BABIP is fairly consistent (though still lower than major league BABIP), and because hitting can be projected to some degree.
Thanks! It has been a lot of work, and appreciation from leading analysts such as you is much appreciated in return. As I hope I’ve made clear, DRA will never displace UZR—we need UZR to capture the many factors that can’t be tracked using traditional data.
Going through your many excellent questions—
(1) Baserunner advancement value. I think what UZR does is track the singles, doubles and triples “allowed” by a fielder in his zones, as well as the outs recorded by a fielder in his zones. I may be misunderstanding the UZR method, but in translating hits allowed/out created into *runs*, the *average* value of a single / double / triple / out *anywhere* in the field is used. What I was suggesting is that the value of a single hit through the hole on the right side of the field is probably slightly more damaging to a defense than a than a single hit through the hole on the left side, because of the greater likelihood that a runner can go from first to third. The only reason this issue even occurred to me is that the average value of a play made on the right side of the field has a slightly (just slightly) higher regression weight than a play made on the left. Similarly, the average DRA value of an out recorded in centerfield might be less than in the corners because baserunners are more likely to advance. To consider an extreme situation, a very deep high fly to centerfield in a zone that a good fielder might reach but a poor fielder might not would cause baserunners to “hold back” in anticipation of the ball being caught (thus decreasing the *negative* impact if the ball “falls in”), but also be able to tag up (and advance) if the ball *is* caught (thus decreasing the *positive* impact if the ball is caught). The “spread” of impact per potential play—the “stakes”, as it were--might be lower, thus explaining why regression weights are lower in center than in the corners.
(2) How DRA can complement UZR ratings. In the absence of reliable non-zone systems, it is difficult to assess very high or low UZR ratings. DRA provides a simple alternative rating. If the two ratings differ, it might suggest that it’s worth double-checking how the many factors tracked under UZR impacted the rating. Maybe, after such a second look, the park factor, or the interaction between outfielders (shared zones), or something else might be worth adjusting. Or it might be the case that the UZR rating, after such close second look, appears sound. In which case, the DRA rating is probably incorrect, as it is based on limited data.
(3) How DRA works. Maybe the way to think about it is that defense is the mirror (a very imperfect mirror) of offense. Regression analysis provides good estimates of the value of “positive” events (singles, doubles (well . . .), triples, home runs etc.). DRA provides estimates of the value of “negative” events—each play made is a hit prevented. DRA uses regression analysis to find how things such as left-handed pitching, GB/FB pitching, baserunners, etc., impact plays made at each position, after an adjustment is already made for BIP. DRA then regresses runs-allowed onto *all* context-adjusted plays made (including pitcher “plays made” such as SO, BB, HR, estimated infield fly outs) to find the average value, in runs, of each such context-adjusted play made. Now there are a number of techniques not described in the article that make the system work, which may explain why it’s hard to see exactly how the system can be so precise. But all of these techniques are also simple and theoretically sound. I *greatly* appreciate that readers have been willing to consider the merits of the system without the benefit of all of the details. If I ever write the book (and I do think that is much more likely to happen than my doing something with a major league team), I’m sure that people will be pleasantly surprised by the precise mechanics.
(4) UZR ratings. As I mention in the article, the fact that RF and third base UZR ratings had the lowest year to year correlations might just be the result of the small sample. I agree that outfield ratings are probably the hardest to get right, because there are many more outfield plays that could be made by two outfielders. UZR infield ratings are good, and are in most cases better at third base in the 1999-2001 study than DRA ratings.
(5) Specific ratings points. DRA doesn’t capture the ability of first basemen to prevent throwing errors, as mentioned in the survey of historical first base ratings.
Maybe we should talk about the need to track errors and plays not made separately. Tango’s run weight for getting on base on an error is only .02 runs greater than a single, so I have a hard time seeing the need to track errors if you have a complete UZR record of plays made and total opportunities. Errors (except at pitcher and right field) had no statistically significant relationship with runs allowed (per the global DRA regression) *after* taking into account context-adjusted plays made, so they are *not* included in ratings. This makes sense, to me at least, because errors are *already* accounted for in terms of reduced context-adjusted plays made. Errors are, for the most part, a duplicative stat. However, outfield throwing errors are probably worse than not making the throw at all.
I do use the DP portion of UZR for infielders. I didn’t use outfielder arm ratings in the 1999-2001 survey because I thought (though I never checked it out) that they would likely add too much “noise”, as the ratio of standard deviation to mean for outfield assists is enormous compared with outfield putouts and infield assists. Outfielder arm ratings are included in the historical ratings, and don’t appear to make much difference, except for Barfield.
I didn’t have your catcher ratings in the UZR files forwarded to me from Tango. DRA catcher ratings shown in the article are based only on context-adjusted assists, stolen bases allowed, and WP/PB/BK. They don’t pick up anything that isn’t already better measured under UZR and Tango’s system. I just wanted to show how accurate ratings could potentially be for pre-1970 ratings.
(6) Effect of age on DRA ratings. When the Appendix is released, it will show the single season ratings for players in 1974-2001. Whether that sample of players will be large enough to do even a quick and dirty aging analysis, I don’t know. At a glance, however, you can see that fielders decline with age, but a fair number of infielders seem to maintain their value, and, yes, more than a few first baseman seem to get better with age, though that might be because they get too lazy and make more Buckner elections.
Many thanks again.
MGL: Essentially, MAH is saying to use the linear weight value by zone. That is, what's the change in run expectancy on 1b/2b to 9-deep, or 7-short, or 78M, etc, etc. We obviously know that an IF single is worth far less than an OF single. Well, how about a LF single compared to a RF single?
It's kind of trivial to figure out, if you know how often a runner goes from 1b to 3b on a single. For example, it's about 30-35% of the time overall. I'll guess it's 20% to LF and 50% to RF. So, a single to LF is worth:
-.15 less than average times
30% of the time that a runner is on 1B
or -.04 runs less than average.
So, if an OF single is worth .50 runs, then a LF single is worth .45 runs.
For an IF single, it would be:
-.30 less than average (i.e., no runner goes to 3b on an IF single) times
30% of the time that a runner is on 1b
or -.09 runs less than average.
So, an IF single is worth .41 runs.
In any case, going back to the OF, if we are talking about, what 100 singles to LF, then we would be off by about 4 or 5 runs by using an average run value for the OF, as opposed to a LF.
You would gain further accuracy by figuring out where in LF the ball was hit, and what the lwt run value is.
The reason why it is helpful to separate "errors" UZR and "range" UZR is that one, it gives a person a better notion of a fielder's defense, e.g., "he has good range but unstseady hands," etc. Two, if you want to incorporate or augment a UZR rating, with, say, a subjective evaluation, like from DM, it is also helpful to know what agree with what. In other words, if DM says that Ordonez only has slightly above average range at SS, and Ordonez UZR is +30 runs, it might seem as if UZR is wrong. If, however, those +30 runs are +20 in "errors" (he made very few errors) and +10 in range, then the +30 sounds more reasonable. Finally, it is possible that errors and range have a different skill/luck component, such that in taking a sample UZR (or DRA) and translating it into a UZR projection or estimate of true defensive talent, one might need to use a different regression coefficient for the error component than for the range component of UZR (or whatever metric). My suspicion is that there is more of a "luck" component in a fielder's error rate, thus a "higher" regression. An anecdote illustrating this notion is R. Orodonez' UZR jumping all over the place (his error rate is the one jumping - his range UZR is staying fairly constant).
Last question. Do you think that deriving the linear coefficients in your as yet unseen formula (I assume it is similar to Palmer's offensive linear weights formula, with more terms) from empirical data or simulations would yield more accurate results? In fact, isn't that usually the case when there are some cross correlations in the variables? For example, Plamer supposedly used a simulation to come up with his coefficients (values of each of the offensive events). Jarvis uses a regression analysis, and others, including myself use an emprical analysis, using the RE tables to calculate average changes in state. Others use a Markov chain model. It appears to me that the regression analysis, like that of Jarvis yields the least accurate coefficients for variosu reasons, one of which is the problem of cross-correlation and non-linearity of the regression lines, as you alluded to in your first installment, I think...
Right now all I have to evaluate you on are Defensive Win Shares, and Ozzie beats you at short, at least on a career basis. But you were without any doubt one of the top five players of all time, and I agree that you certainly outshone your peers more than Mike (Schmidt) did, and would have acquitted yourself quite well today if you had the same lifelong dietary, medical and conditioning advantages that today's players do.
Depot,
Again, the part of the article that explained why I consulted Diamond Mind was deleted by the editor. In brief, it's the old idea of "two heads are better than one", and Diamond Mind is, I think, probably the best evaluator of fielding excluding UZR. As I mentioned in the deleted part of the article (and repeated to a prior poster):
"I’m just consulting one well thought-out, empirically supported, publicly available set of defensive evaluations, the content of which I must accept on faith (DM), in order to evaluate surprising results under *another* well thought-out, empirically supported, publicly available set of defensive ratings, the content of which I must accept on faith (UZR)."
Tango,
Thanks for the comment. I agree that weighting hits by zone won't make a huge difference, but the idea is a good example of how DRA can yield new insights that can improve zone ratings.
MGL,
I've posted my e-mail address above. Unfortunately, I won't have time to do more catcher ratings anytime soon. Take a look at the I-Rod / Piazza comparison and let me know what you think.
The regression coefficients under DRA could be subject to slight distortion from cross-correlation, but as explained in the second installment, the adjustments I've made have reduced cross-correlations to very low levels--never more than .2 (except for one stat not impacting ratings) and usually .1. Unadjusted pitching and fielding stats often had correlations above .6. The standard errors in the run weight regression coefficients were all less than .03.
The format of the DRA equations for infielders is the following:
[run weight] * [Assist +/- BIP adjustment +/- (regression weight)* LHP adjustment +/- (regression weight)*GB/FB adjustment +/- (regression weight)*baserunner variable adjustment]. The resulting rating is centered to the league average to yield runs saved/allowed relative to the league average. The LHP, GB/FB and baserunner variables all come from publicly available data and involve simple add/subtract/multiply and divide arithmetic formulas
For outfielders:
[run weight] * [Putouts +/- BIP adjustment +/- (regression weight)*LHP adjustment +/- (regression weight)*GB/FB adjustment] + [run weight] * [Assists]. The resulting ratings for putouts and assists are each centered to the league average to yield runs saved/allowed relative to the league average.
DRA uses regression analysis precisely because change-in-s