Defensive Regression Analysis
Michael introduces his new method for evaluating defensive play. Part 1 of 3.
"As late as June 4, 2002, . . . there were still big questions about baseball crying out for answers; a baseball diamond was still a field of ignorance. . . . No one had established to the satisfaction of baseball intellectuals exactly which part of defense was pitching and which fielding, and no one could say exactly how important fielding was. No one had solved the problem of fielding statistics." Michael Lewis, Moneyball, p. 98 (W. W. Norton & Co. 2003).
Thanks to the efforts of many people, a solution to the problem of estimating the run-impact of fielding using traditional fielding and pitching statistics?and evaluating the run-impact of pitching independent of fielding?may be within reach. In this article, I would like to introduce a new system that provides runs-saved estimates for pitchers and fielders using traditional statistics publicly available throughout the history of baseball?in other words, without recourse to non-public "zone" data, which has been compiled since the late 1980s.
The system is called Defensive Regression Analysis ("DRA"). (The acronym conveniently rhymes with ERA; fans unfamiliar with regression analysis can think of it as Defensive Runs Analysis. I?ll provide a simple description of regression analysis in Part I below.) To the best of my knowledge, DRA is the first defensive system that
(i)	deals with pitching and fielding simultaneously, i.e., on a fully integrated basis,
(ii) 	does not rely on any subjective weights or factors,
(iii) 	uses only publicly available statistics in existence throughout the history of baseball (e.g., no Caught Stealing or Sacrifice Hits Allowed), and
(iv) 	estimates, using simple “linear weights” equations (similar in form to Linear Weights equations used for evaluating hitters in the official baseball encyclopedia, Total Baseball), the runs saved (allowed) by pitchers and fielders (a) relative to the league average and (b) independently of each other.
A team’s pitcher and fielder DRA ratings add up to the team DRA rating, i.e., an estimate of the number of runs the team should have allowed. The standard error of such estimate in the 1974-2001 study used to create DRA, 19.7 runs, is less than the standard error for regression analysis models for team runs scored, as well as those of all other systems for evaluating offense described in the recent book co-authored by two former Chairs of the Sports Section of the American Statistical Association. See Albert and Bennett, Curve Ball: Baseball, Statistics, and the Role of Chance in the Game (Copernicus Books 2001) (hereinafter, "Curve Ball"). In other words, the "parts" under DRA "add up" as well or better than the "parts" under Linear Weights or Runs Created or any other system for batting and baserunning of which I am aware.
What is probably most exciting about DRA is that it provides individual fielder runs-saved estimates that match up well with runs-saved ratings derived from proprietary zone data, such as Mitchell Lichtman?s Ultimate Zone Rating ("UZR") system (available at baseballprimer.com) and zone-based evaluations posted by Diamond Mind ("DM") on its website. (Disclaimer: I have neither sought nor received any endorsement or any other support from Diamond Mind. The DM evaluations that I cite are all publicly available on its webpage. My interpretations of DM evaluations are entirely my own.) DRA ratings have the same "scale" as UZR ratings, and an over 0.8 correlation with UZR ratings, when the latter are adjusted to incorporate DM evaluations.
Zone-type ratings, as you?re probably aware if you?re reading this, are based on highly detailed records of the actual number of balls hit into each of approximately 80 "zones" on the field. Zone data began to be collected in the 1980s because of the fundamental problem with traditional fielding statistics: they provide no direct record of fielding opportunities. Judging fielders based on gross plays made (e.g., the total number of fly balls caught by an outfielder) was therefore no more reliable (actually far less reliable) than judging batters by the gross number of their hits and walks. Zone ratings essentially compare the number of plays made by the fielder compared to what the league-average fielder would make, given the same number and pattern of balls hit into his zones. UZR converts the "plays made" numbers into runs saved, based on the change in expected runs allowed if a given play is made or not made; i.e., depending on whether the play made, on average, prevents a single or extra-base hit.
Provided in Part II are position-by-position charts of DRA and UZR runs-saved ratings for all 82 players who played at least two seasons full-time (130 or more games) at one position during the three-year time period (1999-2001) for which I had access to both UZR and DRA data. I?m not aware that any other non-zone fielder rating system has ever been compared with zone ratings in as systematic a way.
Although you will have to be the ultimate judge, my sense is that the average rating per player over that time period under DRA essentially matches up with the UZR rating and/or DM evaluation close to 95% of the time. In other words:
Given two or three years of publicly available data, DRA provides a reliable estimate of whether a fielder is basically average (+/- half a dozen runs per season), meaningfully above or below average (+/- a dozen runs a season), or exceptionally above or below average (+/- two dozen runs per season).
I am not promising exact matches. For example, DRA rates Nomar Garciaparra as a +5 (runs saved) shortstop per season in 1999-2000, whereas UZR rates him at ?6. Given the imperfection in statistics, including UZR data, as well as the practical significance of runs-saved numbers of that relatively modest magnitude, I view that as an acceptable match, meaning that both systems identify Nomar as an essentially average-fielding shortstop. The relevant findings under DRA (and supported by UZR and DM) would include, for example, that Derek Jeter is costing the Yankees enough runs to justify moving him to third base, and that Pokey Reese was saving his teams so many runs at second base (at least in 1999-2000) that it made sense to let him play, even though he is weak hitter.
I think it fair to say that out of the 35 players with three years of data, there is only one clear and significant error, and out of the 47 players with only two years of data, there are only four clear errors. In addition, the errors are "conservative". By that I mean there are no false "positives" or "negatives" in the study; DRA might fail to identify a good fielder (in the study, it did not fail to identify any bad fielder), but it doesn?t rate fielders in the study as meaningfully good or bad who aren?t good or bad. Though DRA might be slightly more conservative than UZR, the runs saved (allowed) ratings under DRA nevertheless have approximately the same "scale" of impact as UZR ratings: the highest DRA ratings are as high as (and no higher than) the highest UZR ratings; the lowest DRA ratings are as low as (and no lower than) the lowest UZR ratings.
One simple way of quantifying the overall match between DRA and UZR/DM is to "regress" the UZR average ratings per player onto the corresponding DRA ratings. Regression analysis reveals that the average DRA rating of players in the study has, on average, almost precisely the same "scale" as the average UZR rating (the "coefficient" for DRA under the regression equation is nearly 1.0, actually, 0.95), and a correlation of 0.7. When, for reasons explained in Part II.A.4, UZR ratings are adjusted to account for DM commentary, admittedly in a somewhat subjective manner, the correlation improves still further, to slightly over 0.8. (All of the above results are provided in detail at the very end of Part II.)
Unlike zone systems, DRA can provide ratings for players throughout the history of baseball. Part III (and the Appendix to Part III posted as an Excel file) provides DRA ratings of all players who played at least 130 games at a single position for at least five seasons anytime between 1974 and 2001, the time period for which I had convenient access to the relevant data. I hope you will spend some time looking over these ratings, as they demonstrate even better than the three-year DRA-UZR-DM comparison the basic reliability of the system. In the majority of cases, players peak in their youth. The year-to-year ratings show remarkable consistency, including when players change teams. DRA ratings drop, sometimes sharply, after players are injured. There are almost no weird run-saving values?historically high assists and putout totals do not result in absurdly high ratings. Great as he was, and notwithstanding his historic assists totals, Ozzie Smith was never "saving" 50 runs a season, even at his peak?20 runs a year was more like it. What made Ozzie probably the greatest shortstop in history (and, therefore, probably the greatest fielder in history) is that he consistently performed at or near that level for about ten years, before declining to an average level of performance.
Although I will provide a description of the principles and methodology of DRA, as well as the results of several diagnostic tests, the "linear weights" equations per position are not provided. (The format of the equations, however, is shown in Part I.A.) I am currently approaching several major league organizations regarding DRA, which can serve as a simple tool for double-checking zone ratings (which, due to their computational complexity, have something of a "black box" quality) and for evaluating minor league fielders (for whom zone data is unavailable). When I say that DRA is a simple tool, I mean that it is far, far simpler than any other pitching and fielding system (zone or non-zone) that has been described in print or a public Internet forum, either fully or in general terms. All of the equations can fit on one page, and although most elements of the equations are completely novel, they would make immediate, intuitive sense to any serious baseball fan after a brief explanation.
I can appreciate that it is somewhat difficult to place much faith in a system when not all of the details about it are available, although most fans seem to feel comfortable accepting UZR and DM ratings without having access to the data and all of the calculations under those systems. In the case of UZR, Mitchell Lichtman has done an excellent job of explaining the core "concept" behind zone ratings and the factors that must be considered in using zone data intelligently. Fans, therefore, feel comfortable "buying into" the system, even though the underlying data is proprietary. Tom Tippett at Diamond Mind is also very clear about the factors he considers in evaluating fielders although, again, his data is not publicly available. My hope here is that
(i) 	the description of the general principles and methodology of DRA will reassure you that the core concepts of the system are sound and that great care has been taken in making the system work,
(ii)	the results under DRA, especially how the ratings in the 1999-2001 data set compare with UZR and DM, will instill confidence in the system and pique interest in learning more about it, and
(iii)	the imperfection of the output will reassure the skeptics among you that the ratings were not just cooked up.
Should it be the case that DRA is of more interest to fans than major league organizations, I would welcome the opportunity to publish the DRA equations in a book that rates the greatest fielders throughout major league history.
Were I to reveal the DRA equations here, this article could be condensed to about ten pages, and anyone who reads Bill James, Pete Palmer, Dick Cramer or TangoTiger would "get it" and endorse it immediately. However, I?m trying to make the case for DRA indirectly, so I have to bring more subtle and lengthier arguments to bear. As I?m using statistical techniques and approaches never tried before, some amount of background information about them is in order. Most importantly, I?m ultimately relying on the output of the system to make the case for the system, and discussing that output is a lengthy task.
* * *
Before we go any further, I?d like to thank several people.
Dick Cramer reviewed several earlier versions of DRA, and was the first person to appreciate the underlying logic of the method. Without his encouragement, I doubt I would have had the patience to keep working on DRA. In addition, Dick?s research supports certain key assumptions under DRA. Finally, as a founder of STATS, Inc., Dick made possible the compilation of the data against which DRA could be tested. I?d like to thank Neal Traven at SABR for forwarding my first article about DRA to Dick.
Chris Dial, Mike Emeigh, Mitchell Lichtman ("MGL"), Charlie Saeger, and Tom ("TangoTiger") at baseballprimer.com have all provided key information and insights. Charlie developed years before anyone else many fundamental ideas for fielding evaluation. Throughout this article, I frequently use Charlie?s terminology of "context-adjusted" fielding plays, although our systems are different. MGL has provided an invaluable service to the sabermetric community by publishing the results and basic methodology of his UZR system. Mike Emeigh?s fielding articles helped me see a way of improving UZR, and MGL graciously accepted the suggested changes, which Chris and TangoTiger had previously suggested as well. Although there will be times when I will point out certain UZR ratings that might still be a little "off", UZR is fundamentally reliable, DRA isn?t perfect either, and without the work and insights of Chris, Mike, MGL and Tango, there wouldn?t be a system against which I could test DRA. Tango also provided team-level UZR data that helped me make the most recent and very significant improvement in DRA. I?d also like to thank whoever is or was responsible for putting together the baseball-reference.com website, which is extremely well-designed.
I owe an enormous debt to all of the people who have contributed to Retrosheet, as well as to John Jarvis, both for his creative articles and for putting together team-level Retrosheet data in an easy-to-use format on his website, which was the primary source of data for DRA. As will be explained in Part IV.C, DRA would not have been possible without Retrosheet data, but can nevertheless be applied to seasons in baseball history for which we as yet do not have such data.
Pete Palmer is the class act of sabermetrics, and has been an invaluable help to me in understanding the quality and scope of statistics throughout baseball history.
Steve Pappaterra is a great and good friend who first introduced me to the work of Bill James twenty years ago.
Bill James wrote me a kind and encouraging e-mail when I sent to him an over-long and under-organized letter in which I tried to describe an early version of the DRA method. Although DRA differs significantly from Bill?s defensive Win Shares system, I was inspired to work on this project after reading his Win Shares book. Moreover, I would never have been able to develop DRA had I not learned from Bill that ". . . fielding statistics make vastly more sense if you look at them from the top down than they do if you look at them from the bottom up. * * * To make sense of fielding statistics, sometimes you have to start with what the team accomplished, then ask how they accomplished that, and only then work toward the question of which player gets credit for that success." Win Shares, p. 11. DRA is founded upon the basic principle?introduced in Win Shares?that everything has to add up.
Posted: November 04, 2003 at 06:00 AM | 17 comment(s)
Login to Bookmark