— Where BTF's Members Investigate the Grand Old Game
Tuesday, November 11, 2003
Defensive Regression Analysis - Part 2
Michael explains the methodology behind DRA, step-by-step.
PRINCIPLES AND BASIC METHODOLOGY OF DRA
A.	A "Bare-Bones" Summary
DRA is, as far as I know, the first system to use regression analysis to evaluate fielding and pitching, or at least the first system to use regression analysis so comprehensively. For the sake of brevity, I?m going to provide a two-page summary of the DRA "process", with particular focus on one position (shortstop), and rely on some technical terminology that for some of the readers I?m trying to reach might be a bit daunting; I?ll explain the technical terminology Part I.B immediately below. Under DRA, we
(i)	Collect team-level defensive data of a sufficient sample size
(e.g., the runs allowed, strikeouts generated, walks allowed, shortstop assists, etc, per team; DRA was developed using team-level data for all teams in 1974-2001);
(ii)	"adjust" each defensive variable for each team (strikeouts, shortstop assists, etc.) using simple arithmetic formulas, so that each such event is at least arithmetically independent of the others
(e.g., determine the number of (a) strikeouts generated by the team?s pitchers, taking into account the number of batters facing the team?s pitchers ("BFP"), not innings pitched, thus yielding "adjusted SO" ("aSO"), (b) assists by the team?s shortstops, taking into account the number of batted balls in play allowed by the team?s pitchers ("BIP"), not innings played, thus yielding "adjusted" A6 ("aA6"), and (c) the number of shortstop errors, taking into account the number of chances, or "aE6");
(iii)	"regress" the arithmetically adjusted team-level fielding plays data (e.g., arithmetically "adjusted" assists at short, or "aA6") in the sample "onto" all the contextual variables derivable from publicly available data that we think might have an impact on the distribution of BIP, the opportunity to record assists on double play pivots, and the positioning of shortstops for purposes of fielding BIP
(e.g., regress aA6 "onto" pitcher variables, such as variables measuring BIP against left-handed pitching and a variable estimating the relative tendency of the team?s pitchers to generate ground balls or fly balls, but also variables such as an estimate for runners on first, which impact positioning and double play opportunities);
(iv)	eliminate the variables that do not have a statistically significant impact on the number of assists made at shortstop, and re-run the regression;
(v)	treat the "residuals" of the final, stripped-down, regression analysis result as the "skill" plays made at shortstop, i.e., the regression-"adjusted" aA6, or "aaA6"
(e.g., if the regression result indicates that each marginal BIP against left-handed pitchers increases aA6 by 0.0150, and the team had 1000 "excess" BIP allowed by left-handed pitchers, we decrease the team?s aA6 by 15 to reveal the shortstop plays not "given" to the shortstops by a high level of left-handed pitching, in order to provide a better estimate of fully context-adjusted shortstop assists, or aaA6);
(vi)	repeat steps (ii) through (v) for plays-made data at all of the other fielding positions (including pitcher and catcher), thus yielding the "context-adjusted" plays made at all of the positions (e.g., aaA1, aaA2, aaPO7, aaA7, etc.);
(vii)	"regress" team-level runs-allowed data "onto" team-level context-adjusted pitching data (e.g., aSO, aBB, aHR, aWP, etc.) and the team-level context-adjusted plays made in the field determined in steps (ii) through (vi) (e.g., aaA6, aE6, aaPO7, aE7, etc.);
(viii)	eliminate all fully context-adjusted variables that do not have a statistically significant impact on runs allowed (e.g, all errors, except at pitcher and right field) and re-run the regression; in order to
(ix)	determine the average value, in runs, of each context-adjusted play made
(e.g., the average runs-saved value of each aSO, aHR, aaA6, etc.).
A team?s defensive rating at a position is derived from the number of context-adjusted plays made at such position (determined in part through regression analysis), e.g., aaA6, multiplied by the average run-value of each such play (determined through regression analysis).
The formulas for each position (including pitcher) thus resemble (and are no more complicated than) Pete Palmer?s "Linear Weights" equation for batters found in Total Baseball. For example:
aaA6 = A6 (adjusted arithmetically for BIP) +/- A*Factor A +/- B*Factor B
+/- C*Factor C, etc.
(where the A, B and C "weights" are derived from regression analysis and contextual Factors A, B and C are derived from publicly available information).
The rating at shortstop equals aaA6, multiplied by the regression-calculated weight in runs for each aaA6, adjusted so that the rating is +/- the league average runs-saved rating at shortstop.
In the case of pitchers, the formulas work for their individual stats, with one exception described in Part I.D.1. In the case of fielders, an individual?s rating at such position is based upon the team rating at that position, pro-rated for his innings fielded, and adjusted up or down for his rate of plays made relative to the team rate of plays made at that position.
All of the ratings add up to the DRA estimate of runs allowed by the team. As mentioned in the Introduction, the standard error of such estimate in the 1974-2001 sample is less than that for any system for rating team offense of which I am aware.
B.	A Little Bit of Theory
A (relatively) brief discussion of the basic theory behind regression analysis and how regression analysis has been applied to evaluate run scoring may help explain why regression analysis has never been tried before to evaluate run prevention.
Multi-variable linear regression analysis ("regression analysis") is a statistical tool for determining the marginal impact each variable in a multi-variable model has on the ultimate outcome being modeled. As explained in the book Curve Ball, regression analysis can be used to estimate the impact of each type of batting and baserunning event on the total number of runs a team scores.
For example, if you provide a computer with a sufficiently large sample of rows of historical annual team-level data consisting of at-bats minus hits, walks, singles, doubles, triples, home runs, stolen bases, etc. (collectively, the variables), as well as the actual number of runs each team scored that season (the ultimate outcome being modeled), and (politely) "ask" the computer to "regress" team runs scored "onto" the variables, the computer can perform a regression analysis that will estimate the marginal increase or decrease in team runs scored that is associated with (loosely speaking, statistically correlated with) each variable. Regressions usually show that for each additional home run hit, a team will, on average, score an additional 1.5 runs, assuming all other variables are held constant. For each two additional home runs, a team will score, on average, an additional 3.0 runs, and so forth.
As explained in Curve Ball, the run-values per offensive event have been verified empirically by "change-in-state" models developed by George Lindsay and Pete Palmer. "Change-in-state" models analyze the observed changes in expected runs scored before and after each offensive event (e.g., a home run) in large numbers of actual baseball game situations. See Curve Ball, pp. 178-205.
For regression analysis to "work" (i) each variable must have a reasonably straight-line relationship to the ultimate outcome and (ii) each variable must be reasonably independent of the other variables. The first assumption enables us to say not only that if a team hits an additional 2 home runs it should score an additional 3 runs, but also that if a team hits 20 more home runs than average it should be expected to score, all other variables being equal, 30 more runs. The second assumption is necessary for the technique to reveal the independent marginal impact of each variable?if the variables are correlated with each other, the computer can?t calculates the marginal, independent impact of the variable, because the computer can?t "hold" all the other variables constant while it?s "calculating" the marginal impact of the variable under consideration?they?re "moving" with (or against) the relevant variable.
Although the process of run-scoring is not linear, an individual player?s contribution to the number of runs his team scores?and the marginal impact of each element of a team?s offense?is approximately and reasonably linear over the range of typical major league run-scoring scenarios over the course of a season. That is why Pete Palmer?s Linear Weights equations work so well for batters. The latest version of Bill James? Runs Created is essentially linear. See Curve Ball, pp. 230-41. In addition, the impact in terms of team wins of a given player?s run creation is reasonably linear. "Within the range where the teams are clustered, a linear representation of value works perfectly well?exactly as Pete Palmer has always insisted that it did." Win Shares, p. 108.
Similarly, the process of allowing runs is approximately and reasonably linear. TangoTiger has done interesting work on how extremely good individual pitchers should have a disproportionate, non-linear impact on the number of runs their teams allow. I agree this is true to some extent. Pitchers (unlike batters) "create" their own run "context", and their skills "interact" with each other and with fielders. If The Big Unit strikes out twice as many batters as a typical pitcher, the baserunners he "takes away" significantly reduce the expected impact of whatever singles his fielders might "allow". I performed a DRA analysis of the most dominant starting pitcher in history, inning-per-inning: Pedro Martinez. The estimated runs saved under DRA by Pedro and his fielders (who, over the course of his career, were very slightly better than average) was indeed about 5% less than the number of runs Pedro actually "saved", as measured by runs allowed by his teams while he was pitching. Even assuming there is some non-linearity for extreme pitchers, the effect is fairly modest, and fielding has a linear impact, as team-level pitching staff quality is much less likely to have the effect Tango has described. (For those of you who are familiar with regression analysis, the "residuals" generated under the various regressions used in DRA did not reveal any non-linearity.)
There is, however, one fundamental difference between offensive and defensive statistics: offensive statistics are generally not significantly correlated, whereas defensive statistics are highly, and by definition, cross-correlated.
Just because a team draws a lot of walks does not necessarily "prevent" or "cause" it to hit a lot of singles, or even a lot of homeruns. Each event is thus reasonably independent. Therefore, regression analysis of offensive statistics has long provided reasonable run-value estimates for various discrete offensive events. Doubles and homeruns show some minor cross-correlation (teams that hit more doubles tend also to hit more homeruns), which probably explains why the weights for doubles and home runs are slightly different under regression analysis and Lindsay-Palmer "change-in-state" models. See Curve Ball, p. 203 (home run weight of 1.4 under Palmer methods). But, in general, a simple regression analysis of batting and baserunning data works. When you plug the regression-generated run-weight values to the actual offensive data per team, you obtain an estimate of the number of runs the team should have scored. The "error" in such estimate, that is, the "residual" not explained by such procedure, can be quite low (in a 1954-1999 regression analysis, the standard error was 0.142 runs per game, or 23 runs per 162-game season). See Curve Ball, pp 178-84.
In contrast, each strikeout by a team?s pitchers literally eliminates an opportunity for the team?s fielders to make a play. Thus shortstop assists and outfielder putouts are by definition negatively correlated with strikeouts. In addition, "raw" fielding play data at each of the positions has extremely strong positive cross-correlations. For example, unadjusted shortstop assists are strongly correlated with second and third base assists, as they are both affected by "ground ball" pitching, so if you ran a simple regression of team runs allowed on all defensive plays, the computer wouldn?t be able to "tell" the marginal impact of the shortstop plays, because in most cases in which the shortstop plays go up, second and third base plays go up as well. This problem probably explains why no one has ever used regression analysis before to evaluate fielding. You can?t just "regress" runs allowed "onto" traditional team defensive statistics, and come up with anything useful.
DRA "untangles" the cross-correlations among all the defensive events measured by traditional pitching and fielding statistics (e.g., strikeouts, infielder assists, outfielder putouts, etc.), both arithmetically and through regression analysis, so that each such "event" is context-adjusted and independent and, therefore, can be reliably associated (through regression analysis, and with statistical significance) to runs, the "money of baseball, the common denominator of everything that occur[s] on the field." Moneyball, p. 131.
We first adjust each defensive variable arithmetically, so that each is no longer (negatively) correlated by definition with any other variable. If you just took strikeouts per inning pitched and shortstop assists per inning played, each would be negatively cross-correlated by definition, as each strikeout takes away an opportunity to record a shortstop assist, given the fixed number of outs in a game/season. Therefore, we put each event into the relevant "context" or "denominator" of opportunities, which differs by position: BFP for SO; BIP for A6. (In contrast, offensive events all share the same "context" or "denominator" of opportunities?outs.) At this point we now have arithmetically "adjusted" variables: e.g. aSO and aA6.
To eliminate the remaining cross-correlations among the variables, we use regression analysis and the fact that baseball defense has a clear direction of causality: pitchers cause more or fewer shortstop chances; shortstops don?t "cause" pitchers to allow more ground balls. We can therefore regress pitcher variables "onto" defensive events to reveal the plays made by fielders at each position not "explained" by pitcher variables. That number is the "residual" of the regression analysis: the unexplained part that reflects, on average, fielder skill. To calculate that "skill" residual per team, we just "back out" the effect of the pitching variables for each such team. If the regression result for the 1974-2001 sample indicates that each marginal BIP against left-handed pitchers on average increases aA6 by 0.0150, and a team has 1000 extra BIP against left-handed pitching, we decrease the team?s aA6 by 15 to reveal the shortstop plays not "given" to the shortstops by a high level of left-handed pitching, to provide a better estimate of fully context-adjusted shortstop assists, or aaA6, that are a better estimate of skill plays made.
Now we can run a "global" regression of runs allowed data "onto" all of the now statistically independent variables: aSO, aBB, aaA6, etc., etc., to reveal the statistically significant run-impact of each marginal event.
When you apply these run weights to the number of context-adjusted plays made by a team at a particular position, you effectively arrive at an estimate of the likely change the players at such position brought about in the number of runs their team should have allowed, given the statistically significant relationships among all of the publicly available statistics of the team and between those statistics and the actual runs allowed by the team. You can then "individuate" the ratings per-player at each position in the manner described above in the "Bare-Bones Summary".
C.	Some Preliminary Results to Indicate We?re On the Right Track
Although the basic "idea" behind DRA, as summarized above, is fairly simple (though, again, entirely new), the devil, as always, is in the details. It has taken me more time than I care to admit figuring out the simplest and most effective ways in which to (i) eliminate the cross-correlations between and among pitching and fielding events and (ii) improve the model?s fit to (a) what we all "know" about various events and (b) UZR ratings and DM evaluations.
Under the latest DRA model, with one minor exception having no impact on fielder ratings, no context-adjusted pitching or fielding event has a cross-correlation with any other context-adjusted pitching or fielding event greater than 0.2. Before such transformations, many such events had cross-correlations in excess of 0.6. As a general rule of thumb, correlations below 0.3 are generally viewed as non-significant.
The resulting run values per context-adjusted event are very much those you would expect. Each marginal walk given up in the 1974-2001 data was associated with giving up an additional 0.34 runs. The Palmer weight for a walk drawn is 0.33. Each homerun was associated with an additional 1.44 runs. The Palmer weight is 1.40 runs. Strikeout values varied according to run environment?approximately .31 runs saved in the high-offense 1990s; closer to .26 runs saved in the low-offense 1970s. Pete has similarly found that out values vary between .30 and .25 in the same way, and MGL has found that outs in the form of strikeouts are just slightly more valuable to a defense than other outs; i.e., outs in the form of strikeouts in the high-offense era have been worth about .31 runs saved rather than .30 runs saved. (DRA fielding ratings take into account?simply and objectively?the changing value of outs over time.) The run-value of a wild pitch/passed ball/balk ("WP") was .37 runs allowed, slightly more than the Palmer weight for a stolen base, but that makes sense, because more than one runner can advance on a WP. The value of a stolen base allowed was not stable?under the current model it appears to be half of the UZR value per stolen base, and well below the Palmer weight. But I?m living with it, and the catcher ratings reflect the lower value. In the catcher ratings section in Part II.B.8 I?ll suggest some reasons to believe that the weight might be correct.
One of the cardinal principles under DRA is that only statistically significant statistics and relationships are used in the model. Non-significant factors are dropped, thus simplifying the model. Errors?except at pitcher and right field?have no statistically significant relationship with runs allowed, after taking into account plays made. This makes sense, to me at least. On average, the errors made at, say, shortstop, are no more damaging than the failure to make a play. A bobbled ball that stays in the infield is less damaging than a ball that finds its way to the outfield. In contrast, an error in a throw to first is sometimes worse than not making the play at all (unless the play is successfully backed up by the catcher or second baseman, the batter can often advance to second). On average, the mix of such errors is essentially run-neutral, after taking into account the fact that the play was not made successfully. Pitchers, of course, can?t position themselves to improve their range, and have limited mobility to make a play after delivering a pitch, so errors become significant. In right field, I suspect the reason errors are statistically significant is that throwing errors on throws to third can directly result in the runner scoring.
This finding has a significant impact on how pitchers should be evaluated. The traditional method of rating pitchers is Earned Runs Average ("ERA"). ERA is a flawed method for rating pitchers, not only because of the "DIPS" issues discussed in Part I.D immediately below, but because ERA focuses on the wrong fielder factor, errors, for purposes of rating pitchers independent of fielders. A pitcher on a poor fielding team might have an "unfairly" high Earned Run Average because the team made relatively few errors but had poor range, while a pitcher on a strong fielding team might have an "unfairly" low Earned Run Average because the team made more errors but had good range. It is true that errors slightly correlate with poor fielding, but they have no statistically significant impact (except at pitcher and right field) once you take into account context-adjusted plays made.
But the problem is even worse than that. Error rates are much higher on ground balls than fly balls. So ground ball pitchers, who help cause errors, benefit by doing so, as the higher number of errors results in a lower Earned Run Average. The opposite applies for fly ball pitchers. Bill James has noted that
". . . [T]he Robin Roberts family of pitchers . . . tend to have lower component ERAs than actual ERAs. The component ERA is a formula which looks at a pitcher?s hits allowed, walks allowed, and home runs allowed, and says, "this is the ERA that these should add up to." It?a good formula, but it?s not a perfect formula, and pitchers such as Roberts, Newcombe, Jenkins and Hunter tend to have higher actual ERAs than component ERAs. Robin Roberts never led his league in ERA, but led his league in component ERA twice."
Roberts was an extreme fly ball pitcher, as were Jenkins and Hunter (I don?t know about Newcombe). Roberts hurt his ERA by giving his fielders easier chances. DRA would redress this problem in rating pitchers.
The DRA-UZR-DM ratings comparison is provided and discussed in Part II.
D.	Assumptions Supported by Other Studies
I stated in the Introduction that DRA has no subjective weights or factors. That?s still true?each context-adjustment and run weight is the result of an objective determinant of opportunities to make plays (e.g., the number of batters facing the team?s pitchers or the number of BIP allowed by the team?s pitchers) or a statistically significant regression result, and only statistically significant variables are present in the model. However, I did make exactly four assumptions, each of which is supported by empirical data outside of the model. The assumptions are based in part on new insights regarding DIPS. Readers who already feel "burdened by too much information" should feel free to skip ahead to the ratings; I?m only providing this background information in case the full methodology is ever made public, and somebody would otherwise feel that they had been misled regarding some of the bolder assertions made here.
1.	Responsibility for Estimated Infield Fly Outs
DRA, partly by default and partly by design, assigns responsibility for estimated infield fly outs to a team?s pitchers.
Infield fly outs are assigned to a team?s pitchers by default because?under the principle that everything has to add up?somebody has to take responsibility for them, and I know of no reliable method for assigning credit for them to fielders.
Put simply, we don?t know with any degree of precision how many infield fly outs are actually caught by each fielder. I know of no reliable method for estimating the number of infield fly outs that are caught by middle infielders, due to the large number of putouts from runners caught stealing and force outs/double plays at second. I know of no method for separating out estimated unassisted ground out plays at first from estimated unassisted putouts at first. Probably most putouts at third are fly outs, but Bill James has found no relationship between third base putouts and fielder skill. At third, as well as other infield positions, it is almost always the case that a fly ball caught by one infielder could have been caught by another, so that even if we knew the actual number of fly balls caught by an infielder, it would largely reflect his tendency to "hog" chances, and not his contribution to team fielding success. For all of these reasons, I believe that UZR and DM do not include infield fly outs in infielder ratings.
The number of infield fly outs caught by a team, however, can be easily estimated (infield putouts?including non-strikeout putouts at catcher?minus team assists) and does contribute to team success, as ball-hogging nets out. Sure, there are plenty of first base unassisted ground outs and a few unassisted ground ball putouts at second base, but the variation from team-to-team overwhelmingly reflects variation in fly outs allowed by pitchers (as well as foul territory), as I?ll discuss in the first baseman ratings section of Part III. And yes, sometimes infield fly outs overlap with outfield (particularly centerfield) fly outs, but the team benefits whether or not the play is made by centerfielder or a middle infielder.
In determining where the credit really belongs for estimated infield fly outs, it is important to remember that infield fly outs represent a peculiar kind of pitcher-generated BIP?with the rare exception of "at ?em" line drives, all such BIP outs represent weakly hit balls that zone data reveals are caught by somebody on the team something like 95% of the time.
In light of these facts, it makes sense to assign credit for infield fly outs to a team?s pitchers by design. Infield fly outs represent, in a fun-house mirror sort of way, the mirror opposite of home runs. Home runs are batted balls that by definition fielders can?t reach, and so we assign "blame" for them to pitchers. Infield fly balls are batted balls that in the overwhelming number of cases fielders always reach. So shouldn?t we assign "credit" for them to pitchers?
The problem, of course, is that traditional data doesn?t tell us the number of infield fly outs allowed by individual pitchers. Right now, DRA leaves them as a team-level statistic (which is effectively how they?re treated under UZR, I believe). As Voros and others have shown, completely ignoring BIP outcomes gives us pretty good value estimates for individual pitchers. Until more "Retrosheet" data (discussed in Part IV) is available that tracks infield fly outs per pitcher, I think that it might be reasonable, however, to allocate the team credit (blame) for infield fly outs among the team?s pitchers, initially pro-rata, then adjusted up or down based upon each pitcher?s non-homerun hits allowed ("Hits Allowed") relative to the other pitchers on his team. This is because per-pitcher Hits Allowed differences are probably, on average, largely explained by the relative tendency to give up infield fly balls. A great deal of data, reported at baseballprimer.com and elsewhere, shows that total fly balls are converted into outs at a significantly higher rate than total ground balls. Since infield fly outs are almost always caught, the out-conversion rates for outfield fly balls and infield ground balls might be very similar, on average, so that most of the difference in BIP out-conversion rates by pitchers is probably accounted for by infield fly balls. I readily admit that (i) differences in the relative abilities of an outfield and an infield could impact the relative Hits Allowed of a fly ball or ground ball pitcher on a given team and (ii) centerfielders such as Andruw Jones can play havoc with infield fly out numbers. For both of these reasons, I would report pitcher ratings showing the "stable" and "reliable" SO/BB/HR factors separately from infield fly out estimates.
To recap, we can?t really measure, using traditional statistics, how many fly balls an infielder has caught (still less how many he caught that nobody else on his team could have caught) or how many infielder-catchable fly balls a pitcher has generated, but it makes sense to credit a team?s pitchers (as a group) rather than infielders (as a group) for the impact of estimated infielder fly outs on the number of runs a team allows. Leaving infield fly outs as a team-level pitching statistic should generally not significantly distort pitcher ratings, and in time we may determine a logically and empirically defensible method for allocating infielder fly outs among a team?s pitchers.
2.	Responsibility for BIP Other Than Estimated Infield Fly Outs
Under DRA, fielders are assigned complete responsibility for all ball-in-play ("BIP") outcomes, other than estimated infield fly outs. In other words, the shortstop gets complete credit for each context-adjusted assist. Stated one more way, after accounting for how pitchers influence whether a ball is hit on the ground or in the air, and whether the ball is hit to the left or right side of the field, etc., etc., we attribute all credit or blame for plays made or not made (excluding infield fly outs) to the fielders. This approach implicates the still-raging debate about Voros McCracken?s Defense Independent Pitching Statistics ("DIPS").
In a nutshell, Voros is generally credited with first observing that most pitchers have very little impact on whether BIP (at-bats not resulting in a strikeout, walk, hit batsman, or homerun) are converted into outs by their fielders or fall in as hits. Numerous analysts, including Voros, Bill James, MGL, TangoTiger, Tom Tippett of DM, and Dick Cramer have found, by a variety of different methods, that individual pitchers, with the rarest of exceptions, have not historically demonstrated an ability to cause the number of BIP they allow that are converted into outs over the course of a full season of pitching to differ by more than a handful of hits compared with a league-average pitcher. The entire controversy seems to be about whether Voros said that pitchers have "no" impact (he didn?t, or if he ever did, he quickly corrected himself) and the extent to which those "handful" of hits per season per "exceptional" pitcher are worth thinking about. Some recent computer simulations by Arvin Hsu and Erik Allen at TangoTiger?s Primate Studies page at baseballprimer.com suggest that full-time pitchers might have a larger impact: a pitcher who is one standard deviation better at BIP out conversion might save about seven hits over the course of a season, and the two-standard-deviation outlier might save 14.
My belief, based on the prior discussion and some more analysis provided below, is that (i) pitchers generally control, through their well-known tendency to generate ground balls or fly balls, the level of infield fly outs (the most extreme form of fly ball), which accounts for a significant portion of total variability in BIP out-conversion, team-by-team (consistent with the Hsu/Allen simulation results), and (ii) pitchers that generate more infield fly outs obviously generated more outfield fly balls than infield ground balls (and vice-versa), but (iii) out-conversion of infield ground balls and outfield fly balls is overwhelmingly controlled by fielders.
There is a good deal of indirect evidence to support ascribing responsibility for BIP outcomes (other than infield fly outs) to fielders.
First, making this assumption yields DRA ratings which match up well with UZR and DM ratings, which do take into account virtually all of the ways in which pitching staffs affect BIP. UZR factors in ball placement, speed, whether the ball was hit off a left- or right-handed pitcher, and whether the ball was a grounder, fly ball or line drive. I?m not aware of all of the factors DM considers in its "zone" model, but they are probably similar to the factors included in UZR. If zone systems take every pitcher-controlled variable regarding BIP into account in rating fielders, and zone ratings basically match up with DRA ratings, which assume no pitcher impact, doesn?t that strongly suggest that fielders control such BIP out-conversion rates, on average?
Second, zone data shows that individual fielders have a much greater measured impact on non-infield fly out BIP than pitchers. The number of BIP allowed by a starting pitcher per season probably exceeds the number of BIP reaching the "zones" of any particular full-time fielder per season, even at the most important fielding positions, such as shortstop. In other words, individual pitchers have more "opportunities" to affect BIP outcomes than individual fielders. Yet only a small number of the best pitchers in the history of the game have been demonstrated to have had a reliable effect on more than a handful of BIP outcomes per season, while zone data shows that many fielders every year have measured impacts of five-to-eight times that amount. DM reports that "in a typical season, the top fielders at each position make 25-30 more plays than the average. Exceptional fielders have posted marks as high as 40-60 net plays, but those are fairly uncommon. Recent examples include Darin Erstad in 2002, Scott Rolen just about every year, and Andruw Jones in his better seasons. The worst fielders tend to be in the minus 25-40 range." And though extremely high fielder ratings tend not to last for more than a few years, they have more year-to-year consistency than pitcher BIP outcomes, which brings me to the next point.
Third, the "persistency" of individual fielder ratings is at least eight times greater than individual pitcher ratings on BIP. Before proceeding with the UZR/DRA/DM comparison, I conducted a test to confirm that full-time fielders? UZR ratings in a given year actually predicted?to a statistically significant extent?their ratings in the following year. The absence of such "persistency" (an idea of Dick Cramer?s) would indicate that fielding performance was random, not a skill. The percentage of fielders? plays made in any given year generally explained well more than 20% of the variation in their plays made in the following year. (In other words, when I regressed full-time Year 2 UZR runs for all full-time fielders onto full-time Year 1 UZR runs for the same fielders, the "r-squared" was generally greater than 20%, often much greater. The same held true for DRA.) Using a similar method, Dick Cramer recently conducted a study that showed that for full-time pitchers, the number of their Hits Allowed per BIP relative to the league average rate in any given year "explains" at most 2.6% of the variation in the number of their Hits Allowed in the following year. See Dick Cramer, "Preventing Base Hits", The Baseball Research Journal, Vol. 31, p. 89 (SABR 2003).
My hunch as to why some analysts do not allocate all BIP outcomes to fielders is that measures of team-level fielding impact on Hits Allowed show not that much more year-to-year persistency than individual pitcher Hits Allowed data. Dick?s study suggested there might be about a 6.4% team-level fielding persistency, independent of park effects. Id. There is, however, a good reason why team fielding quality doesn?t persist so much: a team?s fielders don?t "persist" so much. In collecting data for the 1999-2001 UZR/DRA comparison, I discovered yet again how hard it is for players to keep their jobs from year-to-year. On average, only two, maybe three players played full-time at the same position for the same team for two successive seasons. Due to fielder turnover, the most revealing comparison, for determining the relative impact of pitchers and fielders on BIP, is individual pitchers who pitch full-time and individual fielders who field full-time. Thanks to Dick?s study, we have a reliable measure of the average persistency of the effect of individual full-time pitchers on Hits Allowed. Thanks to MGL?s UZR data, we have a reliable measure of the average persistency of the effect of individual full-time fielders on Hits Allowed. Fielders beat pitchers by at least eight to one, even though responsibility for infield fly outs is effectively "ceded" to pitchers under UZR and Dick?s persistency tests.
Eventually, UZR and similar systems will provide a definitive answer regarding the relative impact of pitchers and fielders on Hits Allowed. How? By enabling us to perform an exact comparison of the average scale and persistency of pitcher "zone" ratings ("PZR") with the average scale and persistency of fielder UZR ratings. PZR can track for each pitcher the characteristics of all non-home run batted balls allowed, e.g., the actual rate of infield fly balls, ground balls, outfield fly balls, line drives, etc. given up by the pitcher, as well as the distribution of such BIP on the field, and assign run-expectancy values on such BIP independent of the effects of the pitcher?s park and fielders. In other words, just as UZR rates fielders "independently" of pitchers and parks, PZR can rate pitcher BIP outcomes independently of fielders and parks.
What I am highly confident we will find is that (i) the variation between pitchers in average PZR ratings (the average "scale" of impact, or how much difference there is between pitchers on average) is almost completely explained by the variation in infield fly balls generated, (ii) the scale/variation in PZR ratings excluding infield fly balls will be very much less than that for UZR fielder ratings, and (iii) the year-to-year persistency of PZR ratings excluding infield fly balls will be very much less than that of UZR ratings.
To summarize, we?ll determine that BIP outcomes are explained by
(i)	the number of pitcher-generated, nearly-always-infielder-catchable fly balls (i.e., short, high flies that are "always" caught),
(ii)	pitcher influence over how hard ground balls ("dribbler" or scorching one-hopper?) and outfield fly balls (high fly or "rope"?) are hit,
(iii)	fielder ability,
(iv)	fielder turnover,
(v)	park effects (particularly on infield pop-ups and long outfield drives), and
I?m fairly certain that (iii) and (iv) will be found to be at least five times more important than (ii), and thereby justify the simplifying assumption under DRA that fielders are fully responsible for their context-adjusted plays made, i.e., all BIP outcomes other than estimated infield fly outs. In the meantime, the fact that DRA ratings match up well with UZR ratings reassures me that this is the correct approach.
One intriguing finding under DRA: the number of runs saved/allowed per team on infield fly outs had a standard deviation in the 1974-2001 data set of 27, whereas the standard deviation in runs saved/allowed for all other BIP was 39. The ratio of such standard deviations (approximately 3:4) is very close to the ratio that Hsu and Allen have found in their simulation models between pitcher and fielder impact on BIP.
3.	Out Values of BIP Outs
In the course of refining the methodology for determining the number of context-adjusted plays made at various positions, I was presented with two approaches, each to some degree equally compelling from a theoretical perspective. One would yield significantly different out values (as distinguished from "hit-prevented" values) for infield v. outfield plays. I chose the other approach. Aside from the common sense behind such approach, it yielded a better UZR match. Again, I?m only belaboring the point in case years from now somebody wants to say that I was being misleading in saying that DRA has no subjective weights or factors.
4.	Wild Pitches?Passed Balls?Balks
DRA assigns responsibility for passed balls ("PB") catchers, but also responsibility for wild pitches ("WP") and balks ("BK"); provided, however, that an adjustment (admittedly ad hoc) is made for knuckleball pitching.
Why should catchers be responsible for WP and BK?
Because such an approach yields ratings that best match up with what I consider to be the best catcher rating system, albeit one that requires play-by-play Retrosheet data.
TangoTiger has recently put together, using Retrosheet data, a brilliant analysis of the rates at which WP, PB, and BK (as well as SB, CS, Picked Off) occur for each pitcher/catcher pairing for catchers who played a significant amount of time in the 1970s and 1980s. The data permits a calculation of the rate at which catchers "allow" WP or PB per pitcher, compared with all other catchers who caught for that pitcher anytime in that pitcher?s career. The data also permit a calculation of the rate at which pitchers "allow" WP or PB per catcher, compared with all other pitchers who pitched to that catcher. Although the data show that WP and PB are clearly more controlled by individual pitchers than catchers, catchers do have a non-trivial impact on WP and PB. When I "credited" catchers with WP and PB (and BK), the catcher DRA ratings generally matched up better with Tango?s ratings.
Here?s another way of thinking about the issue. Individual pitchers absolutely control WP more than catchers; indeed, they control PB more than catchers, on a "rate" basis. However, even dominant starting pitchers pitch only one-fourth as many innings per-season as a full-time catcher catches. Furthermore, team-level performance is almost certainly more controlled by the catcher (and his small number of backups) than the pitchers, as the "mix" of pitchers should, on average, randomize away their individual impact. For purposes of making career assessments of catchers, this effect is even more powerful.
Balks are included in the catcher rating because I lumped them together with WP and PB in the regression analyses (for fairly obvious reasons), and it just isn?t worth the trouble to take them out. Tango has concluded they?re basically random occurrences.