Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Tuesday, November 11, 2003

Defensive Regression Analysis - Part 2

Michael explains the methodology behind DRA, step-by-step.

	PRINCIPLES AND BASIC METHODOLOGY OF DRA

 

A.	A "Bare-Bones" Summary

DRA is, as far as I know, the first system to use regression analysis to evaluate fielding and pitching, or at least the first system to use regression analysis so comprehensively.  For the sake of brevity, I?m going to provide a two-page summary of the DRA "process", with particular focus on one position (shortstop), and rely on some technical terminology that for some of the readers I?m trying to reach might be a bit daunting; I?ll explain the technical terminology Part I.B immediately below.  Under DRA, we

 

(i)	Collect team-level defensive data of a sufficient sample size

 

 

(e.g., the runs allowed, strikeouts generated, walks allowed, shortstop assists, etc, per team; DRA was developed using team-level data for all teams in 1974-2001);

 

(ii)	"adjust" each defensive variable for each team (strikeouts, shortstop assists, etc.) using simple arithmetic formulas, so that each such event is at least arithmetically independent of the others

 

(e.g., determine the number of (a) strikeouts generated by the team?s pitchers, taking into account the number of batters facing the team?s pitchers ("BFP"), not innings pitched, thus yielding "adjusted SO" ("aSO"), (b) assists by the team?s shortstops, taking into account the number of batted balls in play allowed by the team?s pitchers ("BIP"), not innings played, thus yielding "adjusted" A6 ("aA6"), and (c) the number of shortstop errors, taking into account the number of chances, or "aE6");

 

(iii)	"regress" the arithmetically adjusted team-level fielding plays data (e.g., arithmetically "adjusted" assists at short, or "aA6") in the sample "onto" all the contextual variables derivable from publicly available data that we think might have an impact on the distribution of BIP, the opportunity to record assists on double play pivots, and the positioning of shortstops for purposes of fielding BIP

 

(e.g., regress aA6 "onto" pitcher variables, such as variables measuring BIP against left-handed pitching and a variable estimating the relative tendency of the team?s pitchers to generate ground balls or fly balls, but also variables such as an estimate for runners on first, which impact positioning and double play opportunities);

 

(iv)	eliminate the variables that do not have a statistically significant impact on the number of assists made at shortstop, and re-run the regression;

 

(v)	treat the "residuals" of the final, stripped-down, regression analysis result as the "skill" plays made at shortstop, i.e., the regression-"adjusted" aA6, or "aaA6"

 

(e.g., if the regression result indicates that each marginal BIP against left-handed pitchers increases aA6 by 0.0150, and the team had 1000 "excess" BIP allowed by left-handed pitchers, we decrease the team?s aA6 by 15 to reveal the shortstop plays not "given" to the shortstops by a high level of left-handed pitching, in order to provide a better estimate of fully context-adjusted shortstop assists, or aaA6); 

 

(vi)	repeat steps (ii) through (v) for plays-made data at all of the other fielding positions (including pitcher and catcher), thus yielding the "context-adjusted" plays made at all of the positions (e.g., aaA1, aaA2, aaPO7, aaA7, etc.);

 

(vii)	"regress" team-level runs-allowed data "onto" team-level context-adjusted pitching data (e.g., aSO, aBB, aHR, aWP, etc.) and the team-level context-adjusted plays made in the field determined in steps (ii) through (vi) (e.g., aaA6, aE6, aaPO7, aE7, etc.);

 

(viii)	eliminate all fully context-adjusted variables that do not have a statistically significant impact on runs allowed (e.g, all errors, except at pitcher and right field) and re-run the regression; in order to

 

(ix)	determine the average value, in runs, of each context-adjusted play made

 

(e.g., the average runs-saved value of each aSO, aHR, aaA6, etc.). 

 

A team?s defensive rating at a position is derived from the number of context-adjusted plays made at such position (determined in part through regression analysis), e.g., aaA6, multiplied by the average run-value of each such play (determined through regression analysis).

 

The formulas for each position (including pitcher) thus resemble (and are no more complicated than) Pete Palmer?s "Linear Weights" equation for batters found in Total Baseball.  For example:

 

aaA6 = A6 (adjusted arithmetically for BIP)  +/- A*Factor A +/- B*Factor B

 

+/- C*Factor C, etc.

 

(where the A, B and C "weights" are derived from regression analysis and contextual Factors A, B and C are derived from publicly available information).

 

The rating at shortstop equals aaA6, multiplied by the regression-calculated weight in runs for each aaA6, adjusted so that the rating is +/- the league average runs-saved rating at shortstop.

 

In the case of pitchers, the formulas work for their individual stats, with one exception described in Part I.D.1.  In the case of fielders, an individual?s rating at such position is based upon the team rating at that position, pro-rated for his innings fielded, and adjusted up or down for his rate of plays made relative to the team rate of plays made at that position.

 

All of the ratings add up to the DRA estimate of runs allowed by the team.  As mentioned in the Introduction, the standard error of such estimate in the 1974-2001 sample is less than that for any system for rating team offense of which I am aware.

 

	B.	A Little Bit of Theory

A (relatively) brief discussion of the basic theory behind regression analysis and how regression analysis has been applied to evaluate run scoring may help explain why regression analysis has never been tried before to evaluate run prevention.

 

Multi-variable linear regression analysis ("regression analysis") is a statistical tool for determining the marginal impact each variable in a multi-variable model has on the ultimate outcome being modeled.  As explained in the book Curve Ball, regression analysis can be used to estimate the impact of each type of batting and baserunning event on the total number of runs a team scores.

 

For example, if you provide a computer with a sufficiently large sample of rows of historical annual team-level data consisting of at-bats minus hits, walks, singles, doubles, triples, home runs, stolen bases, etc. (collectively, the variables), as well as the actual number of runs each team scored that season (the ultimate outcome being modeled), and (politely) "ask" the computer to "regress" team runs scored "onto" the variables, the computer can perform a regression analysis that will estimate the marginal increase or decrease in team runs scored that is associated with (loosely speaking, statistically correlated with) each variable.  Regressions usually show that for each additional home run hit, a team will, on average, score an additional 1.5 runs, assuming all other variables are held constant.  For each two additional home runs, a team will score, on average, an additional 3.0 runs, and so forth.

 

As explained in Curve Ball, the run-values per offensive event have been verified empirically by "change-in-state" models developed by George Lindsay and Pete Palmer.  "Change-in-state" models analyze the observed changes in expected runs scored before and after each offensive event (e.g., a home run) in large numbers of actual baseball game situations.  See Curve Ball, pp. 178-205.

 

For regression analysis to "work" (i) each variable must have a reasonably straight-line relationship to the ultimate outcome and (ii) each variable must be reasonably independent of the other variables.  The first assumption enables us to say not only that if a team hits an additional 2 home runs it should score an additional 3 runs, but also that if a team hits 20 more home runs than average it should be expected to score, all other variables being equal, 30 more runs.  The second assumption is necessary for the technique to reveal the independent marginal impact of each variable?if the variables are correlated with each other, the computer can?t calculates the marginal, independent impact of the variable, because the computer can?t "hold" all the other variables constant while it?s "calculating" the marginal impact of the variable under consideration?they?re "moving" with (or against) the relevant variable.

 

Although the process of run-scoring is not linear, an individual player?s contribution to the number of runs his team scores?and the marginal impact of each element of a team?s offense?is approximately and reasonably linear over the range of typical major league run-scoring scenarios over the course of a season.  That is why Pete Palmer?s Linear Weights equations work so well for batters.  The latest version of Bill James? Runs Created is essentially linear.  See Curve Ball, pp. 230-41.  In addition, the impact in terms of team wins of a given player?s run creation is reasonably linear.  "Within the range where the teams are clustered, a linear representation of value works perfectly well?exactly as Pete Palmer has always insisted that it did."  Win Shares, p. 108.

 

Similarly, the process of allowing runs is approximately and reasonably linear.  TangoTiger has done interesting work on how extremely good individual pitchers should have a disproportionate, non-linear impact on the number of runs their teams allow.  I agree this is true to some extent.  Pitchers (unlike batters) "create" their own run "context", and their skills "interact" with each other and with fielders.  If The Big Unit strikes out twice as many batters as a typical pitcher, the baserunners he "takes away" significantly reduce the expected impact of whatever singles his fielders might "allow".  I performed a DRA analysis of the most dominant starting pitcher in history, inning-per-inning: Pedro Martinez.  The estimated runs saved under DRA by Pedro and his fielders (who, over the course of his career, were very slightly better than average) was indeed about 5% less than the number of runs Pedro actually "saved", as measured by runs allowed by his teams while he was pitching.  Even assuming there is some non-linearity for extreme pitchers, the effect is fairly modest, and fielding has a linear impact, as team-level pitching staff quality is much less likely to have the effect Tango has described.  (For those of you who are familiar with regression analysis, the "residuals" generated under the various regressions used in DRA did not reveal any non-linearity.)

 

There is, however, one fundamental difference between offensive and defensive statistics: offensive statistics are generally not significantly correlated, whereas defensive statistics are highly, and by definition, cross-correlated.

 

Just because a team draws a lot of walks does not necessarily "prevent" or "cause" it to hit a lot of singles, or even a lot of homeruns.  Each event is thus reasonably independent.  Therefore, regression analysis of offensive statistics has long provided reasonable run-value estimates for various discrete offensive events.  Doubles and homeruns show some minor cross-correlation (teams that hit more doubles tend also to hit more homeruns), which probably explains why the weights for doubles and home runs are slightly different under regression analysis and Lindsay-Palmer "change-in-state" models.  See Curve Ball, p. 203 (home run weight of 1.4 under Palmer methods).  But, in general, a simple regression analysis of batting and baserunning data works.  When you plug the regression-generated run-weight values to the actual offensive data per team, you obtain an estimate of the number of runs the team should have scored.  The "error" in such estimate, that is, the "residual" not explained by such procedure, can be quite low (in a 1954-1999 regression analysis, the standard error was 0.142 runs per game, or 23 runs per 162-game season).  See Curve Ball, pp 178-84.

 

In contrast, each strikeout by a team?s pitchers literally eliminates an opportunity for the team?s fielders to make a play.  Thus shortstop assists and outfielder putouts are by definition negatively correlated with strikeouts.  In addition, "raw" fielding play data at each of the positions has extremely strong positive cross-correlations.  For example, unadjusted shortstop assists are strongly correlated with second and third base assists, as they are both affected by "ground ball" pitching, so if you ran a simple regression of team runs allowed on all defensive plays, the computer wouldn?t be able to "tell" the marginal impact of the shortstop plays, because in most cases in which the shortstop plays go up, second and third base plays go up as well.  This problem probably explains why no one has ever used regression analysis before to evaluate fielding.  You can?t just "regress" runs allowed "onto" traditional team defensive statistics, and come up with anything useful.

 

DRA "untangles" the cross-correlations among all the defensive events measured by traditional pitching and fielding statistics (e.g., strikeouts, infielder assists, outfielder putouts, etc.), both arithmetically and through regression analysis, so that each such "event" is context-adjusted and independent and, therefore, can be reliably associated (through regression analysis, and with statistical significance) to runs, the "money of baseball, the common denominator of everything that occur[s] on the field."  Moneyball, p. 131.

 

We first adjust each defensive variable arithmetically, so that each is no longer (negatively) correlated by definition with any other variable.  If you just took strikeouts per inning pitched and shortstop assists per inning played, each would be negatively cross-correlated by definition, as each strikeout takes away an opportunity to record a shortstop assist, given the fixed number of outs in a game/season.  Therefore, we put each event into the relevant "context" or "denominator" of opportunities, which differs by position:  BFP for SO; BIP for A6.  (In contrast, offensive events all share the same "context" or "denominator" of opportunities?outs.)  At this point we now have arithmetically "adjusted" variables: e.g. aSO and aA6.

 

To eliminate the remaining cross-correlations among the variables, we use regression analysis and the fact that baseball defense has a clear direction of causality: pitchers cause more or fewer shortstop chances; shortstops don?t "cause" pitchers to allow more ground balls.  We can therefore regress pitcher variables "onto" defensive events to reveal the plays made by fielders at each position not "explained" by pitcher variables.  That number is the "residual" of the regression analysis: the unexplained part that reflects, on average, fielder skill.  To calculate that "skill" residual per team, we just "back out" the effect of the pitching variables for each such team.  If the regression result for the 1974-2001 sample indicates that each marginal BIP against left-handed pitchers on average increases aA6 by 0.0150, and a team has 1000 extra BIP against left-handed pitching, we decrease the team?s aA6 by 15 to reveal the shortstop plays not "given" to the shortstops by a high level of left-handed pitching, to provide a better estimate of fully context-adjusted shortstop assists, or aaA6, that are a better estimate of skill plays made.

 

Now we can run a "global" regression of runs allowed data "onto" all of the now statistically independent variables: aSO, aBB, aaA6, etc., etc., to reveal the statistically significant run-impact of each marginal event.

 

When you apply these run weights to the number of context-adjusted plays made by a team at a particular position, you effectively arrive at an estimate of the likely change the players at such position brought about in the number of runs their team should have allowed, given the statistically significant relationships among all of the publicly available statistics of the team and between those statistics and the actual runs allowed by the team.  You can then "individuate" the ratings per-player at each position in the manner described above in the "Bare-Bones Summary".

C.	Some Preliminary Results to Indicate We?re On the Right Track

Although the basic "idea" behind DRA, as summarized above, is fairly simple (though, again, entirely new), the devil, as always, is in the details.  It has taken me more time than I care to admit figuring out the simplest and most effective ways in which to (i) eliminate the cross-correlations between and among pitching and fielding events and (ii) improve the model?s fit to (a) what we all "know" about various events and (b) UZR ratings and DM evaluations.

 

Under the latest DRA model, with one minor exception having no impact on fielder ratings, no context-adjusted pitching or fielding event has a cross-correlation with any other context-adjusted pitching or fielding event greater than 0.2.  Before such transformations, many such events had cross-correlations in excess of 0.6.  As a general rule of thumb, correlations below 0.3 are generally viewed as non-significant.

 

The resulting run values per context-adjusted event are very much those you would expect.  Each marginal walk given up in the 1974-2001 data was associated with giving up an additional 0.34 runs.  The Palmer weight for a walk drawn is 0.33.  Each homerun was associated with an additional 1.44 runs.  The Palmer weight is 1.40 runs.  Strikeout values varied according to run environment?approximately .31 runs saved in the high-offense 1990s; closer to .26 runs saved in the low-offense 1970s.  Pete has similarly found that out values vary between .30 and .25 in the same way, and MGL has found that outs in the form of strikeouts are just slightly more valuable to a defense than other outs; i.e., outs in the form of strikeouts in the high-offense era have been worth about .31 runs saved rather than .30 runs saved.  (DRA fielding ratings take into account?simply and objectively?the changing value of outs over time.)  The run-value of a wild pitch/passed ball/balk ("WP") was .37 runs allowed, slightly more than the Palmer weight for a stolen base, but that makes sense, because more than one runner can advance on a WP.  The value of a stolen base allowed was not stable?under the current model it appears to be half of the UZR value per stolen base, and well below the Palmer weight.  But I?m living with it, and the catcher ratings reflect the lower value.  In the catcher ratings section in Part II.B.8 I?ll suggest some reasons to believe that the weight might be correct.

 

One of the cardinal principles under DRA is that only statistically significant statistics and relationships are used in the model.  Non-significant factors are dropped, thus simplifying the model.  Errors?except at pitcher and right field?have no statistically significant relationship with runs allowed, after taking into account plays made.  This makes sense, to me at least.  On average, the errors made at, say, shortstop, are no more damaging than the failure to make a play.  A bobbled ball that stays in the infield is less damaging than a ball that finds its way to the outfield.  In contrast, an error in a throw to first is sometimes worse than not making the play at all (unless the play is successfully backed up by the catcher or second baseman, the batter can often advance to second).  On average, the mix of such errors is essentially run-neutral, after taking into account the fact that the play was not made successfully.  Pitchers, of course, can?t position themselves to improve their range, and have limited mobility to make a play after delivering a pitch, so errors become significant.  In right field, I suspect the reason errors are statistically significant is that throwing errors on throws to third can directly result in the runner scoring.

 

This finding has a significant impact on how pitchers should be evaluated.  The traditional method of rating pitchers is Earned Runs Average ("ERA").  ERA is a flawed method for rating pitchers, not only because of the "DIPS" issues discussed in Part I.D immediately below, but because ERA focuses on the wrong fielder factor, errors, for purposes of rating pitchers independent of fielders.  A pitcher on a poor fielding team might have an "unfairly" high Earned Run Average because the team made relatively few errors but had poor range, while a pitcher on a strong fielding team might have an "unfairly" low Earned Run Average because the team made more errors but had good range.  It is true that errors slightly correlate with poor fielding, but they have no statistically significant impact (except at pitcher and right field) once you take into account context-adjusted plays made.

 

But the problem is even worse than that.  Error rates are much higher on ground balls than fly balls.  So ground ball pitchers, who help cause errors, benefit by doing so, as the higher number of errors results in a lower Earned Run Average.  The opposite applies for fly ball pitchers.  Bill James has noted that

 

". . . [T]he Robin Roberts family of pitchers . . . tend to have lower component ERAs than actual ERAs.  The component ERA is a formula which looks at a pitcher?s hits allowed, walks allowed, and home runs allowed, and says, "this is the ERA that these should add up to."  It?a good formula, but it?s not a perfect formula, and pitchers such as Roberts, Newcombe, Jenkins and Hunter tend to have higher actual ERAs than component ERAs.  Robin Roberts never led his league in ERA, but led his league in component ERA twice."

 

Roberts was an extreme fly ball pitcher, as were Jenkins and Hunter (I don?t know about Newcombe).  Roberts hurt his ERA by giving his fielders easier chances.  DRA would redress this problem in rating pitchers.

 

The DRA-UZR-DM ratings comparison is provided and discussed in Part II.

D.	Assumptions Supported by Other Studies  

I stated in the Introduction that DRA has no subjective weights or factors.  That?s still true?each context-adjustment and run weight is the result of an objective determinant of opportunities to make plays (e.g., the number of batters facing the team?s pitchers or the number of BIP allowed by the team?s pitchers) or a statistically significant regression result, and only statistically significant variables are present in the model.  However, I did make exactly four assumptions, each of which is supported by empirical data outside of the model.  The assumptions are based in part on new insights regarding DIPS.  Readers who already feel "burdened by too much information" should feel free to skip ahead to the ratings; I?m only providing this background information in case the full methodology is ever made public, and somebody would otherwise feel that they had been misled regarding some of the bolder assertions made here.

1.	Responsibility for Estimated Infield Fly Outs

 

DRA, partly by default and partly by design, assigns responsibility for estimated infield fly outs to a team?s pitchers.

 

Infield fly outs are assigned to a team?s pitchers by default because?under the principle that everything has to add up?somebody has to take responsibility for them, and I know of no reliable method for assigning credit for them to fielders.

 

Put simply, we don?t know with any degree of precision how many infield fly outs are actually caught by each fielder.  I know of no reliable method for estimating the number of infield fly outs that are caught by middle infielders, due to the large number of putouts from runners caught stealing and force outs/double plays at second.  I know of no method for separating out estimated unassisted ground out plays at first from estimated unassisted putouts at first.  Probably most putouts at third are fly outs, but Bill James has found no relationship between third base putouts and fielder skill.  At third, as well as other infield positions, it is almost always the case that a fly ball caught by one infielder could have been caught by another, so that even if we knew the actual number of fly balls caught by an infielder, it would largely reflect his tendency to "hog" chances, and not his contribution to team fielding success.  For all of these reasons, I believe that UZR and DM do not include infield fly outs in infielder ratings. 

The number of infield fly outs caught by a team, however, can be easily estimated (infield putouts?including non-strikeout putouts at catcher?minus team assists) and does contribute to team success, as ball-hogging nets out.  Sure, there are plenty of first base unassisted ground outs and a few unassisted ground ball putouts at second base, but the variation from team-to-team overwhelmingly reflects variation in fly outs allowed by pitchers (as well as foul territory), as I?ll discuss in the first baseman ratings section of Part III.  And yes, sometimes infield fly outs overlap with outfield (particularly centerfield) fly outs, but the team benefits whether or not the play is made by centerfielder or a middle infielder.

In determining where the credit really belongs for estimated infield fly outs, it is important to remember that infield fly outs represent a peculiar kind of pitcher-generated BIP?with the rare exception of "at ?em" line drives, all such BIP outs represent weakly hit balls that zone data reveals are caught by somebody on the team something like 95% of the time.

In light of these facts, it makes sense to assign credit for infield fly outs to a team?s pitchers by design.  Infield fly outs represent, in a fun-house mirror sort of way, the mirror opposite of home runs.  Home runs are batted balls that by definition fielders can?t reach, and so we assign "blame" for them to pitchers.  Infield fly balls are batted balls that in the overwhelming number of cases fielders always reach.  So shouldn?t we assign "credit" for them to pitchers?

 

The problem, of course, is that traditional data doesn?t tell us the number of infield fly outs allowed by individual pitchers.  Right now, DRA leaves them as a team-level statistic (which is effectively how they?re treated under UZR, I believe).  As Voros and others have shown, completely ignoring BIP outcomes gives us pretty good value estimates for individual pitchers.  Until more "Retrosheet" data (discussed in Part IV) is available that tracks infield fly outs per pitcher, I think that it might be reasonable, however, to allocate the team credit (blame) for infield fly outs among the team?s pitchers, initially pro-rata, then adjusted up or down based upon each pitcher?s non-homerun hits allowed ("Hits Allowed") relative to the other pitchers on his team.  This is because per-pitcher Hits Allowed differences are probably, on average, largely explained by the relative tendency to give up infield fly balls.  A great deal of data, reported at baseballprimer.com and elsewhere, shows that total fly balls are converted into outs at a significantly higher rate than total ground balls.  Since infield fly outs are almost always caught, the out-conversion rates for outfield fly balls and infield ground balls might be very similar, on average, so that most of the difference in BIP out-conversion rates by pitchers is probably accounted for by infield fly balls.  I readily admit that (i) differences in the relative abilities of an outfield and an infield could impact the relative Hits Allowed of a fly ball or ground ball pitcher on a given team and (ii) centerfielders such as Andruw Jones can play havoc with infield fly out numbers.  For both of these reasons, I would report pitcher ratings showing the "stable" and "reliable" SO/BB/HR factors separately from infield fly out estimates.

 

To recap, we can?t really measure, using traditional statistics, how many fly balls an infielder has caught (still less how many he caught that nobody else on his team could have caught) or how many infielder-catchable fly balls a pitcher has generated, but it makes sense to credit a team?s pitchers (as a group) rather than infielders (as a group) for the impact of estimated infielder fly outs on the number of runs a team allows.  Leaving infield fly outs as a team-level pitching statistic should generally not significantly distort pitcher ratings, and in time we may determine a logically and empirically defensible method for allocating infielder fly outs among a team?s pitchers.

 

	2.	Responsibility for BIP Other Than Estimated Infield Fly Outs

Under DRA, fielders are assigned complete responsibility for all ball-in-play ("BIP") outcomes, other than estimated infield fly outs.  In other words, the shortstop gets complete credit for each context-adjusted assist.  Stated one more way, after accounting for how pitchers influence whether a ball is hit on the ground or in the air, and whether the ball is hit to the left or right side of the field, etc., etc., we attribute all credit or blame for plays made or not made (excluding infield fly outs) to the fielders.  This approach implicates the still-raging debate about Voros McCracken?s Defense Independent Pitching Statistics ("DIPS").

 

In a nutshell, Voros is generally credited with first observing that most pitchers have very little impact on whether BIP (at-bats not resulting in a strikeout, walk, hit batsman, or homerun) are converted into outs by their fielders or fall in as hits.  Numerous analysts, including Voros, Bill James, MGL, TangoTiger, Tom Tippett of DM, and Dick Cramer have found, by a variety of different methods, that individual pitchers, with the rarest of exceptions, have not historically demonstrated an ability to cause the number of BIP they allow that are converted into outs over the course of a full season of pitching to differ by more than a handful of hits compared with a league-average pitcher.  The entire controversy seems to be about whether Voros said that pitchers have "no" impact (he didn?t, or if he ever did, he quickly corrected himself) and the extent to which those "handful" of hits per season per "exceptional" pitcher are worth thinking about.  Some recent computer simulations by Arvin Hsu and Erik Allen at TangoTiger?s Primate Studies page at baseballprimer.com suggest that full-time pitchers might have a larger impact: a pitcher who is one standard deviation better at BIP out conversion might save about seven hits over the course of a season, and the two-standard-deviation outlier might save 14.

 

My belief, based on the prior discussion and some more analysis provided below, is that (i) pitchers generally control, through their well-known tendency to generate ground balls or fly balls, the level of infield fly outs (the most extreme form of fly ball), which accounts for a significant portion of total variability in BIP out-conversion, team-by-team (consistent with the Hsu/Allen simulation results), and (ii) pitchers that generate more infield fly outs obviously generated more outfield fly balls than infield ground balls (and vice-versa), but (iii) out-conversion of infield ground balls and outfield fly balls is overwhelmingly controlled by fielders. 

 

There is a good deal of indirect evidence to support ascribing responsibility for BIP outcomes (other than infield fly outs) to fielders.

 

First, making this assumption yields DRA ratings which match up well with UZR and DM ratings, which do take into account virtually all of the ways in which pitching staffs affect BIP.  UZR factors in ball placement, speed, whether the ball was hit off a left- or right-handed pitcher, and whether the ball was a grounder, fly ball or line drive.  I?m not aware of all of the factors DM considers in its "zone" model, but they are probably similar to the factors included in UZR.  If zone systems take every pitcher-controlled variable regarding BIP into account in rating fielders, and zone ratings basically match up with DRA ratings, which assume no pitcher impact, doesn?t that strongly suggest that fielders control such BIP out-conversion rates, on average?

 

Second, zone data shows that individual fielders have a much greater measured impact on non-infield fly out BIP than pitchers.  The number of BIP allowed by a starting pitcher per season probably exceeds the number of BIP reaching the "zones" of any particular full-time fielder per season, even at the most important fielding positions, such as shortstop.  In other words, individual pitchers have more "opportunities" to affect BIP outcomes than individual fielders.  Yet only a small number of the best pitchers in the history of the game have been demonstrated to have had a reliable effect on more than a handful of BIP outcomes per season, while zone data shows that many fielders every year have measured impacts of five-to-eight times that amount.  DM reports that "in a typical season, the top fielders at each position make 25-30 more plays than the average. Exceptional fielders have posted marks as high as 40-60 net plays, but those are fairly uncommon. Recent examples include Darin Erstad in 2002, Scott Rolen just about every year, and Andruw Jones in his better seasons. The worst fielders tend to be in the minus 25-40 range."  And though extremely high fielder ratings tend not to last for more than a few years, they have more year-to-year consistency than pitcher BIP outcomes, which brings me to the next point.

 

Third, the "persistency" of individual fielder ratings is at least eight times greater than individual pitcher ratings on BIP.  Before proceeding with the UZR/DRA/DM comparison, I conducted a test to confirm that full-time fielders? UZR ratings in a given year actually predicted?to a statistically significant extent?their ratings in the following year.  The absence of such "persistency" (an idea of Dick Cramer?s) would indicate that fielding performance was random, not a skill.  The percentage of fielders? plays made in any given year generally explained well more than 20% of the variation in their plays made in the following year.  (In other words, when I regressed full-time Year 2 UZR runs for all full-time fielders onto full-time Year 1 UZR runs for the same fielders, the "r-squared" was generally greater than 20%, often much greater.  The same held true for DRA.)  Using a similar method, Dick Cramer recently conducted a study that showed that for full-time pitchers, the number of their Hits Allowed per BIP relative to the league average rate in any given year "explains" at most 2.6% of the variation in the number of their Hits Allowed in the following year.  See Dick Cramer, "Preventing Base Hits", The Baseball Research Journal, Vol. 31, p. 89 (SABR 2003).

 

My hunch as to why some analysts do not allocate all BIP outcomes to fielders is that measures of team-level fielding impact on Hits Allowed show not that much more year-to-year persistency than individual pitcher Hits Allowed data.  Dick?s study suggested there might be about a 6.4% team-level fielding persistency, independent of park effects.  Id.  There is, however, a good reason why team fielding quality doesn?t persist so much: a team?s fielders don?t "persist" so much.  In collecting data for the 1999-2001 UZR/DRA comparison, I discovered yet again how hard it is for players to keep their jobs from year-to-year.  On average, only two, maybe three players played full-time at the same position for the same team for two successive seasons.  Due to fielder turnover, the most revealing comparison, for determining the relative impact of pitchers and fielders on BIP, is individual pitchers who pitch full-time and individual fielders who field full-time.  Thanks to Dick?s study, we have a reliable measure of the average persistency of the effect of individual full-time pitchers on Hits Allowed.  Thanks to MGL?s UZR data, we have a reliable measure of the average persistency of the effect of individual full-time fielders on Hits Allowed.  Fielders beat pitchers by at least eight to one, even though responsibility for infield fly outs is effectively "ceded" to pitchers under UZR and Dick?s persistency tests.   

 

Eventually, UZR and similar systems will provide a definitive answer regarding the relative impact of pitchers and fielders on Hits Allowed.  How?  By enabling us to perform an exact comparison of the average scale and persistency of pitcher "zone" ratings ("PZR") with the average scale and persistency of fielder UZR ratings.  PZR can track for each pitcher the characteristics of all non-home run batted balls allowed, e.g., the actual rate of infield fly balls, ground balls, outfield fly balls, line drives, etc. given up by the pitcher, as well as the distribution of such BIP on the field, and assign run-expectancy values on such BIP independent of the effects of the pitcher?s park and fielders.  In other words, just as UZR rates fielders "independently" of pitchers and parks, PZR can rate pitcher BIP outcomes independently of fielders and parks.

 

What I am highly confident we will find is that (i) the variation between pitchers in average PZR ratings (the average "scale" of impact, or how much difference there is between pitchers on average) is almost completely explained by the variation in infield fly balls generated, (ii) the scale/variation in PZR ratings excluding infield fly balls will be very much less than that for UZR fielder ratings, and (iii) the year-to-year persistency of PZR ratings excluding infield fly balls will be very much less than that of UZR ratings.

 

To summarize, we?ll determine that BIP outcomes are explained by

 

(i)	the number of pitcher-generated, nearly-always-infielder-catchable fly balls (i.e., short, high flies that are "always" caught),

 

(ii)	pitcher influence over how hard ground balls ("dribbler" or scorching one-hopper?) and outfield fly balls (high fly or "rope"?) are hit,

 

(iii)	fielder ability,

 

(iv)	fielder turnover,

 

(v)	park effects (particularly on infield pop-ups and long outfield drives), and

 

(vi)	luck.

 

I?m fairly certain that (iii) and (iv) will be found to be at least five times more important than (ii), and thereby justify the simplifying assumption under DRA that fielders are fully responsible for their context-adjusted plays made, i.e., all BIP outcomes other than estimated infield fly outs.  In the meantime, the fact that DRA ratings match up well with UZR ratings reassures me that this is the correct approach.

 

One intriguing finding under DRA: the number of runs saved/allowed per team on infield fly outs had a standard deviation in the 1974-2001 data set of 27, whereas the standard deviation in runs saved/allowed for all other BIP was 39.  The ratio of such standard deviations (approximately 3:4) is very close to the ratio that Hsu and Allen have found in their simulation models between pitcher and fielder impact on BIP. 

 

	3.	Out Values of BIP Outs

In the course of refining the methodology for determining the number of context-adjusted plays made at various positions, I was presented with two approaches, each to some degree equally compelling from a theoretical perspective. One would yield significantly different out values (as distinguished from "hit-prevented" values) for infield v. outfield plays.  I chose the other approach.  Aside from the common sense behind such approach, it yielded a better UZR match.  Again, I?m only belaboring the point in case years from now somebody wants to say that I was being misleading in saying that DRA has no subjective weights or factors.

 

	4.	Wild Pitches?Passed Balls?Balks

	

DRA assigns responsibility for passed balls ("PB") catchers, but also responsibility for wild pitches ("WP") and balks ("BK"); provided, however, that an adjustment (admittedly ad hoc) is made for knuckleball pitching.

 

Why should catchers be responsible for WP and BK?

 

Because such an approach yields ratings that best match up with what I consider to be the best catcher rating system, albeit one that requires play-by-play Retrosheet data.

 

TangoTiger has recently put together, using Retrosheet data, a brilliant analysis of the rates at which WP, PB, and BK (as well as SB, CS, Picked Off) occur for each pitcher/catcher pairing for catchers who played a significant amount of time in the 1970s and 1980s.  The data permits a calculation of the rate at which catchers "allow" WP or PB per pitcher, compared with all other catchers who caught for that pitcher anytime in that pitcher?s career.  The data also permit a calculation of the rate at which pitchers "allow" WP or PB per catcher, compared with all other pitchers who pitched to that catcher.  Although the data show that WP and PB are clearly more controlled by individual pitchers than catchers, catchers do have a non-trivial impact on WP and PB.  When I "credited" catchers with WP and PB (and BK), the catcher DRA ratings generally matched up better with Tango?s ratings.

 

Here?s another way of thinking about the issue.  Individual pitchers absolutely control WP more than catchers; indeed, they control PB more than catchers, on a "rate" basis.  However, even dominant starting pitchers pitch only one-fourth as many innings per-season as a full-time catcher catches.  Furthermore, team-level performance is almost certainly more controlled by the catcher (and his small number of backups) than the pitchers, as the "mix" of pitchers should, on average, randomize away their individual impact.  For purposes of making career assessments of catchers, this effect is even more powerful. 

 

Balks are included in the catcher rating because I lumped them together with WP and PB in the regression analyses (for fairly obvious reasons), and it just isn?t worth the trouble to take them out.  Tango has concluded they?re basically random occurrences.

 

Michael Humphreys Posted: November 11, 2003 at 06:00 AM | 28 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Damon Rutherford Posted: November 11, 2003 at 03:56 AM (#613899)
First, I think this is great and am looking forward to seeing the final results.

All of the ratings add up to the DRA estimate of runs allowed by the team. As mentioned in the Introduction, the standard error of such estimate in the 1974-2001 sample is less than that for any system for rating team offense of which I am aware.

Have you or are you planning to test your model by using it on data *not* used in the regression? Or, perhaps run the regression on the odd years in your data sample and test using the even years? Or perhaps use the leave-one-out method and run 28 different regressions and test each one using the left-out year? That, however, might involve a lot of time and work. Just curious, though. And perhaps you commented on this and I overlooked it.

The DRA-UZR-DM ratings comparison is provided and discussed in Part II.

So I assume another part will be released soon? I'm a bit confused on the labels of each part. I just read part two, so I'm assuming the ratings comparison will be in part three. And are you using this ratings comparison to be the ultimate test on the validity of this work or the validity of their work? Or just for fun because your work was validated in another way, perhaps in methods I inquired about above?

Sincerely,
Greg Tamer



   2. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613900)
Greg,

You raise several good points.

Yes, I should apply the regression weights "out of sample" to verify them. I currently plan to use the weights on 2002 and 2003 data and derive DRA ratings that I hope I'll be able to publish through Primer, along with a comparison of UZR and Diamond Mind ratings, as well as David Pinto's excellent new system, described on Tango's Primate Studies site, assuming they're available. I had hoped to do the out of sample calculations before this article came out, but I am really pressed for time right now (I'm back in school). I hope I can do it sometime in December/January.

I'm very confident they will work out of sample, however, because as I was developing DRA using the 1974-2001 sample I worked on various subsamples of the data and kept deriving very similar weights. The one exception, as noted in the current installment, is the value of outs, which varied in precisely the same manner that Pete Palmer first showed that out-values varied--the higher the general level of offense, the greater the impact of an out. In addition, the DRA sample (1974-2001) was quite large, so "over-fitting" is not at issue.

Major league baseball is a remarkably stable system. A month or so ago, sabermetrician "Patriot" (a frequent poster at Primer) performed a regression analysis of offensive data over approximately the same time period as the DRA study. His regression model standard error using the whole sample was 22.6. When he re-ran the regression on half of the seasons and applied the weights to the other half of the sample, the standard error went up by only 1.1 runs, to 23.7 runs per team per season, which was approximately the same standard error as a number of good and well-known models for offense. I expect to see similar "out of sample" results for DRA. Even if the increase in standard error is *three* times greater than Patriot found for offensive data, the DRA standard error out of sample will still compare favorably to the standard errors for offensive models.

The labeling of the "Parts" has gotten confused, and for that I apologize. The article, as originally written, had an Introduction, four "Parts", and a Conclusion. I believe the editors at Primer originally hoped to post the article in *three* installments, *also* labelled as "Parts". I have the feeling the editors hoped to have published what I call "Part II" (the DRA-UZR-Diamond Mind comparison) in this installment, but the dozen or so Excel spreadsheets that provide the detailed comparisons with all the numbers were probably more difficult to format than expected. (At the end of this post I'll summarize again what's to come.)

Perhaps it is just as well that this installment only includes the explanation of the methodology of the system, as there is a lot there that I hope people can focus on and respond to before we get to the ratings. I believe there are two "DIPS"-related insights here that are new and important, even if they seem incredibly simple:

(i) ERA favors ground ball pitchers compared to fly ball pitchers because many more errors are recorded on ground balls than fly balls. This answers an issue that Bill James raised regarding the "Robin Roberts Family" of pitchers.

(ii) Infield fly outs should be credited to pitchers, because almost any major league team will convert 95% of them into outs. It's the mirror opposite of home runs, which fielders cannot reach. It also helps the "accounting" under the system. UZR and Diamond Mind don't credit infielders for pop-ups. Then who should we credit? As mentioned in the installment above, infield fly outs estimated from non-zone data are obviously "polluted" by line-drives, unassisted ground out putouts, and discretionary chances by outfielders, but in general, the class of estimated infield fly outs are properly creditable to pitchers. Since fly ball pitchers generate more pop-ups (the most extreme form of fly ball), this approach is also consistent with the finding that fly ball pitchers generally allow fewer non-home run hits. I also believe--and I hope readers can contribute their thoughts on the issue--that knuckleballers (e.g. Wakefield) generate more infield pop-ups than average, which would also be consistent with Voros's finding that knuckleballers are unusually good at allowing fewer non-HR hits. (And they'd better be; the WP/PB they cause are costly.)

I'm glad you asked about the comparison of DRA with UZR ratings and Diamond Mind evaluations--the comparison *is* designed to validate DRA. I can't think of any other way it is *possible* to validate a non-zone system. As mentioned in the article, I don't believe that any other non-PBP or non-zone system has ever been submitted to such a test. (It's hard to believe, isn't it? Several books have been published providing non-zone ratings without being test against zone ratings, even when such tests were possible.) I think the results are good--not perfect, but good. Of course we should use zone data when we have it. But DRA provides a useful "back of the envelope" audit for surprising zone ratings, as I'll discuss in what I call Part IV.

Just to recap where we are and where we're going, what was originally labelled "Part I" (presented here as the second Primer installment) provides an overview of the principles and methodology of DRA, including many new insights about Defense Independent Pitching Statistics (?DIPS?) (the relative impact of pitchers and fielders on whether batted balls fall in as hits). What was originally labelled "Part II" (which I hope will be posted soon as the third "installment") compares 1999-2001 DRA results with UZR ratings and DM evaluations. "Part III" provides historical DRA results from 1974-2001. "Part IV" addresses various miscellaneous issues, including how DRA ratings can complement and improve zone ratings, the practical relevance of applying DRA to evaluate minor league fielders, DRA?s role in stirring and settling various Hall of Fame debates, how and why DRA can adapt to changing pitching and fielding dynamics and record-keeping over the course of major league history, as well as how DRA ratings (combined with Linear Weights batting and baserunning ratings) could be converted into replacement-level Win Shares and Loss Shares.

Thanks again.
   3. mike green Posted: November 11, 2003 at 03:56 AM (#613901)
Very interesting. It's not clear to me if DRA is intended to be an estimator of defensive value, rather than simply an attribution among defenders of outs made. If it is intended to be an estimator of defensive value, the interactions are not only among pitcher and fielders, but between fielders. For instance, a shortstop may have fewer opportunities because he is playing behind a strikeout staff or because he is playing beside Scott Rolen. Have I missed something, or does the system not attempt to address this?
   4. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613902)
Mike,

DRA is an estimate of value. Maybe the way to think about it is that it is as a combination of Range Factor (with putouts and assists analyzed separately) and *empirically determined* Defensive Linear Weights. Infielders, for example, are rated based on their assists, adjusted for the number of balls in play (which accounts for strikeout pitching, among other things), left-handed pitching, ground ball / fly ball pitching, baserunners and a few other variables having a statistically significant impact on the number of plays made. We use regression analysis to find the statistically significant impact of left-handed pitching, ground ball / fly ball pitching, etc. Then we use regression analysis to determine the average run value of each such "context-adjusted" play made. The resulting ratings per fielder are denominated in runs saved/allowed, as the subsequent installments will show.

For example, infielders are generally rated as follows:

[Run weight] * [Assists +/- BIP adjustment +/- GB/FB adjustment (determined through regression analysis) +/- LHP adjustment (determined through regression analysis) +/- baserunner variables adjustments (which impact DP opportunities and positioning, and also determined using regression analysis)]. The resulting rating is adjusted to be +/- the league average rate.

That's it. And only using publicly available data. The resulting ratings have an approximately .8 correlation with zone ratings. And almost exactly the same "scale" as zone ratings.

Although it is certainly possible for adjacent fielders to take away chances from each other, this is generally not at all a problem (except in very extreme cases) in the infield and not a terrible problem even in the outfield. The cross-correlations between fully-adjusted DRA ratings at adjacent positions were small (never more than approximately +/-.2, and usually less). Tango may have some posts in Primate Studies that provide further support for this finding.

   5. tangotiger Posted: November 11, 2003 at 03:56 AM (#613903)
Michael,

I just had another thought, regarding the presentation of your system, from a different perspective.

Say that you have Ozzie recording 400 outs, and being worth +30 runs. You extrapolate that out to being +40 outs above league average, given Ozzie's context. So, Ozzie is 400 outs, and the average SS, given Ozzie's context is 360 outs.

If we assume that a lg avg SS would convert 85% of "balls in zone" into outs, then we would guess that the league average SS would have 360/.85 = 424 opps in Ozzie context.

So, now we just estimated the number of balls in zone in Ozzie's context. Ozzie's ZR? 400/424 = .943

What's interesting, and maybe as a validation, is that you should not be able to go over 1.000. I'm not sure if your system kinda does this or not.

On the other hand, while the average ZR is around .84 or so for all positions, the actual DER is about .70, with the difference being "no-man land" BIP, and other types of PA that aren't part of ZR (but is of ZR). From that standpoint, you might take the lg ZR for a SS to be .700 if you account for everything. The above calculation would become:

360/.700 = 514 opps
400/514 = .778 = Ozzie's ZR

Personally, however, I prefer the runs above average presentation. But the ZR presentation might have a certain appeal as well.

Great job, and I love the incredible effort required to do this, and to validate it as well.
   6. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613904)
Tango,

I'll have to think through your post a bit more, but I can't imagine how it would be possible to impute a zone rating above 1.000 using DRA. I agree that it's better to use runs above or below average. Runs have a practical, measurable impact. They're also a statistic that makes immediate and intuitive sense to the average fan. Bill James has written about how certain baseball statistics have "acquired the powers of language"--they summon up vivid images. Somehow I've never been able to look at a ".835" zone rating and summon any image or intuit any meaning. That's one of the reasons we should all be so grateful to Mitchell for UZR--he took the "meaningless" zone percentages that are publicly available and turned them into something that makes sense to the average fan.
   7. studes Posted: November 11, 2003 at 03:56 AM (#613905)
Michael, this is stunning work. Very exciting. I particularly like your DIPS/infield flyball insight.

Maybe I'm thinking about this incorrectly, but I have some issues with assigning 100% of all BIP outcomes to fielders. We know, for instance, that 20% to 25% of all BIP will fall for base hits, even with the best fielders in the world manning their positions. Why should they be held accountable for these?

I tend to think that pitcher-only stats probably account for about 1/4 to 1/3 of all runs allowed, and BIP outcomes account for the rest. Is this how your DRA proportions responsibility for run-scoring too?
   8. MGL Posted: November 11, 2003 at 03:56 AM (#613907)
Outstanding and well-written! I'll have to check the pitcher to pitcher variation in infield pop-ups as compared to other BIP's. Do the retrosheet files not include hit location data?

Are you suggesting that the "skill" component of a pitchers BIP rate is reflected in their infield pop-up rate (infiled pop-ups divided by non-HR BIP's) or that the random component is mostly reflected in the infield pop-up rate and that the "skill" component is reflected in everything (GB and FB out rate AND infield pop-up rate)?
   9. Ken Arneson Posted: November 11, 2003 at 03:56 AM (#613910)
Trying to understand what this the infield fly/DIPS insight means, and it brings to mind a bunch of questions.

Should the formula for DIPS be adjusted for the infield fly rate of a pitcher? Would it account for the knuckleballer effect, too? Does it explain why Barry Zito consistently has an actual ERA quite a bit lower than his DIPS ERA? Is Zito's ability to make hitters hit weak popups a skill that is not reflected in DIPS?
   10. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613914)
MGL,

Thanks. I don't there is pre-zone PBP data that provides the exact placement of infield pop-ups, but I would think that Retrosheet files could be queried to determine the estimated infield fly outs (infield putouts (including catcher non-strikeout putouts) minus team assists) whenever a given pitcher is pitching.

I do think that pitchers who generate more pop-ups have a skill--the tendency of pitchers to be ground ball or fly ball pitchers is well known, and this is merely the most extreme example of such tendency. For reasons explained at lenght in this installment, I assume that non-infield fly balls and all ground outs are pretty much controlled by the fielders.

Ken,

I don't know what Zito's infield fly ball generation rate is. I heard that Wakefield generates a ton of them. DRA rates pitchers as well as fielders, and the pitcher DRA ratings are based on K's, BB's, HRs (like DIPS), with the infield fly outs kept as a separate statistic that probably has a lot of noise in it from year to year, but is probably the single best statistic obtainable without zone data about the pitcher's tendency to give up non-HR hits. (There is also a very slight advantage dedected under DRA to being a left-handed pitcher, independent of the SO/BB/HR data.)

   11. Charles Saeger Posted: November 11, 2003 at 03:56 AM (#613915)
Perhaps it is just as well that this installment only includes the explanation of the methodology of the system, as there is a lot there that I hope people can focus on and respond to before we get to the ratings. I believe there are two "DIPS"-related insights here that are new and important, even if they seem incredibly simple

There's one even more important -- DER is inversely related to a team's assist rate. That is, the more assists a team records, the lower its DER will be. Even excluding errors, this is true. More groundballs go through for hits, and assist rate is a good indicator of groundball rate. (To be more accurate, I get an even stronger positive correlation between DER and team PO-A-SO, but adjusting for assists works well enough.)
   12. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613916)
Tom N,

Thanks for your thoughts. I actually am offering both a fielder and pitcher evaluation system. (Everything adds up.) I'm more confident about the fielder part because I've tested it against UZR and Diamond Mind. The fielder part is also more relevant because evaluating fielding using traditional statistics has been such a problem. DIPS was a very important discovery , but, in general, ERA was not that bad a method of evaluating pitchers. By and large, it is possible to evaluate pitchers fairly well using the stats that have been around for ages.

Though I haven't had the opportunity to prove that infield fly outs solve the DIPS issue, I think there are compelling reasons to think that the DRA approach is a significant advance. The fact is that pitchers do have a tendency to generate ground balls or fly balls, infield fly balls are the most extreme fly ball, and infield fly balls are almost always caught by major league team defenses. Numerous studies have shown that pitchers who persistently have better rates of BABIP tend to be fly ball pitchers or knuckleballers. To me the logic is compelling to assign responsibility to pitchers, even apart from the problem of trying to allocate credit among fielders. Since the idea is new, people may want to think about it for a while.

As mentioned in today's installlment, PZR will ultimately verify, refute or refine what is proposed here. My guess is that the current DRA approach to DIPS is a pretty good, simple way of dealing with the issue. Time will tell.

Thanks again for writing.
   13. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613917)
Charles,

Great idea. I hadn't thought of that. It would be interesting to compare traditional DER /A /PO-SO-A rates with zone data. Again, more evidence that more fly balls mean fewer non-HR hits.
   14. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613922)
Studes,

I'm back and will try to address your post again:

"Maybe I'm thinking about this incorrectly, but I have some issues with assigning 100% of all BIP outcomes to fielders. We know, for instance, that 20% to 25% of all BIP will fall for base hits, even with the best fielders in the world manning their positions. Why should they be held accountable for these?

"I tend to think that pitcher-only stats probably account for about 1/4 to 1/3 of all runs allowed, and BIP outcomes account for the rest. Is this how your DRA proportions responsibility for run-scoring too?"

I think the short answer to why it is not unfair to hold fielders accountable for all BIP outcomes other than estimated infield fly outs is that such allocation, as a practical matter, does not cause fielders to be "blamed" for the 25% or so of BIP that are simply unfieldable. The reason? DRA rates fielders relative to the league average. No player's rating is harmed by the 25% of BIP that drop in, because the *average* fielder allows those BIP to drop in as well.

Mike Emeigh has written about the wisdom of at least theoretically treating each BIP hit as a fielder "failure". Is Mike around?

Regarding the 1/3-to-1/4 allocation to pitchers, I have the feeling that if estimated infield fly outs were adjusted for foul territory and the impact of centerfielder/middle infielder discretionary pop-up chances, that's about were the DIPS line would end up--perhaps closer to the 1/3 level.

Thanks again for writing.
   15. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613924)
Blixa,

You raise some very important points. Just so I can keep things straight, I'll repeat what you said and respond to it.

BLIXA: While UZR ratings have a sound theoretical and structural basis (I'm assuming DM ratings do as well), neither have been totally validated. Compared to offensive metrics, even the best defensive metrics are sorely lacking.

MAH: I think the huge advantage of UZR and DM is that they are based on what is by far the most complete set of data. As I'll mention in Part IV, it is worthwhile to continue trying to think about how to improve upon the use of such data, but for now, the data they are based upon represent the closest thing to the "reality" we are trying to model. In addition, UZR now has very strong "persistency"--individual fielder ratings correlate from year to year about as well as batting average (I think). Although that doesn't *prove* they work, it's certainly a good sign. DRA has nearly the same level of persistency. I believe, however, that the process of fielding batted balls is so intrinsically frought with randomness that fielding ratings will always strike fans as less reliable. Nevertheless, we've come a long, long way from where we were when Bill James began his sabermetric career writing about fielding statistics.

BLIXA: This is why I think your new system has been met with so much enthusiasm. Followers of sabermetrics are still looking for better ways to evaluate defensive performance. Therefore, I am a little miffed at your reliance upon UZR and DM ratings to validate your system. Theoretically speaking, your methods are clear and face valid (with the exception of the infield fly ball confusion). However, it seems when you have had to make a decision, you chose the method that would best correlate with the UZR and DM ratings (for example, including wild pitches and balks because doing so made your results look more like Tango's ratings, not on any theoretical basis [IMO, there is no theoretical basis for including WP and balks in a catcher's defensive ratings unless the regression finds them to be significant contributors)].

MAH: Thanks for appreciating that the DRA method seems clear and plausible on its face. If I had revealed in complete detail how the system works and the resulting equations, the inner logic might have been persuasive on its own, without reference to test results. Since I wasn't prepared to provide all such detail, I felt that readers were entitled to see some empirical validation, and I felt that UZR and DM were in theory almost certainly the most accurate systems out there. And, to be clear, I never made any adjustment to DRA solely on the basis of a better match with UZR/DM. Every adjustment has independent theoretical and empirical justification.

Regarding WP, as explained above, the regression analysis reveals the average impact, in runs, of WP/PB/BK, and the decision to allocate *all* of that impact to catchers was based not only on the Tango study (which really is terrific, see his web site posted in Primate Studies) but also because logic suggest that *at the team level* catchers have to have more impact than pitchers, as the WP/PB tendencies of the twenty or so pitchers per team over the course of a season are randomized away, *except* in the case of knuckleballers and the odd Nolan Ryan. As we'll see in the catcher ratings section, I felt it appropriate to make an add hoc adjustment for the "Charlie Hough" effect on Jim Sundberg's WP and PB. I entirely agree that allocating the whole WP/PB/BK variable to catchers is an imperfect approach, but better, on average, than allocating everthing to pitchers, and I could not determine any systematic method for splitting it up. Balks were left in just because I was too lazy to take them out.

BLIXA: The field of sabermetrics is still looking for advancements in defensive analysis. Advancements can't be made when the goal is to produce similar data to what's already out there.

MAH: Agreed. One of the points I'll make in what I call Part IV is that the *differences* between DRA and UZR/DM (and, I must confess, there are some) can help refine zone systems. And, again, every technique used in DRA has a logical and empirical justification independent of the match with UZR/DM. So what's neat is that such different approaches yield such similar results. That actually should lend more support to UZR and to the general conclusion about the impact of fielding at the major league level.

Thanks again for giving me the opportunity to address these issues.
   16. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613925)
Blixa,

You raise some very important points. Just so I can keep things straight, I'll repeat what you said and respond to it.

BLIXA: While UZR ratings have a sound theoretical and structural basis (I'm assuming DM ratings do as well), neither have been totally validated. Compared to offensive metrics, even the best defensive metrics are sorely lacking.

MAH: I think the huge advantage of UZR and DM is that they are based on what is by far the most complete set of data. As I'll mention in Part IV, it is worthwhile to continue trying to think about how to improve upon the use of such data, but for now, the data they are based upon represent the closest thing to the "reality" we are trying to model. In addition, UZR now has very strong "persistency"--individual fielder ratings correlate from year to year about as well as batting average (I think). Although that doesn't *prove* they work, it's certainly a good sign. DRA has nearly the same level of persistency. I believe, however, that the process of fielding batted balls is so intrinsically frought with randomness that fielding ratings will always strike fans as less reliable. Nevertheless, we've come a long, long way from where we were when Bill James began his sabermetric career writing about fielding statistics.

BLIXA: This is why I think your new system has been met with so much enthusiasm. Followers of sabermetrics are still looking for better ways to evaluate defensive performance. Therefore, I am a little miffed at your reliance upon UZR and DM ratings to validate your system. Theoretically speaking, your methods are clear and face valid (with the exception of the infield fly ball confusion). However, it seems when you have had to make a decision, you chose the method that would best correlate with the UZR and DM ratings (for example, including wild pitches and balks because doing so made your results look more like Tango's ratings, not on any theoretical basis [IMO, there is no theoretical basis for including WP and balks in a catcher's defensive ratings unless the regression finds them to be significant contributors)].

MAH: Thanks for appreciating that the DRA method seems clear and plausible on its face. If I had revealed in complete detail how the system works and the resulting equations, the inner logic might have been persuasive on its own, without reference to test results. Since I wasn't prepared to provide all such detail, I felt that readers were entitled to see some empirical validation, and I felt that UZR and DM were in theory almost certainly the most accurate systems out there. And, to be clear, I never made any adjustment to DRA solely on the basis of a better match with UZR/DM. Every adjustment has independent theoretical and empirical justification.

Regarding WP, as explained above, the regression analysis reveals the average impact, in runs, of WP/PB/BK, and the decision to allocate *all* of that impact to catchers was based not only on the Tango study (which really is terrific, see his web site posted in Primate Studies) but also because logic suggest that *at the team level* catchers have to have more impact than pitchers, as the WP/PB tendencies of the twenty or so pitchers per team over the course of a season are randomized away, *except* in the case of knuckleballers and the odd Nolan Ryan. As we'll see in the catcher ratings section, I felt it appropriate to make an add hoc adjustment for the "Charlie Hough" effect on Jim Sundberg's WP and PB. I entirely agree that allocating the whole WP/PB/BK variable to catchers is an imperfect approach, but better, on average, than allocating everthing to pitchers, and I could not determine any systematic method for splitting it up. Balks were left in just because I was too lazy to take them out.

BLIXA: The field of sabermetrics is still looking for advancements in defensive analysis. Advancements can't be made when the goal is to produce similar data to what's already out there.

MAH: Agreed. One of the points I'll make in what I call Part IV is that the *differences* between DRA and UZR/DM (and, I must confess, there are some) can help refine zone systems. And, again, every technique used in DRA has a logical and empirical justification independent of the match with UZR/DM. So what's neat is that such different approaches yield such similar results. That actually should lend more support to UZR and to the general conclusion about the impact of fielding at the major league level.

Thanks again for giving me the opportunity to address these issues.
   17. Michael Humphreys Posted: November 11, 2003 at 03:56 AM (#613926)
Brent,

"Instrumental variables regression", eh? Incredibly enough, I got the idea for the technique from page 222 in Bill James' "Win Shares", in the context of his analysis of adjustments for left-handed pitching:

"I sorted the data by walks, but the teams which had the most walks also tended to have significantly fewer strikeouts than the low walk teams, so that created some exceptionally annoying cross-correlation effects. To remove those, I created a category called "Walks Minus .16 Strikeouts," to give us a high walk group and a low-walk group with equal strikeouts."

I just guessed he had derived the adjustment from regression analyis. That one stray comment about a technique he apparently uses nowhere else gave me the germ of the idea for DRA. I recently took a statistics course, described to some grad students what I was proposing to do, and they all nodded as if it were the most ordinary thing in the world.

Thanks for letting me know about Craig's conclusions about errors and ERA. Though I can no longer claim credit for the discovery, it's a good thing to know about Craig's work. Eventually the idea that errors are *almost* meaningless (after determining context-adjusted plays made) will take hold. One exception is throwing errors by outfielders, but traditional stats don't break them out.

I agree that giving credit to pitchers for infield fly outs is something that should be tested, though I'm fairly sure that the approach is basically correct, though obviously something that can be improved upon.

Regarding catcher pop-ups, there are undoubtedly catchers who are better at it than others. The key question to ask is, "How many pop-ups does a catcher (or infielder) catch each season that nobody else on his team could have caught and that the average-fielding catcher would not have caught, given the amount of foul territory behind the plate?" I just don't see how we can answer that question based on a season's worth of non-zone data. Perhaps a clear career trend should be noted for purposes of historical ratings, but in the meantime, subjective "credit" for such an observed skill should probably be limited to a "tie-breaker" role for catchers whose other stats are essentially the same.

In the historical ratings section (what I call Part IV) I repeatedly emphasize the continuing need for subjective input--there are real fielding skills that escape quantification. The most important of these is the ability of first basemen to prevent throwing errors by infielders. That is worth much more study. Things like blocking the plate, hustling for soft flies that would otherwise drop in as Texas Leaguers, all have value, but I'm inclined to think it is fairly modest.
   18. Michael Humphreys Posted: November 12, 2003 at 03:56 AM (#613930)
Alex,

Regarding pitcher and right fielder errors, the short answer is that I have included all context-adjusted fielding statistics, including errors at each position (adjusted for total chances), to determine the possible statistical association between such context-adjusted fielding events and runs saved or allowed. The only categories of errors that showed up as having a statistically significant association with runs allowed were errors by pitchers and right fielders.

Again, DRA is about trying to find every possible statistically significant relationship between and among publicly available pitching and fielding statistics, and between those statistics and runs allowed. I was just being true to the method.

Do those results make sense? (Other important principles of DRA are simplicity and common sense.) I'm pretty sure the fact that pitcher errors showed up as significant makes sense. Pitchers are literally the only players in fair territory (i.e., excluding catchers) who can?t position themselves to improve their range, and have limited mobility to make a play after delivering a pitch, so surehandedness becomes more important than range alone. Here's another thought: pitchers are closer to the batter than any other fielder, so batted balls get to them so quickly (unless they're bunts or the rare squiggler) that having fast reflexes and being surehanded are the *primary* ways in which pitchers can excel as fielders.

The right field result is more questionable, and actually raises an important issue about the use of regression analysis.

In the book "Curve Ball", the authors show that if you include Sac Flies as a variable in a regression analysis of batting data, they have a higher "weight" than singles (maybe even doubles). The reason is that regression analysis only reveals a statistical *association* between an event and the outcome being modeled--not a "cause and effect" relationship. As explained in "Curve Ball", Sac Flies *always* result directly in a run scoring--by definition--so they would naturally have a large and statistically significant "association" with runs scored. In addtion, Sac Flies are a "carrier" variable that "carries" information about *other* aspects of the team's offense: i.e., teams with a lot of Sac Flies have a lot of runners on third base, and teams with a lot of runners on third base score more runs, whether or not through Sac Flies. For all of these reasons, Sac Flies should *not* be included in a regression analysis of team offense.

I suggest in the article that right field errors might show up as significant because right field *throwing* errors are the type of error *most* likely to result *directly* in a run scoring. If a runner goes from first to third and the right fielder throws the ball over the head of the third baseman and into the stands, the run will score. (Throws from center and left are less likely to be thrown beyond the reach of a back-up fielder.) If this theory is correct, it is probably still correct to charge the right fielder, because his action is indeed harmful in theory, and the run weight makes more sense than the run weight for Sac Flies, which are, after all, partly *negative* in their effect, as a run is consumed.

The other possibility is that throwing errors by right fielders are associated with the team allowing more baserunners and those baserunners being more aggressive. The first variable is not the right fielder's fault, but the second one might be.

I ultimately decided to leave in right field errors because outfield *assists* were *not* strongly correlated with the number of baserunners and the total number of batted balls reaching the outfield (H-HR+outfield putouts). The relationship was so weak, and could not be applied separately to each outfield position, so I left them out. As will be discussed in in Part III, simple, unadjusted outfield assists attracted regression weights that had a strong correlation with UZR Outfield Arm Ratings, which track runners "held" as well as runners "killed". Net/net, the evidence regarding assists was that throwing events by outfielders were basically context-independent, thus suggesting that right field (presumably throwing) errors are context-independent, and thus should be charged to the right fielder.

   19. Marc Stone Posted: November 12, 2003 at 03:56 AM (#613933)
The reason why errors count for pitchers and right fielders and not for other positions has a lot to do with the arbitrary nature of "statistical significance". I assume you used a p-value of .05, but there is nothing magical using about that number as a cut-off. There is no real difference between a p-value of .04 and one of .06 but you have to draw a line somewhere, and two positions ended up on one side and seven on the other. Errors likely matter a little more for the two but the differences are not likely to be enormous.
   20. studes Posted: November 12, 2003 at 03:56 AM (#613934)
Michael, thanks. Makes sense. Your complete and quick responses to our questions are amazing.
   21. strong silence Posted: November 12, 2003 at 03:56 AM (#613939)
Errors should not be included in any defensive analyis unless one knows and has documented the effect of partial scorers who, as we have all seen, often make decisions based on favoritism.
   22. Michael Humphreys Posted: November 12, 2003 at 03:56 AM (#613942)
Marc,

Actually, the closest any other errors had to a statistically significant impact on allowing runs was left field errors, with a p-value of .17.

Strong Silence,

Good point. There are a lot of errors made in judging errors. In addition, errors are really a duplicative stat, if you're keeping track of adjusted plays made. An infielder who is charged with an error is already charged with a reduction in his assists; an outfielder who is charged with an error on a fly ball is already charged with a reduction in his putouts. Throwing errors by outfielders might have more significance, as the ratio of throwing errors to assists is quite high, and the impact of such a throwing error might be greater than not having made a throw at all.

Studes,

Thanks for appreciating the effort.
   23. Michael Humphreys Posted: November 12, 2003 at 03:56 AM (#613943)
Strong Silence,

Maybe you already picked this up, but my last post to the prior installment was to one of your comments:

"As you know putouts and assists don't tell the whole story. The relationship of put outs and assists to runs prevented is unknown."

As you probably gathered from the second installment, that is precisely what DRA does. If you have any other questions on this point, let me know.
   24. Michael Humphreys Posted: November 12, 2003 at 03:57 AM (#613952)
Joe M.--thanks. Have you had the chance to observe knuckleballers other than Wakefield?
   25. Mike Emeigh Posted: November 12, 2003 at 03:57 AM (#613960)
I'm around, but don't really have time right now to do much more than offer a brief comment or two here and there. (I've been going through a career re-evaluation - not sure where that's headed, but I've turned down two job offers in the last month.)

Wakefield and Steve Sparks are definite flyballers, and both do tend to induce a lot of popups. I don't have all of the AL PBP data for 2003 converted yet, but when I do I'll run some more detailed numbers on this. I don't know if this is true of all flutterballers; we have Retrosheet data that covers a lot of the Niekro brothers' careers, as well as Charlie Hough's, and when I get a chance I'll take a look at them as well.

It is devilishly difficult to filter out all of the shared variance from defensive statistics - something of which I know Michael is well aware. Every play made by a defensive player is one less play available to someone else - and every play not made by a defensive player is one more play available to someone else. This is one of the reasons (probably the biggest reason) that existing non-PBP based methods don't work very well. *Someone* has to make the plays eventually, and the best fielder on a terrible defensive team will make more of them relative to his teammates than he normally will on a good defensive team, because in the latter case his teammates won't be leaving as many opportunities on the table.

-- MWE
   26. Michael Humphreys Posted: November 12, 2003 at 03:57 AM (#613961)
David,

I like your idea. "Practical approximations" is a good way of putting it.
   27. Michael Humphreys Posted: November 12, 2003 at 03:57 AM (#613965)
Mike,

Thanks for your comment and good luck on the career re-evaluation.

I'm pretty sure that DRA takes care of the phenomenon that you describe, which Bill James calls the False Normalization of Fielding Statistics. Maybe when we get to the next installment, which provides the DRA/UZR/Diamond Mind comparison, we can consider individual cases. In the meantime, the following may address the issue.

First, plays made are evaluated under DRA in the context of BIP, not innings. If a good fielder plays on a poor fielding team, the fact that 27 outs have to get recorded *will* cause his plays made per inning to go up. But if the "context" used is BIP, the extra BIP that drop in front of his poor-fielding teammates for hits will increase the "denominator" of BIP "opportunities" "charged" to the good fielder, thus keeping his rating from shooting up.

Second, and related to this point, is the fact that fully-adjusted plays under DRA at each of the positions have low cross-correlations, thus suggesting that, on average, there is minimal "play-stealing" between positions. (Tango posted some studies recently that independently back up this point.) This implies that the chance that a good fielder could take a lot of BIP opportunities "ceded" by the poor fielder next to him (and thereby keep the total BIP "denominator" low) is not a terrible problem. There are always exceptions, of course. I note in the historical survey (what I call Part III) that Tim Wallach's ratings during the three seasons Hubie Brooks impersonated a shortstop are probably a bit too high, because Tim had to do a lot of Hubie's work for him.

Third, DRA ratings match up well with UZR/Diamond Mind ratings, which are designed to avoid the False Normalization problem.



   28. Michael Humphreys Posted: November 14, 2003 at 03:57 AM (#614003)
FJM,

Thanks. I've been blessed with great readers.

I hear you about the outfield fly balls. Credit for the soft flies really belongs to the pitcher, to the extent they represent batted balls that virtually any major league outfield would convert into outs.

The problem is that it is not possible with non-zone data to determine which of the outfield fly balls are the easy ones to catch.

Traditional defensive statistics provide us with three categories of BIP: ground outs, infield fly outs and outfield fly outs. Even the latter two are fuzzy categories, as there is an overlap between infield and outfield fly outs and unassisted ground out putouts get lumped into the infield "flyout" estimate. (Doubles and triples allowed provide more information, but I don't believe that doubles and triples *allowed* by a team have been tracked throughout major league history, and furthermore, we don't have a record of *which* outfielder "allowed" the extra-base hit--we can't assign blame the same way we can assign credit for putouts.)

Although it hasn't been proven that generating infield fly outs is a pitcher skill, we know that pitchers have persistent individual tendencies to generate more fly balls or more ground balls. Infield fly balls are the most extreme form of fly ball, and so it would appear reasonable to assume that a pitcher's rate of generating infield fly balls goes up (probably in a non-linear fashion) with his rate of giving up total fly balls. In a sense, an infield fly ball is the most "successful" outcome for a fly ball pitcher, other than a strikeout. He has "won" the contest on his terms.

I think it would be possible to test this using non-zone Retrosheet data, which goes back to about 1970 or so. What one would need to do is query the database for the total estimated infield fly outs, outfield fly outs and ground outs per pitcher per season. We could also determine the extent to which a pitcher's infield fly out rate, relative to the league average (and perhaps adjusted for park effects of foul territories, which are not trivial), (a) is "persistent" year-to-year to a statistically signficant extent, (b) is correlated with outfield fly outs, and (c) "explains" the total variation in BABIP rates. My belief is that infield fly outs will be highly persistent and explain pitcher BABIP variation better than any other single statistic that can be derived from non-zone data.

OK. Now what about outfield fly balls?

We know, from zone data, that ground balls are significantly more likely to become hits than fly balls. But those statistics looked at *total* fly balls. What I suggest in the article, though I don't have the data to prove it, is that if you eliminate *infield* fly balls from *total* fly balls, the out-conversion rate for *outfield* fly balls is--on average--probably very similar to that for ground balls.

Assuming that is the case, we have two categories with similar out-conversion rates--ground balls and outfield fly balls.

If ground balls are "bad" in terms of run-prevention, then, on average, so are outfield fly balls, on average. If out conversion of ground balls is controlled primarily by the infielders, it stands to reason that out-conversion of outfield fly balls is controlled primarily by the outfielders.

Think of it another way. If the pitcher has "forced" the batter to hit the ball in the air, and has neither clearly succeeded (infield fly out) nor clearly failed (home run), do we think that the positioning, reflexes, and running speed of the outfielders will be the largest determining factor, on average? It would seem that the fact that the outfield is so big would make positioning and speed all the more important, particularly in running down soft flies that would otherwise "drop in" between fielders.

When all is said and done, the approach described, as David Smyth so aptly put it, is a "practical approximation." However, as we'll see in the upcoming DRA/UZR/Diamond Mind comparison, the runs-saved estimates for outfielders under DRA are probably slightly more conservative than under UZR, which *does* track ball trajectory and speed (as well as zone location). So the approach of crediting everything to the outfielder appears to work, or at least not overweight the impact of outfielders.

Thanks again.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
phredbird
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Demarini, Easton and TPX Baseball Bats

 

 

 

 

Page rendered in 1.5178 seconds
47 querie(s) executed