Adjusted Range Factor (Or Why Darin Erstad Deserves Serious Consideration as AL MVP)
David Vaillette proposes a *new* defensive stat.
Adjusted Range Factor (Or Why Darin Erstad Deserves Serious Consideration
as AL MVP)
I was reading something in Rob Neyer the other day about the fact that there
is proof that good defense keeps balls that are in play (Base Hits that are
not home runs and outs that are not strike outs) from being hits and not pitching.?
That is, there is proof that it is not a skill that appears constant from
year to year for pitchers.? (The exception is apparently knuckleballers.)?
I have also read that a good stat for total team defense is outs/balls in
play:
Outs = IP*3-K
Balls in Play = H-HR+IP*3-K
Team Defensive Rating = (IP*3-K)/(H-HR+IP-K)
It also appears to me that Zone Rating is the single most useless stat ever
computed for baseball.
What I think is a good stat is Range Factor, but it?s kind of the Avg. of
Fielding; the most useful single stat (OPS is a composite stat, and RC is
a even more a composite stat) but still potentially misleading.? For instance
do Darin Erstad?s and Wendell Magee?s current (7/15/2001) RFs of 3.71 and
3.36 mean they are both great Center Fielders?? My answer, Erstad yes, Magee
no, read on.
(Note the following are from team stats I compiled on or about July 6th.)
How are RFs different from team to team?? What will the RFs of the Diamondbacks
look like who have to perform behind Randy Johnson and Curt Shilling?? Obviously
they won?t look as good.? What about Outfielders and Infielders who play behind
Derek Lowe?? The infielders will have their stats padded, and the outfielders
stats will look worse.? And what if you are a player on the Atlanta braves
with the best defense in the league.? On the season they make 12 outs for
every 15 balls in play, while an average team makes 12 outs for every 16 balls
in play.? The have essentially robbed themselves of 1 chance to make a play
due to their efficiency.
The answer is, obviously to adjust.
The first adjustment is to compute a team?s Defensive Efficiency to the league
average.? For instance Atlanta?s DE .807 and the league average was .769,
so each member of Atlanta has their RF multiplied by .807/.769. (=1.049)?
Detroit has each member of their team multiply their RF by .736/.769 (=.959)
The 2nd Adjustment is to compute the average in-play outs per
inning for each team, or take out the strikeouts.? For instance Detroit, whose
pitchers don?t strike anybody out, has 2.416 outs per inning pitched and Arizona,
who leads the league in Ks has an average of 2.079 outs per inning pitched.?
The league average is 2.250, so for Detroit the adjustment would be 2.250/2.416
(=.93) and for Arizona, the adjustment would be 2.250 / 2.079 (=1.082).
The 3rd and final adjustment I make is an estimation of ground
balls to fly balls.? The estimation I use is a little bit lazy.? One could
and should figure out the amount of PO by outfielders, but instead I figure
out a ratio of Assists/Outs in play.? The team with the most Assists/Outs
in play is the Montreal Expos with a ratio of .570.? The Team with the least
is the Minnesota Twins with a Ratio of .459.? The league average is .512.?
Now it may be that the reason more outs are recorded by Minnesota outfielders
than any other team could be that they have fly ball pitchers on their team,
or the reason could be that they have good outfielders and bad infielders.?
I take a mid point between these two views and compute an outfield RF adjustment
of (.459/.512+1)/2 (=.948) for the Twins Outfielders and (.512/.459+1)/2 (=1.0578)
for the infielders.
My Adjusted RF subtotal is RF*Adj1*Adj2*Adj3.? Then I subtract a player?s
error rate per 9 innings for my final ?Adjusted RF?.? In my numbers there
are a couple of other things I haven?t told you about above.? I have subtracted
out Catcher Defensive numbers (Mostly just Caught Stealing) there are some
other adjustments that could be made, and others that could be made better
(Should there be a park adjustment?? A Left Right Adjustment?? Should there
be an estimation of infield flies (Team Assists ? Outfield Assists- Catcher
Assists ? IB PO)? This is what I am currently going with. ?And, as, from what
I read, Bill James would expect, and you might not, the results are not what
you would expect.
The 2002 All Defensive team so far:
AL?
C? Einar Diaz
1B Mike Sweeney
2B Adam Kennedy
SS Mike Bordick
3B Robin Ventura
???? Corey Koskie (due to the imperfections I mention Above, I have to consider
this a tie)
LF Jacque Jones
CF Darin Erstad
RF Matt Lawton
NL
C? Damian Miller
1B Eric Karos
2B Jeff Kent
3B Craig Counsell
SS Jose Hernandez
LF Eric Owens (!?!?)
CF Jim Edmonds
???? Juan Encarnacion (tie)
RF Richard Hidalgo
AL Defensive MVP ? Darin Erstad or Mike Borkick
NL Defensive MVP ? Craig Counsell or Jose Hernandez.
Note1: For 1B I use A + DP/2 for a Range Factor Rating.? For Catchers I use
a different rating all together.
Note 2:? One problem with this system is its inability to rate an outfielder?s
arm.? For instance Outfield assists are usually considered a good rating of
an outfielder?s arm, but it is really a rating of an outfielder?s reputation.?
Nobody runs on Mondesi anymore, so he doesn?t get assists anymore.? Ivan Rodriguez
is among the last in the league in Catcher Assists because people have learned
not to run on him either.
So, using this tool, Darin Erstad has an adjusted RF of 3.56.? The team he
plays on seems to have some fly ball pitchers (or is it just because he is
so good?) but they are the 3rd best defensive team in the league.?
Mean while Magee plays for the worst defensive team in the league, and the
worst strikeout team in the league.? He is having his RF padded, and his adjusted
RF is 2.88.? This is still well above the league average of 2.65, but it is
not insanely high like Erstad?s is.? In my estimation he takes 1.16 hits per
game away from the other team per game.? If you are watching him every day,
he will get to a ball that the average centerfielder will not get to once
a game, and that no other major league center fielder will get to every other
game.? He has probably saved his pitching staff 46 runs during the year.?
Offensively he is only above average, but defensively he is in a class by
himself by him self.? Alex Rodriguez has been good, but he is only a slightly
above average SS (4.40 vs. league avg of 4.35).? Mike Sweeney has been incredible
Offensively and Defensively, but Erstad is playing for a contender, and Sweeney
is not.
Last note How the hell does a 37-year-old (Mike Bordick) play such good defense
at SS?
David Vaillette
Posted: September 03, 2002 at 06:00 AM |
13 comment(s)
Login to Bookmark
Related News:
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Mike Emeigh1. Infielder assists should be adjusted for DPs. This affects both the individual fielder estimates and the assists/outs in play estimate, which explains (to some extent) why the Expos rank so high and the Twins so low.
2. DPs at 1B should be adjusted for the number of runners on first base, otherwise you reward a player for playing behind a lousy pitching staff (see below).
3. Mike Sweeney looks good because (a) he almost never records an unassisted putout at 1B, getting lots of 3-1 assists on balls that other 1Bs take themselves, and (b) he gets extra credit for KC's lousy pitching staff because of the use of DPs in his RF. Bill James noted the first of these effects, and makes an appropriate adjustment for it in the Win Shares system. You can't ignore unassisted putouts when evaluating 1Bs.
I haven't run any of the numbers through the more complicated system that Charlie Saeger and I use, but I'd be stunned if Erstad's impact is anything like 46 runs. My feeling is that the simple ratios proposed here leads to adjustment factors that are *far* too large, overstating the real differences between players.
-- MWE
>>>>>
I don't think so. Hold the variables equal. The rest of the league makes 12 plays in 16 chances, while the Braves will make 12.8 plays in 16 chances. They have essentially turned 0.8 of a hit into an out because of their efficiency.
If you're adjusting RF upward because of a player coming from an efficient defensive team, you're double-counting somewhere.
And I know this is counter to the stathead "you can't judge defense by watching players", but as soon as I saw Jeff Kent as the NL's best defensive second baseman, my "sanity check" button started glowing red.
Yes, ZR has its bias. What doesn't? What does OBA measure? % of times reached base. So, Andre Dawson doesn't do so good. Does that mean he's not a good hitter?
What does ZR measure? The % of outs per ball in "zone". It's not the "best fielder stat". So, some players don't do so good because they have lots more balls hit in the "outlying" area of the zone (i.e., not all opps are equal).
These things can be adjusted for. If the field, the pitchers, the batters, and teammates all have some effect on a player's ZR, then look for those reasons, and adjust them accordingly.
* First basemen on bad teams are more likely to make the 3-1 flip rather than take the play unassisted. I couldn't tell you why this is either.
* Left/Right numbers are only really important for first and third basemen.
* You always need to adjust for runners on base with double plays and outfield assists.
That is not the only reason why players don't perform well in ZR. Fenway Park left fielders can't perform well in ZR, because they can't play deep enough to cover all of the zones assigned to a left fielder thanks to the Monster. Neither can Houston left fielders or Baltimore right fielders, for the same reason.
The problem is that you don't know why a player's ZR is low, and because you don't know why a player's ZR is low you can't use it to evaluate his fielding skill. It's like using unadjusted OPS to evaluate a hitter's skill. If you do that, you're going to conclude that Colorado has a lot of great hitters.
These things can be adjusted for. If the field, the pitchers, the batters, and teammates all have some effect on a player's ZR, then look for those reasons, and adjust them accordingly.
These adjustments are hardly simple. The area of the field that a fielder can normally cover is highly situation dependent, with all of the factors above plus the baserunner situation, the time in the game (early/late innings), and the game score affecting the fielder's ability to cover certain sections of the field. As an example - middle infielders are positioned at normal depth with no runner on 1B, play at double-play depth (which is a few steps in closer and toward the middle to allow them to shorten the distance they have to go to get to 2B) with a runner on first and fewer than two outs, and often play in with a runner on 3B and fewer than two outs. Those different positions by themselves change the area they are likely to be able to cover.
Personally, I think we ought not to worry about making fine gradations. Just take the Project Scoresheet/Retrosheet ball location diagram, and use the zone definitions there as an upper bound on opportunities. For example, ground balls hit into the area marked "56" are the responsibility of both the SS and the 3B. If the ball gets through for a hit, both players are penalized, period - no effort or judgment needed to figure out which of those fielders should have made the play, or whether the ball was in fact *fieldable*. (This is, more or less, the Defensive Average approach.) Players shouldn't necessarily be excused for *degree of difficulty* - if the ball is going there frequently against them, that suggests that the defensive positioning needs to be changed to make those plays easier.
I'll have more to say on this subject in a later article.
-- MWE
I never said it was easy, but certainly it doesn't make the ZR approach "useless".
I believe that the UZR approach is a great first step, and MGL (Mitchel L) has refined it to include the park effects (including for Fenway and Coors et al). It's just a matter of capturing all the other variables that we know about.
When a player makes a bunch of extra plays, we should probably assume that he turned a few hits into outs, but also he snared a few outs his teammates could have grabbed.
When I figure CAD, the total rating for the infielders summed together is about three times higher than a combined rating (evaluating the infielders as one unit), looking at the plus/minus score versus average. Outfielders are about 50% higher (I divide hits into groundball and flyball). Thus, when finding the run value, multiply the hit value (0.59+lg.runs/lg.bfp) by 1/3 for infielders and 2/3 for outfielders.
The above is a mathematical phenomenon. More to the point, we just are not that certain. Saving 30 runs more than an average fielder is a phenomenal total, almost certainly league-leading (for all positions), if not leading both leagues.
Still, David's approach has merit, largely as a quick-and-dirty measure of fielding. I suppose this would be best:
* For infielders, divide assists per game by the team assists per game, and multiply by the league assists per game.
* For outfielders, divide putouts per game by the team putouts less strikeouts less assists per game, and multiply by the league putouts less strikeouts less assists per game. Same story for third basemen's putouts, if you really care.
* For putouts by first basemen, second basemen and shortstops, you start to run into problems. The biggest issue here is quick-and-dirty does not work well. I suppose Bill James's assumption of 84% of assists by infielders are fielded by first basemen works alright, and you could make similar estimates with third baseman and shortstop assists for second baseman putouts, and with second baseman assists and the Bill James figure of first baseman assists less pitcher putouts for shortstops. Figure as a percentage of team putouts less strikeouts less assists (keep assists for first basemen), adjust to league, add back league average dross.
* Multiply all range factors by league DER/team DER.
* Figure double plays as a percentage of assists times runners divided by batters facing pitcher, adjust to league. Figure outfield assists as a percentage of runners times putouts less strikeouts divided by batters facing pitcher, adjust to league. Figure catcher assists less passed balls as a percentage of runners less second basemen putouts, adjust to league and laugh at the result.
My biggest issue with all this is you are about two steps away from figuring CAD, Win Shares fielding or BP fielding, so I don't see the reason to stop (though at this point, the really nasty work begins). The biggest advantage is you have a range factor, and people inherently understand this.
I've said it before and I'll say it again. Mike Gimbel's system handles the park problem, because it compares each player against his opponents in his own stadium and on the road. It also takes much of the guesswork about assigning run values to fielding, because he takes into account extra-base hits as well as plays made/not made.
Cheers,
Alan Shank
Because most of us *don't* have it.
What tends to be forgotten he is that the direction/distance data is only available if you are willing to pay for it. For many of us, that's not an option - so we're forced to work from what we have available to us for free.
I've said it before and I'll say it again. Mike Gimbel's system handles the park problem, because it compares each player against his opponents in his own stadium and on the road. It also takes much of the guesswork about assigning run values to fielding, because he takes into account extra-base hits as well as plays made/not made.
Same issue - it doesn't do us any good if it's not freely available.
I am NOT suggesting, by the way, that any of that stuff *should* be freely available; I support the right of folks to make money from their intellectual property if they so choose. But they don't help those of us who want to do the best job we can do without spending significant money out of pocket.
-- MWE
The biggest bias with DA/UZR is inherent in the system from the beginning - the assertion that you can (statically) define a coverage expectation for any player.
Suppose a runner is on first base and breaks for second as the pitcher moves toward the plate. The shortstop moves over to cover. The coverage expectation for that shortstop has just changed! Or suppose the hitter looks like he's going to bunt, and you wheel the infielders around based on the presumption that the hitter will bunt. Guess what? You just changed the coverage expectation again. I saw this happen Wednesday night in the Jacksonville/Carolina Southern League playoff game - Suns' OF Jesus Feliciano hit a weak little flare off the fists that the shortstop would normally have put into his hip pocket, except that the Mudcats had a bunt play on, Ronnie Merrill was racing to cover second and by the time he got turned around it was too late.
The point is that *any* system with an underlying static coverage expectation assignment is biased from the beginning, and the biases can't be adjusted out of the system. If you want to use a zone-based approach, then penalize players equally when a ball gets through an area that either could reach within the range of locations at which he could normally be positioned. If a grounder goes through the SS hole, for example, charge it against both the SS and 3B. If a flare drops in short left, charge it against the SS, 3B, and LF. Or if (as happened last night in the Jacksonville/Carolina game) a popup drops between fielders 40 feet from the plate toward the mound, charge the 3B, 1B, and C with the resulting hit. Don't try to say, "well, the average SS fields 60% of balls hit here and the average 3B fields 40%, so we'll charge the SS with 60% of the hit and the 3B with 40%" - because then you're getting into the realm of assuming a certain static positioning which may not be true for that team at that moment under that circumstance.
-- MWE
Mike, good point on the static positioning. What this really means is that you've now added more variables: intent of runner and intent of batter. If you have a good data recording system, these are two more things that an analyst can adjust for.
I agree with Vinay. The best approach is from the ZR perspective, and the more variables you can account for, the clearer the picture you have. Maybe all these variables won't add up to a hill of beans, and maybe they'll add up to a hill.
I agree, not having pbp data is a problem.
Not entirely. If you improve the accuracy for 50 players a little bit, and decrease the accuracy for 10 players a lot, you don't necessarily have a more accurate *system*, even though you've improved the accuracy for more *players* than not. The magnitude of the individual errors is what matters, because we're trying to devise a system that captures each individual player's value as accurately as possible.
There's an old joke about a guy who sticks one foot into a pail of extremely hot water and the other foot into a bucket of ice. *On average*, he won't be uncomfortable - but he's likely to be dealing with one burned foot and one frostbitten foot. So it is with most methods that try to measure individual performance against the combined performance of the group - those individuals who perform in contexts that vary *most* from the group norm will be unfairly evaluated, and the errors in their performance will usually be larger than the increase in accuracy that you get from those who are *closer* to the group norm.
This is also a problem with offensive methods (like the RC approach that James takes in Win Shares) that *force* offensive performance to add to the team totals. The best players and the worst players will see their performance values distorted in ways that likely increase the distance between their actual value to the team and the value that the model projects.
-- MWE
You must be Registered and Logged In to post comments.
<< Back to main