Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Tuesday, September 03, 2002

Adjusted Range Factor (Or Why Darin Erstad Deserves Serious Consideration as AL MVP)

David Vaillette proposes a *new* defensive stat.

Adjusted Range Factor (Or Why Darin Erstad Deserves Serious Consideration   as AL MVP)

I was reading something in Rob Neyer the other day about the fact that there   is proof that good defense keeps balls that are in play (Base Hits that are   not home runs and outs that are not strike outs) from being hits and not pitching.?   That is, there is proof that it is not a skill that appears constant from   year to year for pitchers.? (The exception is apparently knuckleballers.)?   I have also read that a good stat for total team defense is outs/balls in   play:

Outs = IP*3-K

Balls in Play = H-HR+IP*3-K

Team Defensive Rating = (IP*3-K)/(H-HR+IP-K)

It also appears to me that Zone Rating is the single most useless stat ever   computed for baseball.

What I think is a good stat is Range Factor, but it?s kind of the Avg. of   Fielding; the most useful single stat (OPS is a composite stat, and RC is   a even more a composite stat) but still potentially misleading.? For instance   do Darin Erstad?s and Wendell Magee?s current (7/15/2001) RFs of 3.71 and   3.36 mean they are both great Center Fielders?? My answer, Erstad yes, Magee   no, read on.

(Note the following are from team stats I compiled on or about July 6th.)

How are RFs different from team to team?? What will the RFs of the Diamondbacks   look like who have to perform behind Randy Johnson and Curt Shilling?? Obviously   they won?t look as good.? What about Outfielders and Infielders who play behind   Derek Lowe?? The infielders will have their stats padded, and the outfielders   stats will look worse.? And what if you are a player on the Atlanta braves   with the best defense in the league.? On the season they make 12 outs for   every 15 balls in play, while an average team makes 12 outs for every 16 balls   in play.? The have essentially robbed themselves of 1 chance to make a play   due to their efficiency.

The answer is, obviously to adjust.

The first adjustment is to compute a team?s Defensive Efficiency to the league   average.? For instance Atlanta?s DE .807 and the league average was .769,   so each member of Atlanta has their RF multiplied by .807/.769. (=1.049)?   Detroit has each member of their team multiply their RF by .736/.769 (=.959)

The 2nd Adjustment is to compute the average in-play outs per   inning for each team, or take out the strikeouts.? For instance Detroit, whose   pitchers don?t strike anybody out, has 2.416 outs per inning pitched and Arizona,   who leads the league in Ks has an average of 2.079 outs per inning pitched.?   The league average is 2.250, so for Detroit the adjustment would be 2.250/2.416   (=.93) and for Arizona, the adjustment would be 2.250 / 2.079 (=1.082).

The 3rd and final adjustment I make is an estimation of ground   balls to fly balls.? The estimation I use is a little bit lazy.? One could   and should figure out the amount of PO by outfielders, but instead I figure   out a ratio of Assists/Outs in play.? The team with the most Assists/Outs   in play is the Montreal Expos with a ratio of .570.? The Team with the least   is the Minnesota Twins with a Ratio of .459.? The league average is .512.?   Now it may be that the reason more outs are recorded by Minnesota outfielders   than any other team could be that they have fly ball pitchers on their team,   or the reason could be that they have good outfielders and bad infielders.?   I take a mid point between these two views and compute an outfield RF adjustment   of (.459/.512+1)/2 (=.948) for the Twins Outfielders and (.512/.459+1)/2 (=1.0578)   for the infielders.

My Adjusted RF subtotal is RF*Adj1*Adj2*Adj3.? Then I subtract a player?s   error rate per 9 innings for my final ?Adjusted RF?.? In my numbers there   are a couple of other things I haven?t told you about above.? I have subtracted   out Catcher Defensive numbers (Mostly just Caught Stealing) there are some   other adjustments that could be made, and others that could be made better   (Should there be a park adjustment?? A Left Right Adjustment?? Should there   be an estimation of infield flies (Team Assists ? Outfield Assists- Catcher   Assists ? IB PO)? This is what I am currently going with. ?And, as, from what   I read, Bill James would expect, and you might not, the results are not what   you would expect.

The 2002 All Defensive team so far:

AL?

C? Einar Diaz

1B Mike Sweeney

2B Adam Kennedy

SS Mike Bordick

3B Robin Ventura

???? Corey Koskie (due to the imperfections I mention Above, I have to consider   this a tie)

LF Jacque Jones

CF Darin Erstad

RF Matt Lawton

NL

C? Damian Miller

1B Eric Karos

2B Jeff Kent

3B Craig Counsell

SS Jose Hernandez

LF Eric Owens (!?!?)

CF Jim Edmonds

???? Juan Encarnacion (tie)

RF Richard Hidalgo

AL Defensive MVP ? Darin Erstad or Mike Borkick

NL Defensive MVP ? Craig Counsell or Jose Hernandez.

Note1: For 1B I use A + DP/2 for a Range Factor Rating.? For Catchers I use   a different rating all together.

Note 2:? One problem with this system is its inability to rate an outfielder?s   arm.? For instance Outfield assists are usually considered a good rating of   an outfielder?s arm, but it is really a rating of an outfielder?s reputation.?   Nobody runs on Mondesi anymore, so he doesn?t get assists anymore.? Ivan Rodriguez   is among the last in the league in Catcher Assists because people have learned   not to run on him either.

So, using this tool, Darin Erstad has an adjusted RF of 3.56.? The team he   plays on seems to have some fly ball pitchers (or is it just because he is   so good?) but they are the 3rd best defensive team in the league.?   Mean while Magee plays for the worst defensive team in the league, and the   worst strikeout team in the league.? He is having his RF padded, and his adjusted   RF is 2.88.? This is still well above the league average of 2.65, but it is   not insanely high like Erstad?s is.? In my estimation he takes 1.16 hits per   game away from the other team per game.? If you are watching him every day,   he will get to a ball that the average centerfielder will not get to once   a game, and that no other major league center fielder will get to every other   game.? He has probably saved his pitching staff 46 runs during the year.?   Offensively he is only above average, but defensively he is in a class by   himself by him self.? Alex Rodriguez has been good, but he is only a slightly   above average SS (4.40 vs. league avg of 4.35).? Mike Sweeney has been incredible   Offensively and Defensively, but Erstad is playing for a contender, and Sweeney   is not.

Last note How the hell does a 37-year-old (Mike Bordick) play such good defense   at SS?

 

David Vaillette Posted: September 03, 2002 at 06:00 AM | 13 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Mike Emeigh Posted: September 03, 2002 at 12:44 AM (#606083)
Interesting analysis. A couple of comments:

1. Infielder assists should be adjusted for DPs. This affects both the individual fielder estimates and the assists/outs in play estimate, which explains (to some extent) why the Expos rank so high and the Twins so low.

2. DPs at 1B should be adjusted for the number of runners on first base, otherwise you reward a player for playing behind a lousy pitching staff (see below).

3. Mike Sweeney looks good because (a) he almost never records an unassisted putout at 1B, getting lots of 3-1 assists on balls that other 1Bs take themselves, and (b) he gets extra credit for KC's lousy pitching staff because of the use of DPs in his RF. Bill James noted the first of these effects, and makes an appropriate adjustment for it in the Win Shares system. You can't ignore unassisted putouts when evaluating 1Bs.

I haven't run any of the numbers through the more complicated system that Charlie Saeger and I use, but I'd be stunned if Erstad's impact is anything like 46 runs. My feeling is that the simple ratios proposed here leads to adjustment factors that are *far* too large, overstating the real differences between players.

-- MWE
   2. Tom Austin Posted: September 03, 2002 at 12:44 AM (#606088)
On the season they make 12 outs for every 15 balls in play, while an average team makes 12 outs for every 16 balls in play. The have essentially robbed themselves of 1 chance to make a play due to their efficiency.
>>>>>

I don't think so. Hold the variables equal. The rest of the league makes 12 plays in 16 chances, while the Braves will make 12.8 plays in 16 chances. They have essentially turned 0.8 of a hit into an out because of their efficiency.

If you're adjusting RF upward because of a player coming from an efficient defensive team, you're double-counting somewhere.

And I know this is counter to the stathead "you can't judge defense by watching players", but as soon as I saw Jeff Kent as the NL's best defensive second baseman, my "sanity check" button started glowing red.


   3. tangotiger Posted: September 03, 2002 at 12:44 AM (#606098)
I agree with Vinay.

Yes, ZR has its bias. What doesn't? What does OBA measure? % of times reached base. So, Andre Dawson doesn't do so good. Does that mean he's not a good hitter?

What does ZR measure? The % of outs per ball in "zone". It's not the "best fielder stat". So, some players don't do so good because they have lots more balls hit in the "outlying" area of the zone (i.e., not all opps are equal).

These things can be adjusted for. If the field, the pitchers, the batters, and teammates all have some effect on a player's ZR, then look for those reasons, and adjust them accordingly.
   4. Charles Saeger Posted: September 04, 2002 at 12:45 AM (#606101)
A few other notes ... rewriting the article from scratch, since I rewrote a large portion of the system and I was getting questions about how to work things in detail.

* First basemen on bad teams are more likely to make the 3-1 flip rather than take the play unassisted. I couldn't tell you why this is either.

* Left/Right numbers are only really important for first and third basemen.

* You always need to adjust for runners on base with double plays and outfield assists.
   5. Mike Emeigh Posted: September 04, 2002 at 12:45 AM (#606102)
What does ZR measure? The % of outs per ball in "zone". It's not the "best fielder stat". So, some players don't do so good because they have lots more balls hit in the "outlying" area of the zone (i.e., not all opps are equal).

That is not the only reason why players don't perform well in ZR. Fenway Park left fielders can't perform well in ZR, because they can't play deep enough to cover all of the zones assigned to a left fielder thanks to the Monster. Neither can Houston left fielders or Baltimore right fielders, for the same reason.

The problem is that you don't know why a player's ZR is low, and because you don't know why a player's ZR is low you can't use it to evaluate his fielding skill. It's like using unadjusted OPS to evaluate a hitter's skill. If you do that, you're going to conclude that Colorado has a lot of great hitters.

These things can be adjusted for. If the field, the pitchers, the batters, and teammates all have some effect on a player's ZR, then look for those reasons, and adjust them accordingly.

These adjustments are hardly simple. The area of the field that a fielder can normally cover is highly situation dependent, with all of the factors above plus the baserunner situation, the time in the game (early/late innings), and the game score affecting the fielder's ability to cover certain sections of the field. As an example - middle infielders are positioned at normal depth with no runner on 1B, play at double-play depth (which is a few steps in closer and toward the middle to allow them to shorten the distance they have to go to get to 2B) with a runner on first and fewer than two outs, and often play in with a runner on 3B and fewer than two outs. Those different positions by themselves change the area they are likely to be able to cover.

Personally, I think we ought not to worry about making fine gradations. Just take the Project Scoresheet/Retrosheet ball location diagram, and use the zone definitions there as an upper bound on opportunities. For example, ground balls hit into the area marked "56" are the responsibility of both the SS and the 3B. If the ball gets through for a hit, both players are penalized, period - no effort or judgment needed to figure out which of those fielders should have made the play, or whether the ball was in fact *fieldable*. (This is, more or less, the Defensive Average approach.) Players shouldn't necessarily be excused for *degree of difficulty* - if the ball is going there frequently against them, that suggests that the defensive positioning needs to be changed to make those plays easier.

I'll have more to say on this subject in a later article.

-- MWE
   6. tangotiger Posted: September 04, 2002 at 12:45 AM (#606104)
Mike, I agree that the 24 base-out states and the inning/score also play a role in how many balls are converted into outs.

I never said it was easy, but certainly it doesn't make the ZR approach "useless".

I believe that the UZR approach is a great first step, and MGL (Mitchel L) has refined it to include the park effects (including for Fenway and Coors et al). It's just a matter of capturing all the other variables that we know about.
   7. Charles Saeger Posted: September 04, 2002 at 12:45 AM (#606111)
The reason the runs total is so high is because the author is too certain of its value.

When a player makes a bunch of extra plays, we should probably assume that he turned a few hits into outs, but also he snared a few outs his teammates could have grabbed.

When I figure CAD, the total rating for the infielders summed together is about three times higher than a combined rating (evaluating the infielders as one unit), looking at the plus/minus score versus average. Outfielders are about 50% higher (I divide hits into groundball and flyball). Thus, when finding the run value, multiply the hit value (0.59+lg.runs/lg.bfp) by 1/3 for infielders and 2/3 for outfielders.

The above is a mathematical phenomenon. More to the point, we just are not that certain. Saving 30 runs more than an average fielder is a phenomenal total, almost certainly league-leading (for all positions), if not leading both leagues.

Still, David's approach has merit, largely as a quick-and-dirty measure of fielding. I suppose this would be best:

* For infielders, divide assists per game by the team assists per game, and multiply by the league assists per game.
* For outfielders, divide putouts per game by the team putouts less strikeouts less assists per game, and multiply by the league putouts less strikeouts less assists per game. Same story for third basemen's putouts, if you really care.
* For putouts by first basemen, second basemen and shortstops, you start to run into problems. The biggest issue here is quick-and-dirty does not work well. I suppose Bill James's assumption of 84% of assists by infielders are fielded by first basemen works alright, and you could make similar estimates with third baseman and shortstop assists for second baseman putouts, and with second baseman assists and the Bill James figure of first baseman assists less pitcher putouts for shortstops. Figure as a percentage of team putouts less strikeouts less assists (keep assists for first basemen), adjust to league, add back league average dross.
* Multiply all range factors by league DER/team DER.
* Figure double plays as a percentage of assists times runners divided by batters facing pitcher, adjust to league. Figure outfield assists as a percentage of runners times putouts less strikeouts divided by batters facing pitcher, adjust to league. Figure catcher assists less passed balls as a percentage of runners less second basemen putouts, adjust to league and laugh at the result.

My biggest issue with all this is you are about two steps away from figuring CAD, Win Shares fielding or BP fielding, so I don't see the reason to stop (though at this point, the really nasty work begins). The biggest advantage is you have a range factor, and people inherently understand this.
   8. Alan Shank Posted: September 05, 2002 at 12:45 AM (#606127)
It seems to me that all this adjusting of Range Factor is a lot of guesswork. We have the direction/distance data; why do we have to guess about all this left/right pitching, strikeout pitching, etc.?

I've said it before and I'll say it again. Mike Gimbel's system handles the park problem, because it compares each player against his opponents in his own stadium and on the road. It also takes much of the guesswork about assigning run values to fielding, because he takes into account extra-base hits as well as plays made/not made.

Cheers,
Alan Shank
   9. Mike Emeigh Posted: September 06, 2002 at 12:45 AM (#606135)
It seems to me that all this adjusting of Range Factor is a lot of guesswork. We have the direction/distance data; why do we have to guess about all this left/right pitching, strikeout pitching, etc.?

Because most of us *don't* have it.

What tends to be forgotten he is that the direction/distance data is only available if you are willing to pay for it. For many of us, that's not an option - so we're forced to work from what we have available to us for free.

I've said it before and I'll say it again. Mike Gimbel's system handles the park problem, because it compares each player against his opponents in his own stadium and on the road. It also takes much of the guesswork about assigning run values to fielding, because he takes into account extra-base hits as well as plays made/not made.

Same issue - it doesn't do us any good if it's not freely available.

I am NOT suggesting, by the way, that any of that stuff *should* be freely available; I support the right of folks to make money from their intellectual property if they so choose. But they don't help those of us who want to do the best job we can do without spending significant money out of pocket.

-- MWE
   10. Mike Emeigh Posted: September 06, 2002 at 12:45 AM (#606136)
I still feel that starting with a DA/UZR approach and adjusting for biases is better than starting with a RF and adjusting.

The biggest bias with DA/UZR is inherent in the system from the beginning - the assertion that you can (statically) define a coverage expectation for any player.

Suppose a runner is on first base and breaks for second as the pitcher moves toward the plate. The shortstop moves over to cover. The coverage expectation for that shortstop has just changed! Or suppose the hitter looks like he's going to bunt, and you wheel the infielders around based on the presumption that the hitter will bunt. Guess what? You just changed the coverage expectation again. I saw this happen Wednesday night in the Jacksonville/Carolina Southern League playoff game - Suns' OF Jesus Feliciano hit a weak little flare off the fists that the shortstop would normally have put into his hip pocket, except that the Mudcats had a bunt play on, Ronnie Merrill was racing to cover second and by the time he got turned around it was too late.

The point is that *any* system with an underlying static coverage expectation assignment is biased from the beginning, and the biases can't be adjusted out of the system. If you want to use a zone-based approach, then penalize players equally when a ball gets through an area that either could reach within the range of locations at which he could normally be positioned. If a grounder goes through the SS hole, for example, charge it against both the SS and 3B. If a flare drops in short left, charge it against the SS, 3B, and LF. Or if (as happened last night in the Jacksonville/Carolina game) a popup drops between fielders 40 feet from the plate toward the mound, charge the 3B, 1B, and C with the resulting hit. Don't try to say, "well, the average SS fields 60% of balls hit here and the average 3B fields 40%, so we'll charge the SS with 60% of the hit and the 3B with 40%" - because then you're getting into the realm of assuming a certain static positioning which may not be true for that team at that moment under that circumstance.

-- MWE
   11. tangotiger Posted: September 07, 2002 at 12:45 AM (#606141)
Suppose a runner is on first base and breaks for second as the pitcher moves toward the plate. The shortstop moves over to cover. The coverage expectation for that shortstop has just changed! Or suppose the hitter looks like he's going to bunt, and you wheel the infielders around based on the presumption that the hitter will bunt. Guess what? You just changed the coverage expectation again.

Mike, good point on the static positioning. What this really means is that you've now added more variables: intent of runner and intent of batter. If you have a good data recording system, these are two more things that an analyst can adjust for.

I agree with Vinay. The best approach is from the ZR perspective, and the more variables you can account for, the clearer the picture you have. Maybe all these variables won't add up to a hill of beans, and maybe they'll add up to a hill.

I agree, not having pbp data is a problem.
   12. Mike Emeigh Posted: September 09, 2002 at 12:45 AM (#606155)
I guess I don't really understand Mike's point. Charging the SS for 60% of the hit and the 3B 40% is not supposed to be taken to mean that this breakdown is correct for every qualifying ball in play, nor does it mean that the 60/40 improves the rating for every player season. It simply means that the cases of improved accuracy outweigh the cases of decreased accuracy, and that, overall, we have a more accuate result. No?

Not entirely. If you improve the accuracy for 50 players a little bit, and decrease the accuracy for 10 players a lot, you don't necessarily have a more accurate *system*, even though you've improved the accuracy for more *players* than not. The magnitude of the individual errors is what matters, because we're trying to devise a system that captures each individual player's value as accurately as possible.

There's an old joke about a guy who sticks one foot into a pail of extremely hot water and the other foot into a bucket of ice. *On average*, he won't be uncomfortable - but he's likely to be dealing with one burned foot and one frostbitten foot. So it is with most methods that try to measure individual performance against the combined performance of the group - those individuals who perform in contexts that vary *most* from the group norm will be unfairly evaluated, and the errors in their performance will usually be larger than the increase in accuracy that you get from those who are *closer* to the group norm.

This is also a problem with offensive methods (like the RC approach that James takes in Win Shares) that *force* offensive performance to add to the team totals. The best players and the worst players will see their performance values distorted in ways that likely increase the distance between their actual value to the team and the value that the model projects.

-- MWE
   13. Charles Saeger Posted: September 10, 2002 at 12:45 AM (#606157)
Just to clarify a little something ... Win Shares cannot work properly without this reconciliation step. When adapting XRuns or Base Runs or MLV or whatever for Win Shares, you need to make sure you do this. (Yeah, I know, this is already built into Base Runs.)

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Andere Richtingen
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.3846 seconds
60 querie(s) executed