Page rendered in 1.0327 seconds
66 querie(s) executed
— Where BTF's Members Investigate the Grand Old Game
Friday, March 14, 2003
Ultimate Zone Rating (UZR), Part 1
Everything you wanted to know about Ultimate Zone Rating.
This article will describe the basic methodology of the newly revised fielding system used in Super-LWTS.
Background ? RF, ZR, UZR
Bill James introduced us to Range Factor (RF) as, essentially, the number of outs made per game. This was convenient at the time, because we had no other context but the game. The problem is that a game is not necessarily nine innings for each fielder. As well, each fielder is dependent on his pitching staff and "luck" for opportunities.
STATS began tracking Zone Rating (ZR) as, essentially, the total number of outs per balls in a fielder?s "area of responsibility" (i.e., zone). This addressed some of the shortcomings of RF. However, STATS ZR has many shortcomings of its own. For example, each fielder is only given one zone. We know that it?s much easier to convert a ball in play into an out if the ball is hit near you, rather than on the fringes of the zone.
The following table shows the RF and STATS ZR for all 2002 SS in the NL and AL:
STATS expanded on ZR by creating sub-zones. You can take the average out-conversion rates by sub-zone and apply this rate to the number of balls in play for each fielder for each sub-zone to establish a baseline. This baseline will show the number of outs an average fielder would have had, had he received the same number of balls in play for each sub-zone that our specific fielder received. This is, essentially, UZR, or Ultimate Zone Rating. There are other contexts that have not been considered, which we will get to later on.
The following is the methodology followed in calculating my basic version of UZR (MGL?s basic UZR).
Note: Before I begin explaining the UZR system, keep in mind that there are at least two components of defense that UZR does not address: One, an outfielder?s "arm", and two, an infielder?s skill at turning the double play. Of course, these skills can be measured (and they are in my Super-LWTS system). They are just not included in UZR. Like ZR, UZR is designed to measure and quantify only that skill which enables a fielder to turn batted balls into outs.
UZR rate is expressed as a fraction of 1, the same as a simple ZR (ZR). A UZR rate means essentially the same thing as a simple ZR ? namely the number of balls fielded (turned into at least one out) divided by the number of chances; however, UZR rate is a weighted average of a player?s ZR in each of several zones.
As you will see, UZR rate is really a by-product of UZR runs, and UZR runs is the heart of the UZR system. It represents the value of a fielder?s performance expressed as runs saved or cost, in comparison to an average fielder (actually in comparison to the mean performance of all fielders) at that position, in that player?s league, and during that particular year. UZR runs is the defensive counterpart of Palmer?s offensive linear weights (lwts); thus it can be combined with lwts (among other things) to give you an estimate of a player?s total offensive and defensive value. Any player with an average defensive performance will, by definition, have exactly zero UZR runs.
The entire field is broken down into 78 zones. These are the same zones you can find in the hit location diagram in the documentation section of the retrosheet website (www.retrosheet.org). Of these, UZR uses 64 of them. For infielders, only ground balls, including bunts, are looked at. Pop files caught or landing on an infield zone are excluded, as are line drives caught by an infielder or hit through the infield. For outfielders, all fair fly balls and line drives are included. None of the foul zones are used in UZR except for 3F and 3L, which are near the first and third base bags (for fair ground balls fielded in foul territory behind the bags). Catchers and pitchers are not included in UZR ratings.
For each zone, the computer keeps track of the following on a league-wide (for a particular year) basis:
At the same time, the computer keeps track of the total number of fielding errors for each fielding position, but not for each zone individually. Actually it compiles fielding errors in two separate categories: One, ROE errors, are fielding errors that result in an ROE. All other errors, such as on a hit, or a second error on an ROE, are called non-ROE errors.
For example, here is the 2002 league-wide data for zone 56 (the area between the third baseman and the SS):
For each player at each fielding position (e.g. Rey Sanchez at SS is one entity and Rey Sanchez at 2B is another entity), and for each zone, the computer also compiles the following information:
ROE and non-ROE fielding errors are compiled separately for each player, but again, not by individual zones.
For example, the 2002 data in zone 56 for Mike Bordick, while playing SS, looks like this:
Remember that these numbers represent ground ball outs in zone 56 recorded by Bordick while playing SS and all ground ball hits in that zone while Bordick was playing SS.
Now, here comes the tricky part.
How a Player?s UZR Runs is Calculated in each Zone
For now, all ROE errors are treated as outs (this is necessary because, at first, we use outs and ROE?s to establish the number of balls that a fielder gets to and because some ROE errors are committed by the receiver of a throw, in which case the fielder needs to get credit for an out). In the above data, "outs" actually refers to "outs plus ROE errors". Errors (ROE and non-ROE) will be accounted for later on.
Let?s use the data to calculate Mike Bordick?s UZR runs in zone 56. First we establish the out rate for all ground balls hit into zone 56. That is 1419 divided by 2474 (1419 plus 1055), or .57. That is, 57% of all ground balls hit into zone 56 in 2002 were turned into outs (by all fielders).
Therefore, the "extra" value of a "caught ball" by a fielder in zone 56 is 1 minus .57, or .43 balls. Since Bordick caught 18 balls in zone 56, he has 18 times .43, or 7.7 "extra" caught balls so far.
Now what about the hits? There were 79 hits in zone 56 while Bordick was playing SS. Surely he is not responsible for all of those hits. How many is he responsible for? Well, since an average SS catches 294 balls in zone 56 out of 1419, or 20.7% of the outs, Bordick is responsible for 20.7% of the 79 hits as well, or 16.4 hits (the third baseman is responsible for the other 62.6 hits). I told you it was going to be tricky!
Now, just like the "extra" positive value of a "caught ball" is 1 minus .57, the "extra" negative value of a hit is the .57 itself (an average ball hit into zone 56 gets caught 57% of the time, so when a ball isn?t caught, the responsible fielders, in this case the SS and third baseman, get "docked" .57 balls). Since Bordick is responsible for 16.4 of the 79 hits in zone 56, he has 16.4 times .57, or 9.4 "negative" caught balls added to his 7.7 "positive" ones, for a total of -1.7 "extra" caught balls. In other words, given the number of balls hit into zone 56 while Bordick was at SS, he caught 1.7 fewer balls than the average SS in the AL in 2002.
Now we want to convert those "extra" balls into runs saved or cost. For that, we use the average run value of a hit in zone 56 - which is .47 runs. Since a 2002 AL out is worth -.29 runs, the "swing" between an out and a hit is .47 plus .29, or .76 runs. Since Bordick caught 1.7 fewer balls in zone 56 than an average SS, he has cost his team 1.7 times .76, or 1.3 runs so far (i.e., his UZR runs in zone 56 is ?1.3).
If we do this for every zone in which any SS made at least one out (i.e., the applicable SS zones), and we add up all the runs Bordick saved or cost in each zone, we get a total of +6.2 runs, or 6.2 runs saved by Bordick while playing SS (he must have done well in the other zones).
But wait! What about Bordick?s ROE and non-ROE errors? Remember that I said that a fielder?s ROE errors were thus far treated as outs. Bordick?s 6.2 runs saved assumes that Bordick, and all other SS?s in the AL, turned every ball they got to into an out.
Since that isn?t true (that Bordick and all other SS?s turned every ball gotten to into an out), we now have to factor in Bordick?s ROE errors.
Here?s how (this step is simple):
The average SS committed 169 ROE errors in 5218 balls gotten to (outs plus ROE?s) in all zones. That is an error rate of 169 divided by 5218, or .032. Since Bordick got to a total of 277 balls in all zones, he should have committed .032 times 277, or 8.9 errors. Instead, Bordick committed only 1 error, for a net gain in errors of 7.9. Since an infield error is worth around .49 runs, the swing between an error and an out is .49 plus .29, or .78 runs. Therefore, Bordick saved another .78 times 7.9, or 6.2 runs, by virtue of his "good hands". So far, we have Bordick saving 6.2 runs with his range and another 6.2 runs with his sure hands.
There is one final thing to consider ? Bordick?s non-ROE errors. Like ROE errors, that is easily done.
The average SS committed 45 non-ROE errors and Bordick none. If we do the same calculations as above, using .3 as the value of a non-ROE error, we come up with Bordick saving another .72 runs. So it looks like even at the ripe old age of 36, Mike Bordick saved his team last year a total of 13 runs by virtue of his outstanding play (range and hands) at SS!
UZR Runs and UZR Rates
As I said at the beginning, UZR runs is the heart of the UZR system, and that?s really all you need to know. However, if you want to translate UZR runs into a UZR rate, it can be done, although it?s a bit tricky as well. At first glance, it may seem that the simplest way to calculate a player?s UZR rate from the above data is to simply add up all of his outs (not including the ROE errors) in all the zones and divide by his total chances. What is a "chance", though, in a UZR system that surveys all of the zones for every fielding position? For example, if in a certain zone, all SS?s combined field 3 balls out of 300 in that zone, should all of those 300 balls be considered "chances"?
Actually, we were already half way towards calculating Bordick?s "chances" in zone 56 when we determined how many of the 79 hits he was responsible for (16.4). In fact, the number of "chances" for Bordick in zone 56 is 16.4 (his share of the hits), plus 18 (his outs), or 34.4. In other words, a player?s "chances" in any particular zone is defined as that player?s number of outs plus the number of hits he is responsible for. As you saw above, the number of hits a player is responsible for in any zone is the total number of hits in that zone multiplied by that player?s share of the outs in that zone.
If we add up all of Bordick?s "chances" in each individual zone (in most zones, that will be zero, of course), we get his total "chances". If we then divide his total outs by his total "chances", we get sort of a "global" zone rating. Let?s do the math. Bordick recorded 277 outs and errors minus 1 error, or 276 outs. His total chances were 354. Therefore his ZR for all zones combined was 276 divided by 354, or .780.
Unfortunately, this is not his UZR (it does not correspond to his UZR runs), since it doesn?t weight each zone appropriately. For example, if a player happened to get more than his share of "chances" in "high out" zones, he might necessarily have a higher-than-average "global" ZR, while not necessarily being a better-than-average fielder. Basically, using the above method, the resultant "global" ZR is more like a simple ZR, - not really what we are looking for.
Ultimately, the best way to construct a UZR rate which represents the true value of a fielder in comparison to the UZR rate of an average fielder, and is the equivalent of UZR runs, is the following: We?ll use Mike Bordick as the example again.
First we take the simple ZR for all SS?s, including ROE errors as outs. This is the number of total outs plus ROE errors by all SS?s, divided by the total number of "chances" (outs and ROE errors plus the number of hits a SS is responsible for). In 2002, SS outs plus ROE errors were 5218, and SS "chances" were 6786, for an average ZR of .769.
Next, we multiply that average ZR of .769 by Bordick?s total "chances", which was 354. The result is 272. That is the number of outs plus ROE errors that an average SS would make given those 354 "chances". Bordick, on the other hand, got to (outs and ROE errors) 8.1 more balls than the average SS, for a total of 280.1 balls "gotten to" (272 plus 8.1). Of those 280.1 balls "gotten to", he committed only one error, for a total of 279.1 "outs". As you can see, those 280.1 balls "gotten to" and 279.1 "outs" are a "fiction", as Bordick actually got to 277 balls and recorded 276 outs.
Nevertheless, if we divide the 279.1 "fictional outs" by his 354 chances, we get a UZR rate for Bordick of .788, which should correspond almost exactly to his 13 total runs saved (UZR rate doesn?t account for non-ROE errors [UZR runs does], but we could certainly "fudge it in" if we wanted to).
Here is the UZR data for the same NL and AL SS:
If you compare the above charts with those at the beginning of this article, you will see that STATS simple ZR correlates very well with UZR. This suggests that ZR is a pretty good measure of fielder ability, assuming that UZR is the "gold" standard.
On the other hand, RF does not seem to correlate well with either ZR or UZR, suggesting that it is not a good measure of fielding ability (assuming that ZR and UZR are). In fact, if you look closely at the above charts, you will see that a player?s RF is almost entirely a function of his number of "chances".
Where do we go from here?
Earlier, I said that there are a number of contexts that have not been considered in the STATS UZR (or in the MGL?s basic UZR). These are:
In my newly revised UZR, these contexts are considered. In part II, I will discuss these adjustments, incorporate them into the UZR calculations, and present the final UZR results. To see how much of an effect these adjustments have on a player?s UZR, you will be able to compare the initial basic results with the final adjusted ones.
To be continued?
You must be logged in to view your Bookmarks.
What do you do with Deacon White?
(17 - 1:12pm, Dec 23)
Last: Alex King
(15 - 12:05am, Oct 18)
Nine (Year) Men Out: Free El Duque!
(67 - 10:46am, May 09)
Who is Shyam Das?
(4 - 8:52pm, Feb 23)
Last: RoyalsRetro (AG#1F)
Greg Spira, RIP
(45 - 10:22pm, Jan 09)
Last: Jonathan Spira
Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010
(5 - 12:50am, Sep 18)
Mike Morgan, the Nexus of the Baseball Universe?
(37 - 12:33pm, Jun 23)
Last: The Keith Law Blog Blah Blah (battlekow)
Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011
(2 - 8:03pm, May 16)
Last: Diamond Research
Retrosheet Semi-Annual Site Update!
(4 - 4:07pm, Nov 18)
What Might Work in the World Series, 2010 Edition
(5 - 3:27pm, Nov 12)
Last: fra paolo
Predicting the 2010 Playoffs
(11 - 5:21pm, Oct 20)
SABR 40: Impressions of a First-Time Attendee
(5 - 11:12pm, Aug 19)
Last: Joe Bivens, Minor Genius
St. Louis Cardinals Midseason Report
(12 - 12:42am, Aug 10)
Napoleon Lajoie: Definition of Grace
(9 - 12:38am, Jul 01)
Last: Hang down your head, Tom Foley
Youth Baseball Hitting Drills: Shine the Light
(5 - 6:47am, Mar 11)
Last: Pat Rapper's Delight