— Where BTF's Members Investigate the Grand Old Game
Friday, March 21, 2003
Ultimate Zone Rating (UZR), Part 2
Adjusting for everything under the sun, or dome, or whatever the case may be.
Note: A Primer reader, and researcher in his own right (Shorty), diligently pointed out to me a possible error in the methodology described in Part I of this series. Rather than the two-part (and confusing) process I used to determine the number of balls, and hence, runs, cost or saved by a fielder in each zone, he suggested that a one-part and much simpler process can and should be used. His suggested methodology is as follows:
To determine a fielder?s "balls cost or saved" in each zone, first the fielder?s out percentage is determined by dividing his "balls caught" (his outs) by the total balls in play in that zone (hits plus outs) when that player is on the field, and then subtracting the league outs in that zone (made by a fielder at that position only) divided by the league balls in play in that zone, and multiplying the difference by the total balls in play in that zone when that player is on the field. I know it still sounds a little complicated, but it is actually simple, straightforward, and obvious. More importantly, it may be more accurate than the old methodology. Basically it is a fielder?s simple zone rating in each zone minus an average fielder?s (at that position) simple zone rating in that zone, multiplied by the fielder?s total chances in that zone. As I stated in Part I, a UZR is essentially the weighted average of a player?s simple ZR in every zone on the field.
Since Shorty?s comments were posted on Primer, several other equally credible readers and researchers have suggested that the original methodology may in fact be better - for various reasons that I won?t go into. To be honest, I have gone back and forth between the two methodologies (and others) for several years now, and I?m not even sure anymore why I settled on the current one. Well, after some more rumination and sleepless nights, and invaluable help from Primer friends and colleagues, I have decided that Shorty is right - the simpler methodology is the more correct one.
At the end of this article, you will find updated unadjusted UZR runs (using the simpler methodology) for the 2002 NL and AL shortstops, as well as the corresponding adjusted results (also using the new methodology), which is the subject of this article.
Anyway, let?s get on with the UZR adjustments?
Since my original UZR came out several years ago, it has been suggested that there are a number of factors which might influence a fielder?s ability to turn a batted ball into an out other than what zone that ball is hit into. These are noted in Part I and are reiterated below:
Actually, park factors have always been included in my UZR ratings. This year, I made some minor changes. I?ll explain what changes were made, how UZR park factors are calculated, and how they are applied to the UZR player ratings.
Originally, I used separate infield park factors for each of the infield positions, reasoning that what effect a park?s peculiarities had on one infield position (e.g., the condition of the turf, lighting, glare from the sun, etc.) might not be the same for another infield position.
At some point, however, I decided to use only one infield park factor for all infield positions. The reason for the change was two-fold: One, this considerably increased my sample size ? thus the resultant infield park factors were more reliable. Two, I figured that the primary influence on an infield park factor was the condition (and type) of the turf. Given that, I figured that the infield park factor should be more or less the same for all infield positions. Right or wrong, that is how it is presently done.
For the outfield park factors, I have always separated the OF into three segments and assigned a separate park factor to each segment ? LF, CF, and RF. The LF segment consists of all 7L, 7, and 78 zones (see the retrosheet zones). The CF segment consists of all 8L and 8R zones, and the RF segment consists of all 89, 9, and 9L zones.
You could certainly make an argument for wanting more granularity in the OF park factors (and probably for the IF park factors as well), but as with all complex methodologies, you often reach a point of diminishing returns. As well, in determining the level of granularity in any methodology, there is always a balance that must be attained between rigor and sample size. In fact, I grappled with this issue (rigor versus sample size) many times throughout the process of adjusting the UZR ratings.
The final change that was made this year, in terms of the park adjustments, was that I used a larger sample size for each park (up to ten years), and I was more careful in accounting for situations which might have affected the park factors (such as changes in OF dimensions, fence heights, turf type, etc.). As you will see in the 2002 park factor chart below, I used data since the 1993 season, and treated a park as separate as long as no material changes were made. If a material change was made to a park (such as replacing artificial turf with natural grass, or changing an OF dimension), I treated the "renovated" park as a completely different park. If a change was made to the OF but not to the IF, I treated the renovated park as new for the OF park factors but not for the IF park factors (and vice versa). Thus, some parks (like Wrigley Field and Yankee Stadium) have 10 years of data that go into their IF and OF park factors, other parks (Turner Field, Coors Field, et al.) have less than 10 years for their IF and OF factors, while still others (like the new Comisky Park) have x years for their IF factors and y years for their OF factors (the OF dimensions in Comisky were changed in 2001).
UZR park factors are calculated in the same way that regular park factors are calculated. For the IF, the home (home and road teams combined) groundball out percentage is divided by 1/14th (or 1/16th, depending upon the number of parks in the league) of the home GB out percentage plus 13/14th (or 15/16th) of the road (again, home and road teams combined) GB out percentage. (Actually, the computer is programmed to use 1/15th and 14/15th for all leagues and years.) Road game data is of course for that team?s road games only.
For the OF, the same calculations are done, using flyball and line drive out percentages (FB?s and LD?s are treated as if they were the same) rather than GB out percentage. The same OF park factor is applied to fly balls and line drives. As I said earlier, separate calculations are done for the LF, CF, and RF zones in the OF. Errors in the IF and OF are treated as outs. A ground ball error park factor is also calculated, using the number of GB errors divided by the number of GB outs plus errors.
When all is said and done, for each "park", we get an IF park factor, a LF PF, a CF PF, a RF PF, and a GB error PF. I put the word "park" in quotes because, as I explained above, two different "parks" may actually be the same park with different dimensions and/or different turf.
The chart below contains the 2002 regressed park factors for all 30 NL and AL parks. These factors are used to park adjust the 2002 UZR player ratings. Park factors are regressed according to the size of the sample data.
The UZR park factors are applied in the same way that some of the other UZR adjustments are applied, and in the same way that most offensive park factors are applied.
(Note: Technically this is not the correct way to apply park factors - however it is close enough.)
When an out is recorded by a particular fielder, rather than crediting that fielder with exactly one out, I credit him with "one divided by the park factor" number of outs. For example, the infield park factor at Coors Field is .97, so for every out recorded by an infielder in Coors Field, he gets credited with one divided by .97, or 1.03 outs. Every out for every fielder is park adjusted in this way, depending upon what park the out was recorded in and what the corresponding park factor is for that park at that position. In other words, outs in a fielder?s home park are not the only outs that are park adjusted ? all outs are park adjusted.
Batted Ball Speed
This one is obvious, yet this is the first year that I was able to obtain the speed of every batted ball (since 1999), as judged by the same people ("stringers" I think they call them) who record batted ball type and location. By the way, all of my play-by-play data is courtesy of two independent sources. One is Gary Gillette and Pete Palmer and the other is STATS Inc. I believe that STATS, at least, uses three "stringers" for each game, and somehow combines their judgments, in order to reduce human error and bias.
Anyway, each ball in play is designated (by the "stringers") as hard, medium, or soft. For ground balls, the meaning of these designations is obvious. For bunts, fly balls, and especially pop files, it is not (things like height, distance and trajectory are considered for fly balls and pop-ups). The important thing is that all of the "stringers" are reasonably consistent. From working with the data, I am fairly confident they are.
Here is an example of how the GB out percentages change in the various IF zones, depending upon the speed of the ground ball:
GB Out Percentages by Zone and Batted Ball Speed
How the OF fly ball and line drive out percentages vary with the speed of the ball depends upon the OF zone. For example, softly hit fly balls are harder to catch in the short outfield zones. The opposite is true in the medium and deep OF zones.
FB Out Percentages by Fly Ball Depth and Batted Ball Speed
How is the batted ball speed applied to UZR? Rather than using a park factor type adjustment, I opted to "split" each zone into six separate "sub-zones", and keep track of player outs and chances and league outs and chances separately in each "sub-zone". Why six and not three (soft, medium, and hard)? Well, I also "tacked on" the handedness of the batter, which is another important adjustment, as you will see later on. In other words, a fielder?s runs saved or cost is calculated six separate times for each zone on the field. I warned you that there were going to be lots of "rigor versus sample size" issues in the UZR adjustments!
Well, I already mentioned above that the handedness of the batter significantly affects the out rate in the various zones for both the IF and the OF. The reason for this is three-fold: One, the positioning of the fielders change, so that for example, the SS catches more balls in zone 56 (the SS hole) with a RHB at the plate than with a LHB (he is presumably shaded towards the hole). Two, when a batter pulls a ground ball, it is a weaker hit on the average, and when a batter pulls a fly ball or line drive, it is generally hit harder and further (the opposite of the ground ball). Three, RHB?s and LHB?s, as a group, hit the ball differently, even after accounting for which side of the field they tend to hit to.
Here are some examples as to how GB outs in the various zones are affected by the handedness of the batter:
GB Out Percentage and Batter Handedness
FB Out Percentage and Batter Handedness
LD Out Percentages and Batter Handedness
As I explained above, the way I adjust UZR for batter handedness is the same as the way I adjust for batted ball speed. I keep track of LHB?s and RHB?s separately.
Pitcher G/F Ratio
The ground/fly ratio of the pitcher also affects the GB and FB (not so much for LD?s) out rate. Basically, a ground ball pitcher allows ground balls that are easier to field and fly balls that are more difficult to field. The opposite is true for fly ball pitchers. The more extreme a pitcher?s G/F ratio, the more pronounced the effect.
Originally, I assumed that if I controlled for ground ball and fly ball speed, the differences would disappear (IOW, that ground ball pitchers allowed a greater percentage of soft ground balls, etc.). I figured then, that since I was already accounting for batted ball speed, I wouldn?t need to account for pitcher G/F ratio. I was wrong. Interestingly, and somewhat inexplicably, when I controlled for batted ball speed, the differences between ground ball and fly ball pitchers were still there and almost as pronounced as before.
(Also, when I controlled for batted ball speed, the differences between LHB?s and RHB?s did not change much either.)
In the following chart, FB pitchers had an average G/F ratio of around .8 and GB pitchers around 2.0. FB pitchers were around 25% of all pitchers with at least 100 PA in a season. Same for GB pitchers.
As it turns out, pitcher G/F ratios do not have much of an affect on a player?s UZR rating for one simple reason (actually two reasons). GB and FB out percentages are not that sensitive to a pitcher?s G/F ratio, most pitchers? G/F ratios are near average (around 1.4), and almost all pitching staffs have near-average G/F ratios (well, maybe that was three reasons). In any case, the way I adjust a fielder?s UZR for his pitching staff?s G/F ratio is to keep track of the average G/F ratio for all pitchers while the fielder is on the field, and then to apply this to his UZR rating - at the end. In fact, I simply adjust a player?s UZR rate (and then, eventually, his UZR runs) by .001 per .1 above or below an average pitcher G/F ratio. For example, if an infielder?s pitching staff had an average G/F ratio of 1.8, since this is .4 more than the average pitcher G/F ratio of 1.4, the infielder?s UZR rate would be reduced by .004 (since those pitchers presumably allow easier ground balls). This is admittedly a very course way to do an adjustment, however considering how relatively unimportant pitcher G/F ratio it is, I think it works just fine.
Finally, we get to the last, but certainly not the least, UZR adjustment. Each infielder?s (but not the outfielders) GB out percentage is significantly influenced by the baserunners and the number of outs (as with the other adjustments, this is not to say that fielders, as a general rule, have markedly different distributions of baserunners and outs ? in fact, they don?t, as you will see from a comparison of the unadjusted and adjusted UZR ratings).
This is mostly due to the positioning of the infielders (e.g., with a runner on first, the first baseman has limited range, with a runner on third and less than two outs, the infield may be playing up, etc.), and to a much lesser extent to the approach of the pitchers and batters (e.g., with two outs, GB out percentages tend to be higher across the board).
Here are the GB out percentages for each "set of zones" and for each of the 24 bases/outs situations:
The "3" zones (all zones beginning with the number "3")
The "4" zones (all zones beginning with the number "4")
The "5" zones (all zones beginning with the number "5")
The "6" zones (all zones beginning with the number "6")
How are the baserunner/outs adjustments handled? Dare I break each sub-zone down further into 24 (the number of bases/outs combinations) more sub-sub-zones? Not a chance! First I go through my ten-year database (93-02) to determine the "adjustment factors" for each infield position and for each of the 24 bases/outs combinations. For example, as you can see above, a simple ZR for an average first baseman for 1993-2002 was .513. With a runner on first only, and 0 outs, however, it was .402. Therefore, the bases/outs adjustment factor for a first baseman and this particular bases/outs combination, is .402/.513, or .784. I use this adjustment factor for all outs recorded by a first baseman, regardless of the zone (yes, I know that each zone should have its own bases/outs adjustment factor, but I can only deal with so much granularity in one lifetime), and I apply the adjustment in the same way that I apply the park factor adjustments ? by dividing each out recorded by the adjustment factor. In the case of a first baseman who records an out with a runner on first base only and no outs, he gets credit for 1/.784, or 1.28 outs (remember - this is technically not the correct way to apply an adjustment factor ? but it is good enough, IMO).
Well, those are about all the non-trivial adjustments that I could think of. If anyone comes up with any more, please keep them to yourself! I still have to crank out the rest of the 2002 revised (new and improved) Super-lwts!
To get an idea as to how all of these adjustments affect a player?s UZR rating, here are the same SS charts I printed in Part I with the adjusted UZR runs added. Note again that I redid the original unadjusted ratings using the new methodology, as described at the beginning of this article, so that the following charts contain adjusted and unadjusted UZR runs using the new methodology only.
2002 NL SS UZR Data
2002 AL SS UZR Data