User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Page rendered in 0.6636 seconds
40 querie(s) executed
You are here > Home > Primate Studies > Discussion
| ||||||||
Primate Studies — Where BTF's Members Investigate the Grand Old Game Friday, March 14, 2003Ultimate Zone Rating (UZR), Part 1Everything you wanted to know about Ultimate Zone Rating. This article will describe the basic methodology of the newly revised fielding system used in Super-LWTS. Background ? RF, ZR, UZR Bill James introduced us to Range Factor (RF) as, essentially, the number of outs made per game. This was convenient at the time, because we had no other context but the game. The problem is that a game is not necessarily nine innings for each fielder. As well, each fielder is dependent on his pitching staff and "luck" for opportunities.
STATS began tracking Zone Rating (ZR) as, essentially, the total number of outs per balls in a fielder?s "area of responsibility" (i.e., zone). This addressed some of the shortcomings of RF. However, STATS ZR has many shortcomings of its own. For example, each fielder is only given one zone. We know that it?s much easier to convert a ball in play into an out if the ball is hit near you, rather than on the fringes of the zone.
The following table shows the RF and STATS ZR for all 2002 SS in the NL and AL: National League
American League
STATS expanded on ZR by creating sub-zones. You can take the average out-conversion rates by sub-zone and apply this rate to the number of balls in play for each fielder for each sub-zone to establish a baseline. This baseline will show the number of outs an average fielder would have had, had he received the same number of balls in play for each sub-zone that our specific fielder received. This is, essentially, UZR, or Ultimate Zone Rating. There are other contexts that have not been considered, which we will get to later on. Defining UZR The following is the methodology followed in calculating my basic version of UZR (MGL?s basic UZR).
Note: Before I begin explaining the UZR system, keep in mind that there are at least two components of defense that UZR does not address: One, an outfielder?s "arm", and two, an infielder?s skill at turning the double play. Of course, these skills can be measured (and they are in my Super-LWTS system). They are just not included in UZR. Like ZR, UZR is designed to measure and quantify only that skill which enables a fielder to turn batted balls into outs.
UZR rate is expressed as a fraction of 1, the same as a simple ZR (ZR). A UZR rate means essentially the same thing as a simple ZR ? namely the number of balls fielded (turned into at least one out) divided by the number of chances; however, UZR rate is a weighted average of a player?s ZR in each of several zones.
As you will see, UZR rate is really a by-product of UZR runs, and UZR runs is the heart of the UZR system. It represents the value of a fielder?s performance expressed as runs saved or cost, in comparison to an average fielder (actually in comparison to the mean performance of all fielders) at that position, in that player?s league, and during that particular year. UZR runs is the defensive counterpart of Palmer?s offensive linear weights (lwts); thus it can be combined with lwts (among other things) to give you an estimate of a player?s total offensive and defensive value. Any player with an average defensive performance will, by definition, have exactly zero UZR runs. Tracking Data The entire field is broken down into 78 zones. These are the same zones you can find in the hit location diagram in the documentation section of the retrosheet website (www.retrosheet.org). Of these, UZR uses 64 of them. For infielders, only ground balls, including bunts, are looked at. Pop files caught or landing on an infield zone are excluded, as are line drives caught by an infielder or hit through the infield. For outfielders, all fair fly balls and line drives are included. None of the foul zones are used in UZR except for 3F and 3L, which are near the first and third base bags (for fair ground balls fielded in foul territory behind the bags). Catchers and pitchers are not included in UZR ratings.
For each zone, the computer keeps track of the following on a league-wide (for a particular year) basis:
At the same time, the computer keeps track of the total number of fielding errors for each fielding position, but not for each zone individually. Actually it compiles fielding errors in two separate categories: One, ROE errors, are fielding errors that result in an ROE. All other errors, such as on a hit, or a second error on an ROE, are called non-ROE errors.
For example, here is the 2002 league-wide data for zone 56 (the area between the third baseman and the SS):
For each player at each fielding position (e.g. Rey Sanchez at SS is one entity and Rey Sanchez at 2B is another entity), and for each zone, the computer also compiles the following information:
ROE and non-ROE fielding errors are compiled separately for each player, but again, not by individual zones.
For example, the 2002 data in zone 56 for Mike Bordick, while playing SS, looks like this:
Remember that these numbers represent ground ball outs in zone 56 recorded by Bordick while playing SS and all ground ball hits in that zone while Bordick was playing SS.
Now, here comes the tricky part. How a Player?s UZR Runs is Calculated in each Zone For now, all ROE errors are treated as outs (this is necessary because, at first, we use outs and ROE?s to establish the number of balls that a fielder gets to and because some ROE errors are committed by the receiver of a throw, in which case the fielder needs to get credit for an out). In the above data, "outs" actually refers to "outs plus ROE errors". Errors (ROE and non-ROE) will be accounted for later on.
Let?s use the data to calculate Mike Bordick?s UZR runs in zone 56. First we establish the out rate for all ground balls hit into zone 56. That is 1419 divided by 2474 (1419 plus 1055), or .57. That is, 57% of all ground balls hit into zone 56 in 2002 were turned into outs (by all fielders).
Therefore, the "extra" value of a "caught ball" by a fielder in zone 56 is 1 minus .57, or .43 balls. Since Bordick caught 18 balls in zone 56, he has 18 times .43, or 7.7 "extra" caught balls so far.
Now what about the hits? There were 79 hits in zone 56 while Bordick was playing SS. Surely he is not responsible for all of those hits. How many is he responsible for? Well, since an average SS catches 294 balls in zone 56 out of 1419, or 20.7% of the outs, Bordick is responsible for 20.7% of the 79 hits as well, or 16.4 hits (the third baseman is responsible for the other 62.6 hits). I told you it was going to be tricky!
Now, just like the "extra" positive value of a "caught ball" is 1 minus .57, the "extra" negative value of a hit is the .57 itself (an average ball hit into zone 56 gets caught 57% of the time, so when a ball isn?t caught, the responsible fielders, in this case the SS and third baseman, get "docked" .57 balls). Since Bordick is responsible for 16.4 of the 79 hits in zone 56, he has 16.4 times .57, or 9.4 "negative" caught balls added to his 7.7 "positive" ones, for a total of -1.7 "extra" caught balls. In other words, given the number of balls hit into zone 56 while Bordick was at SS, he caught 1.7 fewer balls than the average SS in the AL in 2002.
Now we want to convert those "extra" balls into runs saved or cost. For that, we use the average run value of a hit in zone 56 - which is .47 runs. Since a 2002 AL out is worth -.29 runs, the "swing" between an out and a hit is .47 plus .29, or .76 runs. Since Bordick caught 1.7 fewer balls in zone 56 than an average SS, he has cost his team 1.7 times .76, or 1.3 runs so far (i.e., his UZR runs in zone 56 is ?1.3).
If we do this for every zone in which any SS made at least one out (i.e., the applicable SS zones), and we add up all the runs Bordick saved or cost in each zone, we get a total of +6.2 runs, or 6.2 runs saved by Bordick while playing SS (he must have done well in the other zones). Including Errors But wait! What about Bordick?s ROE and non-ROE errors? Remember that I said that a fielder?s ROE errors were thus far treated as outs. Bordick?s 6.2 runs saved assumes that Bordick, and all other SS?s in the AL, turned every ball they got to into an out.
Since that isn?t true (that Bordick and all other SS?s turned every ball gotten to into an out), we now have to factor in Bordick?s ROE errors.
Here?s how (this step is simple):
The average SS committed 169 ROE errors in 5218 balls gotten to (outs plus ROE?s) in all zones. That is an error rate of 169 divided by 5218, or .032. Since Bordick got to a total of 277 balls in all zones, he should have committed .032 times 277, or 8.9 errors. Instead, Bordick committed only 1 error, for a net gain in errors of 7.9. Since an infield error is worth around .49 runs, the swing between an error and an out is .49 plus .29, or .78 runs. Therefore, Bordick saved another .78 times 7.9, or 6.2 runs, by virtue of his "good hands". So far, we have Bordick saving 6.2 runs with his range and another 6.2 runs with his sure hands.
There is one final thing to consider ? Bordick?s non-ROE errors. Like ROE errors, that is easily done.
The average SS committed 45 non-ROE errors and Bordick none. If we do the same calculations as above, using .3 as the value of a non-ROE error, we come up with Bordick saving another .72 runs. So it looks like even at the ripe old age of 36, Mike Bordick saved his team last year a total of 13 runs by virtue of his outstanding play (range and hands) at SS! UZR Runs and UZR Rates As I said at the beginning, UZR runs is the heart of the UZR system, and that?s really all you need to know. However, if you want to translate UZR runs into a UZR rate, it can be done, although it?s a bit tricky as well. At first glance, it may seem that the simplest way to calculate a player?s UZR rate from the above data is to simply add up all of his outs (not including the ROE errors) in all the zones and divide by his total chances. What is a "chance", though, in a UZR system that surveys all of the zones for every fielding position? For example, if in a certain zone, all SS?s combined field 3 balls out of 300 in that zone, should all of those 300 balls be considered "chances"?
Actually, we were already half way towards calculating Bordick?s "chances" in zone 56 when we determined how many of the 79 hits he was responsible for (16.4). In fact, the number of "chances" for Bordick in zone 56 is 16.4 (his share of the hits), plus 18 (his outs), or 34.4. In other words, a player?s "chances" in any particular zone is defined as that player?s number of outs plus the number of hits he is responsible for. As you saw above, the number of hits a player is responsible for in any zone is the total number of hits in that zone multiplied by that player?s share of the outs in that zone.
If we add up all of Bordick?s "chances" in each individual zone (in most zones, that will be zero, of course), we get his total "chances". If we then divide his total outs by his total "chances", we get sort of a "global" zone rating. Let?s do the math. Bordick recorded 277 outs and errors minus 1 error, or 276 outs. His total chances were 354. Therefore his ZR for all zones combined was 276 divided by 354, or .780.
Unfortunately, this is not his UZR (it does not correspond to his UZR runs), since it doesn?t weight each zone appropriately. For example, if a player happened to get more than his share of "chances" in "high out" zones, he might necessarily have a higher-than-average "global" ZR, while not necessarily being a better-than-average fielder. Basically, using the above method, the resultant "global" ZR is more like a simple ZR, - not really what we are looking for.
Ultimately, the best way to construct a UZR rate which represents the true value of a fielder in comparison to the UZR rate of an average fielder, and is the equivalent of UZR runs, is the following: We?ll use Mike Bordick as the example again.
First we take the simple ZR for all SS?s, including ROE errors as outs. This is the number of total outs plus ROE errors by all SS?s, divided by the total number of "chances" (outs and ROE errors plus the number of hits a SS is responsible for). In 2002, SS outs plus ROE errors were 5218, and SS "chances" were 6786, for an average ZR of .769.
Next, we multiply that average ZR of .769 by Bordick?s total "chances", which was 354. The result is 272. That is the number of outs plus ROE errors that an average SS would make given those 354 "chances". Bordick, on the other hand, got to (outs and ROE errors) 8.1 more balls than the average SS, for a total of 280.1 balls "gotten to" (272 plus 8.1). Of those 280.1 balls "gotten to", he committed only one error, for a total of 279.1 "outs". As you can see, those 280.1 balls "gotten to" and 279.1 "outs" are a "fiction", as Bordick actually got to 277 balls and recorded 276 outs.
Nevertheless, if we divide the 279.1 "fictional outs" by his 354 chances, we get a UZR rate for Bordick of .788, which should correspond almost exactly to his 13 total runs saved (UZR rate doesn?t account for non-ROE errors [UZR runs does], but we could certainly "fudge it in" if we wanted to).
Here is the UZR data for the same NL and AL SS: National League
American League
If you compare the above charts with those at the beginning of this article, you will see that STATS simple ZR correlates very well with UZR. This suggests that ZR is a pretty good measure of fielder ability, assuming that UZR is the "gold" standard.
On the other hand, RF does not seem to correlate well with either ZR or UZR, suggesting that it is not a good measure of fielding ability (assuming that ZR and UZR are). In fact, if you look closely at the above charts, you will see that a player?s RF is almost entirely a function of his number of "chances".
Where do we go from here? Earlier, I said that there are a number of contexts that have not been considered in the STATS UZR (or in the MGL?s basic UZR). These are:
In my newly revised UZR, these contexts are considered. In part II, I will discuss these adjustments, incorporate them into the UZR calculations, and present the final UZR results. To see how much of an effect these adjustments have on a player?s UZR, you will be able to compare the initial basic results with the final adjusted ones.
To be continued?
|
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2017
(7 - 11:24am, Dec 22) Last: Mr Dashwood 2017-2021 CBA (1 - 10:47am, Oct 04) Last: villageidiom Loser Scores 2015 (12 - 2:28pm, Nov 17) Last: jingoist Loser Scores 2014 (8 - 2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43 - 8:29am, Nov 05) Last: ERROR---Jolly Old St. Nick What do you do with Deacon White? (17 - 12:12pm, Dec 23) Last: Alex King Loser Scores (15 - 12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67 - 10:46am, May 09) Last: DanG Who is Shyam Das? (4 - 7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45 - 9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5 - 12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37 - 12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2 - 8:03pm, May 16) Last: Diamond Research Retrosheet Semi-Annual Site Update! (4 - 3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5 - 2:27pm, Nov 12) Last: Mr Dashwood |
|||||||
About Baseball Think Factory | Write for Us | Copyright © 1996-2021 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| Page rendered in 0.6636 seconds |
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. J. Lowenstein Apathy Club Posted: March 14, 2003 at 01:41 AM (#609440)Fantastic stuff as always, MGL. Thanks. Looking forward to seeing the nuts and bolts. My primary question is, how is it possible to park-adjust the UZR numbers? Would it be possible to park-adjust this data using the data of only the visiting players?
My guess is that you would have to park-adjust on a zone-by-zone basis, which starts to look as if you might not have enough data, even over three years, to give even a reasonable amount of accuracy in the park factors.
I guess we'll find out in Part II.
Great article!
Again, great stuff MGL. I've been waiting for Tango to publish this. I do have a question about IF not getting credit for FBs and only for GBs.
Was this just to make the analysis easier? Did you find FBs for IF to be insignificant compared to GBs? I would think there is some value in having a SS or 2B who is able to cover certain plays in shallow LF/CF/RF.
ROE = Reached base On Error
+8, -1, -1, -17, -13
Does Jeter have old man legs?
I constantly grapple with whether or not to use anything but GB's for IF'ers. I don't like using line drives to or thru the IF because even though there is some level of skill (positioning and reaction time) involved in catching a line drive, it is mostly luck.
IF pop files are almost all caught (96 or 97% I think), there doesn't seem to much skill involved and who catches which pop flies probably varies considerable from team to team.
If I use pop flies caught by an IF in the OF, there is definitely skill invlolved there (some IF'ers are better than others at going back on pop flies and/or making those over the shoulder catches). I just don't think I can isolate and capture that skill because there is too much of an interrelationship between the IF and the OF in catching pop flies in the OF. Basically I eliminate pop flies that are caught or land in the IF zones completely for better or for worse.
One more thing. I did not mention in the article (because I hadn't thought of it at the time) that there is one other defensive skill (besides OF arms and IF DP) that UZR does not cover (and I don't cover at all). That is ability of the first baseman to catch (or not) bad throws from the infielders. For example, it had been suggested that J.T. Snow (who has not have a good UZR rating for several years, BTW) is so good at catching bad throws, that he may save 5-15 errors or hits per year with that skill. That may be true, and I am working on seeing if I can somehow measure and quantify that skill to any reasonable degree...
When STATS came out with their UZR, I don't think they repudiated simple ZR, I think they were merely trying to refine it.
In the hierarchy of fielding metrics, and in comparison to batting metrics, I would say that RF is akin to BA (maybe worse), ZR to OPS (maybe a little worse) and UZR to linear weights (worse).
The best idea I have seen for a fielding metric without using PBP data (except for the initial research) was suggested by the person that wrote that multi-part series on fielding, I think. The idea (at least for the IF) was to use the initial PBP data to determine what percentage of GB's each infield position fielded, on the average, depending on whether a RH or LH pitcher is on the mound, and then to apply this to each team's data, using the G/F ratio of their pitching staff, the number of BIP outs, and the L/R composition of their pitching staff. This is sort of like a "poor man's" ZR. If this is a decent pitching metric (which I think it is), then sureely ZR, which basicaly takes it one step further, is a better than decent metric.
We can all certainly disagree on the quality of the various metrics (both offensive and defensive) and whether something is bad, decent, good, very good, etc., can be a matter of semantics as well (e.g., what does "decent" actually mean?), but to state or imply that STATS ZR is meaningless or worthless can't possibly be right...
Well, all those months and sleepless nights for naught! Anyway, I'm glad you caught the error and I thank you (and the sabermetric community should thank you as well). This is one of the reasons why new and important metrics should not be proprietary. Peer review and open source is a must!
I will explain my error in part 2 of the UZR series and present revised UZR ratings for the SS's.
Thanks again! I'm glad that someone took the time to really scrutinize the math and the logic. As Tango likes to say, when you write an article laden with formulas and the like, everyone tends to skip them over and assume they are correct. Can't really blame anyone. Who's got the time to pour over someone else's work!
So, how do we find these things? Well, look at the out-conversion rates *for the league* for each zone, by the inning/score/base/out situations. Pick out those game states where the out-conversion rate for the "down the line" is 50% higher than average. You've now identified those game situations where the 3B is more likely than not to cover the bag.
Now, how did the *SS* do during these game states? How did Jeter do relative to other SS.
If the sample size is too small, change that "50%" to "30%" or something.
Look for the opposite as well. Look for those game states where the 3B would *not* cover the line. So, in this case, the 3B will shade towards the SS. How did Jeter and the other SS do in these game states?
I agree, thanks much to MGL for parsing and putting in the effort on the PBP database.
Let's say zone 56 is split so that 60% of the plays made are by the ss, and 40% by the 3b. Let's infer that 60% of the plays not made in that zone also belong to the SS. Let's say the league average was to make 80 outs out of 100 plays there.
So, if the Yankees had 100 plays there, 60 "belong" to Jeter and 40 to Ventura.
Let's say that Ventura shaded greatly in that zone. Let's say he made 42 outs, and Jeter made 38. According to the split above, Ventura made 42 outs in 40 "opps" and Jeter made 38 in 60.
The league average 3b makes 30 outs, and the lg average ss makes 50 outs. So, Ventura is +12, and Jeter is -12 outs per 100 balls in the 56 zone.
I would even say then that you don't have to worry about splitting the zones in 40/60. Just consider all 100 plays in that zone, and compare to the league average. This might be what Mike is talking about when he says NOT to split the shared zones.
Regardless of what the split in the shared zones are, just make the comparsion to league average for that position/zone.
We still have the possibility that Ventura is stealing plays from Jeter. Jeter should be stealing plays on his other side, but he doesn't (if I remember Mike's article on the subject).
Since the question being asked at the beginning of the article is "how would an average fielder do, giving this player's playing conditions", I suppose we should consider not only the pitcher on the mound, the batter at the plate, the runners on base, but the fielders next to him. So, what would an average fielder do with Ventura next to him?
Yes, that is the correct (and easiest - funny how it occasionally works out that way) way to do it! I used to do it that way, but for some reason after cogitating for months, I decided to change my methodology and do it the other way, which turned out to be wrong. I should have listened to my 6th grade teacher who told me to always go with my first answer on the test - the more I think, the worse it gets.
As far as one fielder (Ventura) stealing chances from another fielder (Jeter), if you use Shortie's method (the correct one), that can never be a problem, unless one fielder can field a ground ball that another fielder can also field, which I agree is rare and probably inconsequential.
As far as pitcher's snagging balls up the middle, I had never thought of that (more thanks to Shortie). Off the top of my head, I;m not sure how that affects things. Let's see...
Those snags by the tyhe pitcher would be credited to zones 13 and 15 and occasinally zones 6MS and 4MS I guess (I would have to check the database). Let's first see what happens if they are credited to zones 13 and 15...
Let's say an average team has 100 balls hit into the 6M zones and the SS fields 20 of them. If on one team (the Yankees) the pitchers snag a lot more than their fair share, whether it helps or hurts the SS (Jeter) sepends on what percentage of those balls normally go thru for hits and what percentage normally get caught by the SS. IF 5 extrta balls are snagged by the pitcher (out of those 100) and 4 would have gone thru and 1 would have been caught, we now have 95 balls into the 6M zones with the SS (Jeter) catching 19, which is exactly the same as 20 out of 100. It does seem likely (I think) that for every extra ball that is snagged by the pitcher, 80% of the time it would have gone thru and 20% of the time it would have been caught by the SS, so that it shouldn't affect the SS's UZR at all.
Now what about if those extra pitcher snags are in zones 6MS and 4MS? Now I thionk we have that rare situation where a fielder is in fact "stealing" balls from another fielder (the other fielder might catch them) AND causing that fielder to have an unfair drop in his UZR rating. (In the last case, where the snagged balls were in zones 13 and 15 it didn't affect the SS's rating - I don't think - unless of ocurse the ones that the pitchers snag are the most catchable balls for the SS, which could be the case I guess.) Anyway, let's see what happens with extra balls snagged by the pitcher in zones 6MS...
Now even if the pitchers snag extra balls, we still those balls reaching a SS zone (I guess it is a shared zone with the pitcher). SO let's say for an average team, we have 20 balls hit into the 6MS zone, the SS makes 10 of those and the pitcher makes 5 (the other 5 stop short in that zone for a hit). The SS ZR for that zone is then .500. IF the pitcher now makes 6 and the SS only makes 9 (or 9.25 - 4.75 are hits - the pitcher also took away 1/4 of a hit), then I guess that the SS;s (Jeter's) rating gets unfairly reduced (.450 or .4625) in that zone. This is the classic case of a fielder "stealing" a hit from anoteh fielder, which we think is a rare occurrence. One of the ways to check that, of course, is to look at the database and see what's going on in zones 6MS (and 4MS).
I guess the bottom line is to identify zones in the field where it is possible for more than one fielder to field the same ball (like 4MS and 6MS). If one fielder is above (or below) average in that zone, should we automatically adjust the other fielder's rating? What if they are both above or below average? Is this prevalent in the infield?
What about in the oufield? This probably goes on much more often in the OF? Should automatic adjustments be made in the OF? If yes, what is the formula (methodology)? DMB says that they do this (adjust one fielder's rating due to an adjacent fielder's rating), but of coure, they don't explain HOW they do this...
Let's say we COULD establish responsibility, and that of those 100 plays in the hole, 40 belong to 3b and 60 to SS. I also said that the lg avg out-conversion rate (ZR) was 30/40 for 3b, and 50/60 for SS.
There's no reason to think that their ZR in the same zone would be the same. After all, the 3b is playing in, but he's got the more natural throw. I don't even know if this particular zone would be an advantage for the 3b or ss.
But it doesn't matter! Ventura made 42 plays there, and the avg 3b made 30. That's +12. It doesn't matter if the split was 60/40 or 70/30 or 50/50. All we know is that the avg 3b made a certain number of outs per total plays there. And it becomes irrelevant what we think the shared split of BIP might/should be. Unless we know, don't consider it.
You can figure out the extent of "shared responsibilities" impact by looking at situations as I described with my game state example. You can also do it by looking at players that get a "shift" from fielders, like Bonds. There's no "shared" impact between the SS and 3B in this case.
NL
zone 15: 931 balls, 820 outs, 802 by pitcher, 11 by third baseman, and 3 by SS.
zone 13: 994 balls, 912 outs, 900 by pitcher, 8 by first baseman, and 2 by 2B'man.
zone 6MS: 279 balls, 253 outs, 8 by pitcher, 232 by SS.
zone 4MS: 160 balls, 133 outs, 4 by pitcher, and 102 by 2B'man.
Similar results for AL. I don't think there should be much if any of a problem with the pitcher "stealing" outs from either the second baseman or SS AND unfarily affecting their UZR, at least according to the way the zones are set up...
You should be able to go through the PBP and look at the BIP in each zone BY BATTER. That will tell you what kind of shift a batter should get. Look at those batters who are skewed to one way or the other, or who forces the CF to play more to the RF but leaves the LF in place. You should be able to see how much overlap really exists.
I don't think there should be much overlap with infielders, even for Jones. This again is easy enough to figure by doing jsut what MGL did for the stealing pitchers.
That was an interesting suggestion that "Shortie" made on BPrimer, by eliminating an extra step you can still get your net outs calculation. But I am surprised, mgl, that you gave in as quickly as you did! His formula seems clearer and simpler, but I don't know if it's necessarily better. The step that is being eliminated is essentially, the prorating of average fielders in the gray zones which more than one position is active in, to figure out who's responsible for what hits (and chances). By eliminating this you are eliminating an extra control in measuring the subject to "league average", which is the whole basis behind UZR.
As I understand what was being suggested was to take the Bordick's outs/opp. (or "chances") and subtract that from all the league SS outs/lg opp. in zone 56, and multiply that by Bordick's opp. Where "lg opp" is defined as *all* hits in zone plus *SS* outs, in zone. So Bordick's calculation would be as follows:
(18/(79+18) - 294/(1055+294)) *(79+18) = -3.1 net outs, which is more that the -1.7 you came up with.
Technically this equation is sound, and the zero baseline is maintained. (re the sums of net outs equal zero). But theoretically by eliminating the SS and 3B "share" of hits allowed by the prorating method you used, you no longer are comparing Bordick to lg averages. Instead of comparing Bordick to league average SS with league average 3B holding equal, which you did originally . . . you would now instead be comparing league average SS AND the ability level of his own third baseman, I think, which is really what you don't want.
The biggest consideration in accounting for gray zones where more than one different position is active, is to try to minimize the ability levels of those other adjacent fielders involved. For example: let's say Tony Batista is a better thirdbaseman and fielded two balls that Bordick would normally have fielded. So instead of making 18 outs in z56 he makes 16 . . . so league SS fielded 292 balls instead of 294, and league 3B fielded 1421 outs instead of 1419, and hits allowed in that zone remain the same at 1055.
The calculation for Bordick becomes:
(16/(79+16) - 292/(1055+292)) *(79+16) = -4.6 net outs (from -3.1)., where if you allow for the prorating for hits allowed it's -2.5 (from -1.7). It still goes up to a higher negative number because Batista increased the league avg for 3B, but not as much as you would doing the simpler calculation. I also note there is a greater range in net outs (and therefore std.dev.) using the simpler formula.
So the problem is that Bordick's net outs would be prejudiced by a better fielding third baseman, and even moreso if there is also a better fielding second baseman. That's minimized by using that extra control. Now, the simpler formula would work well enough if we are measuring "value", but since UZR is an "ability" measure (at least I think it is), we should context neutralize the averages as much as possible. Also, it may not be that big a problem with infielders because the fielder closer to the ground ball is going to make the play anyway (and "compete" with the adjacent fielder). It may, however be an issue for outfielders where the zones converge toward cf, where of's do tend to compete.
Prorating is an artificial way of allocating responsiblity of hits and may not be a true measure because it assumes the lg average 3b and lg average SS have the same ZR in that zone, (which probably isn't true), but when estimating the control it still may be better than no control at all because you will be estimating on the side of conservatism.
Anyway, first of all, I am almost positive that the two-step method adds up properly to coincide with a team UZR just as well as the 1-step process. In fact, it is specifically designed to make sure that the net number of balls in any zone for all fielders sums to zero. So I don't think that is a problem.
As far as value versus ability, I don't think that is an issue either as far as the 2 methods are concerned. Value and ability are almost the same anyway. The differences are that ability must be "value regressed" (to account for sample error) and that, as David suggested, technically ability should take each fielder's ZR in each zone and apply league average distributions, rather than the other way around, although as Tango and I pointed out, there are some practical problems in doing that. In any case, I don't think the tension between the 2 methods has much if anyhting to do with value versus ability.
I am also not convinced that the tension between the 2 methods is solely about whether fielder's "overlap" (can steal plays from one another) or not. If it is, then I think that it is clear that your method is best in the infield and that the other method could be better for the outfield.
As far as checking if there is much out stealing in the OF, you can't realy verify it, like I could with the pitcher/SS and pitcher /2B stuff. We just "know" that there isn't much stealing in the IF even though there are plenty of shared zones (like 56 and 34). As you said, if there were too much stealing in the IF, it would probably behoove one infielder to move over.
In the OF, OTOH, we "know" that there are lots of lazy fly balls, or even moderately difficult fly balls or line drives, whereby one fielder can call off another fielder. You can't really verify the extent of this in the OF by looking at the data. All you will see are shared zones like 78 and 89, which will look like shared zones in the IF (56 and 34). In the IF we assume that in most of those shared zones, the fielders do not steal ground balls from one another, and that if those shared zones were broken down further, much of the sharing would disappear.
In the OF, however, we don't think that even if we broke down the OF shared zones, like 78 and 89, we would see most of the sharing disappear. We assume that there is much out stealing or at least out "calling", especially on lazy fly balls. How to account for that is another story. The suggestion that we perhaps separate the fly balls from the line drives, at least in the shared zones is a good one. Or we ignore or minimize the zones where most of the balls are caught (for example if in a shared zone 98% of the balls are caught, should we be calculating a ZR for each fielder in that zone? Probably not, since whether oen fielder catches 60% and the other catches 38% or vice versa probably has not much to do with their fielding abilitities in that zone, and mroe to do with who calls of whom, kind of like pop flies in the IF.) In fact, the more that I thinkof it, perhaps certain zones and types of balls should be eliminated from the analysis besides infield pop-ups and line drives. Perhaps any zone that has a high out rate should be eliminated altogether?
Anyway, this is an interesting topic, not to metnion the fact that it is driving me crazy. I will solve it though, perhaps through my computer defensive sim work. I appreciate the help...
...consider all 78 zones for every fielder. Determine the out-coversion rate by POSITION for each zone. Take those out-conversion rates, and apply them to the opps while Jeter was on the field for each of the 78 zones.
As for value or ability, unlesss we define these words, we don't know what we are arguing over. Ability is the manifestation of a player's tools in a context-neutral game. Value is the actual manifestation of a player's tools in a game, after considering context. Sammy Sosa's 3-3 3HR 9 RBI game has alot of value, but, by itself, has little impact on determing his ability. Value is about past accountability, and ability is about future potential. So, as MGL is doing UZR here, he's doing value. To do ability, he'd have to first apply a player's out-conversion rate againt the league average number of opps in that zone. And then, he'd have to regress that total to some degree. He's not doing the first part, and he's certainly not doing the second. This UZR is a value measure.
And anytime you do a value measure, you must ensure that everything adds up. It's about past accountability. (Whether you decide to include a luck bin is a topic for another discussion).
The bias in sharing zones in the OF will always be towards the position, and not the player. And if all CF have the same bias, well then don't worry about it. In the IF, the bias is based on individual positioning. But if there's less stealing in the IF (a number that we should try to quantify, but we can make a good guess that we can ignore), then again, let's not worry about it.
As for errors, as Mike pointed out in Article 8, the rate of errors in the "easy zones" as opposed to the "hard zones" is not as much as you'd first figure (you might figure that errors are on plays in the easy zones). So, you might not need to treat the errors separately, and just treat them as hits.
I know that we are trying to establish responsbility for plays that are ambiguous (non-outs, non-RBOE), but this leads us to more trouble than is (currently) worth. Taking a step back will lead us 2 steps forward. I think we should apply the KISS principle here.
If the 3B is taking more of the plays in the shared zone than normal, the chances are that the plays left for the SS in that zone will be harder plays than normal. There are two reasons for this:
1. Positioning. As I suggested earlier, if the 3B is playing to get to more balls in the hole, then it's likely that the SS is cheating up the middle, and will have to take an extra step or two just to get to the ball in the hole in the first place.
2. The effect of the 3B "flashing in front of" the SS. If the 3B gets in front of the SS but doesn't make the play, he's probably going to affect the SS's field of vision, causing a hesitation. This isn't likely to have much of an effect when the players are playing normally, because almost all balls deep in the hole that the 3B doesn't get are hits anyway. But with extra 3B range, there will be some plays closer to the SS normal position that the movement of the 3B could affect.
These particular issues are (IMO) pretty unique to the 3B/SS combo. Theoretically they could affect the 1B/2B, but because the 1B usually needs to play pretty close to the base I think it's less likely that he'll range far over toward the 1B/2B hole.
-- MWE
"Definitely, I think last year was my best year... You can't go by number of errors, but I know I got to more balls that I didn't get to in the past. I felt my range was better and all-around agility was better by far."
--Derek Jeter, Yankees shortstop, on his defense in 2002 (Sports Illustrated Online)
As David says, if all CF's takes 75% (or whatever percentage) of all fly balls that could be caught by either of 2 fielders, we have no problem of course. But we know that this will vary by team, depending upon who is considered either the surest handed fielder of the two or who is the "designated captain", notwithstanding the fact the CF is usally the "captain" by default. As well, we have lots of fluctuation (the measurement error in those shared OF zones will be greater) from player to player in terms of whether those either/or fly balls were closer to the CF or closer to the other fielder. Again, this is not a huge problem, but it is a problem nonethelsess.
One way to address the problem is to see whether the overall out % in a shared zone is greater or lesser than the league average. If it is around leageu average and one fielder is below average and the other is above average, then I think that suggests that one fielder is indeed stealing outs from the other. If the overall is above average and one fielder is above average, then I think that suggests that the above average fielder is NOT stealing from the other fielder.
Tango, also don't forget that while you say that UZR is strictly a value stat, once I do the adjustments it becomes more ability based, or at least context-neutral value based. The reason I don't like value stats (wihtout at least some kind of context adjustment) is that no matter what people say about wanting to know the actual "value" of a player, and no matter how much "cleaner" a value stats is, it always ends up being a discussion about ability (who is better or worse than whom, etc.)...
Ability = expected manifestation of ability in a neutral context
So, if you strip out the context to neutralize it (by adjusting for opponent, park, etc), you have to decide what to do with luck. Say, Shane Spencer's 1998 September had alot of (probable) luck, or Pat Tabler's bases loaded performance had alot of (probable) luck. You can decide to keep the luck portion with the player that "caused" it or "benefited" by it. In any case, that package is value.
You can have a guy with lots of ability, say Mickey Mantle. But if he doesn't perform to his level for whatever reason (luck, design of being drunk or being hurt), then you can have a player whose expected ability would not match his actual value.
(I don't have a good example in baseball... Mick wasn't a good example, but in hockey, it would be Stephane Richer.)
MGL, based on how you are doing it, and unless you define your words otherwise, this version of UZR is describing value.
- for each zone, determine the run value of the hit and out (let's assume it's a static .80 run difference, but we know that's not true)
- for each fielder/zone, determine the number of outs made and BIP for each fielder and for the league
- multiply the lg out-conversion rate by the BIP for the fielder in question, and take the difference
- multiply the difference by the .80 value (or the dynamic value for that zone)
That's it. That'll answer this question: "Given the ball distribution and context of Derek Jeter, how many runs better/worse is he compared to a lg average SS?"
This assumes that sharing is not an issue.
*********
The other thing we are trying to do is figure out how to "share" the zone. You can
a - share all BIP and split them based on the outs made by each fielder in that zone
b - share all NON-OUT and split them based on the outs made by each fielder in that zone
c - share all non-out, and non-RBOE, and split them based on the outs and RBOE made by each fielder in that zone
For b, you add a fielder's outs made to determine "balls responsible for".
For c, you add a fielder's outs made and RBOE to determine "balls responsible for".
Each of these 3 options have certain validity to them. You can even compute them each way if you like. My guess is that you will find little difference overall. If that is the case, take option a. It's the easiest one, it's clean. In all cases, you can make the sum of the parts equal the whole.
Essentially, DER = fielding (or UZR) + pitching (or PZR), and we can figure this out using the zones. Check the thread for a longer explanation.
If the Yankess were to acquire Tejada, since he is probably a much better SS (defensively), I think it would be correct to move Jeter to third, assuming that he was willing and could play third. The assumption is that if you move a bad SS to third, that he will perform better at third realtive to the other third basemen, of course. That's a general rule, and since third and SS require different skills (quick reactions, better hands at third, and more speed and range at SS), I suppose it is entrirely possible that a SS (good or bad) could not play an adequate third base (and vice versa). For example, I doubt that one of the small wirey good SS's, like Ordonez or Sanchez, could play a good third base (I'm not sure why, but you rarely see a small wirey player at third, though that may be because they are not good enough hitters).
In any case, since both Jeter and Tejada are bif strong guys, there is no reason to think that one or the other would be better suited for third base. In the absence of any information in that regard, you clearly want to move the worse SS over to third, don't you? There is almost no doubt that no matter how you look at it, that Jeter is one of the worst defensive third basemen in baseball.
In fact, there has to be a point at which you move a good hitting SS to another position, just like you would move an old SS or any other old fielder to first or to DH. What that point is I don't know off the top of my head. As Tango once explained, you would have to go through all the combinations on your team, look at each player's offensive and defensive ratings, and see which combination gives you the best overall runs scored and runs allowed (also keeping in mind that a run saved is slightly better than a run earned). If you have Jeter and Tejada on your team, whom you put at SS should be a no brainer I would think, unless, as I said, one or the other shows some great talent or complete inadequacy at third for some reason. For example, Jeter is probably worth around -15 runs or so per year at SS and Tejada is worth around 0. So if you assume that Jeter would be worth, say -5 at third, and Tejada +5 at third, you would prefer Tejada over Jeter at SS (to the tune of 5 extra runs).
On the other hand, if the 15 runs difference between the 2 players shows up at third (such as if Jeter is -5 at third and Tejada is +10, or Jeter is -10 at third and Tejada is +5), then I guess it's a wash where you put them. When I say that the decision is a no-brainer, I am assuming that this is NOT the case - that the spread between a good and bad third baseman is not the same as the spread between a good and bad SS, which is a good assumption, I think (becuase the SS gets more chances and because range is a big factor at SS and not so big at third).
Getting back to at what point you move your good hitting, bad fielding SS to another position, Jeter has to be somewhere close to that threshold (I am assuming that the worst SS in the league has to be somewhere close). When the Yankees had Tino at first, he was not a very go0d hitter (-8 batting lwts in 00, I think). If they moved Jeter to first, and got an average SS, and benched Tino, they would have, say, lost, say 15 runs in defense, but picked up like 35 runs in offense at first, for a net gain of 20 runs at first. If they put in an average SS, they would have picked up, say 15 runs in defense, and lost 40 runs in offense, for a net loss of 25 runs at SS. So while this switch may be a 5 run loser, it is close enough to a wash (if you get a better than average SS or Jeter is pretty good at first, etc., it might be a gain) to be worth considering.
There are of course, at least 3 reasons why Jeter would not be moved from SS, at least at this point in his career: 1) He might be greatly offended, 2) He is not perceived by the Yankees (I don't think) as nearly as bad on defense as he probably is, and 3) It would probably be a bad public relations move (what would the Yankees say - "We like Jeter's offense, but he just stinks at defense, so we had to move him...").
OTOH, if the Yankees did acquire someone like Tejada or A-Rod (I don't think they would), and they didn't trade Jeter, then I think they could come up with a diplomatic way to move him to third or somewhere else (DH?)...
And Jeter and Tejada are "big" strong guys, not "bif" strong guys. Bif is the guy in "West Side Story", isn't he?
And what the heck is going on with everything being italicized? Is it just my browser (older version of Netscape)? I thought that Tango was just trying to indicate how important his posts were compared to everyone else's! :)
First, David makes a correct observation wrt the "taking of discretionary plays" by CFs. Except Andruw Jones takes *all* of them. That's very important. When the CF is not just the CF, but the "defensive rep" he gets far more than 75% of the discretionary plays - he takes 90%. And when pop-ups are included to take eaway from MI, then stats get padded in a hurry. Hell, I forgot if MGL is removing pop-ups from OF. If not, you have a serious inequity on a team with a rookie SS (Furcal) and a GG CF like Jones, vs,say, the Mets with GG Ordonez and random CF. Isolating hte dominant personaltiy CFs isn't necessarily easy, but AJones is one, and that can be demonstrated in MGL's data wrt the consumption of FBs shorter than 220 feet.
I used to score for STATS, so I have a good understanding of the raw data. In addition, I have a book of the ZR zones for every ballpark, and I've looked at the overlay diffrences between STATS and Retrosheet (which is Project Scoresheet's diagram, of course).
As Shorty *perfectly* observes, what if the zones were smaller? Well, Shorty, I haven't actually seen the data MGL has, but if it's STATS data, then it is probably desseminated in smaller zones. He may have contractual obligations not to use the data "just like" STATS would.
Here's the basic breakdown: Zone 56 represents 10 ZR sub-zones for the 3b and 5 for the SS. There are another 5 zones that are designated "no man's land" in ZR. One of the bigger differences between Defensive Average (please google rsb for Dale Stephenson's DA/DR work - not that he did the work, but that he has the data at his fingertips).
You can see that 57% of plays are made (1419/2474). In reg. ZR 75% are the "responsibility" of either the SS or the 3B - usually, each defender misses ~10% of his zone opps, which would make up the difference between 57% and 75% (9% for 3b and 9% for SS). Of the 1055 hits, ~600 of them are in no-man's land.
What this suggests is - *no*, other than *possible* positioning (which I disagree with Mike about - see no groupthink), the 3B's defense does not impact the SS defensive play making. The STATS zones of responsibility are assigned because those are the areas where 50% of plays are made. There's another 10 feet between the 3B zone and the SS zone - in ZR. Basically, no, Brosius and Ventura do not damage Jeter's ZR. He's not very good. But he isn't as bad as Clay says either.
So, no, the Braves SS couldn't impact Chipper's rating. The SS can't play balls the 3B is responsible for. It's too far away and the SS couldn't be in front of the 3B and the SS couldn't throw anyone out from over there and deep anyway. Chipper did okay in ZR. He fared poorly in some systems because his team was a GB staff, but didn't throw GBs to 3B. That sounds screwy, but Chipper didn't have GBs hit into the ZR zone of assignment. It was dramatically depressed. Much moreso than Jeter's numbers (which aren't being impacted by the 3B either). Put it this way - if Chipper fielded *every* ball hit into the 5 and half of the 56 zone you are looking at and got the runner every time (100% efficiency), his Range Factor would still be lower than Aramis Ramirez' - the Pirates just threw more GBs to 3B. And systems that rely on "expected chances" fall in the dumper right here. That happens to Jeter to a lesser extent, *and* favors Andruw Jones in a similar way.
Our team has a SS named Derekson, and a 3B named Robinson. Derekson recorded 40 outs and made 10 errors, while Robinson recorded 40 outs and made 0 errors.
That's all we know. What do we do?
Method 1A
Derekson is 40-50 = 10 outs worse than average (or 8 runs worse) for his position in that zone, and Robinson is 10 outs better (or 8 runs better).
Method 1B
Of the 200 BIP, the SS gets 5/8 responsibility, or 125 plays. 3B gets 75 plays. The avg SS made 50 outs and 75 safe plays for a LWT run value of +22.5 runs. Jeterson made 40 outs and 85 safe plays for a LWT run value of +30.5 runs. That's 8 runs worse than average. Robinson works out to 8 runs better than average.
As you can see, Method1A and 1B are the same thing. 1A is much more straightforward, and you don't have to consider the "sharing" on zones.
Method 2
Of the 120 non-outs, we give 5/8 to the SS as his responsibility, or 75 safe plays. The 3B get 45. Derekson was responsible for 40+75=115 plays, and Robinson 85.
The average SS has, as you remember,+22.5 LWT runs. Derekson is +25.5 runs, or 3 runs worse than average. Robinson was 3 runs better.
Method 3
Of the 110 non-outs, non-RBOE, we get 56/90 to Jeterson, or 68.4. Adding in his 40 outs and 10 errors, and that gives him 118.4 BIP (40 outs and 78.4 safe plays). That's a LWT run value of +27.2 runs, or 4.7 worse than average. Robinson was 4.7 runs better than average.
Method 4
The avg SS, of the known plays that were not fielded by the avg 3B (166 of them), 50 were outs, 6 were errors, and 110 were hits. So, the avg SS has a LWT run value of +43 runs.
Jeterson's 3B fielded 40 plays, meaning that 160 plays were up for grabs. He recorded 40 outs, 10 errors, and 110 were hits. That's a LWT run value of +48 runs, or 5 runs worse than average.
In this case, the 3B is not the reverse, so let's do Robinson. The avg 3b, of the known plays not fielded by the avg SS (144 of them), 30 were outs, 4 errors, and 110 hits. The avg 3b has a LWT run value of +48 runs.
Robinson's SS fielded 50 plays, meaning that 150 plays were up for grabs. He recorded 40 outs, 0 errors, and 110 were hits. That's a LWT run value of +43 runs, or surprise, 5 runs better than average. Even though we double-counted, we still end up with the result that the team overall was zero.
***
So, where does that leave us? I don't know. But, in this example, which I guess is a bit extreme, we're talking about a 5 run swing per 200 BIP. I don't know how extreme the real Jeter/Ventura is, or how many shared plays you need to worry about.
So, we have to decide which method best captures the contributions of our players.
Anybody else left dizzy with these math gyrations? Sorry about that, but it had to be done!
"My only quibble is that you have used one year of stats to analyze the matter, and we do not know for example if Jeter and Tejada's value in your system last year varied more than their ability or past and projected performances..."
Slapshot, any analysis of a player's defense must be informed by their true ability, rather than by measures of their sample ability, so I don't diagree. What goes into "translating" a player's sample defensive rating to an ability rating is regression based on sample size (luck), age adjustments, and injury adjustments.
Here are Jeter's and Tejada's last 4 year's (99-02) adjusted UZR runs per 162 games:
Jeter: -14, -8, -22, -27
Tejada: 9, 7, 0, 4
I think that the numbers speak for themselves...
BTW, this yields the exact same results as the folloiwng method:
Take the total BIP for a player (in a certain zone) and prorate them according to that player's share of the total outs in that zone. Call this number that player's "chances". Then take that player's outs divided by his "chances". Do the same for the league. Then subtract that number (the league outs at that position divided by the league "chances" for that postition) from the player's "outs divided by chances" and multiply the difference by the players "chances". The answer you get is always exactly the same as the first, much simpler method.
It is NOT correct to divide player and league outs by total outs. It is NOT correct to do my "prorate the hits" method (my method in Part I of the article).
This applies to all zones, shared or not, whether one fielder steals outs from another fielder or not, as long as the out stealing is the same for all players, as David points out (becuase the out stealing will show up in the league totals).
The problem arises, as Chris points out, when a certain player, like A. Jones, steals outs more or less than the league average. Then something else needs to be done, and I'm not sure what that is right now. I will work on it eventually, but I don't have any mroe time to devote to tweak UZR until after I come out with my Super-lwts and finish my projections for Tango (if that is even possible at this point).
BTW, I have STATS data (so I have data from STATS smaller zones), but I do the UZR from the retrosheet type data (larger zones). Actually I transpose the STATS data into retrosheet format and then I do the UZR analysis.
Eventually, I will use the STATS data to address the A. Jones out stealing issue...
My thinking is that just getting to a ball to knock it down is a desirable skill, as opposed to not getting to the ball and allowing it to go through to the outfield. Clearly, there is less opportunity for runner advancement, and hence more "saved-run" value in the former case. Yet, UZR appears to treat both cases as just a hit allowed, with equal "cost" charged. Meaning, a player who can get to more balls (even if he can't turn all of them into outs) may not be receiving all the credit he is due.
In a similar vein, for a player who gets to a lot of balls other players don't get to, but throws away that advantage (pun intended) with more wild throws, he would presumably end up with fewer hits allowed, but more ROEs than his more typical counterpart who never got to the ball in the first place. Which guy is going to come out further ahead in UZR? Which guy should?
Something's been bugging me about how errors and hits are dealt with separately - can't quite put my finger on it, but here goes.
might get dealt with
So, from where I sit, Mike, Shorty (who is also named Michael), Mitchell (lotsa Mickeys around here), and I have all agreed that Method 1A is the simplest method, probably most accurate. This looks like the one that MGL is going forward with.
I guess super-genuises aren't recognized in their own time. ;-)
Perhaps with the newly revised UZR, his somewhat average rating from that time period were biased by the other factors that MGL is considering.
So, let's add Chris to the list in terms of support for how to handle the zones.
Dan, if you are out there, you might want to add Mike, Chris, and Mitchel's past articles on this subject in the "related links".
Other than that, I'm really looking forward to the Part II. I hae a few quibbles that I'll get to later tonight.
Good point about the infield hits! The value of an infield hit (when an infielder knocks down a ball heading for the outfield) is obviously less than that of an outfield hit (because of the baserunner advances). It would be interesting to see if infielders with greater range do in fact allow more infield hits (knock down more ground balls).
OTOH, if I included that it UZR, how much difference would it make? OTOOH (how many hands are there?), I am including things in the adjustments that have little overall influence (like pitcher G/F ratios).
As far as doing the errors (ROE's) separately (I could just lump them in with the hits), there are several reasons to do so. One, they have a slighly different value than a hit. BTW, I could separate fielding errors from throwing errors (for IF'ers) if I wanted to, since throwing errors may have a higher value. Two, I wanted to present a fielders UZR "fielding" runs, which is his "range", and his UZR "error" runs, which is his "hands", even though it makes no practical difference - defense is defense. Perhaps, however, as a fielder gets older, his "hands" get better and his "range" gets worse. I don't know. Or perhaps you can teach, or a fielder can learn with practice, to get better with his hands, but not with his range. SO it might be important (for a team at least) to know the breakdown of a player's UZR runs (hands and range). Also, there is probably a different standard error for "range runs" and "hands runs", although I'm not sure which would be higher. An error is very certain - we know that the fielder reached the ball with reasonable effort and then "booted it" or threw the ball away. With a hit, we have all the problems that we were discussing (is that fielder really responsible for every hit he is "charged" with). In that sense, it would seem that the hit would have a much higher standard error (uncertainty) than the error.
OTOH, an error is often the result of a bad hop (maybe 50% of the time), in which any fielder would have made the same error. So how much do a fielder's errors reflect his fielding (hands) ability? What is the year to year correlation for errors? Look at R. Ordonez. One year he has like 3 errors and the next, like 20. On the other hand, you had fielders like Sandberg and Elster who were known for few errors every year (I think). Part of that was that they had limited range, of course. Anyway, the hits and outs, even with all the problems apportioning the hits, do reflect a fielder's range, which is a reflection of his ability. So in this sense, maybe the erros have MORE of a standard error (correlate less with ability) than the hits (Maybe throwing errors correalte well with ability, but fielding errors do not because of the bad hops).
In any case, I don't think it can hurt to calculate the errors separately (and I THINK I did it correctly, although now I'm not so sure about anything) - it can only help...
1)I long for the day when a player's defensive rating (be it ZR or UZR or some other reasonaby accurate metric (NOT RF!) is uttered in the same breath as his offensive rating (be it OPS, lwts, xRuns, RC, or BaseRuns), rather than as an afterthought! Yes, these defensive metrics (ZR, UZR, etc.) are not as accurate or reliable as the offensive ones, but they should be far from an afterthought - including them is far better than not including them, IMO!
2) It never ceases to pop up, in discussions about infield defense, how "important" a player's DP rating is (particularly for those that pivot also - the 2B and SS). It is NOT important (relatively speaking)! It is paled in "importance" by a fielder's fielding ability! A typical SS or 2B'man is a MAXIMUM of plus or minus 3 runs (pivot and starting DP's), and a typical first and third baseman is at most plus or minus ONE (yes, including the all "important" 3-6-3 DP) run! I guess the assertion that DP skill is "so important" falls into the category of the assertion that a great fielder saves his team "a run or two" per game...
BTW, I'm not sure what category OF arms fall into (are they undervalued or overvalued?). They are, in fact, more important than DP defense. The typical OF'er has an "arm lwt" of plus or minus 4, with an occasional 6.
Keep in mind that when we talk about "plus or minus" 3 for infield DP skill or "plus or minus" 4 for OF arms, these are one-year sample spreads, which means that the typical spread in "ability" is much less!
In fact, the more that I think of it, I long for the day when a player's "complete package" (read: Super-Lwts) is uttered in the same breath as his offensive value. As someone pointed out in the Dial article, Piazza is a good example of how you must consider a player's total package. As good a hitter as Piazza is, his other numbers are so bad across the board (throwing, GDP, baserunning, moving runners over), that he is NOT a great player overall. When discussing Piazza (and many other players (read: Jeter on defense), his other numbers should not be an afterthought...
If you just look at his offensive numbers (say, including his base stealing), he looks like a GREAT player, considering he is a SS, and worth every penny he is making.
If you then include his defense, he no longer looks like a GREAT player - he looks like a GOOD player.
If you then include his baserunning and advancing runners, he is now a VERY GOOD player.
Jeter and Piazza are examples of players whom we know have major non-hitting deficiencies (although we tend to ignore or at least minimize them). The players that most concern me are the ones whom we don't for whatever reason know or consider their non-hitting deficiencies (or their non-hitting benefits, though this seems to be less prevalent). Two that come to mind are Womack and Bernie Williams. Womack can't hit and I've never heard him referred to as a terrible defensive player, which he is. His last 3 years' adjusted UZR runs per 162 games at SS were -16, -10, and -27. He does not belong on a major league team.
Bernie Williams is usually referred to as a good or excellent CF'er, as far as I know (although probably those in the "know", other than sabermetricians, know otherwise). This is because he is (was?) fast and graceful, or perhaps once was a good or great CF'er. His adjusted UZR runs for the last 4 years are -20, -14, -16, and -33 and his arm runs are -2, -6, -5, and -10 (wow, I didn't realize his arm was that bad, and that baserunners aparrently know that)!!
Despite his great offense, he is no better than an above average CF'er, and I'm sure he is overpaid.
Yes, I recognize that defensive metrics are less "definitive" in many ways than offensive ones. Again, that does not mean that they should be ignored or mentioned as an afterthought! As far as the other components in my Super-lwts, they are as "definitive" (and reliable) as any offensive metric, and if nothing else, should be lumped in with a player's offensive lwts, so that we can get at least the "total package" sans defense.
One more thing: If we all had a nickle every time a player, coming off a bad year or years, said something like "I'm finally healthy..." (if you get my drift)
I think this is a good Bill James rule. A stat that never surprises is boring. A stat that always surprises is wrong. 80/20 is a good split.
So, is Torii Hunter part of the 20?
And how position always matters - although I know that is a point of argument at fanhome.
As for turning the DP, the Defensive Average work indicated that DPs were largely over-rated wrt importance. I ignored them and your work agrees with DA findings.
Piazza is a catcher. Is he not the highest ranking catcher?
Womack, in the sabermetric circles I run, is considered to suck on defense.
While the noise level is high on USENET, some of the sharpest minds around cut their teeth there, and most still frequent. And the peer review there is outstanding. There's one guy in rsb that is _the_ expert on the Wizard of Oz. I honestly belive he can claim that, if he wanted to - but it isn't Ty Cobb.
But you have the right view, and work like yours goes a long way to help - the total player contribution is what's important.
"MGL, I'm not sure how those two things come to mind after reading my articles. I am somewhat well known around USENET for being tyrannical wrt making people *always* consider defense on any "player X is better than Y" commentary."
My "musings" had nothing to do with you per se. In fact, you are one of those advocating presenting the "whole enchilada", right? Those 2 thoughts just popped into my head while I was reading your article, which was very good, BTW. In fact, I think that ZR and RC (or whatever offensive metric you want to use) should have been combined, as you sugggest, a long time ago, when talking about a player. I've never had a problem with ZR. I think it captures a fielder's value, albeit not perfectly, 90% of the time. Outside of sabermetric circles, even though ZR has been around for a long time (15 years?), it either gets mentioned as an afterthought, not mentioned at all, or scorned.
As far as Torii Hunter, he is very much in that 20% (like J.T. Snow), although you always get someone saying, "Yeah, I knew all along he was not a good fielder" - I would like to one time hear someone say that BEFORE the metric comes out..."
Hunter's UZR runs, per 162, 99-02:
-2, -8, +10, -17
He does have a good arm:
.4, 2.3, 1.4, 3.7
Snow:
-1, -12, +3, -8
(He is about a wash in DP runs)
"Definitely, I think last year was my best year... You can't go by number of errors, but I know I got to more balls that I didn't
get to in the past. I felt my range was better and all-around agility was better by far."
--Derek Jeter, Yankees shortstop, on his defense in 2002 (Sports Illustrated Online)
Jeter's "range" (errors not included, since, as Jeter says, you can't go by errors - whatever that means) UZR for last 3 years:
-5 (116 "games"), -19 (129 "games"), -26 (147 "games")
"Games" are Jeter's chances divided by league average SS chances per game.
So, if the numbers tell me something, and my impressions tell me another, then there must be a reason. I guess McRae's numbers weren't that good, and so maybe I have a certain bias for flashy plays in CF, having been subjected to Dawson, Grissom, and "Puck" White all my life. If Big Cat's numbers (in his prime) don't look good, then I'd say there's a problem with the numbers (I don't know what they are).
Maybe the most revealing things about UZR is with the OF, because positioning, speed, and hang time of ball are alot harder to judge in the OF than in the IF. Since 3B is very reactionary, and arm strength is easy to spot, I'd guess that 3B and 1B (scooping glove) should be the easiest positions to qualify visually, without supporting numbers. Maybe CF is the hardest.
So, maybe you can make a good guess at 90% of the 1b down to 60% in CF? The "visual fielding spectrum" (VFS) might be 1b,3b,c,2b,ss,rf,lf,cf,p ??
If you don't catch the ball properly, it's the same as an infield single. The LWT value of an infield single or a miscatch would probably be about .40 runs.
The LWT value of a misthrow would probably be similar to an outfield single, with sometimes being equivalent to a double. I'd guess therefore about .50 to .55 runs.
Suppose you have a guy with 5 catching errors and 25 throwing errors, and the reverse: what's the difference? Using the above numbers, you get a difference of 2 runs. I'll guess these are pretty extreme numbers, and you wouldn't get that kind of split on throwing/catching errors by *position*.
My guess is that distinguishing between catching and throwing errors will give you an "error range" (no pun intended) of 1 run.
If this is the case, with double plays being worth about half a run apiece (as compared to forceouts... a run apiece compared to all-hands-safe), it follows that the difference between a very good 2B and a very bad one is about ten double plays a year.
I'm not saying that's wrong, but that seems highly unlikely to me; I would have thought the difference would be greater, at least twice that.
I've got Craig Biggio at one end and Jose Vidro at the other end, and the difference between the two is about .20 DP turned as pivot / dp situation. There's about 100 of those opportunities to turn, so that makes the swing 20 DPs, or +/- 10 DP.
I talked to Dan Szymborski of this site about putting up the complete UZR files (on this site), but he hasn't gotten back to me on that. I don't think it will be a problem. He's been very helpful with everything else. (BTW, I submitted part II of the UZR article, so that should be up on this site shortly, I assume.)
Maybe Tango can put up the UZR files on his web site and Primer can link to them if they want.
I'll put them in the best format I can think of, something similar to what Shorty suggested (also Tango is a magician in terms of being able to "do" things on a web site, like sorting data with one click of the mouse)...
You must be Registered and Logged In to post comments.
<< Back to main