D - ## ! D- ## !
I love the defense. As long as I have been stumbling around the internets, I have been arguing long and loud about how much defense was undervalued by Statheads. However, when I started I was arguing some things I would never say now, but I was at one point just as ignorant as Bill Plaschke.
Looking back, I was pretty amazed at the Re-Education of Chris Dial from November of 1997 to February of 1998. In June of 1998 , I devised some methodology to evaluate both sides of the ball using Zone Rating data, or about the same thing as SuperLwts/UZR.
I went through a few permutations of offense and defense, and began using Jim Furtado’s Extrapolated runs, and did a ton of work with Dale Stephenson (Google that name for the work of a top-notch sabermatrician) in determining runs for defense. Dan, we still need to get Dale to post his Peak Lists here.
Last year, I wrote a couple of “Who should be MVP?” articles. Right now the methodology link is dead, but it is pretty close to the above rsb post.
I wanted to improve what I had done. So I did some data-mining. I went back to the source – STATS - to better determine what ZR chances occurred at each position, and worked backwards from there to calculate defensive runs.
In doing so, I was able to come up with good averages for balls in play based on 3000-7000 defensive games (up to 60000 innings and 7000-22000 chances depending upon the position), and effectively draw a baseline of where a fielder’s production will lie based on his zone rating. Converting to runs is simple enough, once you figure out chances.
So I have done all this, worked on my calculations, and generated defensive runs. These runs saved (RS) are above average and specific to the individual’s playing time.
Yes, the first critique is: I have to do one of two things: normalize everyone to the same number of chances (the average) and indicate that the rate would result in so many more (or less) outs and runs. Or I use the average out conversion rate and subtract actual outs from average player outs. I’d certainly prefer to do it the second way, but I can’t. So I do it the first way.
I have some seasons where I can do it the second way. Working through the math, the difference between defensive plays at shortstop (the position with the most plays on average) converted to outs is plus or minus four plays. That’s three runs. So the first way is going to be within three runs of the second way from best to worst in 525 chances.
I took all the positions, made a nice spreadsheet, and have it ready to calculate runs saved above (or below) average.
The basic calculation is:
Player’s ZR times the average number of chances at the position over a full season times the average run value of balls hit to that position.
This yields a Player’s RS(cal), where (cal) is every inning of every game. We then subtract the league average RS(cal), and adjust those runs by the player’s actual playing time.
That yields Runs saved compared to what a player converting outs at X rate given an equal number of chances normalized to the playing time.
I know – how much does the normalization affect the data – as noted above, not very much.
With these formulas, you can generate good defensive value numbers on your own, anytime you need to. The results are going to be robust too.
What about UZR?
MGL has said that UZR *has* to be better than ZR because it is an extension of ZR. Well, it isn’t exactly. MGL converts all of the data from STATS zones to a different grid (Project Scoresheet) that is far less discriminating.
There are three zones (V, W, X) for a first baseman in ZR, covering about 24 feet. That is from the baseline to about 24-30 feet (depending on a fielder’s depth). These zones are assigned as “first base ones” because first basemen turn groundballs hit into those three zones into out more than half of the time. The next zone over, U, does not get half of the balls turned into outs, and thus it isn’t included as a first base zone. There are approximately 280 groundballs hit into those three zones. However, when you overlap these three zones with Project Scoresheet, only the first two zones, W, X, are the first baseman’s zone (Zone 3). The third zone, V, is combined with another two zones, T and U, creating a “34” zone and then UZR tries to average out the chances a first baseman should get from that. In UZR, unfielded groundballs in the T, U zones count the same against a 1B as those in the V zone (ZR zones T, U, V make up the Project Scoresheet (PS) 34 Zone) whereas they don’t count at all in ZR. That’s important. A ball hit 30 feet to the first baseman’s right is his responsibility in UZR. It is not in ZR.
STATS Zone Rating Grid
This doesn’t turn out too terrible because Zone T is the responsibility of second basemen in ZR. This is why UZR misses 2B the most (I think). Half of the groundballs hit in that portion of the PS 34 zone are in ZR by default at full credit, whereas in UZR they are not.
And so it goes around the diamond. In addition, the PS 4M zone (up the middle to the 2B side of second) is split in ZR. Half of it is the responsibility of the 2B, but the other half is not.
The 3B/SS side of the field sees the same problems. As I pointed out in Mike Emeigh’s great eight part series on Jeter’s defense, there was a zone assignment error from Project Scoresheet that rendered one season of Jeter’s data unusable. It’s a big issue.
Project Scoresheet Grid
In a nutshell, UZR is a very nice system. However, I believe the proprietary nature of the STATS’ raw data compels the data user to “tweak” it, and in my opinion the use of Project Scoresheet zones makes it less accurate than generating runs saved from ZR, as I have done.
Posted: November 02, 2005 at 04:45 AM | 111 comment(s)
Login to Bookmark