Baseball for the Thinking Fan

Login | Register | Feedback

You are here > Home > Dialed In > Discussion
Dialed In

Wednesday, November 02, 2005

D - ## !  D- ## !

I love the defense. As long as I have been stumbling around the internets, I have been arguing long and loud about how much defense was undervalued by Statheads. However, when I started I was arguing some things I would never say now, but I was at one point just as ignorant as Bill Plaschke.

Looking back, I was pretty amazed at the Re-Education of Chris Dial from November of 1997 to February of 1998. In June of 1998 , I devised some methodology to evaluate both sides of the ball using Zone Rating data, or about the same thing as SuperLwts/UZR.

I went through a few permutations of offense and defense, and began using Jim Furtado’s Extrapolated runs, and did a ton of work with Dale Stephenson (Google that name for the work of a top-notch sabermatrician) in determining runs for defense. Dan, we still need to get Dale to post his Peak Lists here.

Last year, I wrote a couple of “Who should be MVP?” articles. Right now the methodology link is dead, but it is pretty close to the above rsb post.

I wanted to improve what I had done. So I did some data-mining. I went back to the source – STATS - to better determine what ZR chances occurred at each position, and worked backwards from there to calculate defensive runs.

In doing so, I was able to come up with good averages for balls in play based on 3000-7000 defensive games (up to 60000 innings and 7000-22000 chances depending upon the position), and effectively draw a baseline of where a fielder’s production will lie based on his zone rating. Converting to runs is simple enough, once you figure out chances.

So I have done all this, worked on my calculations, and generated defensive runs. These runs saved (RS) are above average and specific to the individual’s playing time.

Yes, the first critique is: I have to do one of two things: normalize everyone to the same number of chances (the average) and indicate that the rate would result in so many more (or less) outs and runs. Or I use the average out conversion rate and subtract actual outs from average player outs. I’d certainly prefer to do it the second way, but I can’t. So I do it the first way.

I have some seasons where I can do it the second way. Working through the math, the difference between defensive plays at shortstop (the position with the most plays on average) converted to outs is plus or minus four plays. That’s three runs. So the first way is going to be within three runs of the second way from best to worst in 525 chances.

I took all the positions, made a nice spreadsheet, and have it ready to calculate runs saved above (or below) average.

The basic calculation is:
Player’s ZR times the average number of chances at the position over a full season times the average run value of balls hit to that position.

This yields a Player’s RS(cal), where (cal) is every inning of every game. We then subtract the league average RS(cal), and adjust those runs by the player’s actual playing time.

That yields Runs saved compared to what a player converting outs at X rate given an equal number of chances normalized to the playing time.

I know – how much does the normalization affect the data – as noted above, not very much.

With these formulas, you can generate good defensive value numbers on your own, anytime you need to. The results are going to be robust too.

What about UZR?

MGL has said that UZR *has* to be better than ZR because it is an extension of ZR. Well, it isn’t exactly. MGL converts all of the data from STATS zones to a different grid (Project Scoresheet) that is far less discriminating.

Envision this:
There are three zones (V, W, X) for a first baseman in ZR, covering about 24 feet. That is from the baseline to about 24-30 feet (depending on a fielder’s depth). These zones are assigned as “first base ones” because first basemen turn groundballs hit into those three zones into out more than half of the time. The next zone over, U, does not get half of the balls turned into outs, and thus it isn’t included as a first base zone. There are approximately 280 groundballs hit into those three zones. However, when you overlap these three zones with Project Scoresheet, only the first two zones, W, X, are the first baseman’s zone (Zone 3). The third zone, V, is combined with another two zones, T and U, creating a “34” zone and then UZR tries to average out the chances a first baseman should get from that. In UZR, unfielded groundballs in the T, U zones count the same against a 1B as those in the V zone (ZR zones T, U, V make up the Project Scoresheet (PS) 34 Zone) whereas they don’t count at all in ZR. That’s important. A ball hit 30 feet to the first baseman’s right is his responsibility in UZR. It is not in ZR.

STATS Zone Rating Grid

This doesn’t turn out too terrible because Zone T is the responsibility of second basemen in ZR. This is why UZR misses 2B the most (I think). Half of the groundballs hit in that portion of the PS 34 zone are in ZR by default at full credit, whereas in UZR they are not.

And so it goes around the diamond. In addition, the PS 4M zone (up the middle to the 2B side of second) is split in ZR. Half of it is the responsibility of the 2B, but the other half is not.

The 3B/SS side of the field sees the same problems. As I pointed out in Mike Emeigh’s great eight part series on Jeter’s defense, there was a zone assignment error from Project Scoresheet that rendered one season of Jeter’s data unusable. It’s a big issue.

Project Scoresheet Grid

In a nutshell, UZR is a very nice system. However, I believe the proprietary nature of the STATS’ raw data compels the data user to “tweak” it, and in my opinion the use of Project Scoresheet zones makes it less accurate than generating runs saved from ZR, as I have done.

Chris Dial Posted: November 02, 2005 at 03:45 AM | 111 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 2 of 2 pages  < 1 2
   101. DSG Posted: November 09, 2005 at 09:21 PM (#1725535)
Great stuff, Blackhawk! A 5 run difference is relatively large, however, and that is roughly 1 standard deviation here. It's not a fatal flaw or anything, but that, coupled with other flaws, means that you will have some "wrong" zone ratings, especially with smaller sample sizes. Again, this is great stuff. Nice job.
   102. Spivey Posted: November 09, 2005 at 09:33 PM (#1725560)
How is NEB calculated in this?
   103. Mike Emeigh Posted: November 09, 2005 at 09:59 PM (#1725599)
I don't know how much of a problem it is, but when you make hypotheticals you need to approach realism more - total chances and 5 converted - otherwise, while hte math is easier, it's exxaggerated for effect.

Well, that's what we don't know - how many OOZ plays there are. But take it to 400 in-zone BIP if you want, with a .750 average ZR. The in-zone guy will catch 300 of those; the OOZ guy will get 5 plays OOZ but if he gives up more than one ball in-zone to catch those five OOZ he'll still have a lower ZR.

Jump the average ZR higher, to .850. The in-zone guy makes 340 plays. The OOZ guys still gets 5 balls OOZ, but if he gives up even one in-zone ball to catch those 5 OOZ balls, his ZR will drop below the in-zone guy's.

-- MWE
   104. Chris Dial Posted: November 09, 2005 at 10:29 PM (#1725652)
means that you will have some "wrong" zone ratings,

Ooof. No, that assumes the *other* method is right, when it uses the larger zones. Of course, maybe that's why you had quotes around it.

My method that he references is slightly different, but maybe it shouldn't be.
   105. Los Angeles Waterloo of Black Hawk Posted: November 09, 2005 at 10:30 PM (#1725654)
Searching through some old posts, it seems that NEB is based on opportunities in a player's zone ...

... here is the CF DA data from 1988-1992. Parsing the explanation, NEB comes out of how many 2B and 3B were hit to a fielder's zone, compared to the league average for that position.

If you just search Google groups for "Dale Stephenson" "fielding runs" and/or "defensive average", you'll find all kinds of great stuff on the topic. It's presented pretty well and I really wish someone was still distributing data this way.
   106. Spivey Posted: November 09, 2005 at 10:35 PM (#1725669)
Thanks Blackhawk. It seems like this is a stat (NEB) that would be very strongly influenced by handedness of staff.
   107. Los Angeles Waterloo of Black Hawk Posted: November 09, 2005 at 10:37 PM (#1725671)
I agree that a five-run difference is "fairly" large, but, if all you have is ZR and some estimate of opportunities (fairly easy for OF, though sketchy for IF), I think pegging a particular CF as +10 runs +/-5 isn't that bad.

Are we confident in pegging an individual hitter's run contribution within a margin of 5? Fairly so ... but I don't know that anyone would argue that a batter that was +10 was indisputably better than one that was +5. There's a margin of error around everything they do.
   108. Chris Dial Posted: November 09, 2005 at 10:37 PM (#1725672)
I really wish someone was still distributing data this way.

Now you know why I pimp rsbb so much.

When Giants walked the Earth, and all that.

Szym and I are working on presenting the data like that all the time. My niext piece will be up shortly.

Stupid awards articles... (I kid! I started it, and I love it when Ant writes.)
   109. Chris Dial Posted: November 09, 2005 at 10:39 PM (#1725674)
Are we confident in pegging an individual hitter's run contribution within a margin of 5? Fairly so

No, we aren't. I think the margin is right around 5 runs.

I have generally contended that methodology was about half as close as offense (3-4 and 6-8). But that's mostly due to the limitations of not being able to reach the minutia of all teh context that exists on every BIP. It simply cannot be done.
   110. Chris Dial Posted: November 09, 2005 at 10:40 PM (#1725675)
Oh, and Dale Stephenson's PEak Lists....(whistle)....

Solves more trouble than you can imagine.
   111. Los Angeles Waterloo of Black Hawk Posted: November 09, 2005 at 11:04 PM (#1725711)
Oh, I always knew why you pimped rsbb (I was active on one team's newsgroup, and often lurked in rsbb, though this was in the late 90s, after a lot of this work had been done).

Life is not quite long enough to dig through all the interesting stuff done there; you can lose days just wading through stuff written 12 or 13 years ago ...
Page 2 of 2 pages  < 1 2

You must be Registered and Logged In to post comments.



<< Back to main

BBTF Partner

Support BBTF


Thanks to
for his generous support.


You must be logged in to view your Bookmarks.


Page rendered in 0.3356 seconds
61 querie(s) executed