Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Dialed In > Discussion
Dialed In
— 

Friday, November 11, 2005

Dr. StrangeGlove or: How I Learned to Stop Worrying and Love Zone Rating

There is a ton of mistrust of defensive methodologies, and how well they describe defensive play.  MGL’s UZR is recognized at many baseball sites as being good enough to cite with considerable confidence in its accuracy.  MGL does a good deal of work on it to perfect it, and he’s no dummy, so it’s a fair position of others to want to quote his data.

I’m here to tell you, friends and neighbors, that you, yes, you, can be a defensive runs saved calculator without the need to pray MGL is able to let you in on how your favorite player performed.  MGL always obliges, but he doesn’t have to have the burden.

You will have to do some of the work, so stop watching that baseball game and get your head into a spreadsheet.


Here’s what you are going to need:

1. The player’s name and defensive innings
played at position.

2. The average number of innings a team
played in a season. Usually, it is about 1440,
depending on extra inning games, etc. In 2005,
the average AL team had 1441.0.

3. Obtain the league average ZR at position.
This can be done by “back calculating” from every
other player’s ZR. It’s a bit tricky, but you can do it.

4. All you need now is how many chances each
position has if they play every inning and the average
run value per play at that position.

And those are right..here:

Position AvgZROps Runs/play
1B 281 0.798
2B 507 0.754
3B 430 0.800
SS 532 0.753
LF 348 0.831
CF 462 0.842
RF 365 0.843

5. Mix thoroughly. Sprinkle lightly with arm
and/or double play ratings. You read that correctly. You
don’t need assists or putouts or anything. Yes, that’s
mildly annoying because it makes you think,
"that can’t be right”. But trust me. Just trust me.

6. Example:

Position AvgIP AvgOpps AvgZR Run/play
1B 1440 281 0.871 0.798

last team INN ZR
Helton Col 1230.7 .916

Is that all the data I said you needed to gather/gave you?

Let’s start calculating!

Average Plays Made (PM) = Avg Chances (281) * Avg ZR (0.8710)
Average Plays Made (PM) = 244.75

So far, so good. That is the average plays made you
are going to compare every first baseman to.

Now the run value of that:

Runs Saved at Position = PM (244.75) times Runs/play (0.798)
Runs Saved at Position = 195.31 Runs Saved at Position

It is very important to remember that represents the
Runs Saved (RS) for playing every inning. I refer
this to a “cal” after Cal Ripken who played every
inning for about a decade.

At first base then, the Avg RScal = 195.3.

For Helton we get:
RScal = Helton’s ZR * 281 AvgZRChances * 0.798 runs/play
RScal = 205.4

RScal+ = RScal above average (simple subtraction)

Helton’s RScal+ = 205.4 – 195.3 = 10.1 RScal+

Converting to Helton’s playing time, RSpt:

RScal+ * Helton’s Innings / League Avg Innings

RSpt = 10.1 * 1230.7 / 1440 = 8.6

Yes, that can be a one-line formula, but I personally
like having “if he played every inning”.

To summarize Helton’s line:

last team INN RScal Rscal+ RSpt RS/150
Helton, Col 1230.7 205.4 10.1 8.6 9.4

As described previously, it would be better to also look at Helton’s actual ZR Chances multiplied by the league average ZR, and then subtracting from Helton’s RS.

I don’t have Helton’s actual ZR chances.  I think trying to re-estimate them is folly, as my “281” number is generated from some 56,000 innings and 11,000 chances.

Now you can set up your own spreadsheets and load the data.  You can post weekly updates, or just be ready to look at defensive value before you post your MVP selections.

It is very important to understand that ZR chances are like plate appearances.  If a fielder has only played enough innings to make 100 fielding plays, then it is too early to judge how good of fielder he is.  It’s like 100 PAs.  So don’t put too much weight, good or bad, on guy who haven’t played much.  It’s very important for catchers too, because they always play partial seasons.

Oh, more info?

Okay, for outfielders’ arms, I calculate the average assists per inning from every outfielder and then multiply by the league average innings.  Then I subtract that from the player’s assists with the proper playing time conversion.  Those are straight runs added to a player’s RSpt (or RScal+).

I do not park adjust, pitching staff adjust, groundball/flyball adjust.  I am really skeptical of that still – because I haven’t worked it out for myself.  I’m stubborn that way, but I also have never seen the data broken out in that manner.

Catchers are done as an amalgam of caught stealing per inning above average, stolen bases per inning above average, errors per inning above average and passed balls per inning above average, at an average base advancement of 0.31 runs per.  I have vacillated between a couple of catcher run value calculations.  Someone needs to make an argument on which way I should go.

Is this method any good?

I like my method.  I think defensive evaluations from a zone perspective is the best way.  I think this method is robust due to the sheer volume of data input.  How do I compare it to anything?  I know UZR is pretty good, so I took MGL’s posted numbers in the Gold Glove articles, and David Gassko’s (DSG) from his article at The Hardball Times.  Mr. Smith (Rally) provided all of us with all his numbers for players with 250 innings.

I have 56 data points from MGL.  I have 100 from DSG.  I used 122 from Rally.  The reason I used 122 from Rally is because I used the guys that you have read with posted numbers.  Besides, you’ll see there isn’t much point to more between Rally and me.  We agree very tightly.

I made comparisons to each other, all based on 150 defensive games (1350 innings played) excluding outfield arms and double plays.

Correlations:



	vsMGL	vs DSG	vs Rally
Dial	0.82	0.60	0.97
Rally	0.80	0.61	
DSG	0.61	

Feel free to square those if you can’t do that in your head.  The complete list of data used can be found here.

What about absolute differences?  You may have seen various discussions (see the AL Gold Glove article) regarding the Nick Swisher number.  MGL has Swisher at +37, while Rally and I had him at average and DSG had him very negative (-23).  Somebody, somewhere has something off.

Average Absolute Difference:


	vsMGL	vs DSG	vs Rally
Dial	 9.9	12.5	3.0
Rally	10.0	12.2	
DSG	12.2

I’m certain there is a better statistical way to compare these results, but I don’t know what it is.

By position:

		
	Correlations with MGL		
	Dial	Rally	DSG
2	1.00	0.98	0.98
3	0.88	0.88	-0.28
4	0.86	0.85	0.91
5	0.84	0.83	0.66
6	0.69	0.59	0.83
7	0.99	0.99	0.95
8	0.88	0.84	0.83
9	0.74	0.78	0.14


As DSG noted in his article at The Hardball Times, he has issues to resolve at first base and right field.  However, at the middle infield positions, shortstop and second base, DSG’s are significantly better than either ZR method.  It appears “Range” is capturing something that agrees better with UZR.  I have no comment on whether or not that makes it more “correct”.

Rally’s method correlates well everywhere with some question at shortstop.  I don’t know if his double play addition would increase that, but MGL’s doesn’t include double plays.

My rating does very well.  The worst correlation is 0.69, with a 0.9 rating or greater at five of the eight of the positions.

I am using UZR as a baseline because it is well respected.

If I remove the four worst matches – the players each method disagreed with UZR the most - the correlations increase (duh).  Removing those four moved my correlation to 0.87, Rally’s to 0.84, but most amazingly, DSG’s to 0.83.  That’s a huge jump in DSG’s numbers.  That’s basically saying 7% of these players are problematic when comparing a non-pbp method to a pbp method.  I think that’s really good.  I don’t know if that 7% can be eliminated.

I haven’t worked up all of the 2000-2003 data for which there is a bunch of UZR data available.  One of you more industrious fellows can take my methodology, make your own spreadsheet and compare to UZR.  Or you can wait until I get around to it.  Which will happen.  No, really.

Each of you is now armed with the ability to accurately estimate how many runs a defender saves at any moment.  No more “but how good is his glove”?  No more “will MGL post UZR”?  Well, we still want that, but you can feel pretty confident using this methodology that you are right on it.

Plus, when you vote for MVP, or ROY, you can do it with

50%

52% more knowledge.

HTH.

In the spirit of open research, the data used to calculate these defensive ratings can be found here.

Chris Dial Posted: November 11, 2005 at 03:19 AM | 164 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 2 pages  1 2 > 
   1. Russ Posted: November 11, 2005 at 01:30 AM (#1727243)
You can use something like intraclass correlation to measure agreement between multiple raters and agreement with a gold standard. It's not the "best" way to measure, but it's a bit better than correlation because two things can be correlated, but don't agree very well. That can be important for methods that think they're talking about runs saved (as everyone "knows" what a run is).
   2. DSG Posted: November 11, 2005 at 01:47 AM (#1727263)
Chris,

First off I just want to commend you on a great article. Secondly, I can suggest why my method has some of the problems it does at first base and in right:

1) At first, I'm apparently not doing a great job of measuring independent putouts. I don't know why, maybe it's because I include pitcher putouts, or maybe my infield fly adjustment is off (I'm going to need to vet that w/ PBP data), but the point is that Range disagrees with non-PBP metrics at first where there should be high agreement, and the non-PBP metrics certainly should not be closer to what ZR/UZR are saying.

2) In RF, the only problem I can think of is my treatment of line drives. In Range, it is assumed that right-handed batters hit 1.3x more fly balls and line drives to RF than left-handers. While this is true for all BIP, perhaps the break down is something like 1.5x for FB and .5x for LD (since LD are probably more likely to be pulled). In that case, my ratings for RF should be "funkier" than those for left since there are more right-handed batters, but not by this much. In LF, UZR and Range have a perfect correlation, so there has to be something else here as well. Maybe I entered the data incorrectly (though that is somewhat unlikely). I'll have to check.

Either way, this point really made me feel good:

If I remove the four worst matches – the players each method disagreed with UZR the most - the correlations increase (duh). Removing those four moved my correlation to 0.87, Rally’s to 0.84, but most amazingly, DSG’s to 0.83. That’s a huge jump in DSG’s numbers. That’s basically saying 7% of these players are problematic when comparing a non-pbp method to a pbp method. I think that’s really good. I don’t know if that 7% can be eliminated.

That's what I'm working on. I think Range is trustable everywhere but first and RF (and partially 3B). You'll notice of course that those are all corner positions, so it's possible that the BIP estimates just aren't as good there. In the end, that is what's being compared here: how well each system estimates how many plays a player should have made.
   3. Chris Dial Posted: November 11, 2005 at 01:48 AM (#1727265)
Russ,
send a "how to", or better yet, use the spreadsheet and report your findings.
   4. GGC don't think it can get longer than a novella Posted: November 11, 2005 at 01:59 AM (#1727269)
What does this have to do with Dick Stuart?

Heh, seriously Chris. I like the series so far. I just have to block out an afternoon, maybe this Saturday, and go through it. Defense is one area that I know little about.

I think that I met you, btw, DSG. Where you at a SABR regional in Boston around MLK day?
   5. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 02:57 AM (#1727346)
So this was written by Dial but posted by Szym?

Whoever did it, this is good stuff; this is not dissimilar to how I attempted to convert David Pinto's PMR into runs last offseason (my approach, of course, based on work Dial did 298 years ago on Usenet).
   6. Dan Szymborski Posted: November 11, 2005 at 03:14 AM (#1727358)
Crap.
   7. Dan Szymborski Posted: November 11, 2005 at 03:16 AM (#1727359)
This was all written by Chris Dial. For some reason, I can't select author for posts I open in Dialed In, so it's erroneously credited to me. This should be fixed soon, I hope.
   8. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 03:21 AM (#1727362)
Looking back at this for a second:
Position   AvgZROps  Runs/play   Runs/Season  AvgZR*  Runs/Season
                                  Perfect               Average
   1B         281      .798        224.24     .870       195.09
   2B         507      .754        382.28     .822       314.23
   3B         430      .800        344.00     .783       269.35 
   SS         532      .753        400.60     .835       334.50
   LF         348      .831        289.19     .861       248.99
   CF         462      .842        389.00     .888       345.44
   RF         365      .843        307.70     .873       268.61
*AvgZR is the average of the averages of the two leagues
This might sound stupid, but does this tell us anything about the relative importance of players at different positions? An average SS will prevent 139.41 more runs in a season than an average 1B will. That strikes me as a huge difference, obviously far larger than the offensive difference between the positions, which I doubt is more than 20-25 runs in the modern game.

Let's say:
Pos  Runs Created     Runs Prevented       Total
1B       100                195             295
SS        80                335             415
Total    180                530             710
That's the average. But what if keep your average SS, and replace him with a 1B who is 95% of average offensively and defensively:
Pos  Runs Created     Runs Prevented       Total
1B        95                185             295
SS        80                335             415
Total    175                520             695
But what if you keep your average 1B and take the 95% SS?
Pos  Runs Created     Runs Prevented       Total
1B       100                195             295
SS        76                318             415
Total    176                513             689
Your average SS is 21 runs better than the 95% SS, and your average 1B is 15 runs better than the 95% 1B.

I actually have to go, so I ask: does any of that mean anything?
   9. Dan Szymborski Posted: November 11, 2005 at 03:37 AM (#1727371)
Good thinking Russ - comparing defensive runs is kinda akin to grading skating figure judges - we don't have these rock-solid xs and ys to choose from (if this is confusing, Russ will probably explain it a lot better).

        vsMGLvs DSGvs Rally
Dial0.68    0.51    0.96
Rally0.70    0.52
DSG0.61
   10. MAH Posted: November 11, 2005 at 03:39 AM (#1727373)
Chris,

Just terrific. I'll want to redo the 2001-03 DRA/ZR/UZR test using your calculation. Can I call it "Dial Zone Runs" ("DZR")?

"3.Obtain the league average ZR at position.
This can be done by “back calculating” from every
other player’s ZR. It’s a bit tricky, but you can do it."

Does this mean that you calculate the league-average ZR at that player's position _excluding_ the player you're rating? Conceptually that makes sense, but is it necessary to go through that extra effort? Does it make a meaningful difference if you just use the league-average ZR?

"4.All you need now is how many chances each
position has if they play every inning and the average
run value per play at that position."

Are the league-average total chances per position per team per 1440 innings provided immediately below the quoted language based on 2005 data, or a larger sample? If the larger sample is used, does that mean that DZRs over the course of a season might not add up to zero each season? Even if that is so, it's not a big deal--just want to know.

I guess that's another way of asking whether the same "total chances" numbers can be used with 2001-03 ZR ratings.

As I understand DZR, you're effectively calculating (i) the number of runs the player would have saved compared to the league-average fielder if he had the league-average number of chances, as opposed to calculating (ii) the number of runs the player saved compared to what the league-average player would have saved if the league-average player had had the same number of chances the player actually had. Again, that's not a criticism, because you're getting better correlations with UZR than I had been able to derive trying to make estimate (ii) using ZR.

For purposes of testing a non-PBP system, do you think it is a good idea to test the non-PBP system against some sort of weighted average or linear combination of ZR and DZR? My personal view is yes, as I think there are sufficient differences in the strengths and limitations of each system to believe a combination is better than either alone.

Thanks again.
   11. Chris Dial Posted: November 11, 2005 at 03:48 AM (#1727379)
An average SS will prevent 139.41 more runs in a season than an average 1B will. That strikes me as a huge difference, obviously far larger than the offensive difference between the positions, which I doubt is more than 20-25 runs in the modern game.

THis is true.

This is why I have always argued that you can't judge players as hitters and fielders, but as a combo of the two. Tango prefers to judge them the former way (I think).
   12. Chris Dial Posted: November 11, 2005 at 03:49 AM (#1727380)
Oh, yes I think so, Blackhawk.

It strongly points up that defense is very important.
   13. Chris Dial Posted: November 11, 2005 at 03:59 AM (#1727385)
MAH,
the answer to 3 is that I don't bother to subtract out the individual in question. For 60 players it isn't so necessary.

If the larger sample is used, does that mean that DZRs over the course of a season might not add up to zero each season?

The larger sample was used to generate a "chances per inning" rate. Hmm, I didn't multiply it by the corrected total innings though. That's a slight error in the formula. See, I did this without intending for it to work this well...Thanks, MAH, that may increase my accuracy. However, average innings played each season is not too variable.

I guess that's another way of asking whether the same "total chances" numbers can be used with 2001-03 ZR ratings.

Absolutely. However, it may improve it to multiply by "innings played in season you are looking at/1440". That'll be very close to one.

My personal view is yes, as I think there are sufficient differences in the strengths and limitations of each system to believe a combination is better than either alone.

Yes.

And as far as your i vs ii question - I'd prefer ii, but don't have *actual* ZR opps for every season - this approximation is a good one though.
   14. DSG Posted: November 11, 2005 at 06:04 AM (#1727448)
I think that I met you, btw, DSG. Where you at a SABR regional in Boston around MLK day?

No, sorry. I have yet to attend a SABR thing, or join for that matter. I really should do both.

This might sound stupid, but does this tell us anything about the relative importance of players at different positions?

It's not and it does. I do this sort of thing in an article I'll publish on THT, probably Monday. For example, 139.41 runs will be worth about 6.75 Win Shares per 162. According to Raindrops, the difference in actual Win Shares is 4.81, a difference of less than 7 runs a year between the two, which isn't too bad. BTW, Win Shares pretty much nail the relative value of positions, at least based on the approach I took in my article.
   15. DSG Posted: November 11, 2005 at 06:38 AM (#1727464)
I posted about this on my blog and I want to drop it into the discussion here, because in all honestly, I'm truly baffled. About a week ago, I was playing around with some defensive numbers when it came to me that an easy and fun stat to use for infielders would be A/(A+E). I thought that this should work out pretty well, since taking out PO reduces the noise of F% and errors should stay pretty consistent based on a player's chances on ground balls. Today, I decided to run some correlations with zone rating for the hell of it, expecting a .5 or .6 so that I could post it on my blog and say, guess what? Modified Fielding % (mF%) works pretty well. The results I got were jaw-dropping. The correlations I got were,

2B = .87
3B = .85
SS = .85

You read that right. mF% correlates with ZR in the mid .8s in the infield. Which is, of course, insane. Someone has to perform the tests independently and see if they get the same result. If they do, then clearly this is the best way to evaluate defense at those positions for years where PBP data is unavailable. My jaw is just hanging loose right now, to be honest.
   16. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 08:49 AM (#1727516)
Do you know the correlation between normal old fielding percentage and ZR?
   17. dcsmyth1 Posted: November 11, 2005 at 01:28 PM (#1727566)
----"Do you know the correlation between normal old fielding percentage and ZR?"

Not OTTOMH, but I've known for many years that Fld% correlates much better with fewest runs allowed on a team level than it "should", given the small number of errors made. Fld% carries lots of "hidden information". I don't think it's true that there are very many "statues" with good Fld% in MLB.

I was playing around with a stat similar to the one just posted by DSG a year ago. I don't recall the exact form, but I'm pretty sure I didn't exclude PO entirely--I had them in at some proportion (10%? 20%?)
   18. MAH Posted: November 11, 2005 at 03:21 PM (#1727610)
DSG,

Those are interesting findings regarding infield errors.

In developing DRA and testing the independent significance of infield errors, I also used a version of fielding percentage that excluded infield putouts, because, as you know, infield errors overwhelmingly occur on ground balls, either by mishandling the batted ball or by failing to make a successful throw to first.

In my February Hardball Times article I mentioned that ZR seemed to favor infielders who avoid errors, which makes sense. It stands to reason that an infielder who avoids errors will make a higher percentage of plays on balls hit reasonably close to him.

So the question, discussed at length above, is raised again about whether, particularly for infield ground balls, which are rarely discretionary plays, ZR should be covering the entire infield, rather than leaving out the "slices" in the gaps and up the middle. It would be interesting to know the percentage of batted balls fielded in the "uncounted" infielder slices. Zones are excluded if less than 50% of BIP in that zone are successfully fielded. Even if a zone has only a 30% success rate, it suggests that a lot of plays are being made there, and it could be precisely those plays that separate the excellent fielders from the merely solid ones.

This is one of the reasons why I think it's a good idea to use both UZR (which covers the whole field) and DZR for rating players. In the absence of UZR, a combination of DRA (which also counts all plays) and DZR would probably be a good "second best".

Duffy Duff,

Errors are, by definition, plays _not_ made. Therefore, a correlation does exist between errors and team runs allowed in the _absence_ of _other_ data regarding plays not made.

DRA generally finds that errors (except in right field) have no statistically significant impact on _team_ runs allowed, _given_ the number of plays _made_ (infield assists and outfield putouts) and batted balls in play.

I'm pretty sure there have been quite a few low-range but sure-handed fielders, particularly in the outfield. Greg Luzinski. Manny Ramirez. Brian Downing. In the infield, Joe Morgan and Larry Bowa come to mind.
   19. DSG Posted: November 11, 2005 at 03:42 PM (#1727634)
Do you know the correlation between normal old fielding percentage and ZR?

2B = .81
3B = .86
SS = .81

So this shocks me more, because I thought that putouts would provide a lot of noise. Let's see what happens if we remove all players with under 725 innings played (about half a season).

<u>mF%</u>

2B = .63
3B = .21
SS = .66

<u>F%</u>

2B = .62
3B = .19
SS = .63

Wow! I knew that restricting the list to only high-innings players would reduce the correlations but I did not expect that it would impact F% just the same as mF%. I would have expected mF% to hold up much better. Either way, it seems that this would only work at the middle infield positions, probably because errors there are more indicative of range that at third base, where they are generally an indication of a strong throwing arm that is perhaps a bit erratic. Either way, it doesn't seem to offer much more information the regular old F%.
   20. Chris Dial Posted: November 11, 2005 at 03:43 PM (#1727637)
You read that right. mF% correlates with ZR in the mid .8s in the infield. Which is, of course, insane. Someone has to perform the tests independently and see if they get the same result. If they do, then clearly this is the best way to evaluate defense at those positions for years where PBP data is unavailable.

I don't think it is.

Here's why: every A+E is included in ZR. So all you are leaving out is "plays not made in zone". Since the way ZR is defined is that you field at 85%, then you aren't *usually* going to be very far off if you use A+E. In fact, you will gain some ground at 2B/SS for DPs turned. They gain assists to make up for plays not made in their zone. It comes up with an interesting number, but the more DPs you turn, the "better" you look without fielding more GBs.
   21. AROM Posted: November 11, 2005 at 04:02 PM (#1727667)
Blackhawk,

In #8 you are looking at the number of runs an average/perfect player saves compared to not having a man at the position. To compare it to offense and be consistent, you have to consider not just RC but the negative runs that come from having a player using up 650 or so outs and never reaching base or advancing a runner.

Another way to compare the relative value of positions is the difference between a "perfect" fielder and an average one. +29 for 1B, +66 for shortstop, etc.
   22. AROM Posted: November 11, 2005 at 04:05 PM (#1727674)
Modified Fielding % (mF%) works pretty well. The results I got were jaw-dropping.

David, this is interesting. Is that correlation just a result of the nature of the numbers? mF% and Zr will almost always be in the .800 to .950 range. I wonder if the correlation will hold if you compare it to runs.

Thanks to all our great veterans, I have the day off to ponder questions like this. I'll look into it and post my results.
   23. DSG Posted: November 11, 2005 at 04:11 PM (#1727680)
I don't think it is.

I now agree. See the post I made right above you.
   24. Tom Cervo, backup catcher Posted: November 11, 2005 at 04:28 PM (#1727703)
So the question, discussed at length above, is raised again about whether, particularly for infield ground balls, which are rarely discretionary plays, ZR should be covering the entire infield, rather than leaving out the "slices" in the gaps and up the middle. It would be interesting to know the percentage of batted balls fielded in the "uncounted" infielder slices. Zones are excluded if less than 50% of BIP in that zone are successfully fielded. Even if a zone has only a 30% success rate, it suggests that a lot of plays are being made there, and it could be precisely those plays that separate the excellent fielders from the merely solid ones.

This is why I want a somewhat altered version of ZR. I asked Stats, Inc. if I could get a ZR that would be:

OIZ/BIZ
OOZ
(OIZ + OOZ) / BIZ

They said it would be about $2,000 for 2005 data, which I can't afford but if anybody else wants to...or wants to get a bunch of people to go in on it and put it somewhere where we'd all have access.

I would think you could also use DSG's range to estimate the number of GB that should have been hit while the fielder was playing and estimate the balls they get to out of their zone compared to others.

I'm also waiting for a response from Baseball Info. Solutions to see if they could do this.
   25. Mike Green Posted: November 11, 2005 at 04:41 PM (#1727717)
I agree that one must look at offence and defence in combination, but the scale in post 8 is quite clearly inaccurate. When we say that an average offensive shortstop creates 80 runs, we know that this is approximately correct in the following sense. Nine batting order positions times 80 runs is 720 runs which is roughly what an average team scores.

If we say that an average shortstop prevents 335 runs, we are quite clearly not on the same scale as we were for run creation. The average team allows those same 720 runs, and that includes contributions from the pitching staff and team defence. There is little debate that team defence accounts for 10-20% of run prevention, which would allocate 72-144 runs to the defence. The average defensive shortstop will probably have a significant share of these runs, but it is doubtful that we are speaking of more than 25% of this figure or 18-36 runs.
   26. Mike Green Posted: November 11, 2005 at 04:45 PM (#1727724)
Oops. There is little debate that defence accounts for 20-40% of runs prevention. The corresponding figures for the defence and shortstop are 144-288 and 36-72 respectively.
   27. Chris Dial Posted: November 11, 2005 at 04:49 PM (#1727732)
Actually, Tony, thanks. I may look into getting that.

As for BIP estimates, that exists. We already know that there are significant problems with that for some players.
   28. AROM Posted: November 11, 2005 at 04:56 PM (#1727748)
After converting mF% to runs, and using a 250 inning cutoff, I get these correlations:

2B .25
SS .45

Makes sense, as there's a lot more variation on error rate among SS than 2B.
   29. Chris Dial Posted: November 11, 2005 at 05:09 PM (#1727762)
If we say that an average shortstop prevents 335 runs, we are quite clearly not on the same scale as we were for run creation.

I'm aware that you say "not the same scale", but Pitchers/defense prevent HBIP at a 70% clip (effectively DER), while offense only achieves it at a 30% clip.

That means that for the 720 RA, there is actually some 1680 runs prevented. That's about 20% for the SS in the above example.

I don't think that's necessarily the "right" way to do it - I was more agreeing that a SS has more value than he appears because of a defensive premium.
   30. DSG Posted: November 11, 2005 at 05:10 PM (#1727763)
Anaheim, your runs figures should be tiny since so few errors are made, even by the biggest offenders (it's tough to be more than 10 or so errors worse average, I think). Of course, that does not impact correlation.
   31. Chris Dial Posted: November 11, 2005 at 05:11 PM (#1727766)
Mike,
I appreciate your contributions and hope you will have the time for more. Every new set of eyes helps.
   32. MAH Posted: November 11, 2005 at 05:25 PM (#1727788)
Tony,

Is is true that 2005 STATS, Inc. ZR data is only $2,000?

I thought Chris said it was $5,000, and I was under the impression that the data MGL bought a few years ago was about $10,000.

Sounds as though the price of zone data is falling as analysts here at BTF and at THT find ways of generating good fielder ratings from publicly available traditional data and publicly posted ZR percentage ratings.

Also, wouldn't we want to know OIZ/BIZ and OOZ/BOZ?

Chris and Tony,

Would STATS, Inc. sell the OOZ/BOZ and, for outfielders, _line drives_ and _flyballs_ separately?
   33. Chris Dial Posted: November 11, 2005 at 05:40 PM (#1727810)
MAH.
each of these is different packages.

MGL gets every play. I was asking aobut each play that was fielded. Tony is asking about summary totals.

OIZ/BIZ
OOZ
(OIZ + OOZ) / BIZ

If STATS is providing ratios instead of raw numbers, that's a problem. We'd want the data, not themath done for us.
   34. Tom Cervo, backup catcher Posted: November 11, 2005 at 05:42 PM (#1727813)
Tony,

Is is true that 2005 STATS, Inc. ZR data is only $2,000?


Yeah. You don't get PBP data, but they would give you what I put there. I believe MGL, however, gets PBP data, which would be upwards of $10,000 they said.


Also, wouldn't we want to know OIZ/BIZ and OOZ/BOZ?

We would...at least BOZ that were hit in a zone that that position sometimes gets to. I'm not sure if that would cost more, though.

Chris and Tony,

Would STATS, Inc. sell the OOZ/BOZ and, for outfielders, _line drives_ and _flyballs_ separately?


Here's the first email I got when I asked them if this was possible:
"We could conceivably provide a zone rating for only balls hit to zone and then separately the total number of outs he turned on balls hit outside of zone – but like me check internally to be sure.

You’re looking for this on every player from 2005 only? And you were intending to post this list on your website?"

Then the second:
"Kyle,

I don’t have a final quote available yet, but I wanted to get back to you with an estimated price range. The cost on this would run between $1,000 - $2,000, depending on the final programming time estimate I receive from one of our developers. If you’d like to move forward on this we can set a time to talk so we can finalize details and I can get a final quote to you.

Let me know if you have any additional questions.

Patrick"

So I think you would get OOZ, but not BOZ or type of ball hit. I'm not sure how much that would cost.
   35. Mike Green Posted: November 11, 2005 at 05:58 PM (#1727839)
Chris,

When we use runs created for the offensive measure, we are not only considering HBIP, but homers, walks and strikeouts. Actually, it's mostly homers, walks and strikeouts. If you like, runs created is the flip side of both the pitcher's and the defence's contribution to run prevention. Apples to apples.
   36. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 07:17 PM (#1727998)
In #8 you are looking at the number of runs an average/perfect player saves compared to not having a man at the position. To compare it to offense and be consistent, you have to consider not just RC but the negative runs that come from having a player using up 650 or so outs and never reaching base or advancing a runner.

Oh, yes -- I figured there was something obvious I was missing. That's what I get for starting a massive post just when I'm supposed to be leaving work.
   37. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 07:36 PM (#1728023)
Another way to compare the relative value of positions is the difference between a "perfect" fielder and an average one. +29 for 1B, +66 for shortstop, etc.

Hmm ... that would seem to infer that a 1B faces a huge up-hill battle in trying to contribute as much as a SS, possibly larger than I'd have expected ...
   38. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 07:38 PM (#1728028)
Wait, that would also mean a 3B had more value than a SS, which doesn't make sense to me ...
   39. GuyM Posted: November 11, 2005 at 08:29 PM (#1728112)
I really don't think it's useful to approach this in terms of some absolute number of runs prevented. In the aggregate, a team "prevents" a virtually infinite number of runs. A 1B alone saves thousands of runs, compared to what a team would give up with first base empty (no more GB outs!). This just isn't meaningful. In order to say how many runs a SS saves a team, we need a different benchmark, such as a generic "average defensive player" (which is probably approxx the same as an average 3B) or -- my preference -- a replacement level player at that position.

The problem with defining a generic defensive player is that different positions require different skills -- a superb 3B might not have the footspeed to be even an adequate CF. While it's obviously true that an avg. SS has better defensive skills than an avg. 1B, it's not clear to me how useful it is to try to quantify that difference.
   40. Chris Dial Posted: November 11, 2005 at 08:36 PM (#1728121)
I don't think any of us are focusing on that, GuyM.

It indicates that there is a reason why the baselines for offense are different and thus it is always needed as a qualifier.

If the whole team required teh skill of a SS defensively, Frank Thomas would have been doing something else. That has specific value.

I compare to "runs above average at position". I don't think there is a generic defensive player, and Tango did generate soem "run scale", but I don't necessarily buy it (I think his sample sizes had issues).

As for defensive replacement player, defensive subs are generally better fielders than regulars. You can see this by looking at the aggregate ZR of the starters (>1000 Innings) as compared to the league average. The league average is usually higher. That usually means *most* of the non-starters are better fielders than the starters.
   41. Chris Dial Posted: November 11, 2005 at 08:48 PM (#1728142)
Mike,
I'm not sure we're on the same page. What's the value of an out in RC/XR or whatever? What is the value of an out prevented?
   42. GuyM Posted: November 11, 2005 at 09:11 PM (#1728211)
Chris,
I think we agree. I was referring only to the discussion btwn LAWBH and ARM (and agreeing with Mike G's earlier point). To the extent we want to measure the underlying defensive value of positions, differences in avg. offensive performance seems the right way to do it. Some day, when the Dial Zone Rating system is 100% accurate, I suppose we can start to track what happens when players move along the defensive spectrum, and then we'll know that an average SS is +12 runs if moved to 3B, or whatever it would turn out to be.

I agree that replacement players provide above-average defense, but this is a point that I think gets far too little attention. This means that at least half of position players provide below-RL defense (which is made up for by far above RL hitting), while very few provide below-RL hitting (offset by exceptional defense). This has important implications for the Win Shares system, which allocates 17% of total player value -- and 33% of position player value -- to fielding. In fact, if value means performance over replacement level then 0% should be allocated to fielding in the aggregate. Individual players can of course vary tremendously in fielding, but the variance is around a mean of zero (as your system does) because league-average and replacement are essentially one and the same. To handle this, WS needs to allow for negative fielding win shares. And the failure to do this badly distorts the rest of the WS system.
   43. MAH Posted: November 11, 2005 at 09:17 PM (#1728225)
"As for defensive replacement player, defensive subs are generally better fielders than regulars. You can see this by looking at the aggregate ZR of the starters (>1000 Innings) as compared to the league average. The league average is usually higher. That usually means *most* of the non-starters are better fielders than the starters."

Wow. I had always sorta thought that would be true, but it's good to know.
   44. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 09:25 PM (#1728243)
To the extent we want to measure the underlying defensive value of positions, differences in avg. offensive performance seems the right way to do it.

I understand why this is commonly done, but it has never struck me as the proper approach.

I would guess that the difference in the average offensive contribution between 1B and SS has declined significantly over the last twenty years -- or at least declined when A-Rod, Jeter, Nomar, and Miguel Tejada entered the scene.

Does it follow that the difference between the underlying defensive value of SS and 1B has declined? It's possible, but that does not follow from the fact that there are (theoretically) more good hitters at SS now.
   45. GuyM Posted: November 11, 2005 at 09:44 PM (#1728282)
Does it follow that the difference between the underlying defensive value of SS and 1B has declined?

I think it probably does follow. This is a high-K, high-HR, low-BA (relatively) era. In such conditions, the fielding/hitting tradeoff changes. The value of superior defense at SS (which mainly involves preventing singles) relative to offensive prowess (especially power) shifts in favor of offense. A Dal Maxville type (good glove, no hit for you kids) is no longer a viable player in today's game -- whatever defensive gain there is can't equal the offensive drain. But at 1B, this change in the game will have a much smaller impact, if any, since this is already where the least-talented defensive players reside. So as defense gets sacrificed, the gap will inevitably shrink.

At the same time, it may be that the defensive gap btwn today's good-hitting SSs and the Maxville/Balangers of old isn't as great as we think. It could be that an old stereotype -- "big guys can't play MI" -- was overthrown, creating a one-time historical shift in offensive production at these positions.
   46. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 09:51 PM (#1728294)
I mean, amongst regular SS in the AL of 1985 (per BB-ref), there was only one guy with an OPS+ above 100 -- Ripken. The AL of 2005 had six SS with an OPS+ over 100.

Ozzie Smith and Hubie Brooks (102 and 106, respectively) were the only two guys over 100 in the NL that year (Templeton was at 99). Well, the NL only had two this year, so that's not too different ...

... but in 2005, 26.7% of teams had a regular SS with an OPS+ over 100, and in 1985 it was 11.5%.

Are we to believe that SS are less important defensively in 2005 than in 1985? And, to the degree that they might be (due to increased strikeouts), that is true of all positions.
   47. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 09:52 PM (#1728297)
Should have refreshed ... my 46 should be read prior to GuyM's 45 ...
   48. Tom Cervo, backup catcher Posted: November 11, 2005 at 10:11 PM (#1728331)
Chris,

How important is it to do seperate players for AL and NL? I'm working on a spreadsheet for qualified players in 2005.
   49. Los Angeles Waterloo of Black Hawk Posted: November 11, 2005 at 10:15 PM (#1728338)
This is a high-K, high-HR, low-BA (relatively) era. In such conditions, the fielding/hitting tradeoff changes. The value of superior defense at SS (which mainly involves preventing singles) relative to offensive prowess (especially power) shifts in favor of offense. A Dal Maxville type (good glove, no hit for you kids) is no longer a viable player in today's game -- whatever defensive gain there is can't equal the offensive drain. But at 1B, this change in the game will have a much smaller impact, if any, since this is already where the least-talented defensive players reside. So as defense gets sacrificed, the gap will inevitably shrink.

I do think this makes sense, and is generally true. But I don't see why there would necessarily be a 1:1 relationship between the differences in offensive and defensive values between two positions.

It strikes me that using offensive difference to set the defensive difference is simply using a proxy. That proxy is somewhat reasonable, as your example demonstrates. But I've differed from orthodox sabermetrics on this for awhile, as I've just never bought the relationship being that simple.

For that reason, I really like the concept behind Tango's "generic defensive player," and I use that as a starting point for my own positional adjustments.
   50. Tom Cervo, backup catcher Posted: November 11, 2005 at 10:50 PM (#1728408)
NAME        TEAMINNGFPCTZROPPRR/162
Neifi PerezCHC1063.1118.10.9820.8826462128
Orlando CabreraLAA1241.2137.90.9880.8446912226
Jack WilsonPIT1360.0151.10.9820.8848862223
Juan UribeCWS1293.1143.70.9770.8528062023
Omar VizquelSF1292.1143.60.9880.8727661821
Jhonny PeraltaCLE1232.1136.90.9700.8527501619
Adam EverettHOU1292.2143.60.9780.8757361416
Julio LugoTB1339.2148.80.9680.84190289
Derek Jeter,NYY1353.2150.40.9790.83088189
Rafael FurcalATL1306.1145.10.9810.85890277
Khalil GreeneSD1029.2114.40.9710.86056623
Jimmy RollinsPHI1356.0150.70.9810.84774511
Angel BerroaKC1360.1151.10.9650.827871-2-3
J.J. HardyMIL938.2104.20.9750.848474-2-3
Miguel TejadaBAL1395.2155.00.9710.818921-4-5
Alex GonzalezFLA1087.1120.80.9740.846714-4-5
David EcksteinSTL1341.2149.00.9810.833932-7-8
Michael YoungTEX1356.0150.70.9740.807846-9-10
Felipe LopezCIN1175.1130.60.9700.836671-12-15
Jose ReyesNYM1398.1155.30.9740.821832-18-19
Royce ClaytonARI1177.1130.80.9820.815730-17-21
Edgar RenteriaBOS1293.0143.70.9540.809810-20-23
Cristian GuzmanWAS1161.0129.00.9730.804695-28-35
Russ AdamsTOR1100.0122.20.9520.780700-40-53
   51. Tom Cervo, backup catcher Posted: November 11, 2005 at 10:50 PM (#1728409)
Okay, that looked fine on the live preview. ####.
   52. Tom Cervo, backup catcher Posted: November 11, 2005 at 10:52 PM (#1728413)
And I have a question. Does the (BIZ + BOZ) part include balls got to but the fielder made errors on? IIRC, they do, which is why I went one step further and compared each player to the average fielder on errors to get a +/- and Run value on that as well.
   53. GuyM Posted: November 11, 2005 at 10:52 PM (#1728414)
LAWBH:
FYI, my back-of-the-envelope calculation is that in 1985 81.6% of ABs in the NL yielded a BIP, while in 2005 the figure was 77.9%. That strikes me as a pretty big change, with real consequences for the relative importance of defense.

Anyway, I think the link between offensive and defensive gaps is real and logical. Think of it this way: 1) variance in offensive ability is much greater than defensive ability (in terms of run consequences), so therefore 2) the default position for assembling a team is to first get the 8 best hitters you can. Then 3) at each position, you have to decide how much offense you are willing to give up in order to get adequate defense there. The resulting offensive falloff becomes a pretty good estimate of the presumed value provided by the defensive upgrade obtained at any given position. If teams are willing to give up 20 runs on offense at SS (I'm making that # up), then experience tells them they are saving those 20 runs on defense compared to putting a good hitter there who can't handle the position.

Of course, this assumes that ML teams are doing this tradeoff correctly, given the complex distribution of combined hitting/fielding talent available in the pool of position players. I believe that's a reasonable assumption given a lot of teams trying different ways to win over many years (if teams really could win more games by putting good hitters at SS and 2B and ignoring defense, or doing the reverse at 1B/LF/RF, I think we'd know that by now).
   54. AROM Posted: November 11, 2005 at 11:42 PM (#1728501)
How important is it to do seperate players for AL and NL? I'm working on a spreadsheet for qualified players in 2005.

It makes a bit of a difference. When I published my ratings, I did not separate the leagues. NL shortstops were better than the AL, while the AL had the better centerfield zone ratings.

At those 2 positions, Chris zorrelates much better than me with UZR, while we are about the same at other positions.

Its worth separating the leagues, and I'll probably do it in the future.
   55. MAH Posted: November 12, 2005 at 02:40 AM (#1728712)
Chris,

Have you calculated the standard deviation of DZR ratings compared to the standard deviation in UZR ratings? If you have, do you have any thoughts regarding that?

Thanks.

Michael
   56. Chris Dial Posted: November 12, 2005 at 06:10 AM (#1728839)
How important is it to do seperate players for AL and NL? I'm working on a spreadsheet for qualified players in 2005.

I think it is important. See Rally's response.

Have you calculated the standard deviation of DZR ratings compared to the standard deviation in UZR ratings? If you have, do you have any thoughts regarding that?

I don't think I understand the question. I don't really have enough for 2005 to generate a decent SD at each position for UZR.

And all my data for this year is right there for your calculating...
   57. AROM Posted: November 12, 2005 at 06:42 AM (#1728859)
At those 2 positions, Chris zorrelates much better than me with UZR

D'oh. Should be correlates. Don't know if what I typed is a word, but if it is then its probably something Antonio Banderas does to your clothes with a rapier.
   58. Mefisto Posted: November 13, 2005 at 12:55 AM (#1729341)
I just want to add my support for LAWoBH on the use of proxy measures for defense. I don't like them.

GuyM makes a good point about declining BIP, but I don't see that the increase in offensive production by SS necessarily tracks that 1-1. Since the 4% decline in BIP must be distributed over all the defensive positions, I'd find it hard to explain the offensive difference between Ozzie Smith and A-Rod by that measure. It's also the case that OPS+ is a relative measure, that is, that if the offense of SS goes up, offense somewhere else must go down. Yet, as I said above, the decline in BIP must affect ALL fielders (SS more, of course). Have 2B and 3B increased their OPS+ in that time? Can we say that there is more of a preference for hitters at all positions these days? Finally, it's clear that MLB tends to go through "fads" at different positions (CF in the 50s, 3B in the 70s, SS today, etc.) which make it harder to justify the assumption that OPS+ tracks defensive demands.

It seems to me that the way to handle it is to (a) decide how many relevant chances the player has to handle (e.g., eliminating IF popups); and (b) adjust for the different degrees of difficulty in the chances handled. If we had a scale for this, we could then create an absolute standard which distinguishes each position. The first part seems easy to do -- in essence, it's what ZR and UZR already do, and we could use IF assists or OF POs as proxies where we lack PBP data. The second part is intuitively clear but harder to quantify -- how do you compare, say, 3B to CF?
   59. studes Posted: November 15, 2005 at 01:34 PM (#1732342)
Ah Chris, thank you. I'm only catching up on the fielding discussions you guys have been having (great job!) but you just helped me realize that i made a mistake in one of the articles in the Hardball Times Annual. I used the wrong run values in an analysis I performed of team defense, because I forgot to include the out prevented. Just a silly oversight. I'll post a correction on the site, of course, but I hate that. The book isn't even out of the print shop yet!

Anyway, I applied a simple fielding analysis to each team, based on outfield flies, groundballs, etc. fielded for outs or not. With the updated values, I found that the difference between the best fielding team (Athletics) and worst team (Royals) was about 210 runs, which is a huge amount (about 2/3 of the total difference in runs allowed between the best and worst major league teams). This was a much larger difference than in 2004 (125 runs between best and worst). Does this make sense to the posters here? And what does it say about the relative value of fielding vs. pitching, especially in 2005?
   60. Chris Dial Posted: November 15, 2005 at 01:53 PM (#1732350)
The trick to relative value of fielding, and I am not sure if you icluded this, is how many GBs went for hits or what percentage of line drives were hit. BIP distribution is nearly everything.

One reason I struggle with "accepting" tradional stat based measures is that even with the GBs in play and the percentage going for hits, you can't tell if they were "fieldable" by a fielder at all. ZR tempers that dramatically.

If we know the each team had 2000 GBs (which is a pretty fair estimate), and the A's have an infield ZR of 0.88 and the Royals had an IF ZR of 0.83, and EVERY BALL was in a fielder's zone (not true), we'd only be looking at 100 plays/80 runs. Realistically, on average, only 1800 of those GBs are fieldable (about 90%). That means only about 70 runs on the infield.

In the OF, each team allows about 1450-1500 FBs (not HRs). Approximately 1200 are "catchable", *independent of FB/LD distribution*.

The A's/Royals catch about the same ratios above, yielding another 30 runs.

So I'd "put the fielding blame" on about 100 runs of the defense. 210 runs, without seeing your gymnastics, and using this crude "back of the envelope data" (that is grounded in reality from years of calculating this), seems much too high for a "fielder". I would be more inclined to indicate the other 210 runs is driving through BIP distribution or hit clustering, assuming you are using actual runs.
   61. Mike Emeigh Posted: November 15, 2005 at 02:27 PM (#1732365)
The A's also have a built-in park factor; the relatively large amount of foul territory allows foul pops and fly balls to be caught that would be in the seats in other parks.

-- MWE
   62. Chris Dial Posted: November 15, 2005 at 02:45 PM (#1732373)
Good point, Mike. I thought they had decreased that somewhat, but I could be wrong.
   63. studes Posted: November 15, 2005 at 03:00 PM (#1732388)
Thanks. I looked at out rates by batted ball type, and adjusted for park factors. I obviously agree that zone-based systems are better, but the point of my article was to debunk the "DER measures fielding" trend.

Is team zone rating available anywhere, for this year and previous years? Have you calculated it along the way?
   64. Foghorn Leghorn Posted: November 16, 2005 at 02:13 AM (#1733473)
Studes,
there used to be a team ZR available (and I have several data points on my computer somewhere), but I noticed it wasn't available recently.

Debunking DER as measuring fielding? I wish I had thought of that.
   65. DSG Posted: November 16, 2005 at 05:39 AM (#1733610)
Studes, you can calculate team Zone Rating by finding all the individual "plays made" and "chances" where plays made are assists for infielders and putouts for outfielders, and chances are plays made/zone rating. Add the individual plays made up and the individual chances, and then divide plays made by chances for a team zone rating. There are some minor problems with this approach (and I think I know them all, but if someone wants to point them out and try and add something to the list go ahead; I'm too lazy to list them), but it works pretty well, though you'd have to be slightly crazy to spend your time on that.
   66. Foghorn Leghorn Posted: November 16, 2005 at 05:46 AM (#1733615)
DSG,
I've done that. Care to tell me where you find "plays made and chances?
   67. Danny Posted: November 16, 2005 at 06:19 PM (#1734006)
This is probably a very naive question, but could one use the data that goes into UZR to come up with a fielding independent batting statistic?
   68. Chris Dial Posted: November 16, 2005 at 06:48 PM (#1734075)
Danny,
yes. You could easily derive a "hits that were where they ain't stat".
   69. DSG Posted: November 16, 2005 at 06:57 PM (#1734101)
I've done that. Care to tell me where you find "plays made and chances?

***

Read my post again. Plays made = Assists for infielders, Putouts for Outfielders

Chances = Plays Made/Zone Rating
   70. Chris Dial Posted: November 16, 2005 at 07:06 PM (#1734117)
I have been doing additional research in ZR chances, and here's what I have found *for 2005*
PosAvgZROps  2005 Ops
1B 281   281
2B 507   528
3B 430   443
SS 532   536
LF 348   353
CF 462   458
RF 365   361


What that should say to you is that the data I researched from a few years ago (98-00) has held up pretty well. The 2B is a tad low, so I'll dig out some more years (I am working on 2004 and 2003).

To be clear this average number is defined by "Players who played 500+ innings at the position; average chances per inning * total innings per team (1440 for 2005)."

That strengthens my confidence that using the average is a workable methodology.
   71. Danny Posted: November 16, 2005 at 10:32 PM (#1734576)
yes. You could easily derive a "hits that were where they ain't stat".


If you used multiple years, do you think there'd be any value in that? How much variance do you think there'd be from actual batting stats? Of the top of my head, speed would obviously be a big factor.
   72. Chris Dial Posted: November 16, 2005 at 11:20 PM (#1734702)
I think there is value. I think looking at hitters' dstribution of FB/GB/LD maybe informative.
   73. Punky Brusstar (orw) Posted: March 13, 2006 at 05:25 AM (#1896264)
Chris, I think that there's a way to derive defensive winning percentages from this. Using the data from the spreadsheet. I took the average runs given up by the various NL infield positions per 1440 innings. [Subtract the average zone rating from 1 and multiply by average opportunities. This is average plays not made by position. Multiply that by runs per play and you have average runs surrendered (created against, whatever) by position. (1B=28.93, 2B=71.58, 3B=72.24, and SS=63.01)] Using the same process with the players zone rating, you can figure out how many runs he surrendered prorated to 1440 innings. Plug these to figures into the pythagorean theorem and you get a defensive winning percentage. The next step would be to convert this into defensive wins and losses.

Here are the defensive winning percentages for all the NL infielders that played at least 720 innings at a position:

1B

ToddHeltonCol0.702
NickJohnsonWas0.668
SeanCaseyCin0.561
DaryleWardPit0.528
J.T.SnowSF0.516
LyleOverbayMil0.500
DerrekLeeChC0.500
AlbertPujolsStL0.477
AdamLaRocheAtl0.400
LanceBerkmanHou0.360
CarlosDelgadoFla0.357
RyanHowardPhi0.349

2B

ChaseUtleyPhi0.688
MarkGrudzielanekStL0.641
CraigCounsellAri0.619
LuisCastilloFla0.545
MiguelCairoNYM0.534
MarkLorettaSD0.514
JeffKentLAD0.511
ToddWalkerChC0.472
RayDurhamSF0.472
MarcusGilesAtl0.467
CraigBiggioHou0.455
JoseCastilloPit0.390
RickieWeeksMil0.333


3B

ChipperJonesAtl0.563
GarrettAtkinsCol0.545
MorganEnsbergHou0.534
JoeRandaCin/SD0.527
DavidBellPhi0.522
AbrahamONunezStL0.500
VinnyCastillaWas0.498
MikeLowellFla0.491
DavidWrightNYM0.466
TroyGlausAri0.446
AramisRamirezChC0.426
EdgardoAlfonzoSF0.399


SS

JackWilsonPit0.648
NeifiPerezChC0.640
AdamEverettHou0.613
OmarVizquelSF0.602
KhalilGreeneSD0.558
RafaelFurcalAtl0.551
J.J.HardyMil0.517
JimmyRollinsPhi0.514
CesarIzturisLAD0.511
AlexGonzalezFla0.511
FelipeLopezCin0.479
DavidEcksteinStL0.470
JoseReyesNYM0.436
RoyceClaytonAri0.420
CristianGuzmanWas0.392
   74. Punky Brusstar (orw) Posted: March 13, 2006 at 05:28 AM (#1896267)
Eh, that looks terrible and there are no pre tags anymore? Other than that, are there any flaws?
   75. Punky Brusstar (orw) Posted: March 13, 2006 at 03:30 PM (#1896704)
BTW, I know that there's a way to simplify this, so that you can associate a Zone Rating for a player, with a defensive winning pct, but I'm at work now and the phone is ringing off the hook.
   76. Foghorn Leghorn Posted: March 13, 2006 at 05:19 PM (#1896798)
That's very impressive, orw. Jim and others have asked me to do this - I just didn't take the time solve it.

It cn be cut and pasted into Excel.

We'll have to do a comp to th Fielding Bible.
   77. Punky Brusstar (orw) Posted: March 13, 2006 at 06:00 PM (#1896855)
That's very impressive, orw. Jim and others have asked me to do this - I just didn't take the time solve it.

It cn be cut and pasted into Excel.

We'll have to do a comp to th Fielding Bible.


Thanks, I got the inspiration for this when one of my buddies was over my place and we were looking at old Baseball Abstracts.

Bill used a rating system that included defensive on offensive winning percentages and I thought that there might be a way to use ZR as a substitute for his Form 1040 system for determining defensive winning pct. You might be able to finetune it using situational spolits (like James does now with Runs Created), but I'm not sure if it's worth the effort or not.

I like the concept of player win-loss records. It may be better than WARP or Win Shares. James ranked the players, IIRC, in descending order of how likely a .350 team would achieve the same win-loss record as the player did. I wonder why he abandoned that system.
   78. Punky Brusstar (orw) Posted: March 14, 2006 at 01:12 AM (#1897402)
Re: Historical Zone Rating stats:

ESPN.com has them at least from 1987 on. Heck, I found [url="http://sports.espn.go.com/mlb/players/stats?playerId=1784&c>Rich Renteria's</a> playing card. Not difficult, he player ID is 1784; right below Barry Bonds's 1785.

I suppose you could go through all the player cards. I'm not sure how ESPN assigns player IDs; Bonds and Renteria debuted in 1987, while player #1786, <a href="http://sports.espn.go.com/mlb/players/stats?playerId=1786&c]Ruben Rodriguez[/url] cameo'ed in '88. Someone may know how to do write a program for this. (I bought Baseball Hacks, but I got lost in that book.) But it looks like there is close to 20 years of this data out there.
   79. Los Angeles Waterloo of Black Hawk Posted: March 14, 2006 at 03:07 AM (#1897480)
I like it, ORW.

Another way of looking at it, possibly ... say we have Helton at +10.1 runs for 162 games (which we do). That's .06 per game; he played in a context of 5.03 RPG (4.45 NL RPG x 1.13 Park Factor at BB-Ref), so he lowered the RPG to 4.97. That comes out to a defensive winning percentage of .506.

Of course, you can just as easily do this with his offense ... I get his offense at around +.27 per game for 2005, so that gets us to 5.30 Runs Scored Per Game againsg 4.97 Runs Allowed ... an overall winning percentage of .532. (Interestingly, his offensive winning percentage comes out to .526, so +.026, and his defensive was +.006, and that addes up to +.032.)

Of course, that doesn't include positional adjustments, and it probably should ... but if you took an average team playing in Coors Field (or any 5.03 RPG environment) and replaced their 1B with Todd Helton for 162 games, you'd be expected to have around 86 wins, so Helton comes out as around +5 wins per 162 games ... you would also want to use PythagenPat instead of Pythagorean, though the differences will likely be slight ...

... dunno how correct all those actual numbers are, but the process strikes me as valid ...
   80. Chris Dial Posted: March 14, 2006 at 03:45 AM (#1897531)
orw,
I'd be careful with historical ZRs. I don't know f they are switched over after the changes STATS made (recalaculated with new Zone definitions).

Could be, which would be *awesome* because my formulas would allow for a good database.
   81. Punky Brusstar (orw) Posted: March 14, 2006 at 04:34 AM (#1897630)
LAW, I think that the winning percentages I came up may reflect reality more. There seems to be more separation between the good, the bad, and the ugly players. But I'm just a dilletante. I figured out how to make a table for Dwpct based off of Zone Ratings. If you chart these for the infield positions (that's what I've fooled around with so far), they make a pretty S-curve; meaning that Dwpct increases at an increasing rate as ZR increases until they both get pretty close to 1.000. Unfortunately, I can't seem to get the table to show properly here.
   82. Punky Brusstar (orw) Posted: March 14, 2006 at 05:16 AM (#1897766)
orw,
I'd be careful with historical ZRs. I don't know f they are switched over after the changes STATS made (recalaculated with new Zone definitions).


Yah, it's not like I was gonna get around to that project.

Assuming I cut and pasted everything correctly, here's what I got for the AL IFers:

1B


FirstLastTeamDefensive WPCT
DarinErstadLAA0.694
EricHinskeTor0.655
MarkTeixeiraTex0.627
JustinMorneauMin0.604
TinoMartinezNYY0.544
BenBroussardCle0.531
DanJohnsonOak0.485
TravisLeeTB0.474
KevinMillarBos0.460
ChrisSheltonDet0.456
PaulKonerkoCWS0.417
RafaelPalmeiroBal0.322
RichieSexsonSea0.276

2B

MarkEllisOak0.646
BrianRobertsBal0.537
PlacidoPolancoDet0.537
OrlandoHudsonTor0.524
AdamKennedyLAA0.515
RonnieBelliardCle0.512
TadahitoIguchiCWS0.509
RobinsonCanoNYY0.463
MarkBellhornBos/NYY0.463
AlfonsoSorianoTex0.442
NickGreenTB0.361
BretBooneMin/Sea0.292

3B

EricChavezOak0.600
BrandonIngeDet0.559
JoeCredeCWS0.554
AdrianBeltreSea0.547
MelvinMoraBal0.547
AaronBooneCle0.535
BillMuellerBos0.523
MichaelCuddyerMin0.509
AlexSGonzalezTB0.498
AlexRodriguezNYY0.417
HankBlalockTex0.417
MarkTeahenKC0.364

SS

BobbyCrosbyOak0.631
JuanUribeCWS0.580
JhonnyPeraltaCle0.580
OrlandoCabreraLAA0.554
JulioLugoTB0.545
DerekJeterNYY0.512
AngelBerroaKC0.503
MiguelTejadaBal0.478
EdgarRenteriaBos0.454
MichaelYoungTex0.448
RussAdamsTor0.385
   83. Harold can be a fun sponge Posted: March 14, 2006 at 05:20 AM (#1897777)
I'm not sure how ESPN assigns player IDs; Bonds and Renteria debuted in 1987, while player #1786, Ruben Rodriguez cameo'ed in '88. Someone may know how to do write a program for this.

ESPN uses STATS player IDs. Which is nice, because that's somewhat standardized. For instance, Yahoo also uses STATS IDs. A year or two back, I wrote a script that iterated over all the IDs to get a list of Yahoo's available player cards (Szymborski posted a thread about it here, in caes you want to search for it). The mapping of IDs to players would be the same for ESPN, though they may not have the same set of historical playercards as Yahoo. No reason I couldn't change my script a little to work on ESPN.com (that is, if I could find it).
   84. Los Angeles Waterloo of Black Hawk Posted: March 14, 2006 at 06:02 AM (#1897869)
LAW, I think that the winning percentages I came up may reflect reality more. There seems to be more separation between the good, the bad, and the ugly players.

Well, I think we're asking different questions. I'm asking, "If you had a team of average defenders at every position, and replaced, say, your 1B with Todd Helton, and the team had average offense, what would the winning percentage be?" I say .506.

I'm not sure what you're asking ... look at Helton. His "defensive winning percentage", by your reckoning, is .702. I don't know what that means. (That's not an insult, I literally don't know.) You can't be asking the same question as I am, as there's no way a 1B is worth nearly 33 wins on his lonesome. Are you asking how a team would perform if each of its defenders were as skilled at their positions as Helton is at his?
   85. Punky Brusstar (orw) Posted: March 14, 2006 at 02:25 PM (#1898152)
Well, I think we're asking different questions. I'm asking, "If you had a team of average defenders at every position, and replaced, say, your 1B with Todd Helton, and the team had average offense, what would the winning percentage be?" I say .506.

I'm not sure what you're asking ... look at Helton. His "defensive winning percentage", by your reckoning, is .702. I don't know what that means. (That's not an insult, I literally don't know.) You can't be asking the same question as I am, as there's no way a 1B is worth nearly 33 wins on his lonesome. Are you asking how a team would perform if each of its defenders were as skilled at their positions as Helton is at his?


Yah, I think that we are looking at it differently, but get the same results. I say his Dwpct is .702, but first base is worth, say three games, so his defensive won-lost record is something like 2.1-0.9. So, it makes the team's winning percentage about .06 better.

Vinay, that sounds pretty cool.
   86. Foghorn Leghorn Posted: March 14, 2006 at 03:29 PM (#1898197)
Vinay, that's very intriguing.

Using this article, and having me verify a few ZRs, we could generate a database of about 15 years of DZR data - certainyl a good set for every one presently playing.

I know Bangkok9 could write these scripts, pulling the innings played and ZR with the ID.

Can you? With the work I set up, anyone of us could create teh spreadsheet with the default calculations I've put forth and have a really good database.

In addition, we could dig up older DA/DR (which is VERY similar to UZR) and see how they look.
   87. Dan Turkenkopf Posted: March 14, 2006 at 03:43 PM (#1898207)
I say his Dwpct is .702, but first base is worth, say three games, so his defensive won-lost record is something like 2.1-0.9. So, it makes the team's winning percentage about .06 better.


How do you get to the number of 3 games for 1B?
   88. Punky Brusstar (orw) Posted: March 14, 2006 at 03:53 PM (#1898216)
How do you get to the number of 3 games for 1B?


That was the figure that Bill James used in one of his early Abstracts; 1983 or 1984. I forget how he derived it. I have the book at home, but not here at work.
   89. Dan Turkenkopf Posted: March 14, 2006 at 04:04 PM (#1898223)
That was the figure that Bill James used in one of his early Abstracts; 1983 or 1984. I forget how he derived it. I have the book at home, but not here at work.


I'd be interested to see his logic there (I only have from 85-88). Do you remember off the top of your head if it's anything like how he determined the relative value of defense in Win Shares?
   90. Punky Brusstar (orw) Posted: March 14, 2006 at 04:13 PM (#1898229)
I'll have to check it when I get home. I think that he assigned 129 out of 162 defensive games to the pitching staff, then divided the rest up by positions; shortstop getting the most and first base the least. I remember the 3 games figure.
   91. Punky Brusstar (orw) Posted: March 14, 2006 at 05:26 PM (#1898314)
Also,you can combine these with offensive win-loss records to come up with a player ranking. On the offensive side, I think that there needs to be some way to adjust for baserunning, but I think most of the formulae like batter runs, runs created, or eXtrapolated Runs will do.
   92. Dan Turkenkopf Posted: March 14, 2006 at 06:12 PM (#1898388)
I'll have to check it when I get home. I think that he assigned 129 out of 162 defensive games to the pitching staff, then divided the rest up by positions; shortstop getting the most and first base the least. I remember the 3 games figure.


Thanks for the info.

I'm trying to wrap my head around the different ways to use these defensive metrics and I'm intrigued by what you and LABHW have come up with here. His method feels a little bit better to me because I'm not sure if there's been a good determination of how much each defensive position is worth in absolute numbers like you're using.

I don't know enough of the particulars of Win Shares to remember exactly how James determined the value of defense versus pitching but I seem to remember it was unconvincing. Most of the other numbers I've seen will discuss the relative value of the positions to each other, but I haven't found a good baseline yet that says the average player is worth this many absolute wins defensively.

To be honest, I'm not sure that's an important number to have since it's not like a team will just leave a position empty. Relative values is probably good enough for most discussions unless you're trying to get to a Win Shares like stat that measure absolute value.

Which is a long, rambling way of coming to the conclusion that you're probably approximating defensive win shares with a different way of determining individual responsibility - but you still need to rely on some allocation of responsibility to the defense as opposed to the pitcher.

Just as an aside, I compared Todd Helton and Ryan Howard using both methods. In LABHW's method, Helton is 1.5 wins better than Howard defensively. In orw's method (assuming 3 games for a 1b), the difference is 1.06 wins. To make the two methods equivalent, 1B would have to be worth 4.5 games defensively (in this case). If I have time I'll see if this is another method to generalize the positional values.
   93. Punky Brusstar (orw) Posted: March 14, 2006 at 07:20 PM (#1898492)
Just as an aside, I compared Todd Helton and Ryan Howard using both methods. In LABHW's method, Helton is 1.5 wins better than Howard defensively. In orw's method (assuming 3 games for a 1b), the difference is 1.06 wins. To make the two methods equivalent, 1B would have to be worth 4.5 games defensively (in this case). If I have time I'll see if this is another method to generalize the positional values.



I'm at work, but if I had to guess, I'd say that three games is a ballpark figure for a typical 1B. At linch, I multipled 281 opps by .87 and came up with roughly 244 plays made, or nine games worth. Maybe James gave credit to the pitcher for 67-70% of that. I imagine that if a defender makes more/less plays, he should be credited for more/less games.

Now, I really want to get home and look at this stuff.
   94. Los Angeles Waterloo of Black Hawk Posted: March 14, 2006 at 07:45 PM (#1898548)
Well, from above, the list of Zone Rating Opportunities by position:

1B281
2B507
3B430
SS532
LF348
CF462
RF365

Let's see, there are catchers and pitchers ... thankfully, because of K's counting as PO for the catchers, we can probably just use catcher PO to estimate their combined contribution ... just eyeballing a few teams, it looks like the average team will have roughly 1000 put-outs from their catchers, so let's use that (though it doesn't really matter for the purposes of the illustration I'm about to engage in).

I'm making this up as I go, but anyway:

Pos  Opp   of all Opp  x 162
P
/C 1000       .254       41.3
1B   281       .072       11.6 
2B   507       .129       20.9
3B   430       .110       17.7
SS   532       .136       22.0
LF   348       .089       14.4
CF   462       .118       19.1
RF   365       .093       15.1 


Well, that doesn't seem quite right, I don't think I'm counting strikeouts or enough, and I'm not counting balls-out-of-zone at all. But there might be something to do along those lines ...
   95. Punky Brusstar (orw) Posted: March 14, 2006 at 08:00 PM (#1898573)
test 


test
   96. Los Angeles Waterloo of Black Hawk Posted: March 14, 2006 at 08:31 PM (#1898615)
BTW, I did something stupid in post 29. I park-adjusted Helton's offensive runs, but then put it in the 5.03 RPG context of Coors Field. That's a double penalty; you should just use one.
   97. Mike Emeigh Posted: March 14, 2006 at 09:12 PM (#1898679)
Try looking at it in terms of runs. Fielding independent stats (walks, Ks, and HRs) constitute about 45% of total runs; the other 55% is fielding-affected, and catchers and pitchers contribute to that (through SB/CS and WP/PB as well as their own fielding) along with the rest of the fielders.

-- MWE
   98. Los Angeles Waterloo of Black Hawk Posted: March 14, 2006 at 09:48 PM (#1898722)
I have the sense that Mike Emeigh already knows the answer, and is toying with us.

the other 55% is fielding-affected, and catchers and pitchers contribute to that (through SB/CS and WP/PB as well as their own fielding) along with the rest of the fielders.

Dunno how to account for pitchers and catchers, let's just guess that that's 5% and knock everyone else down to 50% total (guessing that SB/CS balance out for catchers, and that WP/PB and pitcher fielding only accounts for 5%, but that's a total guess):

Pos  AvgZROps  Runs/play   Runs   %TotRuns  x162  x%50
1B     281 .798       224     .096    15.5   7.8
2B     507       .754       382     .163    26.5  13.2      
3B     430       .800       344     .147    23.8  11.9
SS     532       .753       401     .172    27.8  13.9
LF     348       .831       289     .124    20.0  10.0
CF     462       .842       389     .166    27.0  13.5
RF     365       .843       308     .132    21.4  10.7 


Of course, that's giving 100% of the responsibility for BIP to the fielders, which is likely incorrect.
   99. Dan Turkenkopf Posted: March 14, 2006 at 10:43 PM (#1898776)
Following upon my Helton/Howard example from earlier I decided to run a comparison for 1B using both LABHW and orm's methods.

I used LABHW's system for defense only as described in post 79 (called LDWAA). For orw's method, I took the defensive winning percentage, and then calculated wins above average given the positional weight (defined above as 3 for 1B) (called ODWAA). That formula works out pretty nicely as (DW%-.5) x PW.

The results I get are as follows:

PlayerTeamLDWAAODWAADiff
ErstadLAA1.0870.5820.505
JohnsonWas1.0360.5040.532
HeltonCol1.0100.6060.404
HinskeTor0.8360.4650.371
TeixeiraTex0.6820.3810.301
MorneauMin0.5850.3120.273
CaseyCin0.3570.1830.174
MartinezNYY0.2520.1320.120
BroussardCle0.2010.0930.108
WardPit0.1750.0840.091
SnowSF0.1030.0480.055
OverbayMil0.0000.0000.000
LeeChC0.0000.0000.000
JohnsonOak
-0.091-0.0450.046
PujolsStL
-0.148-0.0690.079
LeeTB
-0.166-0.0780.088
MillarBos
-0.256-0.1200.136
SheltonDet
-0.288-0.1320.156
KonerkoCWS
-0.547-0.2490.298
LaRocheAtl
-0.700-0.3000.400
HowardPhi
-1.089-0.4530.636
BerkmanHou
-1.098-0.4200.678
DelgadoFla
-1.171-0.4290.742
PalmeiroBal
-1.465-0.5340.931
SexsonSea
-1.904-0.6721.232 


Obviously the choice of the position weight affects ODWAA a lot. Interestingly enough the relationship between the two methods is not linear.
   100. Harold can be a fun sponge Posted: March 14, 2006 at 10:49 PM (#1898780)
Using this article, and having me verify a few ZRs, we could generate a database of about 15 years of DZR data - certainyl a good set for every one presently playing.

I know Bangkok9 could write these scripts, pulling the innings played and ZR with the ID.

Can you?


Yeah, it actually looks pretty easy. Let me see if I can take a stab at it tonight.
Page 1 of 2 pages  1 2 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
BDC
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 1.2604 seconds
68 querie(s) executed