Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Tuesday, November 05, 2002

And the Beat Goes On: Derek Jeter and the State of Fielding Analysis in Sabermetrics - Part 2

First up on the chopping block:  Pete Palmer.

In the Beginning - Fielding Runs

 

In 1984, Pete Palmer and John Thorn wrote The Hidden Game of Baseball
. This book provided the first detailed look at Palmer’s Total Player
Rating (TPR) system, which attempts to rank players based upon the number of
runs that they produced (on offense) or saved (on defense) beyond those
produced or saved by a league-average player. All aspects of a player’s game
are considered, and were intended to be additive, so that Batting Runs, Stolen
Base Runs, Fielding Runs, and (for pitchers) Pitching Runs could be added
together and converted to wins using a season-specific value for Runs per Win.
Theoretically, a player who played in a high-run environment could be compared
to a player who played in a low-run environment, since the number of runs per
win would be used to convert the numbers to the same scale.

Palmer’s defensive measurement system, Fielding Runs, has been widely
critiqued and (mostly) criticized. In fairness to Palmer, the system was
developed long before we had the kind of data collection that Project
Scoresheet/Baseball Workshop and STATS, Inc. have brought to the table since
the late 1980s, and the system must be seen - as all pathfinding systems tend
to be - as a first cut at making sense of the data.

I have written an analysis of Fielding Runs in which I
describe how Fielding Runs are calculated, so I won’t repeat that information
here. I will note that instead of using Palmer’s estimator of playing time in
the FR formula, I used actual innings played from the Palmer/Gillette
play-by-play data base.

Table 1. AL SS Fielding Runs 1998-2000 (min 800 innings)

1998 TEAM G GS INN FR
M Bordick BAL 150 144 1238.3 18.26
D Cruz DET 135 132 1163.3 16.10
K Stocker TB 110 108 940.0 11.63
O Vizquel CLE 151 149 1316.0 7.44
M Tejada OAK 104 104 915.0 2.16
A Rodriguez SEA 160 160 1389.3 2.06
G DiSarcina ANA 157 155 1370.7 -3.63
M Caruso CWS 131 129 1121.3 -7.33
P Meares MIN 149 145 1270.0 -7.68
A Gonzalez TOR 158 157 1398.3 -9.40
N Garciaparra BOS 143 143 1255.3 -15.26
D Jeter NYY 148 148 1304.7 -20.02
           
1999 TEAM G GS INN FR
M Bordick BAL 159 155 1357.3 35.38
R Sanchez KC 134 131 1131.7 31.94
T Batista TOR 98 98 860.7 10.18
A Rodriguez SEA 129 129 1116.0 7.23
M Tejada OAK 159 156 1377.3 7.13
D Cruz DET 155 151 1302.3 4.68
R Clayton TEX 133 133 1149.0 3.10
O Vizquel CLE 143 140 1214.3 1.57
C Guzman MIN 131 126 1069.0 -7.20
N Garciaparra BOS 134 133 1173.7 -7.53
M Caruso CWS 131 125 1114.7 -19.92
D Jeter NYY 158 158 1395.7 -33.55
           
2000 TEAM G GS INN FR
F Martinez TB 106 103 887.7 31.20
J Valentin CWS 141 136 1212.3 20.59
R Sanchez KC 143 140 1198.0 15.18
A Rodriguez SEA 148 148 1285.0 8.05
D Cruz DET 156 154 1355.3 3.95
M Tejada OAK 160 159 1400.3 3.17
O Vizquel CLE 156 154 1328.7 -0.54
N Garciaparra BOS 136 135 1185.0 -2.55
R Clayton TEX 148 144 1237.0 -2.59
A Gonzalez TOR 141 140 1225.3 -6.82
C Guzman MIN 151 148 1307.0 -14.15
M Bordick BAL 100 100 865.0 -14.38
D Jeter NYY 148 148 1278.7 -36.47

I touch on some of the limitations of Fielding Runs as a measurement
of defensive performance in the referenced article above, and Bill James
also touches on them in his Win Shares book. The biggest issue with
FR, in my opinion, is the implicit assumption that the only factor that affects
the number of plays that a shortstop (or any other fielder) makes is the total
number of balls put into play against a team. But whwn we look at the play-by-
play data we can see that this assumption is erroneous. There are two other factors
that are known to affect this distribution - the ground ball/fly ball charateristics
of the pitching staff, and the L/R distribution of batters faced (which can be
estimated by using the L/R distribution of the pitching staff). Batters as a group
tend to pull grounders and hit fly balls to the opposite field, so if a team faces
proportionately more RHB than the norm, the 3B, SS, and RF will tend to have more plays
than the norm.

We can see these team-to-team differences in the play-by-play data. Gillette and
Palmer use the zones defined on the Project Scoresheet/Retrosheet ball location diagram to identify where
balls are put into play. If we assume (for the moment) that a shortstop,
regardless of his exact positioning, could conceivably field ground balls hit into the
‘56’, ‘6’, and ‘6M’ zones, we can use that as an upper bound on the number of
opportunities he could have. In this part of the analysis, I used the ‘56’, ‘56D’,
‘6’, ‘6D’, ‘6M’ and ‘6MD’  zones as providing the bounds for the area where a
shortstop could be expected to have a reasonable chance to field a ground ball,
because my initial runthrough of the data suggests that either the pitcher or the
3B will field the vast majority of grounders in the ‘56S’,‘6S’, and ‘6MS’ zones.
Also, in this part of the analysis, I didn’t concern myself with who actually
fielded the ball, but only where the ball was hit, since I was interested
in establishing the boundaries of what might have happened given positioning
variances between teams.

Here’s what the play-by-play data shows for the years 1998-2000. “SSF” is
the number of shortstop fielding opportunities, and “Hole”, “Direct”, and
“Middle” shows the disribution of balls in the ‘56’, ‘6’, and ‘6M’ areas
respectively:

Table 2. AL SS BIP, 1998-2000 (min 800 innings)

1998 Team G GS Inn BIP GB FB %GB SSF SSF/9 Hole Direct Middle FR
Cruz, D DET 135 132 1163.3 3605 1783 1822 49.5% 652 5.04 262 280 110 16.10
Caruso, M CHA 131 129 1121.3 3504 1574 1930 44.9% 560 4.49 245 244 71 -7.33
Bordick, M BAL 150 144 1238.3 3764 1787 1977 47.5% 608 4.42 230 289 89 18.26
Stocker, K TBA 110 108 940.0 2848 1272 1576 44.7% 461 4.41 222 181 58 11.63
Tejada, M OAK 104 104 915.0 2898 1286 1612 44.4% 447 4.40 164 210 73 2.16
Rodriguez, A SEA 160 160 1389.3 4167 1873 2294 44.9% 672 4.35 260 290 122 2.06
Meares, P MIN 149 145 1270.0 4096 1756 2340 42.9% 580 4.11 221 268 91 -7.68
Vizquel, O CLE 151 149 1316.0 4084 1894 2190 46.4% 596 4.08 245 260 91 7.44
DiSarcina, G ANA 157 155 1370.7 4138 1901 2237 45.9% 608 3.99 288 245 75 -3.63
Jeter, D NYA 148 148 1304.7 3876 1789 2087 46.2% 578 3.99 252 251 75 -20.02
Garciaparra, N BOS 143 143 1255.3 3835 1694 2141 44.2% 535 3.84 195 260 80 -15.26
Gonzalez, A TOR 158 157 1398.3 4158 1837 2321 44.2% 573 3.69 175 269 129 -9.40
                             
1999 Team G GS Inn BIP GB FB %GB SSF SSF/9 Hole Direct Middle FR
Sanchez, R KCA 134 131 1128.7 3636 1714 1922 47.1% 605 4.82 252 157 196 31.94
Clayton, R TEX 133 133 1149.3 3611 1773 1838 49.1% 593 4.64 215 145 233 3.10
Rodriguez, A SEA 129 129 1114.7 3463 1578 1885 45.6% 547 4.42 170 166 211 7.23
Bordick, M BAL 159 155 1355.0 4097 1997 2100 48.7% 663 4.40 210 188 265 35.38
Cruz, D DET 155 151 1300.3 4043 1873 2170 46.3% 623 4.31 206 190 227 4.68
Batista, T TOR 98 98 860.7 2654 1258 1396 47.4% 409 4.28 128 113 168 10.18
Caruso, M CHA 132 125 1114.7 3547 1588 1959 44.8% 526 4.25 197 121 208 -19.92
Tejada, M OAK 159 156 1377.3 4333 2067 2266 47.7% 642 4.20 243 176 223 7.13
Guzman, C MIN 131 126 1069.0 3454 1527 1927 44.2% 497 4.18 150 155 192 -7.20
Vizquel, O CLE 143 140 1214.3 3635 1737 1898 47.8% 552 4.09 209 157 186 1.57
Garciaparra, N BOS 134 133 1171.7 3489 1602 1887 45.9% 526 4.04 192 166 168 -7.53
Jeter, D NYA 158 158 1395.7 4143 1942 2201 46.9% 565 3.64 205 174 186 -33.55
                             
2000 Team G GS Inn BIP GB FB %GB SSF SSF/9 Hole Direct Middle FR
Martinez, F TBA 106 103 887.7 2801 1351 1450 48.2% 467 4.73 161 143 163 31.20
Tejada, M OAK 160 159 1400.3 4405 2158 2247 49.0% 694 4.46 276 189 229 3.17
Sanchez, R KCA 143 140 1198.0 3751 1783 1968 47.5% 592 4.45 223 158 211 15.18
Garciaparra, N BOS 136 135 1185.0 3519 1695 1824 48.2% 576 4.37 195 186 195 -2.55
Gonzalez, A TOR 141 140 1225.3 3852 1765 2087 45.8% 593 4.36 220 139 234 -6.82
Cruz, D DET 156 154 1355.3 4294 2072 2222 48.3% 653 4.34 215 237 201 3.95
Valentin, J CHA 141 136 1212.3 3645 1681 1964 46.1% 579 4.30 180 193 206 20.59
Clayton, R TEX 148 144 1237.0 4051 1718 2333 42.4% 570 4.15 228 159 183 -2.59
Rodriguez, A SEA 148 148 1285.0 3929 1731 2198 44.1% 568 3.98 169 193 206 8.05
Vizquel, O CLE 156 154 1328.7 3921 1894 2027 48.3% 587 3.98 217 152 218 -0.54
Bordick, M BAL 100 100 865.0 2709 1245 1464 46.0% 379 3.94 137 84 158 -14.38
Guzman, C MIN 151 148 1307.0 4080 1753 2327 43.0% 549 3.78 159 165 225 -14.15
Jeter, D NYA 148 148 1278.7 3934 1693 2241 43.0% 497 3.50 185 144 168 -36.47

The correlation coefficient r for the number of fieldable shortstop chances
per nine innings (SSF/9) and Fielding Runs is +0.74. The correlation
coefficient for the percentage of ground balls hit and Palmer’s FR is +0.51.
This strongly suggests that Fielding Runs for SS are highly dependent on the
number of balls hit in the vicinity of the SS. Jeter was at the very bottom of
the list in both 1999 and 2000, and very close to the bottom of the list in
1998.

Fielding Runs falls short on a number of counts. The method, as noted above,
fails to account for the effects of team defensive context; players who see more
balls will generaly rank higher than players who see fewer balls in their vicinity.
It is also obvious that Fielding Runs fails to consider defensive failures, other
than errors, by ignoring hits. Finally, Fielding Runs penalizes fielders for plays
made by other fielders on their team. Charles Saeger, in his 1999 BBBA article in
which he details Context-Adjusted Defense, demonstrates this point nicely:

“Suppose a team allows an average number of hits, but strikes out 100 fewer
batters than average. Also suppose that the left fielder recorded 40 of those
extra outs and the second baseman recorded the other 60. Now, the shortstop
recorded an average number of outs. ... the Palmer method ... would show him
to be a below-average fielder, since (it) posits that he had 100 more chances
to field a ball.”

Mike Emeigh Posted: November 05, 2002 at 06:00 AM | 12 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Chris Dial Posted: November 06, 2002 at 02:01 AM (#607076)
We could dance about many of these things for hours. I really appreciate and enjoy the work you are putting out here.

I looked at the GB/FB ratio in your charts. Why is it 0.85 instead of 1.16? Is that a GDP thing?

I looked at the zone % (z56, z6, z6M). There is a scoring change between 1998 and 1999/00. The average z6M% in 1998 is 0.154. In 1999 and 2000 the average z6M% is 0.366 and 0.358. There is a shift from the 98 z6% to the 99/00 z6M%. The z56% goes .408=>0.351/0.351 (same % in 99 & 00).

I also looked at zone% as compared to league average (to generate a PRO+ like rating).
Jeter (NYY) gets:
Year z56% z6% z6M%
1998 109.42 97.72 82.55
1999 103.29 109.81 89.39
2000 106.46 98.71 94.73

ARod (SEA) gets:
Year z56% z6% z6M%
1998 95.96 97.02 119.45
1999 87.32 108.03 106.22
2000 83.65 117.44 102.17

These numbers will be affected by observation 2.

Do I have a point? I don't know. These discrepancies mean something, I'm just not sure what.

   2. Charles Saeger Posted: November 06, 2002 at 02:01 AM (#607077)
Chris -- it is lower because the 1.16 figure counts double plays as two outs.
   3. Mike Emeigh Posted: November 06, 2002 at 02:01 AM (#607079)
We could dance about many of these things for hours.

Probably will, too :) Maybe we can host a Primer Chat at some point down the road after I finish a few more installments of this series.

I looked at the zone % (z56, z6, z6M). There is a scoring change between 1998 and 1999/00. The average z6M% in 1998 is 0.154. In 1999 and 2000 the average z6M% is 0.366 and 0.358. There is a shift from the 98 z6% to the 99/00 z6M%. The z56% goes .408=>0.351/0.351 (same % in 99 & 00).

I noticed that, too. I think the zone assignment system used for balls in play was refined between 1998 and 1999. I don't know whether Pete and Gary had access to STATS' stuff; IIRC STATS made some zone changes after 1998, and Pete and Gary might have followed the STATS assignments. The net effect is small, I think, because there's not a whole lot of difference between z6 and z6M in terms of conversion of opportunities - but in light of what the data shows about Jeter's fielding skills in each area (which is coming in later installments) I should split 1998 out and look at 1999-2000 separately.

Note that these are fielding opportunities, not balls actually fielded by the SS.

What I'm most interested in seeing (and hopefully this will come in a later installment) is whole well the estimated number of chances a SS would see (based on team balls in play, pitcher handedness and ground-fly ball ratio) correlates with the SSF/9 stat above...

It's coming, in part 4. Context-Adjusted Defense is the first method (and AFAIK the only method) that directly attempts to place SS chances in this context. DFTs (part 3) don't do this directly, and Win Shares doesn't do it at all.

I suppose I should lay out the rest of the installments, just so
people have the plan:

Part 3: Davenport Fielding Translations (DFTs)
Part 4: Context-Adjusted Defense (CAD)
Part 5: Win Shares
Part 6: ZR/UZR
Part 7: What we can learn from the play-by-play data
Part 8: Summary and conclusions

Dan S has Part 3, and I assume he'll post it later in the week. I need Charlie to review Part 4, which is about 90% done (I didn't talk about CAD in Boston because the revisions were still in process, and I wanted to wait until Charlie's article appeared here). Part 5 was written for the SABR32 presentation, but I wanted to present DFTs and CAD first to demonstrate the extent to which James used things that Clay and Charlie had already done. I have outlines for the rest of the series.

-- MWE
   4. Charles Saeger Posted: November 06, 2002 at 02:01 AM (#607080)
Well, Mike, I'd love to look at it, but you need to send it to me first. :-)
   5. Mike Emeigh Posted: November 06, 2002 at 02:01 AM (#607081)
Well, Mike, I'd love to look at it, but you need to send it to me first. :-)

I need to finish it first :) For a variety of personal and professional reasons, I haven't gotten any writing done for a week or so. Sometimes my real life intrudes on my Primer life :)

-- MWE

   6. Chris Dial Posted: November 06, 2002 at 02:01 AM (#607084)
The above comment is from me.
   7. Mike Emeigh Posted: November 06, 2002 at 02:01 AM (#607085)
I have STATS zones/assignments. There weren't any at that time - they changed from double-counting DPs then. The zones wouldn't change wrt BIP anyway.

They did something in the outfield, I know - OF zone ratings went way up between 1998 and 1999. But I suspect you're right and that there was a transcription error in 1998 that cause a number of z6M balls to be labeled as z6. Have to remember to ask Gary why that is.

-- MWE
   8. tangotiger Posted: November 07, 2002 at 02:01 AM (#607093)
As Sean Smith pointed out, Jeter's A/SSF is probably around average.

MGL's published UZR for the last 4 years (98 through 01) for Jeter is:
+8, -1, -1, -17, for an average of -3 per year. I don't know what he has for 2002.

MGL is always tinkering, so he may have come up with something different.
   9. MGL Posted: November 08, 2002 at 02:02 AM (#607122)
I have a little free time this weekend, so I am going to redo my UZR methodology, incorporating some "positioning" and L/R factors. When I'm done, I'll write a little article describing the methodology, and including the results. Hopefully they'll put it up on this web site. Unfortunately I don't have my 2002 data (I will soon), so the results will just be a rahash, with the "tweaked" methodology, of what was already presented in my Superlwts articles...
   10. Boileryard Posted: November 09, 2002 at 02:02 AM (#607125)
Mike, I was going to ask the same question as Doug. I'm sure you'll go back and see this in Part 1, anyway, but I wanted to re-ask the question for those that won't. So, here's Doug's question:

November 5, 2002 - Doug


-- How many of these balls would you estimate there are per team/season? -- Which teams give up more of them? Which teams give up fewer? -- What kind of pitchers give up more of them? What kind of pitchers give up fewer of them? -- What are the techniques that you would use to estimate the number of these balls that are put into play? -- How would you verify that your estimate was of the right order of magnitude?

OK Mike, good questions, but I think there should be ways to answer them in a reasonable way. And, gut feel, I just can't imagine that the unfieldable BIPs are negligible.

You alluded in your piece to the fact that your study makes use of Play-by-Play data. I haven't ever seen these data but, from what I've read in other pieces, included are, among other things, a location code to which every BIP is hit - and it's pretty precise, breaking the field down into several dozen distinct zones. So, if you've got that data, you can tell, pretty much straight-away, in which zones BIPs are seldom turned into outs, and in which zones outs come much more readily. Seems like a reasonable basis for coming up with estimates of how many BIPs really should be considered fieldable.

To take it a step further, if the play-by-play data tells you what zone a BIP was hit to, it probably also tells you who fielded it. So that should also give you a start on coming up with estimates on expectations for which fielder is more likely to field a ball in a certain location (or, to put it another way, to come up with a fairer way to establish expectations on which fielder should handle which BIPs).

If I'm wrong about the play-by-play data, and none of the stuff I've talked about exists, then please diregard my points. But I do think it exists somewhere, based on other pieces I've read on this site. And, obviously, I'm only talking conceptually here and it would undoubtedly be a mountain of work to account for all the nuances, but my point is I think it's quite possible to come up with reasonable answers to your questions. I guess you can tell I've got no problem with doing estimates. My view is I think it's utterly impractical to assume that methods used to try to answer meaningful questions about the game will ever be 100% accurate 100% of the time. Just isn't going to happen. So, instead, why not try to make the best insights you can based on whatever data you've got? Having said that, I completely respect that you may have a different philosophical bent on this.
   11. Mike Emeigh Posted: November 10, 2002 at 02:02 AM (#607136)
So, if you've got that data, you can tell, pretty much straight-away, in which zones BIPs are seldom turned into outs, and in which zones outs come much more readily. Seems like a reasonable basis for coming up with estimates of how many BIPs really should be considered fieldable.

I took a pretty difficult fielding situation, a line drive hit just over the infield into one of the xD zones, where x is an infield position. There are eight of these zones - 3D, 34D, 4MD, 6MD, 6D, 56D, and 5D. I looked at these using the 2000 PBP data.

3D: 200 line drives/1 out
34D: 603 line drives/29 outs
4D: 535 line drives/145 outs
4MD: 700 line drives/83 outs
6MD: 790 line drives/65 outs
6D: 581 line drives/149 outs
56D: 700 line drives/19 outs
5D: 265 line drives/2 outs

I suppose those who want to make the uncatchable argument would note that the plays in the zones nearest the lines were *rarely* made - but I note that players manage to catch some balls even in those areas of the field, and thus one can't claim that the balls hit in that area were totally uncatchable. This is about the worst case fielding situation that I can imagine, and there aren't any places where plays are never made.

-- MWE
   12. Slinger Francisco Barrios (Dr. Memory) Posted: November 10, 2002 at 02:02 AM (#607145)
Nicely done.

I myself have cast a trout eye on Fielding Runs ever since I perused my spankin' new 1989 edition of Total Baseball (plucked out of a bookstore's albatross bin in mid-1990). I have always been taken by Jerry Martin's curious 1976 season, where (as Greg Luzinski's Designated Glove Caddy(TM)) he had 130 games and only 129 PAs. That has to be some kind of record! But I digress.

Anyway, I look up in my 1989 Total Baseball Jerry Martin's Fielding Runs for that year. Imagine my deep and abiding shock to see attached some historically large negatives thereunto. "That ain't right," said I, and so I still do say. And I bet Danny Ozark would say it, too.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
BDC
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.8251 seconds
66 querie(s) executed