Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Sunday, November 10, 2002

And the Beat Goes On: Derek Jeter and the State of Fielding Analysis in Sabermetrics - Part 3

In the third installment, Mike examines Davenport Fielding Translations.

Moving on Up - Davenport Fielding Translations

 

In 1998, Clay Davenport in Baseball Prospectus published
a new method for evaluating fielding. The Davenport Fielding Translations
(DFTs) have been revised on at least two occasions since their original
publication.

The DFTs are based on several important underlying concepts, which will
crop up again when we look at Context-Adjusted Defense, Win Shares, and
zone-based defensive methods. These are:

     

  1. Fielder performance needs to be evaluated in the context of the team’s
    defensive performance, because a fielder’s defensive opportunities are
    impacted by the performance of his teammates.

  2. In addition to errors, hits other than home runs should be treated as
    fielding failures. Fielding is properly evaluated in the context of the number
    of balls put into play against a team, less home runs.

  3. Pitching staff characteristics, specifically the groundball/flyball
    tendencies of the staff and the left/right balance of the staff, and in some
    cases (DP, CS, outfield assists) the number of runners on base, affect the
    opportunities that fielders have to make plays.

  4. Park factors, commonly used to adjust offensive and pitching statistics,
    also affect defensive value. Fielders tend to get fewer outs on balls in play
    in Coors Field than they do in Safeco Field.

 

There is a long article describing how the DFTs are determined in
Baseball Prospectus 2002 Edition
(pages 4-10). DFTs begin with an estimate
of the proportion of each team’s defensive value (in terms of runs prevented)
that can be attributed to the fielders. Davenport first identifies three
different types of events (the titles listed below are mine):

  1. Pitcher-specific events: BB, K, HBP, and HR. Davenport assigns 100% of
    these events to the pitchers.

  2. Fielder-specific events: Errors and double plays. Davenport assigns 100%
    of these events to the fielders. DPs weren’t included at the team level in the
    published analysis, but Davenport recently posted an update to SABR-L in which
    he notes that he now considers them, and I’ve included them in my analysis
    below.

  3. Fielder-assisted events: Hits and outs on balls in play, other than HR.
    Davenport assigns 70% of these events to the hitters, 30% to the pitchers. He
    justifies this decision by citing Voros McCracken’s
    Defense
    Independent Pitching Stats
    (DIPS) analysis that suggests that pitchers
    have little impact on the results from balls in play, and by also noting that
    “a 70/30 split has the result of evening the valuation of pitchers over
    time…somewhere around the 70/30 ratio is the point at which the top
    pitchers in all eras appear nearly equal in value”. This avoids a problem with
    rating pitchers from the 19th century as being extremely important in an era
    where the outcome of the batter/pitcher confrontation depended less on the
    pitcher than at any time since.

 

Once Davenport has divvied up the events in this manner, he creates
separate defensive batting lines for a team’s pitchers and a team’s fielders,
and then uses his Equivalent Average (EqA) formula, modified to account for
E and DP, to derive a pitcher EqA and fielder EqA for the team. Errors are
treated as having the value of walks, and double plays are treated as though
they were caught stealing. Davenport then takes the ratio of the pitcher EqA
to the fielder EqA, and doubles the difference, since runs go up as double the
difference. For example, if the PEqA/FEqA ratio is 1.15, the pitchers would be
given the responsibility for 30% more of the runs per PA than the fielders,
and the fielders’ share of the runs allowed would be (fielder PA)/
(1.30*pitcher PA + fielder PA).

For 1998-2000, I have BFP and AB against, as well as a breakdown by
teams of doubles, triples, and HR allowed. The basic EqA formula, modified as
Davenport indicated in his SABR-L post to include errors and double plays, is:

EqA = (H*2 + 2B + 3B*2 + HR*3 + (BB+HBP+E)*1.5) / (AB + BB + HBP + DP)

For fielders, the HR, HBP, and BB factors will obviously be zero, and
for pitchers the E and DP factors will be zero. Fielder ABs are
assigned by taking 70% of (AB - K - HR - E), then adding errors back in.
Fielder PA are calculated by taking 70% of (BFP - BB - HBP - K - HR - E), then
adding the errors back in. Fielder IP are calculated by taking 70% of (IP*3) -
K - DP, adding the DPs back in and dividing by 3 (rounded to the nearest
1/3 IP). Pitcher totals for AB, PA, and IP are the team totals minus the
fielder totals.

Table 3 summarizes the team-level calculations for 1998. The expected runs
for fielders are calculated as LgFR/LgInn*TmInn*PF, and represent
the number of runs that a team with average pitchers or fielders would allow
in that ballpark. The difference between the expected runs for the fielders
and the actual runs estimated to be allowed by the fielder - +51 for the
Yankees - are the runs to be divided among the defenders.

Table 3. DFTs for 1998: Pitcher-Fielder Split

Total PF IP BFP AB R H 2B 3B HR BB HBP SO E DP
Anaheim 0.989 1444.0 6326 5548 783 1481 327 23 164 630 47 1091 106 146
Baltimore 0.961 1431.3 6213 5535 785 1505 275 29 169 535 46 1065 81 144
Boston 1.014 1436.0 6141 5511 729 1406 267 20 168 504 53 1025 105 128
Chicago 0.987 1438.7 6368 5651 931 1569 308 26 211 580 54 911 140 161
Cleveland 1.043 1460.0 6393 5669 779 1552 303 29 171 563 67 1037 110 146
Detroit 1.009 1446.3 6327 5596 863 1551 283 37 185 595 40 947 115 164
Kansas City 1.035 1436.3 6369 5653 899 1590 291 24 196 568 60 999 125 172
Minnesota 1.021 1447.7 6322 5715 818 1622 320 40 180 457 44 952 108 135
New York 0.973 1456.7 6100 5484 656 1357 262 18 156 466 68 1080 98 146
Oakland 0.964 1434.0 6310 5628 866 1555 314 37 179 529 56 922 141 155
Seattle 1.004 1424.3 6271 5596 855 1530 362 23 196 528 60 1156 125 139
Tampa Bay 1.026 1443.0 6269 5466 751 1425 260 27 171 643 81 1008 94 178
Texas 1.056 1431.3 6357 5695 871 1624 330 47 164 519 45 994 121 140
Toronto 1.008 1465.0 6352 5632 768 1443 333 34 169 587 45 1154 125 131
                             
AL Totals   20194.7 88118 78379 11354 21210 4235 414 2479 7704 766 14341 1594 2085
                             
Total PEqA FEqA REqA RunR PR PExpR Diff FR FExpR Diff        
Anaheim 0.859 0.711 1.207 1.415 465 487 22 318 320 2        
Baltimore 0.850 0.700 1.214 1.428 462 467 5 323 310 -13        
Boston 0.833 0.662 1.259 1.518 435 488 53 294 332 38        
Chicago 0.953 0.701 1.361 1.721 577 454 -123 354 336 -18        
Cleveland 0.871 0.712 1.223 1.446 458 508 50 321 348 27        
Detroit 0.913 0.703 1.299 1.599 523 471 -52 340 342 2        
Kansas City 0.916 0.714 1.283 1.567 544 490 -54 355 343 -12        
Minnesota 0.886 0.729 1.216 1.431 462 481 19 356 345 -11        
New York 0.793 0.647 1.226 1.452 387 481 94 269 320 51        
Oakland 0.904 0.716 1.263 1.527 507 445 -62 359 326 -33        
Seattle 0.877 0.744 1.179 1.358 501 502 1 354 313 -41        
Tampa Bay 0.882 0.660 1.336 1.673 476 489 13 275 342 67        
Texas 0.876 0.752 1.164 1.329 483 501 18 388 348 -40        
Toronto 0.834 0.707 1.180 1.359 450 513 63 318 326 8        
                             
AL Totals 0.875 0.704 1.242 1.484 6733     4621        

This is the easy part, and the part that is well-documented in BP 2002. The
division of the team runs among individual fielders, however, is the more
difficult part. I can explain in general the nature of Davenport’s
adjustments, but I cannot perform exact calculations because he does not
provide the details of these adjustments.

Davenport first estimates the percentage of playing time that a player gets
at a position, which he converts into an estimated number of games played at
the position. He used plate appearances, the sum of player games at a
position, and individual player games at multiple positions in order to make
this estimate (G’). This number appears next to the player’s line under
Defense in BP. In 2000, Davenport estimated that Derek Jeter played the
equivalent of 146 full games of the Yankees 161 at SS; he actually played
1278 2/3 innings out of 1424 1/3 innings, which is about 144.5 games played
out of 161.

Davenport evaluates middle infielders together. As he notes, about 1/3 of
middle infield putouts are optional plays, which can be handled by either the
2B or the SS. On most teams, these plays are divided more or less evenly; when
the adjustment is applied to the 1998 Yankees it results in a net transfer of
seven putouts from the 2B to the shortstops. However, in some cases, one player,
such as Nap Lajoie did for the Cleveland Naps, handles the bulk of the
discretionary plays at 2B. The DFTs do not reward such a player, nor do they
penalize his middle infield counterpart.

SS are evaluated based on their adjusted putout, assist, error, and double
plays against the league average totals for their position. Davenport applies
three adjustments to all of the SS totals:

     

  1. an adjustment for balls in play. Teams with fewer balls in play than the
    norm (as the 1998 Yankees did, with an estimated 26.75 BIP vs a league average
    of 28.00) will give their fielders fewer chances to make plays.

  2. an adjustment for the groundball/flyball characteristics of the pitching
    staff. Davenport looks at the ratio of infield assists to outfield putouts and
    also at pitcher chances per inning, adjusted for strikeouts (which I assume is
    ((PO.p+A.p)*3)/(IP*3-K)) to determine the extent of this adjustment.
    Groundball pitchers will generally have more chances/inning than do flyball
    pitchers. The Yankees, in 1998, had virtually a league-average ratio of
    infield assists to outfield putouts (1.259 to a league average of 1.260) and
    an above average rate of pitcher chances per inning (.267 to .259), which
    suggests a mild groundball tendency.

  3. a adjustment for the LH/RH balance on the pitching staff. Teams with an
    abundance of LHP will tend to have more balls hit to the left side of the
    diamond. Davenport looks at the ratio of LHPIP to RHPIP and the number of
    plays made by the left side fielders to those handled by the right side
    fielders. The 1998 Yankees had an above-average number of innings pitched by
    LHP (37.6% vs a league average of 26.9%); while I don’t have an exact
    breakdown of plays made on each side of the field, the Yankees do have a lower
    ratio of left-side infield assists to right-side infield assists than the
    league norm (1.26 to 1.34), which probably offsets the other adjustment
    somewhat. I would consider it likely that the final adjustment still would be
    in favor of more balls being hit to the left side of the infield.

 

In addition to these adjustments, Davenport also makes additional
adjustments to the DP, PO, and A totals for SS. SS DP are adjusted for runners
on 1B - the 1998 Yankees had the fewest runners on 1B in the league, which
should come as no surprise. PO totals are adjusted for assists that are made
to 2B. For the SS, the estimate is that with a man on 1B pitchers, 1Bs, and
2Bs will throw to second about 50% of the time, and that 50% of the team’s CS
(the bulk of which are at 2B) will also be taken by the SS. These plays are
removed from the SS PO total after the adjustment for discretionary plays
described earlier is made. PO and A totals are also adjusted for double plays
(after the men-on-1B adjustment) above or below the league norm, since the
player is already getting credit for those extra plays in the DP category.

Once Davenport makes all of the above adjustments for individual fielders,
he compares those fielders to the expected totals that an average player at
the position in the league would make, given the same amount of playing time.
The result is a set of Deltas - DeltaPO, DeltaA, DeltaE, and DeltaDP - where a
positive total means that the player is above average and a negative total
means that the player is below average for that characteristic. For SS,
Davenport then calculates a league norm, (LgPO-LgA2+LgA+LgE+LgDP)/LgG, where
LgA2 is the estimated number of assists by the league’s pitchers, 1Bs, and 2Bs
that go to the SS covering 2B, and the other totals are the league totals for
SS. The individual shortstop’s Deltas are summed, and a rate is calculated as
(Norm+Deltas/G’)/Norm, where G’, as noted earlier, is the shortstop’s
estimated number of games played. This rate is converted to a number of Runs
Above Average using the calculation RAA = (Rate-1) * Norm * G’ * .7. Finally,
Davenport reconciles the individual RAAs to the expected team total, by
summing the RAAs across all players at all positions, Each player’s rate is
multiplied by the factor (expected FR-player RAA sum)/(expected FR-team RAA
difference) - which has the effect of reducing the rate whenever the player
RAA sum exceeds the team RAA sum and increasing the rate when it is less than
the player RAA sum - and the RAA are recalculated. For example, if the player
total is +50, the expected team total is +40, and the fielder contribution
was 300 runs, each player rate would be multiplied by (300-50)/(300-40), or
250/260.

BP 2002 prints fielding numbers for only 1999 and 2000 for Jeter; Jeter was
rated at -13 runs in 1999 and -27 runs in 2000. The earlier version of the
system, for which the data appears in BP 2001, lists Jeter at -3 runs in 1998,
-12 runs in 1999, and -22 runs in 2000. I would expect that, were the 1998
numbers to be recalculated, Jeter would wind up at something like -4 or -5
runs.

Table 4 shows the DFTs for AL regular SS (800 or more innings) for
1998-2000, with Fielding Runs included for comparison purposes. DFTs for 1998,
and for Mike Caruso in 1999, were from BP 2001 and were based on the earlier
version of the system, but a quick comparison between the numbers in BP 2001
and BP 2002 for other SS in both books suggests that the 1998 numbers would
be of the same sign, and probably about the same magnitude, under the revised
system, so I don’t think the conclusions that I draw below are invalidated.

Table 4. DFTs, 1998-2000

Mike Emeigh Posted: November 10, 2002 at 06:00 AM | 6 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. tangotiger Posted: November 11, 2002 at 02:02 AM (#607147)
Great stuff, Mike! The last 3 paragraphs are the most important part of this series (so far), in my view.
   2. MattB Posted: November 11, 2002 at 02:02 AM (#607148)
Very interesting.

I see where you are going here, and will be interested to see the "new" results, accounting for the locations the balls are hit.

I am wondering, though, if there aren't also implications here for player placement theory. Is it possible that fielders (or at least shortstops) are "cheating" too much in the direction that ball will most likely be hit, at the expense of balls that get hit the "wrong" way?

In other words, do shortstops, as a whole, play too close to second with lefties up? Or do the benefits outweigh the costs?

An interesting follow up may be to see if players with more and less extreme middle/hole splits are better or worse fielders on the whole.

Or maybe you were going in that direction anyway. It's hard to avoid jumping the gun in a series of articles that stretches into weeks . . .
   3. RP Posted: November 12, 2002 at 02:02 AM (#607158)
I agree that this series is very interesting, and I want to see where it ends up, but, at the moment, I can't help but think of Occam's razor. Every defensive statistic shows Jeter as being one of the worst shortstops in baseball, and those stats are only confirmed by my (and many others) subjective observations of his fielding. Isn't the most logical (and therefore likely correct) explanation simply that he's a bad fielder?
   4. tangotiger Posted: November 12, 2002 at 02:02 AM (#607159)
I don't know what you mean by "every". MGL's UZR, which looks at the play-by-play, had, as I posted in part 2, numbers of +8,-2,-3,-17 (or something like that) from 1998 to 2001.

And perhaps the "every" statistic that you are looking at has the same biases, as Mike is trying to point out with his series. You must know two things *at least*
1 - How many balls does this player get to field?
2 - Where are those balls being hit?

   5. Mike Emeigh Posted: November 12, 2002 at 02:02 AM (#607160)
And perhaps the "every" statistic that you are looking at has the same biases, as Mike is trying to point out with his series.

Tango is correct. The various defensive methods out there do a better job confirming each other's conclusions than they do in systematically removing biases.

One thing that struck me when I did the original analysis for SABR32 is the extent to which the results from every recent method correlate with Palmer's Fielding Runs. Fielding Runs is a much-maligned statistic among the community of defensive analysts, for good reasons - yet even after you make a careful effort to account for the known problems that affect Fielding Runs, as Davenport and Saeger and James have done, you wind up with rankings that place the players in more or less the same order (although not with the same range of separation between best and worst), and you still end up with a good-sized positive relationship between fielding opportunities and results. It's certainly *possible* that good fielders act as ball magnets, and that pitchers try to pitch in such a way as to maximize the opportunities that their good fielders have to handle the ball, but there's not a lot of evidence that pitchers actually *do* pitch in this fashion - which leads one to the conclusion that there is still some form of opportunity bias for which existing defensive methods do not account.

It's pretty clear that every defensive method out there, to some extent, rewards fielders who happen to have a large number of balls hit into their vicinity, and penalizes fielders who don't. It's also fairly clear that the distribution of BIP against the Yankees gives Jeter very few opportunities to make plays - and that Jeter's low ranking in existing defensive methods stems at least in part from the fact that he isn't given the chance to make plays.

Visual evidence on the quality of a player's defense is pretty unreliable, because when most people are watching a game, they're following the ball, not the fielder, and most of the work that the fielder does to make a play occurs before he ever makes it into the viewer's frame of reference. Those balls to his right on which Jeter is determined to be "pretty bad", for example - who hit them? Does the player normally hit balls in that area? Where was Jeter positioned BEFORE the play? How far did he have to move? Was Jeter positioned in a place where he SHOULD have covered the location where the ball was hit? If you don't know the answers to those questions, you can't really make a judgment as to whether Jeter missed a play that another SS would have made - and unless you are following the fielder instead of the ball, you don't really have the answers to those questions.

-- MWE
   6. RP Posted: November 12, 2002 at 02:02 AM (#607164)
Sean -- I watched Jeter play many times, and to my eyes, he appears to have very little range and miss a lot of balls. It's true that it could be a matter of perception and that I'm simply applying his poor defensive stats to what I see, but isn't it equally possible that I see the same thing the stats see -- i.e., that he's not very good? Isn't the latter explanation more plausible and logical?

You must be Registered and Logged In to post comments.

 

1998 Team G GS Inn FR DFT
Cruz, D DET 135 132 1163.3 16.10 17
Stocker, K TBA 110 108 940.0 11.63 14
Vizquel, O CLE 151 149 1316.0 7.44 11
Gonzalez, A TOR 158 157 1398.3 -9.40 5
Bordick, M BAL 150 144 1238.3 18.26 4
DiSarcina, G ANA 157 155 1370.7 -3.63 3
Meares, P MIN 149 145 1270.0 -7.68 0
Tejada, M OAK 104 104 915.0 2.16 0
Rodriguez, A SEA 160 160 1389.3 2.06 0
Jeter, D NYA 148 148 1304.7 -20.02 -3
Garciaparra, N BOS 143 143 1255.3 -15.26 -11
Caruso, M CHA 131 129 1121.3 -7.33 -16
             
1999 Team G GS Inn FR DFT
Sanchez, R KCA 134 131 1128.7 31.94 24
Bordick, M BAL 159 155 1355.0 35.38 23
Cruz, D DET 155 151 1300.3 4.68 20
Batista, T TOR 98 98 860.7 10.18 13
Tejada, M OAK 159 156 1377.3 7.13 4
Rodriguez, A SEA 129 129 1114.7 7.23 3
Garciaparra, N BOS 134 133 1171.7 -7.53 0
Guzman, C MIN 131 126 1069.0 -7.20 -3
Vizquel, O CLE 143 140 1214.3 1.57 -7
Clayton, R TEX 133 133 1149.3 3.10 -7
Caruso, M CHA 132 125 1114.7 -19.92 -10
Jeter, D NYA 158 158 1395.7 -33.55 -13
             
2000 Team G GS Inn FR DFT
Sanchez, R KCA 143 140 1198.0 15.18 28
Rodriguez, A SEA 148 148 1285.0 8.05 17
Gonzalez, A TOR 141 140 1225.3 -6.82 16
Cruz, D DET 156 154 1355.3 3.95 13
Martinez, F TBA 106 103 887.7 31.20 9
Valentin, J CHA 141 136 1212.3 20.59 0
Garciaparra, N BOS 136 135 1185.0 -2.55 -4
Vizquel, O CLE 156 154 1328.7 -0.54 -4
Tejada, M OAK 160 159 1400.3 3.17 -4
Clayton, R TEX 148 144 1237.0 -2.59 -4
Bordick, M BAL 100 100 865.0 -14.38 -6
Guzman, C MIN 151 148 1307.0 -14.15 -9

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Adam S
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.9731 seconds
66 querie(s) executed