Fun With Leverage: Is Perception Reality?
Almost four years ago, I introduced a concept called Leverage Index. Leverage is the swing in the possible change in win probability. If there is a game with one team leading by ten runs, the possible changes in win probability, whether the event is a home run or a double play, will be very close to negligible. That is, there won’t be much swing in any direction.
But, in a late and close game, the change in win probability among the various events will have rather wild swings. With a runner on first, two outs, down by one, and in the bottom of the ninth, the game can hinge on one swing of the bat—a home run and an out will both end the game, but with vastly different outcomes for the teams involved.
You can spot a high-leverage situation, I can spot them, and pretty much everyone can spot many high-leverage situations. All that’s left for us to do is to quantify every single game state into a number. That number is the Leverage Index.
—from Tango’s article at The Hardball Times.
Baseball fans - and sportswriters - who are not oriented toward statistical analysis tend to have a fixation with the concept of “clutch hitting”. In spite of numerous studies over the years that show that clutch ability - if it exists at all - tends to be relatively small, fans still argue that so-and-so is truly a “clutch god” or a “choker”.
One issue that we’ve had in trying to evaluate clutch performance from an analytical standpoint is that it’s been difficult to come up with a consistent definition of “clutch situation” that doesn’t do one of two things:
1. aggregate too many “unlike things” together (e.g. performance with runners in scoring position, which equates runner on second/two outs with bases loaded/no outs even though there is a very different potential impact on the game situation);
2. reduces the sample size to a point where small variations have tremendous impact (e.g. performance with RISP in late/close situations, where many hitters may have no more than 20-30 appearances in a season)
What Leverage Index does is to place every plate appearance on a sliding scale based on potential game impact. As Tango notes in the quote I highlighted above, most people know clutch when they see it, even if they can’t necessarily define it. LI does an excellent job of accurately capturing the relative importance of game situations from the viewpoint of a typical fan.
Suppose we look at some randomly selected game situations (per Tango’s chart):
Leverage Index 1.0
Top 6, no outs, bases empty, home team trailing by 1
Bottom 6, 1 out, bases empty, score tied
Bottom 7, 2 out, bases empty, home team trailing by 1
Top 8, 2 out, bases empty, score tied
Bottom 8, 1 out, runner on 3rd, home team ahead by 1
Leverage Index 2.0
Top 6, 1 out, runner on 2nd, home team ahead by 1
Bottom 6, no outs, runner on 1st, score tied
Bottom 7, 1 out, runner on 1st, score tied
Top 8, 1 out, runners on 1st and 2nd, home team trailing by 1
Bottom 8, no outs, runner on 2nd, score tied
Leverage Index 3.0
Bottom 6, 1 out, runners on 1st and 3rd, home team trailing by 1
Bottom 7, 1 out, runners on 1st and 2nd, score tied
Top 8, 1 out, runners on 2nd and 3rd, score tied
Bottom 8, 1 out, runners on 2nd and 3rd, score tied
Bottom 9, bases loaded, 1 out, home team down by 4
While there may be some quibbles about the value assigned to specific situations (like the last one in the 3.0 group), I think that most people would agree that, in general, the relative game importance of the situations as a group mirror the LI assigned to the group. Most fans would recognize the last group of situations as being more “clutch” than the next-to-last group, in my opinion, and the first group as containing the fewest clutch situations.
LI can be used as a weighting factor, to weight a player’s plate appearance by their relative game importance. Consider a .250 hitter who bats in the following game situations over 24 plate appearances:
12 PA with LI = 0.5
8 PA with LI = 1.0
4 PA with LI = 2.0
If we weight his PA by LI, the 12 PA in the lowest LI situation would be the equivalent of 6 “normal” PA, and the 4 PA in the highest LI situation would by the equivalent of 8 “normal” PA, giving him a weighted equivalent of 22 PA. Suppose that player goes 4-12 in the 0.5 LI situations, 2-8 in the 1.0 LI situations, and 0-4 in the 2.0 LI situations. If we weight that performance by LI, we get:
2-6 weighted by 0.5 LI
2-8 weighted by 1.0 LI
0-8 weighted by 2.0 LI
or 4-22, a weighted performance of .182. If on the other hand, the player went 2-12 in the low leverage situations, 2-8 in the middle, and 2-4 in the high leverage situations, we’d now have
or 7-22, a weighted performance of .318.
One could, in this manner, develop weighted performance for each player, weighting his PA by the LI of each situation in which he appeared. If the player’s weighted performance was better than his actual performance, one could conclude that he produced more value in game-important situations (e.g. was more “clutch”); if the player’s weighted performance was worse than his actual performance, one could conclude that he produced less value in game-important situations (e.g. was more of a “choker”). The advantage of doing something like this is that every plate appearance for every player can be included in the study, and plate appearances are weighted in a more-or-less appropriate manner based on a consistent definition of the value of the PA.
Now, having said all of that, I don’t think that doing this the simple way actually has a lot of analytical value. There appears to be a small inverse relationship between weighted performance and average leverage - IOW, the higher the average leverage a player sees, the worse his weighted performance is likely to be. Since leverage opportunities are not evenly distributed (they depend on lineup position and team quality, at a minimum), it’s not entirely clear that the weighted performance is fair. That’s why this article is called “Fun with Leverage” - this shouldn’t be taken as a serious attempt to answer the clutch question but as more of a throwaway. But I decided to write this article anyway, because it’s eerie how well some of the results match the perception that many fans have of certain players - and may at least give some insight into why people have picked those labels up.
The play-by-play data from 2003-2006 that I used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet. The LIs were derived from Dave Studenmund’s Win Expectancy worksheet, which is available from the Baseball Graphs site. I didn’t make any sort of year-to-year park or run environment adjustments. in an effort to keep it (relatively) simple.
There were, from 2003-2006, 1029 players who were not pitching at the time and who batted in at least one game. Collectively, these 1029 players hit .270/.338/.433 overall. When their plate appearances were weighted by LI, the collective performance of those players was .271/.345/.431, a net gain of 5 points in OPS. This reflects a fairly typical tradeoff that occurs in high-leverage situations - pitchers are more willing to allow a walk, less inclined to allow an extra-base hit. It may also reflect the “protecting the lines” mentality that permeates baseball teams late in close games.
From that set of 1029 players, I identified a smaller group of 153 players who had at least 250 plate appearances in each of the four seasons 2003-2006. These players I cast as “regulars” - players who got consistent playing time - and the smaller number of PAs any one of these players had was 1220 (Juan Castro). These players, collectively, hit .281/.351/.455 - they were a better group of players across the board. Their collective performance weighted by LI was .282/.358/.454 for a net gain of 6 points in OPS - basically the same pattern as shown by all players.
Finally, within the set of 152 regulars, I took the top 36 hitters, all of whom had OPS of at least .850. These good hitters, collectively, hit .293/.383/.529 unweighted, and .296/.395/.531 weighted by LI - a gain of 14 points in OPS. I found it interesting that, even though they had a larger OBP increase than the other groups, the good hitters maintained their isolated power where the other group lost some of theirs (although the numbers are small and not especially significant). There was virtually no difference in average LI among the three groups.
These group totals set expectations for weighted performance, in my opinion. We would expect modest, OBP-heavy gains in OPS from the typical hitter when his performance is weighted by LI. A really good high-leverage performer would see larger gains; a poor one would see smaller gains, or a decline.
Looking at the group of good hitters, we have.
Top 5, weighted OPS - actual OPS:
Carlos Delgado, .285/.391/.566 unweighted, .310/.416/.618 weighted, 77 point gain
Carlos Beltran, .278/.368/.517 unweighted, .295/.388/.550 weighted, 53 point gain
Albert Pujols, .338/.429/.650 unweighted, .345/.443/.688 weighted, 52 point gain
David Ortiz, .294/.391/.609 unweighted, .318/.412/.638 weighted, 50 point gain
Derek Jeter, .316/.387/.464 unweighted, .331/.410/.482 weighted, 41 point gain
Bottom 5, weighted OPS - actual OPS:
Travis Hafner, .299/.404/.590 unweighted, .289/.399/.563 weighted, 32 point loss
Javy Lopez, .298/.347/.518 unweighted, .283/.350/.486 weighted, 29 point loss
Carlos Guillen, .310/.379/.483 unweighted, .301/.382/.456 weighted, 24 point loss
Miguel Tejada, .306/.356/.505 unweighted, .296/.351/.489 weighted, 21 point loss
Carlos Lee, .290/.344/.513 unweighted, .284/.344/.492 weighted, 21 point loss
The top five have been well-publicized for their “clutchiness”. The bottom 5 aren’t particularly well-known as “chokers” - with the possible exception of Tejada - but Alfonso Soriano, who was sixth from the bottom, does have something of an “unclutch” reputation.
ARod, FWIW, hit .299/.396/.562 overall, but had a weighted performance of .297/.403/.557, for a 2-point OPS gain. This placed him 24th among the 36 good hitters, and especially in comparison to Jeter probably explains a lot of the perception of ARod as a player who doesn’t produce when it counts. Manny Ramirez, who also has a bit of an “unclutch” reputation, hit .311/.412/.602 overall and .312/.429/.594 weighted, a 9-point OPS gain but with a larger loss of power than the typical good hitter showed.
While there are some mismatches between weighted performance and perception - Bobby Abreu was just behind Jeter, JD Drew and Adam Dunn were also pretty high, and Andruw Jones and Miguel Cabrera are fairly low on the list - as a general rule I think that performance weighted by LI matches perception of clutch value quite well. Whether this has any analytical significance remains to be seen, but I think it offers a starting point.
Posted: July 15, 2007 at 03:43 PM | 13 comment(s)
Login to Bookmark