Baseball for the Thinking Fan

You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Sunday, July 15, 2007

Fun With Leverage: Is Perception Reality?

Almost four years ago, I introduced a concept called Leverage Index. Leverage is the swing in the possible change in win probability. If there is a game with one team leading by ten runs, the possible changes in win probability, whether the event is a home run or a double play, will be very close to negligible. That is, there won’t be much swing in any direction.

But, in a late and close game, the change in win probability among the various events will have rather wild swings. With a runner on first, two outs, down by one, and in the bottom of the ninth, the game can hinge on one swing of the bat—a home run and an out will both end the game, but with vastly different outcomes for the teams involved.

You can spot a high-leverage situation, I can spot them, and pretty much everyone can spot many high-leverage situations. All that’s left for us to do is to quantify every single game state into a number. That number is the Leverage Index.

—from Tango’s article at The Hardball Times.

Baseball fans - and sportswriters - who are not oriented toward statistical analysis tend to have a fixation with the concept of “clutch hitting”. In spite of numerous studies over the years that show that clutch ability - if it exists at all - tends to be relatively small, fans still argue that so-and-so is truly a “clutch god” or a “choker”.

One issue that we’ve had in trying to evaluate clutch performance from an analytical standpoint is that it’s been difficult to come up with a consistent definition of “clutch situation” that doesn’t do one of two things:

1. aggregate too many “unlike things” together (e.g. performance with runners in scoring position, which equates runner on second/two outs with bases loaded/no outs even though there is a very different potential impact on the game situation);
2. reduces the sample size to a point where small variations have tremendous impact (e.g. performance with RISP in late/close situations, where many hitters may have no more than 20-30 appearances in a season)

What Leverage Index does is to place every plate appearance on a sliding scale based on potential game impact. As Tango notes in the quote I highlighted above, most people know clutch when they see it, even if they can’t necessarily define it. LI does an excellent job of accurately capturing the relative importance of game situations from the viewpoint of a typical fan.

Suppose we look at some randomly selected game situations (per Tango’s chart):

Leverage Index 1.0
Top 6, no outs, bases empty, home team trailing by 1
Bottom 6, 1 out, bases empty, score tied
Bottom 7, 2 out, bases empty, home team trailing by 1
Top 8, 2 out, bases empty, score tied
Bottom 8, 1 out, runner on 3rd, home team ahead by 1

Leverage Index 2.0
Top 6, 1 out, runner on 2nd, home team ahead by 1
Bottom 6, no outs, runner on 1st, score tied
Bottom 7, 1 out, runner on 1st, score tied
Top 8, 1 out, runners on 1st and 2nd, home team trailing by 1
Bottom 8, no outs, runner on 2nd, score tied

Leverage Index 3.0
Bottom 6, 1 out, runners on 1st and 3rd, home team trailing by 1
Bottom 7, 1 out, runners on 1st and 2nd, score tied
Top 8, 1 out, runners on 2nd and 3rd, score tied
Bottom 8, 1 out, runners on 2nd and 3rd, score tied
Bottom 9, bases loaded, 1 out, home team down by 4

While there may be some quibbles about the value assigned to specific situations (like the last one in the 3.0 group), I think that most people would agree that, in general, the relative game importance of the situations as a group mirror the LI assigned to the group. Most fans would recognize the last group of situations as being more “clutch” than the next-to-last group, in my opinion, and the first group as containing the fewest clutch situations.

LI can be used as a weighting factor, to weight a player’s plate appearance by their relative game importance. Consider a .250 hitter who bats in the following game situations over 24 plate appearances:

12 PA with LI = 0.5
8 PA with LI = 1.0
4 PA with LI = 2.0

If we weight his PA by LI, the 12 PA in the lowest LI situation would be the equivalent of 6 “normal” PA, and the 4 PA in the highest LI situation would by the equivalent of 8 “normal” PA, giving him a weighted equivalent of 22 PA. Suppose that player goes 4-12 in the 0.5 LI situations, 2-8 in the 1.0 LI situations, and 0-4 in the 2.0 LI situations. If we weight that performance by LI, we get:

2-6 weighted by 0.5 LI
2-8 weighted by 1.0 LI
0-8 weighted by 2.0 LI

or 4-22, a weighted performance of .182. If on the other hand, the player went 2-12 in the low leverage situations, 2-8 in the middle, and 2-4 in the high leverage situations, we’d now have

1-6
2-8
4-8

or 7-22, a weighted performance of .318.

One could, in this manner, develop weighted performance for each player, weighting his PA by the LI of each situation in which he appeared. If the player’s weighted performance was better than his actual performance, one could conclude that he produced more value in game-important situations (e.g. was more “clutch”); if the player’s weighted performance was worse than his actual performance, one could conclude that he produced less value in game-important situations (e.g. was more of a “choker”). The advantage of doing something like this is that every plate appearance for every player can be included in the study, and plate appearances are weighted in a more-or-less appropriate manner based on a consistent definition of the value of the PA.

Now, having said all of that, I don’t think that doing this the simple way actually has a lot of analytical value. There appears to be a small inverse relationship between weighted performance and average leverage - IOW, the higher the average leverage a player sees, the worse his weighted performance is likely to be. Since leverage opportunities are not evenly distributed (they depend on lineup position and team quality, at a minimum), it’s not entirely clear that the weighted performance is fair. That’s why this article is called “Fun with Leverage” - this shouldn’t be taken as a serious attempt to answer the clutch question but as more of a throwaway. But I decided to write this article anyway, because it’s eerie how well some of the results match the perception that many fans have of certain players - and may at least give some insight into why people have picked those labels up.

The play-by-play data from 2003-2006 that I used here was obtained free of charge from and is copyrighted by Retrosheet.  Interested parties may contact Retrosheet. The LIs were derived from Dave Studenmund’s Win Expectancy worksheet, which is available from the Baseball Graphs site. I didn’t make any sort of year-to-year park or run environment adjustments. in an effort to keep it (relatively) simple.

There were, from 2003-2006, 1029 players who were not pitching at the time and who batted in at least one game. Collectively, these 1029 players hit .270/.338/.433 overall. When their plate appearances were weighted by LI, the collective performance of those players was .271/.345/.431, a net gain of 5 points in OPS. This reflects a fairly typical tradeoff that occurs in high-leverage situations - pitchers are more willing to allow a walk, less inclined to allow an extra-base hit. It may also reflect the “protecting the lines” mentality that permeates baseball teams late in close games.

From that set of 1029 players, I identified a smaller group of 153 players who had at least 250 plate appearances in each of the four seasons 2003-2006. These players I cast as “regulars” - players who got consistent playing time - and the smaller number of PAs any one of these players had was 1220 (Juan Castro). These players, collectively, hit .281/.351/.455 - they were a better group of players across the board. Their collective performance weighted by LI was .282/.358/.454 for a net gain of 6 points in OPS - basically the same pattern as shown by all players.

Finally, within the set of 152 regulars, I took the top 36 hitters, all of whom had OPS of at least .850. These good hitters, collectively, hit .293/.383/.529 unweighted, and .296/.395/.531 weighted by LI - a gain of 14 points in OPS. I found it interesting that, even though they had a larger OBP increase than the other groups, the good hitters maintained their isolated power where the other group lost some of theirs (although the numbers are small and not especially significant). There was virtually no difference in average LI among the three groups.

These group totals set expectations for weighted performance, in my opinion. We would expect modest, OBP-heavy gains in OPS from the typical hitter when his performance is weighted by LI. A really good high-leverage performer would see larger gains; a poor one would see smaller gains, or a decline.

Looking at the group of good hitters, we have.

Top 5, weighted OPS - actual OPS:

Carlos Delgado, .285/.391/.566 unweighted, .310/.416/.618 weighted, 77 point gain
Carlos Beltran, .278/.368/.517 unweighted, .295/.388/.550 weighted, 53 point gain
Albert Pujols, .338/.429/.650 unweighted, .345/.443/.688 weighted, 52 point gain
David Ortiz, .294/.391/.609 unweighted, .318/.412/.638 weighted, 50 point gain
Derek Jeter, .316/.387/.464 unweighted, .331/.410/.482 weighted, 41 point gain

Bottom 5, weighted OPS - actual OPS:

Travis Hafner, .299/.404/.590 unweighted, .289/.399/.563 weighted, 32 point loss
Javy Lopez, .298/.347/.518 unweighted, .283/.350/.486 weighted, 29 point loss
Carlos Guillen, .310/.379/.483 unweighted, .301/.382/.456 weighted, 24 point loss
Miguel Tejada, .306/.356/.505 unweighted, .296/.351/.489 weighted, 21 point loss
Carlos Lee, .290/.344/.513 unweighted, .284/.344/.492 weighted, 21 point loss

The top five have been well-publicized for their “clutchiness”. The bottom 5 aren’t particularly well-known as “chokers” - with the possible exception of Tejada - but Alfonso Soriano, who was sixth from the bottom, does have something of an “unclutch” reputation.

ARod, FWIW, hit .299/.396/.562 overall, but had a weighted performance of .297/.403/.557, for a 2-point OPS gain. This placed him 24th among the 36 good hitters, and especially in comparison to Jeter probably explains a lot of the perception of ARod as a player who doesn’t produce when it counts. Manny Ramirez, who also has a bit of an “unclutch” reputation, hit .311/.412/.602 overall and .312/.429/.594 weighted, a 9-point OPS gain but with a larger loss of power than the typical good hitter showed.

While there are some mismatches between weighted performance and perception - Bobby Abreu was just behind Jeter, JD Drew and Adam Dunn were also pretty high, and Andruw Jones and Miguel Cabrera are fairly low on the list - as a general rule I think that performance weighted by LI matches perception of clutch value quite well. Whether this has any analytical significance remains to be seen, but I think it offers a starting point.

Mike Emeigh Posted: July 15, 2007 at 04:43 PM | 13 comment(s) Login to Bookmark
Related News:

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

1. GGC Posted: July 16, 2007 at 12:58 AM (#2442272)
Interesting stuff, Mike. I'd be interested in seeing someone come up with a Leverage Index that places every game on a sliding scale based on potential pennant impact. I think that some of the flack that Rodriguez gets is stuff like the Mayrod label. Also, I think that Justin Morneau won the AL MVP last year because it was perceived that he hit in the games that counted.
2. Isabel Posted: July 16, 2007 at 02:31 AM (#2442314)
I'd be interested in seeing someone come up with a Leverage Index that places every game on a sliding scale based on potential pennant impact.

A good way to do that would be to look at the baseball prospectus postseason odds. To calculate a LI-like statistic for a game between, say, the Phillies and the Qankees(*) you'd compare the probability of the Phillies making the playoffs at the beginning of the day, the probability of the Phillies making the playoffs if they win the game, and the probability of the Phillies making the playoffs if the Qankees win the game.

However, this has a problem: it's possible that a game could be important to one of the teams involved (because they're a contending team but just barely -- the actual Phillies in the last couple seasons are a good example) but not to the other. You don't have this problem with LI, and it's not immediately clear how to fix it; perhaps you've have to calculate separate statistics for both teams? Also, the postseason odds report seems hard to calculate; I'm wondering if there's a quicker way to calculate approximate playoff odds than actually simulating the entire schedule.

(*) Yes, the Qankees. I generally call the teams in my examples the Phillies and the Qankees, because I'm a Phils fan, and q is the letter after p so clearly the name of the other team has to start with a Q; Qankees is more fun to say than Qets or Qaves (Qraves?), and I went to college in Boston so I kind of got the hating-the-Yankees thing drilled into me there.
3. Gold Star - just Gold Star Posted: July 16, 2007 at 04:07 AM (#2442353)
I think that some of the flack that Rodriguez gets is stuff like the Mayrod label. Also, I think that Justin Morneau won the AL MVP last year because it was perceived that he hit in the games that counted.
Think Chipper Jones in 1999. After the ASG, hit 328/464/693/1.157, y'know.
4. Delino DeShields & Yarnell Posted: July 16, 2007 at 04:03 PM (#2442548)
I have a dumb question and am too lazy to go through the original work...
What is the 'favorable' state assumed in LI?. A basehit with each runner advancing one base? Some variation where a runner scores from second with two outs?

If just one base, might LI overvalue a hitter who is in fact a bit more likely to produce extra bases with the leveraged AB? Just offhand, note that the unadjusted SLG of the overachievers above is 27 pts higher than the underachievers.

Agreed, though, the operative word is 'fun'.
5. DSG Posted: July 16, 2007 at 04:20 PM (#2442559)
What is the 'favorable' state assumed in LI?. A basehit with each runner advancing one base? Some variation where a runner scores from second with two outs?

***

No, the beauty of LI is that it takes into account ALL possibilities of what could happen in determining the leverage of the situation, instead of seeing what would happen given one particular event, like Woolner's Leverage or Drinen's "P". I would suggest reading all of Tango's articles on LI:

http://www.hardballtimes.com/main/article/crucial-situations/
http://www.hardballtimes.com/main/article/crucial-situations-part-2/
http://www.hardballtimes.com/main/article/crucial-situations-part-three/
6. Delino DeShields & Yarnell Posted: July 16, 2007 at 04:35 PM (#2442577)
DSG-
Thanks. I remember reading the THT stuff originally but probably just forgot how amazing the work product really was.

So a player with higher ISOp, say, in those situations will get more credit but with good reason - he's favorably affecting outcomes better than the weighted average of outcomes, not better than some fixed, deterministic outcome.
7.  Posted: July 16, 2007 at 06:44 PM (#2442699)
No, the beauty of LI is that it takes into account ALL possibilities of what could happen in determining the leverage of the situation

There are multiple ways to calculate it, as Tango's series notes, but the definition is:

Leverage is the swing in the possible change in win probability.

so you are looking at endpoints - e.g. with bases loaded and no outs, the swing in WP goes from hitting a grand slam to hitting into a triple play.

So a player with higher ISOp, say, in those situations will get more credit but with good reason - he's favorably affecting outcomes better than the weighted average of outcomes, not better than some fixed, deterministic outcome.

Well, the player will get more credit for being a clutch performer in the minds of the fans. I don't know that the player actually deserves the level of credit that weighting performance by LI gives him - it can be argued that players who perform well in lower-leverage situations early in the game actually help their teams more by reducing the frequency of higher-leverage situations where teams can see wild swings in their ability to win or lose later in the game. I do think there's some value in looking at the shape of performance based on leverage, but I look at this as more of an adjustment to the base evaluation rather than as an overall evaluation system.

-- MWE
8. Delino DeShields & Yarnell Posted: July 16, 2007 at 07:36 PM (#2442741)
But the 'endpoints' are just two of the (more unlikely) plus/minus outcomes being weighted, yes? That is, with bases loaded, no out, an LI of 3.0+ may have your extreme scenario(s) in there but there's also the double that clears the bases or the double play that scores one run with the 'next state' at two outs.

By 'more credit' I meant that, given that it is a high LI spot, to the extent he has a better weighted distribution of outcomes than the distribution used to get LI (e.g. always hits a double when he does hit), then that is legit credit.

I sort of thought the opposite - this IS the way to see if a guy really does contribute five nickels early in a game for every quarter another player might.
9. Los Angeles Waterloo of Black Hawk Posted: July 18, 2007 at 12:54 AM (#2444305)
The bottom 5 aren’t particularly well-known as “chokers” - with the possible exception of Tejada

Well, Miggy won an MVP on the basis of a handful of high-leverage successes, so I doubt he has a rep as a choker; I've never heard him assailed as such.
10.  Posted: July 18, 2007 at 01:30 PM (#2444888)
Miggy won an MVP on the basis of a handful of high-leverage successes, so I doubt he has a rep as a choker; I've never heard him assailed as such.

Since he moved to Baltimore, there have been a handful of rumblings about his lack of "clutch" performance, and of course his postseason baserunning booboo with the A's is still remembered not-so-fondly. That's why I said "possibly".

I was surprised by how well the numbers, particularly at the top of the lists, supported the popular perceptions of players.

-- MWE
11. More Indecisive than Lonnie Smith on 2nd... Posted: July 20, 2007 at 05:54 PM (#2448226)
I know you only ran 2003-2006, Mike (and thank you for all the work and interesting thought process), but it would be interesting to see how a great season (A-Rod) and a personally average season (Ortiz) might impact the rankings. I doubt one season will change the perceptions of players established over a 4+ year span, but given A-Rod's terrific 9th inning success thus far, I'm wondering how far up the rankings he would move. Similarly, it appears that Jeter has been decidedly "unclutch" this season (from a purely observational perspetive), and Ortiz has been battling injury/slip in performance, so high might drop out of the top 5.

I guess the shorthand questions is: how volatile are these rankings, given the expected yearly variation + the small sample sizes?
12.  Posted: July 20, 2007 at 06:10 PM (#2448253)
I guess the shorthand questions is: how volatile are these rankings, given the expected yearly variation + the small sample sizes?

Hard to say. I would expect them to be pretty volatile, in as much as players get only about 60-70 high-leverage PAs (I call any PA with LI of 2.0 and above "high-leverage") per season. ARod, for example, had 262 PAs in high-leverage situations from 2003-2006. Ortiz had 271, Jeter just 226 (which is pretty low for the group of good hitters, but not unusual for a top-of-the-order guy).

-- MWE
13. More Indecisive than Lonnie Smith on 2nd... Posted: July 20, 2007 at 11:05 PM (#2448663)
Thanks for the reply, Mike. Do you think you might recalculate these after 2007 to get an idea of volatility? I know you've done a lot of work on this already, so "no" is a good answer as well...just curious if it will be something to look out for.

On a side note, did the proportion of 3.0:2.0 LI situations have any predictive value? Given the increased weight and the (presumably) lower frequency and small sample size, I could see someone's (relatively) poor performance in LI 3.0 situations skewing their overall pretty considerably, if they faced an inordinately high # of PAs in those situations. That said, the likelihood of that affecting the rankings of more than 2-3 guys is, I would assume, pretty low.

You must be Registered and Logged In to post comments.

<< Back to main

Support BBTF

Thanks to
robneyer
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.2352 seconds
41 querie(s) executed