User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Page rendered in 0.4518 seconds
62 querie(s) executed
|
| ||||||||
|
You are here > Home > Primate Studies > Discussion
| ||||||||
Primate Studies — Where BTF's Members Investigate the Grand Old Game Sunday, July 15, 2007Fun With Leverage: Is Perception Reality?
-- from Tango’s article at The Hardball Times. Baseball fans - and sportswriters - who are not oriented toward statistical analysis tend to have a fixation with the concept of “clutch hitting”. In spite of numerous studies over the years that show that clutch ability - if it exists at all - tends to be relatively small, fans still argue that so-and-so is truly a “clutch god” or a “choker”. One issue that we’ve had in trying to evaluate clutch performance from an analytical standpoint is that it’s been difficult to come up with a consistent definition of “clutch situation” that doesn’t do one of two things:
1. aggregate too many “unlike things” together (e.g. performance with runners in scoring position, which equates runner on second/two outs with bases loaded/no outs even though there is a very different potential impact on the game situation);
What Leverage Index does is to place every plate appearance on a sliding scale based on potential game impact. As Tango notes in the quote I highlighted above, most people know clutch when they see it, even if they can’t necessarily define it. LI does an excellent job of accurately capturing the relative importance of game situations from the viewpoint of a typical fan. Suppose we look at some randomly selected game situations (per Tango’s chart):
Leverage Index 1.0
Leverage Index 2.0
Leverage Index 3.0
While there may be some quibbles about the value assigned to specific situations (like the last one in the 3.0 group), I think that most people would agree that, in general, the relative game importance of the situations as a group mirror the LI assigned to the group. Most fans would recognize the last group of situations as being more “clutch” than the next-to-last group, in my opinion, and the first group as containing the fewest clutch situations. LI can be used as a weighting factor, to weight a player’s plate appearance by their relative game importance. Consider a .250 hitter who bats in the following game situations over 24 plate appearances:
12 PA with LI = 0.5
If we weight his PA by LI, the 12 PA in the lowest LI situation would be the equivalent of 6 “normal” PA, and the 4 PA in the highest LI situation would by the equivalent of 8 “normal” PA, giving him a weighted equivalent of 22 PA. Suppose that player goes 4-12 in the 0.5 LI situations, 2-8 in the 1.0 LI situations, and 0-4 in the 2.0 LI situations. If we weight that performance by LI, we get:
2-6 weighted by 0.5 LI
or 4-22, a weighted performance of .182. If on the other hand, the player went 2-12 in the low leverage situations, 2-8 in the middle, and 2-4 in the high leverage situations, we’d now have
1-6
or 7-22, a weighted performance of .318. One could, in this manner, develop weighted performance for each player, weighting his PA by the LI of each situation in which he appeared. If the player’s weighted performance was better than his actual performance, one could conclude that he produced more value in game-important situations (e.g. was more “clutch"); if the player’s weighted performance was worse than his actual performance, one could conclude that he produced less value in game-important situations (e.g. was more of a “choker"). The advantage of doing something like this is that every plate appearance for every player can be included in the study, and plate appearances are weighted in a more-or-less appropriate manner based on a consistent definition of the value of the PA. Now, having said all of that, I don’t think that doing this the simple way actually has a lot of analytical value. There appears to be a small inverse relationship between weighted performance and average leverage - IOW, the higher the average leverage a player sees, the worse his weighted performance is likely to be. Since leverage opportunities are not evenly distributed (they depend on lineup position and team quality, at a minimum), it’s not entirely clear that the weighted performance is fair. That’s why this article is called “Fun with Leverage” - this shouldn’t be taken as a serious attempt to answer the clutch question but as more of a throwaway. But I decided to write this article anyway, because it’s eerie how well some of the results match the perception that many fans have of certain players - and may at least give some insight into why people have picked those labels up. The play-by-play data from 2003-2006 that I used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet. The LIs were derived from Dave Studenmund’s Win Expectancy worksheet, which is available from the Baseball Graphs site. I didn’t make any sort of year-to-year park or run environment adjustments. in an effort to keep it (relatively) simple. There were, from 2003-2006, 1029 players who were not pitching at the time and who batted in at least one game. Collectively, these 1029 players hit .270/.338/.433 overall. When their plate appearances were weighted by LI, the collective performance of those players was .271/.345/.431, a net gain of 5 points in OPS. This reflects a fairly typical tradeoff that occurs in high-leverage situations - pitchers are more willing to allow a walk, less inclined to allow an extra-base hit. It may also reflect the “protecting the lines” mentality that permeates baseball teams late in close games. From that set of 1029 players, I identified a smaller group of 153 players who had at least 250 plate appearances in each of the four seasons 2003-2006. These players I cast as “regulars” - players who got consistent playing time - and the smaller number of PAs any one of these players had was 1220 (Juan Castro). These players, collectively, hit .281/.351/.455 - they were a better group of players across the board. Their collective performance weighted by LI was .282/.358/.454 for a net gain of 6 points in OPS - basically the same pattern as shown by all players. Finally, within the set of 152 regulars, I took the top 36 hitters, all of whom had OPS of at least .850. These good hitters, collectively, hit .293/.383/.529 unweighted, and .296/.395/.531 weighted by LI - a gain of 14 points in OPS. I found it interesting that, even though they had a larger OBP increase than the other groups, the good hitters maintained their isolated power where the other group lost some of theirs (although the numbers are small and not especially significant). There was virtually no difference in average LI among the three groups. These group totals set expectations for weighted performance, in my opinion. We would expect modest, OBP-heavy gains in OPS from the typical hitter when his performance is weighted by LI. A really good high-leverage performer would see larger gains; a poor one would see smaller gains, or a decline. Looking at the group of good hitters, we have. Top 5, weighted OPS - actual OPS:
Carlos Delgado, .285/.391/.566 unweighted, .310/.416/.618 weighted, 77 point gain
Bottom 5, weighted OPS - actual OPS:
Travis Hafner, .299/.404/.590 unweighted, .289/.399/.563 weighted, 32 point loss
The top five have been well-publicized for their “clutchiness”. The bottom 5 aren’t particularly well-known as “chokers” - with the possible exception of Tejada - but Alfonso Soriano, who was sixth from the bottom, does have something of an “unclutch” reputation. ARod, FWIW, hit .299/.396/.562 overall, but had a weighted performance of .297/.403/.557, for a 2-point OPS gain. This placed him 24th among the 36 good hitters, and especially in comparison to Jeter probably explains a lot of the perception of ARod as a player who doesn’t produce when it counts. Manny Ramirez, who also has a bit of an “unclutch” reputation, hit .311/.412/.602 overall and .312/.429/.594 weighted, a 9-point OPS gain but with a larger loss of power than the typical good hitter showed. While there are some mismatches between weighted performance and perception - Bobby Abreu was just behind Jeter, JD Drew and Adam Dunn were also pretty high, and Andruw Jones and Miguel Cabrera are fairly low on the list - as a general rule I think that performance weighted by LI matches perception of clutch value quite well. Whether this has any analytical significance remains to be seen, but I think it offers a starting point. |
My BookmarksYou must be logged in to view your Bookmarks. Hot Topics |
|||||||
|
About Baseball Think Factory | Write for Us | Copyright © 1996-2007 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| Page rendered in 0.4518 seconds | ||||||
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
A good way to do that would be to look at the baseball prospectus postseason odds. To calculate a LI-like statistic for a game between, say, the Phillies and the Qankees(*) you'd compare the probability of the Phillies making the playoffs at the beginning of the day, the probability of the Phillies making the playoffs if they win the game, and the probability of the Phillies making the playoffs if the Qankees win the game.
However, this has a problem: it's possible that a game could be important to one of the teams involved (because they're a contending team but just barely -- the actual Phillies in the last couple seasons are a good example) but not to the other. You don't have this problem with LI, and it's not immediately clear how to fix it; perhaps you've have to calculate separate statistics for both teams? Also, the postseason odds report seems hard to calculate; I'm wondering if there's a quicker way to calculate approximate playoff odds than actually simulating the entire schedule.
(*) Yes, the Qankees. I generally call the teams in my examples the Phillies and the Qankees, because I'm a Phils fan, and q is the letter after p so clearly the name of the other team has to start with a Q; Qankees is more fun to say than Qets or Qaves (Qraves?), and I went to college in Boston so I kind of got the hating-the-Yankees thing drilled into me there.
Think Chipper Jones in 1999. After the ASG, hit 328/464/693/1.157, y'know.
What is the 'favorable' state assumed in LI?. A basehit with each runner advancing one base? Some variation where a runner scores from second with two outs?
If just one base, might LI overvalue a hitter who is in fact a bit more likely to produce extra bases with the leveraged AB? Just offhand, note that the unadjusted SLG of the overachievers above is 27 pts higher than the underachievers.
Agreed, though, the operative word is 'fun'.
***
No, the beauty of LI is that it takes into account ALL possibilities of what could happen in determining the leverage of the situation, instead of seeing what would happen given one particular event, like Woolner's Leverage or Drinen's "P". I would suggest reading all of Tango's articles on LI:
http://www.hardballtimes.com/main/article/crucial-situations/
http://www.hardballtimes.com/main/article/crucial-situations-part-2/
http://www.hardballtimes.com/main/article/crucial-situations-part-three/
Thanks. I remember reading the THT stuff originally but probably just forgot how amazing the work product really was.
So a player with higher ISOp, say, in those situations will get more credit but with good reason - he's favorably affecting outcomes better than the weighted average of outcomes, not better than some fixed, deterministic outcome.
There are multiple ways to calculate it, as Tango's series notes, but the definition is:
so you are looking at endpoints - e.g. with bases loaded and no outs, the swing in WP goes from hitting a grand slam to hitting into a triple play.
Well, the player will get more credit for being a clutch performer in the minds of the fans. I don't know that the player actually deserves the level of credit that weighting performance by LI gives him - it can be argued that players who perform well in lower-leverage situations early in the game actually help their teams more by reducing the frequency of higher-leverage situations where teams can see wild swings in their ability to win or lose later in the game. I do think there's some value in looking at the shape of performance based on leverage, but I look at this as more of an adjustment to the base evaluation rather than as an overall evaluation system.
-- MWE
By 'more credit' I meant that, given that it is a high LI spot, to the extent he has a better weighted distribution of outcomes than the distribution used to get LI (e.g. always hits a double when he does hit), then that is legit credit.
I sort of thought the opposite - this IS the way to see if a guy really does contribute five nickels early in a game for every quarter another player might.
Well, Miggy won an MVP on the basis of a handful of high-leverage successes, so I doubt he has a rep as a choker; I've never heard him assailed as such.
Since he moved to Baltimore, there have been a handful of rumblings about his lack of "clutch" performance, and of course his postseason baserunning booboo with the A's is still remembered not-so-fondly. That's why I said "possibly".
I was surprised by how well the numbers, particularly at the top of the lists, supported the popular perceptions of players.
-- MWE
I guess the shorthand questions is: how volatile are these rankings, given the expected yearly variation + the small sample sizes?
Hard to say. I would expect them to be pretty volatile, in as much as players get only about 60-70 high-leverage PAs (I call any PA with LI of 2.0 and above "high-leverage") per season. ARod, for example, had 262 PAs in high-leverage situations from 2003-2006. Ortiz had 271, Jeter just 226 (which is pretty low for the group of good hitters, but not unusual for a top-of-the-order guy).
-- MWE
On a side note, did the proportion of 3.0:2.0 LI situations have any predictive value? Given the increased weight and the (presumably) lower frequency and small sample size, I could see someone's (relatively) poor performance in LI 3.0 situations skewing their overall pretty considerably, if they faced an inordinately high # of PAs in those situations. That said, the likelihood of that affecting the rankings of more than 2-3 guys is, I would assume, pretty low.
You must be Registered and Logged In to post comments.
<< Back to main