|
|
|
|
Baseball Primer Newsblog— The Best News Links from the Baseball Newsstand
Friday, October 09, 2009
What you can’t see won’t hurt you… it’ll clutch you!
Let’s begin with the null hypothesis that player performance in clutch situations is identical to performance in non-clutch situations. A type I error occurs when we reject a correct null hypothesis. Studies of clutch hitting find that performance differences in these situations are small and often not statistically meaningful. The null stands and clutch-hitting skill is seen as a myth. A type II error occurs from not rejecting an incorrect null hypothesis. When James advocates agnosticism towards clutch-hitting as a skill, it is because that despite the studies showing little evidence of clutch-hitting he wants to avoid committing type II error. The problem is, this choice between type I and II errors isn’t free. By raising the decision criterion to avoid type II error, you necessarily increase the chance of committing type I error.
Identifying clutch hitting is practical problem that requires a decision involving real costs. Should a team factor in clutch ability when choosing between free agents. Should it matter for the manager choosing among pinch hitters? Should a historically big-game pitcher start the playoff series over your regular season ace? Based on the available evidence, if I had to decide between Jeter or A-Rod it’s not even close: Alex Rodriguez is a far superior player to Derek Jeter, and that’s what is relevant. And in cases were the players’ performances are more similar, I wouldn’t consider clutch performance for even a moment. If clutch ability exists, it would show up in bunches using the empirical methods already employed by researchers seeking to study the question.
In my view, the fog is a distraction: something to bring up to keep the argument going. But arguing takes time, which is valuable. Let’s stop it with the fog, already. Of course it’s possible that something exists that just hasn’t been discovered yet (e.g.the Loch Ness Monster, Sasquatch, ergogenic effects of HGH); but the evidence we have says these things don’t exist, and hanging hopes on the possible isn’t a very persuasive argument.
Repoz
Posted: October 09, 2009 at 12:29 PM | 24 comment(s)
Login to Bookmark
Tags:
sabermetrics
|
Bookmarks
You must be logged in to view your Bookmarks.
Hot Topics
Newsblog: Posnanski: Albert Pujols doesn't matter anymore (4 - 1:40am, May 21)Last: Eric FergusonNewsblog: OMNICHATTER for MAY 20, 2013 (142 - 1:38am, May 21)Last:  Phil Coorey. Newsblog: Joe Maddon calls ump's position 'baseball anarchy' (16 - 1:18am, May 21)Last: Robert in Manhattan BeachNewsblog: [OTP-May] Politico: Congressional baseball game, May 1, 1926 (3589 - 1:13am, May 21)Last:  Tulo's Fishy Mullet (mrams)Newsblog: Heyman: Miggy-Trout debate rages on, but Cabrera wins all here (151 - 12:52am, May 21)Last:  Cooper NielsonNewsblog: Rosenthal: Ax to fall soon for LA's Mattingly (88 - 12:46am, May 21)Last: The Yankee ClapperNewsblog: Hal Steinbrenner calls tickets 'affordable' (29 - 12:46am, May 21)Last: What did Billy Ripken have against Elroy Face?Newsblog: Williams: Discover one of baseball's forgotten streaks (24 - 12:45am, May 21)Last: Misirlou is bad, he's nationwideNewsblog: TheZobrists.com (15 - 12:44am, May 21)Last: MontyNewsblog: Rare Feat Not Done Since Pete Rose (2 - 12:25am, May 21)Last: VoodooRNewsblog: OT: The Soccer Thread, May 2013 (977 - 11:53pm, May 20)Last:  JH (in DC)Newsblog: Justice: 3-homer effort puts Miguel Cabrera ahead of pace from MVP 2012 season (2 - 11:32pm, May 20)Last: Cooper NielsonNewsblog: OT: NHL is finally back thread (354 - 11:25pm, May 20)Last:  Robert in Manhattan BeachNewsblog: Draft Features Rarest of Prospects: Redheads (107 - 11:09pm, May 20)Last:  Alex meets the threshold for granular reviewNewsblog: Sherman: Mets' roster of rubbish makes it impossible to evaluate Collins (39 - 10:59pm, May 20)Last: Jack Carter, calling Beleaguered Castle
|
|
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. AROM Posted: October 09, 2009 at 02:08 PM (#3346183)So I wouldn't put much weight on it for signing free agents or choosing a pinch hitter, and if you choose to ignore it you won't hurt yourself in baseball decision making. But as an academic exercise it's wrong to say it doesn't exist.
I think that's the whole point. If you're looking to advise a team, you probably shouldn't concern yourself with it. But if your just studying baseball for the sake of it, well, that's a different story.
For the purposes of discussion, let's do this:
"Clutch situation": any plate appearance with LI of 2.0 or higher
Method: look at performance of all hitters other than pitchers in plate appearances where LI is >=2.0, and compare it to the performance of all hitters in plate appearances where LI is <2.0.
When I look at this from 2004-2008, what I see is this:
In high-leverage situations, hitters hit .272/.351/.424, with .300 in-play BA and .380 in-play SLG. When you exclude intentional walks, the OBP drops to .333. Excluding IBB, hitters in high-LI situations hit 1 HR every 40.2 PA, drew an unintentional walk once every 11.9 PA, and struck out once every 5.7 PA.
In all other situations, hitters hit .270/.337/.432, with .300 in-play BA and .385 in-play SLG. When you exclude IBB, the OBP drops to .334. Excluding IBB, hitters managed one HR every 34.8 PA, an unintentional walk once every 12.6 PA, and struck out once every 6.1 PA.
FWIW: The main reason that the BA is higher in high-leverage situations is that sacrifice flies occur more often in those situations. If I treat sacrifice flies as at-bats, the BA in high-leverage situations would be .260 vs .269 in all other situations. In high-LI situations, excluding IBB, players get a hit once every 4.4 PA; in all other situations, once every 4.1 PA.
What seems to be reasonably clear to me from all of this is that player performance in clutch situations is NOT identical to player performance in non-clutch situations (by the definition I used for "clutch", anyway - making no claim that I have the best definition). Players get fewer hits - primarily because they hit fewer HR, since the in-play BA is essentially identical - hit for less power (both on HR and on BIP), walk more (both intentionally and unintentionally), and strike out more. A player who maintains his performance level across high-LI situations, especially his power production, is therefore very likely doing WELL in those situations, and we ought to approach our testing using group expectation for those situations rather than the "no change" assertion. Note that we might still very well find no significant variation among players from the group norm - but by using the group performance norm in those situations as the starting point we have a better chance of finding them, IMO.
-- MWE
Think of the null hypothesis of a coin being fair. You flip it once and it lands on heads. Is the coin fair? We have insufficient information to reject the null hypothesis. But had the null hypothesis instead been that the coin is unfair, we still wouldn't have had enough information to reject the null hypothesis. In essence, any conclusion you make depends on which null hypothesis you chose and how badly you want to reach a conclusion about it.
Now let's expand the sample size: do 100 coin flips. Or 1,100. Or 12,000. No matter what you choose, there's a margin of error around the result. The fog is still there, but it's smaller as the sample grows. If what you're looking for is small enough that it's within that margin of error, you can't say it doesn't exist. You can say that, if it exists, it's likely small.
Note that in the above we're testing if a coin is "fair". What is "unfair"? How different must the outcome be for us to call it unfair? If someone had rigged a coin to land on heads 50.3% of the time, or 50.0000001% of the time, technically it's unfair. If you have a precise definition of "fair" - exactly 50% - you can always come up with a definition of unfair that can exist within the margin of error, regardless of sample size.
Does this matter? It depends. If all you want to do is to decide whether a coin is suitable for one coin flip, a coin that lands on heads 50.0000001% of the time is good enough. If you're looking for a coin that you can use for 1 trillion flips, that coin might not be good enough for you.
In baseball, the success rate in a single plate appearance is relatively low. As a result of that, even a slight improvement is important. But the margin of error is significant; we'll only get 12,000 observations in a great career, of which maybe 600 will be clutch situations. To identify it as a repeatable skill you need to split the data further, to confirm that it's consistent in different samples. But baseball statistics are rarely that consistent from one 600 PA sample to another 600 PA sample, never mind fractions thereof. And that's for a player whose career is already over, not one who's maybe one-third of the way to that point.
In short, with baseball clutch performance can easily be real, meaningful, and statistically insignificant. I think Bradbury is defining the unfair coin to be something that lands on heads 95% of the time, and when it doesn't show up that way he's certain there's no such thing as an unfair coin. At a minimum there's no such thing as an unfair coin at that level of magnitude; but that's something entirely different than the question posed.
Isn't this probably explained by the fact that teams tend to use better pitchers in high leverage situations?
Isn't Bradbury an economist. In which case our null hypothesis should be that he doesn't know what he's talking about.
In short, it's not that clutch hitting may be real, meaningful, but statistically insignificant, but the reverse: statistically significant but meaningless.
But is this counter-balanced by the fact that bad pitchers give up more baserunners and create more high leverage situations?
This is a given. If a research piece doesn't do this, I'd like to see it.
This, by the way, also applies to regular season v playoffs. Yes, Jeter and O'Neill etc have the same stats, but that means they perform better in the playoffs, since they are facing a lower run environment.
My response is here.
But don't you get meaningful data by adding more careers to your study? I'm sure it's different analysis, but you can determine if a single coin is unfair by flipping it enough times. If you want to find out if pennies are inherently unfair you can flip 100 pennies 1,000 times, or you can flip 10,000 pennies one time. I know it's not that simple, and your levels of confidence probably depend on your method.
I have a problem with measuring clutch over long careers, though, due to selection bias. It's a given that production goes down in clutch situations, right? It's also an assumption that, in general, better pitchers are used in clutch situations. It's theorized that this is the cause of the dip in production. But a 950 OPS player might only go down to a 900 OPS player when facing Joe Nathan, while a 700 OPS player might go down to a 500 OPS player vs. Nathan, regardless of the clutchiness of the particular AB.
If you measure clutch over a long career, you're like measuring how he does against better pitchers, not how he does in clutch situation. And since the career is long, you'd expect that the guy is a good player. Of course, if you find that there are good hitters with long careers who suffer in the clutch, and that's statistically significant as well, then this might be moot.
The pitchers are different - they pitch differently with men on base - the fielders play differently and as one other poster said - even the outs are scored differently i.e. with sac flies but this also applies to ground outs. Getting a force out fielder's choice is much easier than having to throw to first base. These also count as at bats. Worse, these factors will vary not only from game to game which can be corrected with a large enough sample size but also probably from player to player. Albert Pujols is not treated the same way as Neifi Perez with men on base.
It would be very difficult to control for all these things especially if you are looking for what people seem to agree is a rather small effect which if it exists, may be smaller than variations in any of the confounding factors.
Tango's response quotes Andy (Not sure on the etiquette of quoting a quote, here, but...):
Is any of that due to the unintentional intentional walk? Pujols doesn't get an IBB, but the pitcher pitches around him?
That's how it should be done, but other than Andy's work and Grabiner's it's not what is actually being done.
What I have seen a number of people do - and which I think Bradbury's statement of the null hypothesis encourages - is to take a group of players who perform better in the clutch that their season line in season 1, look at how they do in season 2, and see how many of them are still better in season 2. Because the baseline performance in a clutch situation (regardless of which situation you pick) is almost always different, and usually worse, than the baseline performance in all situations, this will result in rejecting a performance that is worse than the player's overall average, but better than the expected baseline, as being "not clutch".
-- MWE
Neyer Burn: "I've sort of gotten out of the habit of checking Sabernomics, because J.C. so often writes about his local ballpark woes." ;-)
It was actually on his Twitter page, but I don't know if he was saying it or someone else.
What I found backs up your results as well as what David and Sean were saying. As a group they hit almost precisely what you'd expect but there was more variation than you'd expect if it was all random. (Only about a .87 correlation between the clutch stats and the overall stats. Not that this is a bad correlation, but it is at least an indication that "something" is going on. It is pretty clear that some hitters do change their approach with RISP and try to hit more singles)
The effect though is so weak that it's simply not worth sweating. David Grabiner estimated it's around +/- a hit a year and that's in the noise.
Mind you I was looking at two different data sets than what you studied, but I doubt it makes any difference.
So the stats you present aren't conclusive that players aren't using their skills as efficiently as possible in both clutch and non-clutch situations.
Well, since I agree with Andy's basic conclusion:
I can't say that this is unexpected.
I am not arguing for or against the existence of clutch skill, mind you; like Andy, I think that it's hard to find until you have a significant amount of data in the bank, so to speak. But I think that Bradbury's statement of the null hypothesis is misleading. It's not that a player performs "the same" clutch vs non-clutch; it's more that the change in the player's performance clutch vs non-clutch is predictable based on the way that the performance changes across the entire population of players clutch vs non-clutch. If the population sees an HR rate decline of 15%, then we would expect to see a decline of 15% for any individual player we select. If the population strikes out 7% more often, then we would expect to see an increase of 7% in any individual player's strikeout rate. Etc. etc. etc.
-- MWE
Nor were they intended to be. If I were doing a controlled study of the issue (which I am not) I would certainly use tighter controls on skill sets, both hitters and pitchers. I certainly would not include Albert Pujols and Juan Pierre in the same data set, because they are very different types of players, and I would not expect their performances to vary in the same way. I also wouldn't lump ace relievers in with run-of-the-mill starters when looking at hitter performances; I would expect different results when hitters were facing the Riveras and Hoffmans than when they were facing the generic 5th starters. Of course, the tighter these controls become, the smaller the sample sizes with which you have to work.
-- MWE
For example, a random at bat in a high school game in the fourth inning of a 7-1 game doesn't seem like a particularly "clutch situation." But if the player gets word that an MLB scout is there to evaluate him, the situation suddenly becomes extremely clutch.
My point is that, because batters' success rate is already small, and because there's a lot of noise in the data, it's possible for the skill to be materially large enough to matter AND small enough to be impractical to detect. It's one thing to say it's a waste of time to look for it for that reason, and another to say it's a waste of time because it doesn't exist. Bradbury is saying the latter. Worse, he's saying that even considering the possibility that it exists and discussing ways in which we could improve upon prior analyses is a waste of time.
I think Bradbury and I agree on one thing: A-Rod over Jeter. Jeter's clutchiness and/or A-Rod's chokiness would have to be pretty big for the former to be preferred over the latter in clutch situations. Nothing of that magnitude has been demonstrated yet. Thus we should go where the statistically significant data takes us, which is toward A-Rod. Where we disagree is on whether there's more work to do.
You must be Registered and Logged In to post comments.
<< Back to main