Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Friday, October 09, 2009

Sabernomics: Bradbury: Overestimating the Fog

What you can’t see won’t hurt you… it’ll clutch you!

Let’s begin with the null hypothesis that player performance in clutch situations is identical to performance in non-clutch situations. A type I error occurs when we reject a correct null hypothesis. Studies of clutch hitting find that performance differences in these situations are small and often not statistically meaningful. The null stands and clutch-hitting skill is seen as a myth. A type II error occurs from not rejecting an incorrect null hypothesis. When James advocates agnosticism towards clutch-hitting as a skill, it is because that despite the studies showing little evidence of clutch-hitting he wants to avoid committing type II error. The problem is, this choice between type I and II errors isn’t free. By raising the decision criterion to avoid type II error, you necessarily increase the chance of committing type I error.

Identifying clutch hitting is practical problem that requires a decision involving real costs. Should a team factor in clutch ability when choosing between free agents. Should it matter for the manager choosing among pinch hitters? Should a historically big-game pitcher start the playoff series over your regular season ace? Based on the available evidence, if I had to decide between Jeter or A-Rod it’s not even close: Alex Rodriguez is a far superior player to Derek Jeter, and that’s what is relevant. And in cases were the players’ performances are more similar, I wouldn’t consider clutch performance for even a moment. If clutch ability exists, it would show up in bunches using the empirical methods already employed by researchers seeking to study the question.

In my view, the fog is a distraction: something to bring up to keep the argument going. But arguing takes time, which is valuable. Let’s stop it with the fog, already. Of course it’s possible that something exists that just hasn’t been discovered yet (e.g.the Loch Ness Monster, Sasquatch, ergogenic effects of HGH); but the evidence we have says these things don’t exist, and hanging hopes on the possible isn’t a very persuasive argument.

Repoz Posted: October 09, 2009 at 12:29 PM | 24 comment(s) Login to Bookmark
  Tags: sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Rally Posted: October 09, 2009 at 02:08 PM (#3346183)
Researchers have found evidence of clutch hitting ability. And if I recall correctly, statistically significant. It just happens to be small, and by the time you have enough evidence in terms of career PA to detect it, the player is probably nearing the end of his career.

So I wouldn't put much weight on it for signing free agents or choosing a pinch hitter, and if you choose to ignore it you won't hurt yourself in baseball decision making. But as an academic exercise it's wrong to say it doesn't exist.
   2. BDC Posted: October 09, 2009 at 02:17 PM (#3346193)
This is hilarious, the ad I'm seeing at the bottom of the page is Get Your 24K Gold Derek Jeter Half Dollar. AROD can't even get a robo-ad for a blog post that says he's far superior.
   3. GGC for Sale Posted: October 09, 2009 at 02:23 PM (#3346202)
So I wouldn't put much weight on it for signing free agents or choosing a pinch hitter, and if you choose to ignore it you won't hurt yourself in baseball decision making. But as an academic exercise it's wrong to say it doesn't exist.


I think that's the whole point. If you're looking to advise a team, you probably shouldn't concern yourself with it. But if your just studying baseball for the sake of it, well, that's a different story.
   4. Mike Emeigh Posted: October 09, 2009 at 02:51 PM (#3346266)
Let’s begin with the null hypothesis that player performance in clutch situations is identical to performance in non-clutch situations.


For the purposes of discussion, let's do this:

"Clutch situation": any plate appearance with LI of 2.0 or higher
Method: look at performance of all hitters other than pitchers in plate appearances where LI is >=2.0, and compare it to the performance of all hitters in plate appearances where LI is <2.0.

When I look at this from 2004-2008, what I see is this:

In high-leverage situations, hitters hit .272/.351/.424, with .300 in-play BA and .380 in-play SLG. When you exclude intentional walks, the OBP drops to .333. Excluding IBB, hitters in high-LI situations hit 1 HR every 40.2 PA, drew an unintentional walk once every 11.9 PA, and struck out once every 5.7 PA.

In all other situations, hitters hit .270/.337/.432, with .300 in-play BA and .385 in-play SLG. When you exclude IBB, the OBP drops to .334. Excluding IBB, hitters managed one HR every 34.8 PA, an unintentional walk once every 12.6 PA, and struck out once every 6.1 PA.

FWIW: The main reason that the BA is higher in high-leverage situations is that sacrifice flies occur more often in those situations. If I treat sacrifice flies as at-bats, the BA in high-leverage situations would be .260 vs .269 in all other situations. In high-LI situations, excluding IBB, players get a hit once every 4.4 PA; in all other situations, once every 4.1 PA.

What seems to be reasonably clear to me from all of this is that player performance in clutch situations is NOT identical to player performance in non-clutch situations (by the definition I used for "clutch", anyway - making no claim that I have the best definition). Players get fewer hits - primarily because they hit fewer HR, since the in-play BA is essentially identical - hit for less power (both on HR and on BIP), walk more (both intentionally and unintentionally), and strike out more. A player who maintains his performance level across high-LI situations, especially his power production, is therefore very likely doing WELL in those situations, and we ought to approach our testing using group expectation for those situations rather than the "no change" assertion. Note that we might still very well find no significant variation among players from the group norm - but by using the group performance norm in those situations as the starting point we have a better chance of finding them, IMO.

-- MWE
   5. Chris Dial Posted: October 09, 2009 at 03:00 PM (#3346279)
we ought to approach our testing using group expectation for those situations rather than the "no change" assertion.
You should have mentioned this before...
   6. villageidiom Posted: October 09, 2009 at 03:04 PM (#3346288)
Good statisticians go where statistically significant data takes them. The impression I have from reading James' Underestimating and Bradbury's Overestimating is that James, while claiming not to be a statistician, is a better statistician than Bradbury.

Think of the null hypothesis of a coin being fair. You flip it once and it lands on heads. Is the coin fair? We have insufficient information to reject the null hypothesis. But had the null hypothesis instead been that the coin is unfair, we still wouldn't have had enough information to reject the null hypothesis. In essence, any conclusion you make depends on which null hypothesis you chose and how badly you want to reach a conclusion about it.

Now let's expand the sample size: do 100 coin flips. Or 1,100. Or 12,000. No matter what you choose, there's a margin of error around the result. The fog is still there, but it's smaller as the sample grows. If what you're looking for is small enough that it's within that margin of error, you can't say it doesn't exist. You can say that, if it exists, it's likely small.

Note that in the above we're testing if a coin is "fair". What is "unfair"? How different must the outcome be for us to call it unfair? If someone had rigged a coin to land on heads 50.3% of the time, or 50.0000001% of the time, technically it's unfair. If you have a precise definition of "fair" - exactly 50% - you can always come up with a definition of unfair that can exist within the margin of error, regardless of sample size.

Does this matter? It depends. If all you want to do is to decide whether a coin is suitable for one coin flip, a coin that lands on heads 50.0000001% of the time is good enough. If you're looking for a coin that you can use for 1 trillion flips, that coin might not be good enough for you.

In baseball, the success rate in a single plate appearance is relatively low. As a result of that, even a slight improvement is important. But the margin of error is significant; we'll only get 12,000 observations in a great career, of which maybe 600 will be clutch situations. To identify it as a repeatable skill you need to split the data further, to confirm that it's consistent in different samples. But baseball statistics are rarely that consistent from one 600 PA sample to another 600 PA sample, never mind fractions thereof. And that's for a player whose career is already over, not one who's maybe one-third of the way to that point.

In short, with baseball clutch performance can easily be real, meaningful, and statistically insignificant. I think Bradbury is defining the unfair coin to be something that lands on heads 95% of the time, and when it doesn't show up that way he's certain there's no such thing as an unfair coin. At a minimum there's no such thing as an unfair coin at that level of magnitude; but that's something entirely different than the question posed.
   7. Eric J can SABER all he wants to Posted: October 09, 2009 at 03:10 PM (#3346296)
What seems to be reasonably clear to me from all of this is that player performance in clutch situations is NOT identical to player performance in non-clutch situations (by the definition I used for "clutch", anyway - making no claim that I have the best definition).

Isn't this probably explained by the fact that teams tend to use better pitchers in high leverage situations?
   8. Gaelan Posted: October 09, 2009 at 03:14 PM (#3346302)
The impression I have from reading James' Underestimating and Bradbury's Overestimating is that James, while claiming not to be a statistician, is a better statistician than Bradbury.


Isn't Bradbury an economist. In which case our null hypothesis should be that he doesn't know what he's talking about.
   9. David Nieporent (now, with children) Posted: October 09, 2009 at 03:27 PM (#3346319)
Villageidiom, I think you're misreading what he says. At no point does he suggest that an unfair coin is a 95% heads coin. What he says is, basically, if it's 50.000001, who gives a crap? Spending time looking for it is a waste of time.

In short, it's not that clutch hitting may be real, meaningful, but statistically insignificant, but the reverse: statistically significant but meaningless.
   10. Jeff R. Posted: October 09, 2009 at 03:51 PM (#3346346)
Isn't this probably explained by the fact that teams tend to use better pitchers in high leverage situations?


But is this counter-balanced by the fact that bad pitchers give up more baserunners and create more high leverage situations?
   11. Tango Posted: October 09, 2009 at 04:05 PM (#3346361)
Mike is saying to compare the performance in non-clutch for an individual to the group for non-clutch, and compare the clutch for an individual to the group for clutch.

This is a given. If a research piece doesn't do this, I'd like to see it.

This, by the way, also applies to regular season v playoffs. Yes, Jeter and O'Neill etc have the same stats, but that means they perform better in the playoffs, since they are facing a lower run environment.

My response is here.
   12. Greg Pope Posted: October 09, 2009 at 04:36 PM (#3346397)
we'll only get 12,000 observations in a great career, of which maybe 600 will be clutch situations. To identify it as a repeatable skill you need to split the data further, to confirm that it's consistent in different samples.

But don't you get meaningful data by adding more careers to your study? I'm sure it's different analysis, but you can determine if a single coin is unfair by flipping it enough times. If you want to find out if pennies are inherently unfair you can flip 100 pennies 1,000 times, or you can flip 10,000 pennies one time. I know it's not that simple, and your levels of confidence probably depend on your method.

I have a problem with measuring clutch over long careers, though, due to selection bias. It's a given that production goes down in clutch situations, right? It's also an assumption that, in general, better pitchers are used in clutch situations. It's theorized that this is the cause of the dip in production. But a 950 OPS player might only go down to a 900 OPS player when facing Joe Nathan, while a 700 OPS player might go down to a 500 OPS player vs. Nathan, regardless of the clutchiness of the particular AB.

If you measure clutch over a long career, you're like measuring how he does against better pitchers, not how he does in clutch situation. And since the career is long, you'd expect that the guy is a good player. Of course, if you find that there are good hitters with long careers who suffer in the clutch, and that's statistically significant as well, then this might be moot.
   13. Scientist guy Posted: October 09, 2009 at 04:37 PM (#3346398)
Something that seems to have been overlooked is that high leverage situations are very different from low leverage situations in terms factors independent of the batter's skill.

The pitchers are different - they pitch differently with men on base - the fielders play differently and as one other poster said - even the outs are scored differently i.e. with sac flies but this also applies to ground outs. Getting a force out fielder's choice is much easier than having to throw to first base. These also count as at bats. Worse, these factors will vary not only from game to game which can be corrected with a large enough sample size but also probably from player to player. Albert Pujols is not treated the same way as Neifi Perez with men on base.

It would be very difficult to control for all these things especially if you are looking for what people seem to agree is a rather small effect which if it exists, may be smaller than variations in any of the confounding factors.
   14. Greg Pope Posted: October 09, 2009 at 04:43 PM (#3346404)
Albert Pujols is not treated the same way as Neifi Perez with men on base.

Tango's response quotes Andy (Not sure on the etiquette of quoting a quote, here, but...):

About one in six players increases his inherent “OBP skill” by eight points


Is any of that due to the unintentional intentional walk? Pujols doesn't get an IBB, but the pitcher pitches around him?
   15. Tango Posted: October 09, 2009 at 06:17 PM (#3346523)
I thought some might like this paragraph (p. 97-98), especially Mike E:

Well, that’s something, isn’t it? The ten best clutch seasons from 2000 through 2003 were followed by seasons in which the players hit 33 points worse in clutch wOBA than in non-clutch wOBA! To be fair, we need to account for the fact that clutch situations generally bring the best pitchers to the mound; indeed the whole sample of major leaguers had wOBA that were 8% worse in clutch situations than in non-clutch situations over this period (meaning that we expect our .385 non-clutch hitters to hit .354 in clutch). So our all-star clutch performers were only one point of wOBA better than we would have expected—in other words, they performed almost exactly how we would have expected based on their overall wOBA alone.
   16. Mike Emeigh Posted: October 09, 2009 at 06:33 PM (#3346544)
This is a given. If a research piece doesn't do this, I'd like to see it.


That's how it should be done, but other than Andy's work and Grabiner's it's not what is actually being done.

What I have seen a number of people do - and which I think Bradbury's statement of the null hypothesis encourages - is to take a group of players who perform better in the clutch that their season line in season 1, look at how they do in season 2, and see how many of them are still better in season 2. Because the baseline performance in a clutch situation (regardless of which situation you pick) is almost always different, and usually worse, than the baseline performance in all situations, this will result in rejecting a performance that is worse than the player's overall average, but better than the expected baseline, as being "not clutch".

-- MWE
   17. GGC for Sale Posted: October 09, 2009 at 06:39 PM (#3346550)
I found this the most interesting thing on that page:

Neyer Burn: "I've sort of gotten out of the habit of checking Sabernomics, because J.C. so often writes about his local ballpark woes." ;-)

It was actually on his Twitter page, but I don't know if he was saying it or someone else.
   18. Ron Johnson Posted: October 09, 2009 at 06:41 PM (#3346552)
Tango, when I looked at this I used both group average and correlation between clutch situations and overall.

What I found backs up your results as well as what David and Sean were saying. As a group they hit almost precisely what you'd expect but there was more variation than you'd expect if it was all random. (Only about a .87 correlation between the clutch stats and the overall stats. Not that this is a bad correlation, but it is at least an indication that "something" is going on. It is pretty clear that some hitters do change their approach with RISP and try to hit more singles)

The effect though is so weak that it's simply not worth sweating. David Grabiner estimated it's around +/- a hit a year and that's in the noise.

Mind you I was looking at two different data sets than what you studied, but I doubt it makes any difference.
   19. GGC for Sale Posted: October 09, 2009 at 06:43 PM (#3346556)
It says FireStatus underneath that, whatever that means.
   20. valuearbitrageur Posted: October 09, 2009 at 06:43 PM (#3346557)
MWE, you realize that your research here depends upon the definition of "performance". Your comparison defines performance merely as results. But I think the question of Clutch should define it as efficiency of application of ability (i.e. how well the player used his skills in the situation) clearly players using their skills at the same performance level in different situations will likely have different results. They'll walk more because pitchers won't give them as many hittable pitches. They'll have a lower BA and home runs for the same reason. They'll try to hit more sacrifices, which means sometimes they'll end up creating more outs. They'll be facing tougher pitchers.

So the stats you present aren't conclusive that players aren't using their skills as efficiently as possible in both clutch and non-clutch situations.
   21. Mike Emeigh Posted: October 09, 2009 at 06:53 PM (#3346569)
The ten best clutch seasons from 2000 through 2003 were followed by seasons in which the players hit 33 points worse in clutch wOBA than in non-clutch wOBA! To be fair, we need to account for the fact that clutch situations generally bring the best pitchers to the mound; indeed the whole sample of major leaguers had wOBA that were 8% worse in clutch situations than in non-clutch situations over this period (meaning that we expect our .385 non-clutch hitters to hit .354 in clutch). So our all-star clutch performers were only one point of wOBA better than we would have expected—in other words, they performed almost exactly how we would have expected based on their overall wOBA alone.


Well, since I agree with Andy's basic conclusion:

the fact that one of three players performs at least .006 [wOBA] better or worse in the clutch doesn’t mean that we can tell which players have this skill, even when looking at several seasons’ worth of data.


I can't say that this is unexpected.

I am not arguing for or against the existence of clutch skill, mind you; like Andy, I think that it's hard to find until you have a significant amount of data in the bank, so to speak. But I think that Bradbury's statement of the null hypothesis is misleading. It's not that a player performs "the same" clutch vs non-clutch; it's more that the change in the player's performance clutch vs non-clutch is predictable based on the way that the performance changes across the entire population of players clutch vs non-clutch. If the population sees an HR rate decline of 15%, then we would expect to see a decline of 15% for any individual player we select. If the population strikes out 7% more often, then we would expect to see an increase of 7% in any individual player's strikeout rate. Etc. etc. etc.

-- MWE
   22. Mike Emeigh Posted: October 09, 2009 at 07:14 PM (#3346604)
So the stats you present aren't conclusive that players aren't using their skills as efficiently as possible in both clutch and non-clutch situations.


Nor were they intended to be. If I were doing a controlled study of the issue (which I am not) I would certainly use tighter controls on skill sets, both hitters and pitchers. I certainly would not include Albert Pujols and Juan Pierre in the same data set, because they are very different types of players, and I would not expect their performances to vary in the same way. I also wouldn't lump ace relievers in with run-of-the-mill starters when looking at hitter performances; I would expect different results when hitters were facing the Riveras and Hoffmans than when they were facing the generic 5th starters. Of course, the tighter these controls become, the smaller the sample sizes with which you have to work.

-- MWE
   23. Voros McCracken of Pinkus Posted: October 10, 2009 at 12:53 AM (#3346992)
One of the biggest issues involved with "clutch" hitting is that it would be a psychological issue and defining what a "clutch situation" is by looking at the effects the situation has on a team's fortunes misses the point. What you really want to know are the situations with the biggest effect on a player's psyche, and while there's obviously going to be some overlap between the two, I don't think the sets of situations are going to resemble each other all that closely.

For example, a random at bat in a high school game in the fourth inning of a 7-1 game doesn't seem like a particularly "clutch situation." But if the player gets word that an MLB scout is there to evaluate him, the situation suddenly becomes extremely clutch.
   24. villageidiom Posted: October 10, 2009 at 04:22 AM (#3347493)
Villageidiom, I think you're misreading what he says. At no point does he suggest that an unfair coin is a 95% heads coin. What he says is, basically, if it's 50.000001, who gives a crap? Spending time looking for it is a waste of time.
I'm inferring the 95% thing. He's not assessing the amount of space obscured by fog; rather, he's looking at the amount of space that isn't, and deciding that's enough. For that to be enough, whatever he's testing for has to be large enough that it can't be contained within the fog. And if he's indifferent to the amount of fog there is - and there's a lot - what he's looking for must be pretty freakin' huge.

My point is that, because batters' success rate is already small, and because there's a lot of noise in the data, it's possible for the skill to be materially large enough to matter AND small enough to be impractical to detect. It's one thing to say it's a waste of time to look for it for that reason, and another to say it's a waste of time because it doesn't exist. Bradbury is saying the latter. Worse, he's saying that even considering the possibility that it exists and discussing ways in which we could improve upon prior analyses is a waste of time.

I think Bradbury and I agree on one thing: A-Rod over Jeter. Jeter's clutchiness and/or A-Rod's chokiness would have to be pretty big for the former to be preferred over the latter in clutch situations. Nothing of that magnitude has been demonstrated yet. Thus we should go where the statistically significant data takes us, which is toward A-Rod. Where we disagree is on whether there's more work to do.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

News

All News | Prime News

Old-School Newsstand


BBTF Partner

Support BBTF

donate

Thanks to
dirk
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogALCS Game 6 OMNICHATTER, for October 20, 2017
(146 - 3:24am, Oct 21)
Last: LA Podcasting Hombre of Anaheim

NewsblogAngell: Bringing the Yankees Home?
(3 - 2:52am, Oct 21)
Last: Gonfalon Bubble

NewsblogDusty Baker Will Not Be Back as Manager
(54 - 2:50am, Oct 21)
Last: Bote Man

NewsblogOTP 16 October 2017: Sorry, Yankee fans: Trump’s claim that he can ensure victory simply isn’t true
(1723 - 2:42am, Oct 21)
Last: Gonfalon Bubble

NewsblogOT - NBA 2017-2018 Tip-off Thread
(427 - 1:53am, Oct 21)
Last: cmd600

NewsblogHeyman | Tigers To Hire Ron Gardenhire
(26 - 1:48am, Oct 21)
Last: cmd600

Gonfalon CubsFive minute Los Angeles Dodgers Preview
(89 - 9:07pm, Oct 20)
Last: Pops Freshenmeyer

NewsblogOT: Wrestling Thread November 2014
(2086 - 9:01pm, Oct 20)
Last: Gonfalon Bubble

NewsblogBaseball News, Scores, Analysis, Schedules
(4 - 9:00pm, Oct 20)
Last: Dr. Vaux

NewsblogTheo Epstein: Joe Maddon has taken enough heat, don’t blame NLCS on Cubs manager | NBC Sports Chicago
(17 - 8:58pm, Oct 20)
Last: Andere Richtingen

NewsblogDodgers crush Cubs in Game 5 to advance to the World Series for first time since 1988 | LA Times
(51 - 8:26pm, Oct 20)
Last: TomH

NewsblogSeverino, Verlander ready for G6 in Houston | MLB.com
(4 - 8:09pm, Oct 20)
Last: caspian88

NewsblogOT: New Season August 2017 Soccer Thread
(1187 - 7:57pm, Oct 20)
Last: Fourth True Outcome

NewsblogOT - 2017 NFL thread
(149 - 6:11pm, Oct 20)
Last: Ray (RDP)

NewsblogPrimer Dugout (and link of the day) 10-20-2017
(20 - 3:00pm, Oct 20)
Last: There are no words... (Met Fan Charlie)

Page rendered in 0.2998 seconds
47 querie(s) executed