Baseball Primer Newsblog — The Best News Links from the Baseball Newsstand ## Thursday, August 27, 2009## Freakonomics Blog: Statistical Slumps
Dr. I likes his panda steak medium rare
Posted: August 27, 2009 at 01:15 AM | 39 comment(s)
Tags: sabermetrics |
## Reader Comments and Retorts

1. joker24 Posted: August 27, 2009 at 04:06 AM (#3306404)

The answer is 42.I didn't realize that the question would be included.

The answer is 42.I didn't realize that the question would be included.

Really, I think this is the only reason this was written by the author of this piece. I loved that book as a youth.

But snark beat meta-nerd reference by six minutes.

I loved that book as a youth.Why only as a youth?

I don't have enough probability theory training to answer this question analytically.

While it is not the analytical approach, I would expect Monte Carlo methods would take care of this problem pretty quickly. It would be a simple matter of generating a lot of "careers" of homerun/no homrun trials for ARod, and then looking at statistics around homerun droughts. Of course, this wouldn't really tell you much about how you would actually expect these streaks to unfold (a real baseball player probably doesn't have a constant homerun rate for his career), but it would certainly answer this particular question.

While TFA seams off in its calculation, I found the author's notion of statistical significance as a tool for sportswriters kind of funny.

Good point. I do not have a way with words. But I guess I never read the Hitchiker series as an adult. I'm sure I'd still love it though.

Why do you say that? I've read the Hitchhiker's trilogy four or five times and still love it.

The upcoming sixth book worries me, however.

I don't understand how this applies.

I didn't RTFA, but if the author is saying a 42 AB streak is statistically significant, all he's saying is that it isn't necessarily random at this point, and that perhaps something is 'up' here, like maybe he's hurt, maybe his girlfriend dumped him, his uncle died, etc..

Everything isn't random. I think he's saying after 42 AB you should be concerned that he isn't 'normal' as there's a less than 1 in 20 chance this would happen randomly. I think that's entirely reasonable. Players aren't the same random APBA or Strat-O-Matic card every day of the season.

My point is that 42 at bats is about 1/15 of a season. A guy like ARod should have a 42 AB homerless streak every season or two, if everything's random.

Look at it another way: Say I'm playing blackjack. As the player I win just under half the time, but let's call it half the time, to make the math easier. I have a 1/16 chance of winning four hands in a row. But if I play 100 hands, I'd expect that somewhere in there, I'd win four in a row. It wouldn't mean that the odds had suddenly gone in my favor or anything. It's just that unlikely events happen more often when you do the experiment more times.

I understand that . . . but what I'm arguing is that maybe it isn't random, and that there is a reason for it. And if you don't start having some concern for it at that point (if you are the player or the hitting coach) perhaps it will continue past 42 AB.

I think it's reasonable to say that if you're down to the point where it's 19:1 that it would be random, there might be something going on there.

I would say that you could argue that it isn't random, that every year or two a slugger has something so wrong that he ends up with a long streak.

Since we are comparing it to standards that include these 'random' events (they are incorporated into the numbers we use to determine how often these things should occur) they may appear to be random, when quite possibly, they aren't.

Well, of course, none of this is really random. Either the guy hits the ball fairly squarely, near the sweet spot of the bat, or he doesn't. His margin for error depends on the pitch speed, the ballpark, the weather, and his bat speed.

My point is that if you calculate what the probability is that you have a random event, on the basis of the guy's rate of home runs per at bat, and you expect that number to mean something, then you should take into account all the batters in the league and all the stretches of at bats for each batter.

And even if you take out a 42 at bat homerless stretch of a typical ARod season, you only increase his home rate from about 7% of at bats to about 7.5%. It doesn't make much of a difference to the calculation. It would drop the threshold to about 38 at bats for a 5% chance of being random, if viewed in isolation.

What I would say is that looking at the home run rate, rather than watching the consistency of his swing, and seeing how frequently he made fairly solid contact, would be the wrong approach, because home runs are so rare. Let's remember, 42 at bats is only about a week and a half for an everyday player.

95% of these events < 42.

That doesn't imply causation or anything, but it also means you can't use 42 AB = 1/15 season to determine how often the event occurs.

EDIT: Or to put it another way, to measure it the way you want to measure it: from 2007 to present, ARod has 111 HRs in 1426 ABs, which is almost exactly 34 "42 AB" chunks. That's basically 3 HRs ever 42 ABs. Calculate the probability that over any given "chunk" A-Rod will hit zero home runs (poisson, e^-3 * (3^0) / (0!) = .0497870684.)

Yes, that's not a bad way to do it. Assuming a Poisson distribution, the chance of getting zero when you expect 3 is also about 5%. So my original points hold.

And the point on when a coach should become concerned is that the coach shouldn't be looking at statistics of relatively rare events to make these determinations.

But the chance of any given 42 at bat chunk having no home runs is trivial to compute, if you assume that the home runs are Poisson distributed with a single rate over all the at bats.

Yeah, but once the numbers are large enough, it's nearly the same. There are far more troubling issues than that one.

Basically, though, the point is that it is not out of the ordinary at all to have a stretch of 70 or so at bats without a homer, even if you average 40 homers a year.

Bingo. That's the problem. ARod is not always 7% or 1:14 to hit a HR. Even if you leave out things he has no control over like pitcher and park, etc..

When everything is clicking, he's 100% healthy and his head is on straight, it might be 11%. If his hip is bothering him, or his hitting mechanics are slightly off, or his ex-wife is being a pain in the ass, or Jeter doesn't help him out in a tough spot with the media, or he made an error last inning and he's still thinking about it, he might be 3%. It's a moving target that averages out at 7%.

So just saying that he hasn't HR'd in 42 AB and you'd expect that doesn't cut it, it's the easy way out.

It could be random, sure. Or it could be that he's going through some of the issues that have moved his true level down from 7% temporarily. I would tend to put my money on the latter.

In fact, even without outside distracting factors, the "moving target" average concept would seem to be a feature of all head-to-head competitive sports. AROD is sitting on fastballs, hitting 11% home runs, and word gets around and guys start throwing him junk on the outside corner. AROD is thrown off, starts hitting 3% home runs, then adjusts and starts hammering the junk, so he's back to 11% again and starts to see high fastballs. (That's a gross oversimplification of real-life pitching patterns, of course, but it approximates the adjustments and counter-adjustments of competition.)

I suspect that there's really "something wrong" with a player even when a slump is fairly short; it's not just "random" unless you can easily see that he's hitting the ball square but the defense has truly robbed him, that kind of thing. But, all else (like injuries and ex-wives) equal, the better players adjust more quickly, and their brief slumps combine with their successes to produce a more consistent pattern of success.

I did the simulation with a homer in 11% of his at bats. Even then, in 10000 at bats, there were 7 stretches where with more than 42 at bats between homers. Since 10000 at bats is about 17 seasons worth, you'd expect to see one of those stretches per player every 2 or 3 years.

So, for a 42 at bat stretch without a homer to mean anything, the level a guy has to be at when everything is going right has to be extremely high.

I think a good coach/scout could probably see something wrong in the player's approach over 42 at bats. I think the statistics have a hard time showing anything in a 42 at bat stretch.

I'm not disagreeing with you about any of the details about all these calculations being too simple. However, it's pretty clear to me that if you want to read a lot into a 42 at bat stretch, you need to have a very high home run rate.

Another way to look at it is the chance of hitting three homers in a game. For an 11% rate, you'd expect a guy to do this every 50 or so games in which he has 5 at bats. You'd expect a 2 homer game every two 5-at bat games with an 11% rate. If a player were really have stretches of higher rates, and stretches of lower rates, you'd expect a lot more 2 and 3 home runs games than you actually get from guys who hit 40 homers a year. Now if the fluctuations are at-bat to at bat, then they don't have very much impact on long streaks. They'll have some, to be sure, but they'll only matter a lot if you start having very large deviations from the average.

EDIT:

Only every 6 games for a 2 homer game, not every 2.

Maybe, but the question is whether you should be using the statistics to judge when a player is struggling, or the other information you have available. Also, should you use the statistics of rare events like homers, or more common things like base hits as a whole?

As fans, most of us can't just watch every game, so we naturally tend more towards the statistics side. If I were a coach and able to watch every at bat, I'd use both. Or maybe I'd say, you know what, ARod hasn't homered in 2 weeks, maybe I should focus a little more on him (time is a limited resource and you have 12-13 other guys to worry about as well), and see if there's something wrong. Etc.

