Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Saturday, June 14, 2014

Jordan Ellenberg: The Author of How Not to Be Wrong Explains How He Was Wrong

As Horatio Prim once (okay a lot more) said: “Odds bodkins!”

When you write a book called How Not to Be Wrong, you ought to expect to be fact-checked a little. And one of the virtues of the new, data-driven journalism currently in vogue is the habit of going back and checking one’s own old stuff. We’re not supposed to avert our gaze from the howlers in our old columns. We’re supposed to find the mistakes and learn from them.

Overall, my record’s not too bad. Mathematicians over 30 have continued to make major theoretical advances. My criticism of Jonah Lehrer’s scientific sloppiness is looking pretty good. And Stephen Wolfram never did become the world’s most prominent and revolutionary scientist.

But there were some mistakes, too. Here are the three biggest.

Barry Bonds isn’t going to break the home run record. Bonds had 39 home runs in the 88 games making up the first half of the 2001 season, putting him on pace for a record-breaking 72 homers for the year. But I knew the theory of regression to the mean, which reminds us that the league leader in home runs at midseason is likely to have been both good and lucky, and thus isn’t apt to maintain his league-leading pace. Historically, typical league-leaders only hit two-thirds as many home runs in the second half as they did in the first. If that trend held in 2001, Bonds would finish the season with 61 home runs.

In fact, he increased his pace, ending up with 73 home runs and the all-time season record. My reasoning wasn’t bad. It’s just that I’d neglected the possibility that there was another factor besides natural ability and luck that was working in Bonds’ favor.

Thanks to Bill Petti.

Repoz Posted: June 14, 2014 at 04:43 PM | 11 comment(s) Login to Bookmark
  Tags: giants, history, sabermetrics, steroids

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Walt Davis Posted: June 14, 2014 at 06:31 PM (#4726111)
It’s just that I’d neglected the possibility that there was another factor besides natural ability and luck that was working in Bonds’ favor.

No apparently it's that you don't understand probability. And apparently not regression to the mean (or maybe that's just poor writing).

Even if we assume Bonds was on steroids and they were effective, why would this make him immune from regression to the mean?

Nor is he particularly good at math. First 88 games if more than half the season -- presumably that was the AS break but if you're going to apply statistical principles, you shouldn't apply baseball definitions of first and second halves. Also 2/3 of 39 is 26 and Bonds ends with 65.

In 1998, McGwire had 37 HR after 81 team games, no additional ones before the AS break and hit 33 in the second, not really any difference. Bond's second half matched McGwire's -- perhaps unlikely but obviously not unprecedented. In 98, Sosa had 32 in the first 81 games, added zero after the break, had 34 in the second half.

Last year Davis had 28 in the first 81 and so 25 in the second. He did add 8 between game 81 and the AS break but pointing out one problem with using the AS break, the AS game was after game 96 last year -- 60% of games. So for a guy who finished with 53 HR, we'd have expected 32 before the AS break when he had 36 ... not a huge effect.

And if 60% of games happened before the AS break, we would expect the guy to hit only 2/3 as many in the second "half" if he maintained the same pace. (I assume 96 games before the AS break is unusually high)
   2. Blackadder Posted: June 14, 2014 at 06:54 PM (#4726155)
Nor is he particularly good at math.


I'm going to have to disagree with you there...
   3. Pasta-diving Jeter (jmac66) Posted: June 14, 2014 at 08:33 PM (#4726236)
Nor is he particularly good at math.
I'm going to have to disagree with you there...

True--but he should get out of his "The Diophantine equation A4 + 2dB2 = Cn," and watch a game sometime

his CV is quite impressive, actually
   4. Arbitol Dijaler Posted: June 14, 2014 at 09:04 PM (#4726260)

Nor is he particularly good at math. First 88 games if more than half the season -- presumably that was the AS break but if you're going to apply statistical principles, you shouldn't apply baseball definitions of first and second halves. Also 2/3 of 39 is 26 and Bonds ends with 65.


Are you pinching him at both ends here, snot? What'd Bonds have after 81, and can you add 2/3 to that and get 61?
   5. Kiko Sakata Posted: June 14, 2014 at 09:22 PM (#4726279)
What'd Bonds have after 81, and can you add 2/3 to that and get 61?


39 - he didn't hit any home runs between Team Game 81 and the All-Star break

As to how you get to 61, it's pretty straightforward. Through 88 games, Bonds had 39 HR's - that's 0.443 HR/game. If his pace for the rest of the season (74 games) was 2/3 of that, it'd be 0.295 HR/G times 74 games = 22 HRs (21.864 if you don't do any rounding at any point in the process). And 39 HRs + 22 HRs = 61 HRs.
   6. PASTE Thinks This Trout Kid Might Be OK (Zeth) Posted: June 14, 2014 at 09:42 PM (#4726306)
What's the hullaballoo about 61? The home run record was 70.
   7. Kiko Sakata Posted: June 14, 2014 at 09:47 PM (#4726313)
What's the hullaballoo about 61? The home run record was 70.


The issue was whether Bonds would break the record (70). His expected home runs, according to this guy, was 61, which would leave him 9 short of the record (70). Hence, this guy's prediction that Bonds wouldn't break the home run record. The fact that Bonds's expected HR's equaled Maris's old record is just a coincidence.
   8. BDC Posted: June 15, 2014 at 12:34 PM (#4726517)
Even if we assume Bonds was on steroids and they were effective, why would this make him immune from regression to the mean?

Exactly.
   9. escabeche Posted: June 15, 2014 at 01:30 PM (#4726550)
Even if we assume Bonds was on steroids and they were effective, why would this make him immune from regression to the mean?

It doesn't. But it does change what you think the mean is.

I write about this in a little more depth in the book, where I write about why there's no such thing as "The Curse of the Home Run Derby."
   10. Walt Davis Posted: June 16, 2014 at 05:20 AM (#4727097)
Then it was poor writing here ...

Historically, typical league-leaders only hit two-thirds as many home runs in the second half as they did in the first.

"two-thirds as many ..." 2/3 of 39 is 26. If it's 2/3 of the rate, then "2/3 as many" is not quite right and it's rather important to know whether "second half" is 81 games or 74 games. I'd also be curious how many actually hit them at 2/3 the rate vs. how often they continued to hit them at about the same rate (or a sufficient rate) but got hurt -- which would still lead you to predict he'd come up short.

It doesn't. But it does change what you think the mean is.

Possibly (we have no evidence steroids was the main culprit for the increase ... we don't really have any reliable evidence it had any effect) ... but if he hadn't noticed that the mean had changed by 2001 ...

And no matter what the mean is ... if the mean is higher then, yes, the 39 HR is not as far out in the distribution and so the regression towards the mean would be less severe ... but the statement was about the "typical HR leader" ... are we saying that Bonds' 39 was fewer standard deviations above the new mean than the "typical HR leader?" That seems unlikely.

Regardless, statistics is about probability. Therefore the statement would be that Bonds was unlikely (with a specific percentage if you have an estimator) to break the record. Even if he was "wrong" about the mean (bias), the main reason he was "wrong" was that the roll of the dice went against him. If you don't believe in the roll of the dice, you don't believe in statistics and you don't really believe in "regression to the mean."

My issue isn't with the application of statistical principles to conclude that it was highly unlikely Bonds would get the record, it is looking for the explanation -- and offering one without evidence -- that I object to. It's fine to look at big residuals to see if you can detect something missing from your model, but it's pretty impossible to do from a single outlier ... although as I note, both Mac and Sosa seemed to have violated this principle in 98 ... and of course you should test your new model.

Now, I find 18 seasons where a player had at least 32 HR in the "first half" -- defined the way b-r defines it by the AS break although I'm not sure how it's defined for Ruth from 1921 to 1930. I am going to skip Frank Howard because the AS game that year (1969) came after his team's 101st game. (The A's only had 93 games by that point so I'll keep Reggie in).

Of these 17, 6 happened between 1998 and 2001 and that does not include Sosa's 2001. Pujols and Davis have done it since 2001 so it was 6 of 15. Note that three guys had also done it in 94.

Of those 6, 2 ended with over 70, 2 ended with over 60. Griffey and Gonzalez faded. Of those 4 ... Bonds hit 39 first "half", 34 second; Mac 37/33; Sosa 33/33 and 32/31. Sosa added a 29/35 in 2001 and McGwire a 28/37 in 1999.

There have been 17 seasons with a second "half" of 30 or more HR, 7 of those between 1998 and 2001 -- three Sosa, 2 Mac, one Bonds, one Belle. ARod, Howard and Bautista have done it since 2001.

Obviously drawing conclusions from sample sizes of 6 or 13 or 18 or 35 is discouraged. But the basic model doesn't seem to have been working very well from 98-01. Unless you want to pretend that "HR first half leader" is a different population than "big first half HR totals" the basic model was wrong about 6 times over 4 seasons, 4 times in 98-99.

It's not clear the model was working in 94 either. Williams had 33 HR through 89 games then another 10 in 26 games, the same pace. Griffey and Thomas were fading at about the "expected" rate. We'd still expect Williams to miss 62 (he was only on pace for 60.5 as it was) but he was a good bet to not fade to 2/3 the rate he had in the first half (he needed 8 in 47 games to match that).

In 95, seasons started late ... Sosa had 15 HR in 69 games at the break, 21 in 75 games after the break. In 96 he had 27 in 87 games (I think he was the leader but don't know for sure ... a damn fine total regardless) then hit 13 in 37 (a higher pace) before breaking his hand on a HBP. In 1996, McGwire missed the first 18 games of the season but still had 28 HR in 69 team games at the break. He had 24 in 74 games after that which was a slower pace but at about 80% of his first half. Still, he got into only 130 games, 548 PA and 52 HR ... all he needed was a full season. In 1997, he had 31 HR in 89 team games and then 27 in 75 team games (he picked up 2 team games in the trade) which is a slightly higher pace. Griffey was bombing them out in 97 too with 30 in 87 games followed by 26 in 75, the same pace. Sosa led the league in 2000 with "just" 50 ... he hit 23 in 86 games followed by 27 in 76.

So we can add Mac 97, Griffey 97, Sosa 2000 and possibly Mac96, Sosa 96 and Sosa 95 to the list of "wrongs". That could be due to a radically shifted mean that meant a "typical" big HR hitter would hit one every three games or so but, regardless, there wasn't any reason not to notice the "typical" adjustment wasn't working by 2001.

It's possible that applying the "typical leader" rate to atypical HR hitters is a mistake.

(Injuries, etc. certainly count towards prediction ... i.e. p(injury) is sort of automatically adjusted for by focusing on season totals rather than HR/PA or HR/games played rates and such ... which is fine.)

Note, the split finder doesn't seem to let you save a set of players then check stats only for them ... or at least I'm nost sharp enough to do it. I'm too lazy to check all 17 but obviously most of the rest other than Ruth faded. I also don't know how to get p-i to give me AS break (or 81 game) HR leaders.

   11. escabeche Posted: June 16, 2014 at 11:23 AM (#4727221)
Therefore the statement would be that Bonds was unlikely (with a specific percentage if you have an estimator) to break the record.


Oh yeah, absolutely. What I wrote (if this isn't clear, I'm the author of the linked article) was meant to say that Bonds was unlikely to break the record, not that it was in any way IMPOSSIBLE for Bonds to break the record. Here's what I wrote later about this in Slate:

"Many people have written me about my assertion in July that "Barry Bonds isn't going to hit 72 home runs," and asked what went wrong with my analysis. Answer: Nothing. In July, it was extremely unlikely that Bonds would break the home run record. One great thing about baseball is that players sometimes accomplish the unlikely. (Ask Tony Womack.) If you bet a hundred bucks at the All-Star Break that Bonds would hit 73 home runs, you made a dumb bet. Now you've got a hundred bucks; it was still a dumb bet."

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
HowardMegdal
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogOT: Monthly NBA Thread - December 2014
(628 - 11:00pm, Dec 17)
Last: King Mekong

NewsblogAre Wil Myers' flaws fixable? | FOX Sports
(71 - 10:59pm, Dec 17)
Last: Infinite Joost (Voxter)

NewsblogThe 2015 HOF Ballot Collecting Gizmo!
(19 - 10:59pm, Dec 17)
Last: The District Attorney

NewsblogOT: Soccer December 2014
(294 - 10:55pm, Dec 17)
Last: Dale Sams

NewsblogMLBTR: Padres-Rays-Nationals Agree to Three-Team Trade
(25 - 10:49pm, Dec 17)
Last: Misirlou was a Buddhist prodigy

NewsblogAZCentral: Miley's Preparation Apparently an Issue for DBacks
(8 - 10:46pm, Dec 17)
Last: Crispix reaches boiling point with lackluster play

NewsblogOT: Politics - December 2014: Baseball & Politics Collide in New Thriller
(4530 - 10:39pm, Dec 17)
Last: Manny Coon

NewsblogOT: NFL/NHL thread
(9148 - 10:39pm, Dec 17)
Last: Pops Freshenmeyer

NewsblogThe Dan Shaughnessy Hall Of Fame Ballot
(56 - 10:07pm, Dec 17)
Last: PreservedFish

NewsblogOrioles agree to one-year deal with LHP Wesley Wright, pending physical, source says
(11 - 8:58pm, Dec 17)
Last: escabeche

NewsblogIndians sign Gavin Floyd to deal
(9 - 8:37pm, Dec 17)
Last: Dock Ellis on Acid

NewsblogMorosi - Effects of US Shift on Cuba Policy
(7 - 8:24pm, Dec 17)
Last: Los Angeles El Hombre de Anaheim

Newsblog"You Are F----d": Breaking down the first-year candidates on the Hall of Fame ballot (Sports Illustrated)
(15 - 7:57pm, Dec 17)
Last: The District Attorney

NewsblogNew York Mets Top 20 Prospects for 2015
(18 - 7:28pm, Dec 17)
Last: Russlan is fond of Dillon Gee

NewsblogPitching-deficient Yankees re-sign Chris Capuano
(22 - 7:09pm, Dec 17)
Last: No Maas Cashman

Page rendered in 0.2664 seconds
48 querie(s) executed