Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Sunday, November 04, 2012

Colby Cosh: Tarnished Silver: assessing the new king of stats

Nate Silver, Cosh, Tango, PECOTA, Marcel the Monkey!...It’s almost like the golden days of Primer! Backlasher, RossCW come on down!

The whole world is suddenly talking about election pundit Nate Silver, and as a longtime heckler of Silver I find myself at a bit of a loss. These days, Silver is saying all the right things about statistical methodology and epistemological humility; he has written what looks like a very solid popular book about statistical forecasting; he has copped to being somewhat uncomfortable with his status as an all-seeing political guru, which tends to defuse efforts to make a nickname like “Mr. Overrated” stick; and he has, by challenging a blowhard to a cash bet, also damaged one of my major criticisms of his probabilistic presidential-election forecasts. That last move even earned Silver some prissy, ill-founded criticism from the public editor of the New York Times, which could hardly be better calculated to make me appreciate the man more.

...For most players in most years, Silver’s PECOTA worked pretty well. But the world of baseball research, like the world of political psephology, does have its cranky internet termites. They pointed out that PECOTA seemed to blunder when presented with unique players who lack historical comparators, particularly singles-hitting Japanese weirdo Ichiro Suzuki. More importantly, PECOTA produced reasonable predictions, but they were only marginally better than those generated by extremely simple models anyone could build. The baseball analyst known as “Tom Tango” (a mystery man I once profiled for Maclean’s, if you can call it a profile) created a baseline for projection systems that he named the “Marcels” after the monkey on the TV show Friends—the idea being that you must beat the Marcels, year-in and year-out, to prove you actually know more than a monkey. PECOTA didn’t offer much of an upgrade on the Marcels—sometimes none at all.

PECOTA came under added scrutiny in 2009, when it offered an outrageously high forecast—one that was derided immediately, even as people waited in fear and curiosity to see if it would pan out—for Baltimore Orioles rookie catcher Matt Wieters. Wieters did have a decent first year, but he has not, as PECOTA implied he would, rolled over the American League like the Kwantung Army sweeping Manchuria. By the time of the Wieters Affair, Silver had departed Baseball Prospectus for psephological godhood, ultimately leaving his proprietary model behind in the hands of a friendly skeptic, Colin Wyers, who was hired by BPro. In a series of 2010 posts by Wyers and others called “Reintroducing PECOTA”—though it could reasonably have been entitled “Why We Have To Bulldoze This Pigsty And Rebuild It From Scratch”—one can read between the lines.

Repoz Posted: November 04, 2012 at 07:14 AM | 49 comment(s) Login to Bookmark
  Tags: history, primer, sabermetrics, special topics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. villageidiom Posted: November 04, 2012 at 08:59 AM (#4292491)
Nate Silver, Cosh, Tango, PECOTA, Marcel the Monkey!...It’s almost like the golden days of Primer! Backlasher, RossCW come on down!
It's a trap!
   2. Avoid running at all times.-S. Paige Posted: November 04, 2012 at 09:02 AM (#4292492)
Didn't PECOTA usually do better than almost all other projection systems on a yearly basis? That's my memory of these things but I have a shitty memory.
   3. Johnny Temporary Posted: November 04, 2012 at 09:31 AM (#4292500)
I think it was pretty clearly pointed out to BPro at the time that their minor league translations for Weiters were based on bad league/park factors... but BPro just basically ignored that criticism
   4. Harveys Wallbangers Posted: November 04, 2012 at 09:42 AM (#4292504)
silver is trying to make a buck. why should he give away the secret sauce that is making him said buck? because folks say so?

   5. Swedish Chef Posted: November 04, 2012 at 10:08 AM (#4292509)
silver is trying to make a buck. why should he give away the secret sauce that is making him said buck?

That is a very good reason for shunning PECOTA in favor of systems made by people trying to advance the state of knowledge instead. Black boxes are inherently less useful than models where you can see all the gears, maybe even tweak if you feel like tinkering. Of course people should be told that there are things just as good out there that are transparent and free.
   6. zenbitz Posted: November 04, 2012 at 11:13 AM (#4292535)
PECOTA started out fine, but then it went too far.
   7. DKDC Posted: November 04, 2012 at 11:22 AM (#4292538)
PECOTA wasn't wrong about Wieters. They nailed his rookie season with their 5% percentile projection.
   8. Yeaarrgghhhh Posted: November 04, 2012 at 11:37 AM (#4292545)
I prefer Wang [insert Wang joke] at PEC because his model is simpler and more transparent.
   9. calhounite Posted: November 04, 2012 at 11:43 AM (#4292550)
Credit Pecota with getting me in on the ground floor with Bautista. Had him ranked as a 20 homer low quality filler which was good enough to draft late.

Next year Pecota saddled me up with a whole host of bums..weird..like it KNEW how to screw up a team..not just the rooks, injury cases, and psychotic maniacs it routinely overates..It picked them ALL.

No more Pecota.
   10. Infinite Joost (Voxter) Posted: November 04, 2012 at 11:49 AM (#4292555)
I don't think Silver was running Pecota anymore when it decreed Weiters the new Piazza, was he?
   11. Best Regards, President of Comfort Posted: November 04, 2012 at 12:28 PM (#4292573)
I think it was pretty clearly pointed out to BPro at the time that their minor league translations for Weiters were based on bad league/park factors... but BPro just basically ignored that criticism
One of the main critics of the Wieters projection is now running their projections.
   12. Matt Clement of Alexandria Posted: November 04, 2012 at 02:29 PM (#4292641)
My understanding is that over a multi-year sample, the proprietary projection systems (ZiPS, PECOTA, Oliver, CHONE, etc) outperformed Marcel, but none of them were obviously better than the others. Also, one of the big things that the other system do, that Marcel doesn't, is to incorporate MLE data for players without significant playing time. So there's a big chunk of added value in that Marcel projections for players with less than 2-3 years of MLB play are kind of useless.

This piece does a good job of arguing that Silver's baseball projections, like his political projections, aren't notably better than the projections put together by other smart folks in the field. In 2008 and 2010, Silver's projections did fine, but not notably better than other folks in the field. This seems like a good and important point - Silver isn't a "wizard", he's a good writer with a good model that spits out results of a quality similar to the models of other folks who aren't as good at writing.**

Then parts of the article seem to hint at something much worse, that Silver is actually terrible at projecting things. The huge blockquote from that Wyers article is ironic - he says we can't take Silver's projections at face value given that they're black-boxed, but he takes Wyers' criticism at face value even though Wyers doesn't actually open up the box for anyone to test his critiques either. That part of the article has no actual data behind it, and is entirely unconvincing.

**I wouldn't be surprised at all if Sam Wang's simpler, open-source model is better at projections than Silver's black-boxed, likely overfitted model, but Wang isn't close to Silver's class as a writer and I'm going to keep reading Silver regardless of which models overperform his.
   13. cercopithecus aethiops Posted: November 04, 2012 at 04:08 PM (#4292701)
Sam writes pretty well when he's writing about neuroscience.
   14. zonk Posted: November 04, 2012 at 04:17 PM (#4292704)
PECOTA started out fine, but then it went too far.


At least it made the Weiters run on time.
   15. vivaelpujols Posted: November 04, 2012 at 04:52 PM (#4292736)
Nate left BP well before the Weiter's projection. It's probably not his fault it was so screwy.
   16. AROM Posted: November 04, 2012 at 04:59 PM (#4292744)
"My understanding is that over a multi-year sample, the proprietary projection systems (ZiPS, PECOTA, Oliver, CHONE, etc) outperformed Marcel, but none of them were obviously better than the others. Also, one of the big things that the other system do, that Marcel doesn't, is to incorporate MLE data for players without significant playing time. So there's a big chunk of added value in that Marcel projections for players with less than 2-3 years of MLB play are kind of useless. "

I first did projections in 2006. The first year wasn't that good. From 2007-2010 I had no problem beating PECOTA. Since then I haven't published anything in the public domain. After 2007 frankly I was more worried about competing with ZIPS than PECOTA. Nate's last year running it was 2008 I think.

Saw Silver on CBS Sunday morning today. His fans for the most part are as clueless as his detractors. Despite the press, this doesn't require anything high tech or need advanced statistical techniques. You just need the data. Take a simple average of the polls within each state, add up the electoral votes, and you get the same results that Silver is touting. 4th grade math will suffice, if you have the data. And your predictions are only as good as the data. As Nate has said himself, Romney's chance at a win pretty much comes down to the polls being wrong in a systematic fashion.
   17. Danny Posted: November 04, 2012 at 05:29 PM (#4292770)
Despite the press, this doesn't require anything high tech or need advanced statistical techniques. You just need the data. Take a simple average of the polls within each state, add up the electoral votes, and you get the same results that Silver is touting. 4th grade math will suffice, if you have the data.

The probabilities can matter. A 5 point lead in Ohio, for example, would make for a very different race than a .5 point lead.
   18. Zach Posted: November 04, 2012 at 06:03 PM (#4292797)
Behind the scenes, the PECOTA process has always been like Von Hayes: large, complex, and full of creaky interactions and pinch points… The numbers crunching for PECOTA ended up taking weeks upon weeks every year, making for a frustrating delay for both authors of the Baseball Prospectus annual and fantasy baseball players nationwide. Bottlenecks where an individual was working furiously on one part of the process while everyone else was stuck waiting for them were not uncommon. To make matters worse, we were dealing with multiple sets of numbers.

…Like a Bizarro-world subway system where texting while drunk is mandatory for on-duty drivers, there were many possible points of derailment, and diagnosing problems across a set of busy people in different time zones often took longer than it should have. But we plowed along with the system with few changes despite its obvious drawbacks; Nate knew the ins and outs of it, in the end it produced results, and rebuilding the thing sensibly would be a huge undertaking. We knew that we weren’t adequately prepared in the event that Nate got hit by a bus, but such is the plight of the small partnership.


This is a point that I was trying to make in the other thread, but got sidetracked away from emphasizing. Part of the reason you want to want to use a mathematically sound approach is so that you can be really rigorous about what the model is actually telling you and what you're putting in yourself. If you don't have a really good idea of how you're describing a system, you'll tend to write a program that grows exponentially as you try different things, fix some bugs, introduce some others...

If you have a really good idea about the mathematical relationships between different working parts, you can zero in on problems without introducing new complexity. It might not show up the first time you try to test your results, but in the long run a simple model that's mathematically sound will tell you a lot more than a complex model that might seem to fit historical data better.
   19. Harveys Wallbangers Posted: November 04, 2012 at 06:10 PM (#4292801)
gotta confess that most of the stuff directed at silver sounds like jealousy to me. the guy has worked hard. i don't recall him having a sugar daddy or sugar momma out there supporting him and pushing his stuff. he made bp work with wastes of skin like joe sheehan and the doofsy injury guru who is truly the emperor with no clothes. silver has put stuff out there, shared his thoughts, taken all the hits and kept his nose to the grindstone

stat folks should be celebrating these inroads versus sounding like a bunch of pouty pollies claiming this and that about his approach

sounds small and if i was in a harsher mood might write the word pathetic

edit

to be clear if folks have an honest intellectual critique that is fine. but this harping claiming the guy is doing something anyone could do rings hollow. plenty of folks 'could' have done it. and some have tried to compete

have not seen anyone do it like silver. so if it's so easy where are the folks rushing to be the next cool thing? that is how markets work. there is no cost to entry.

oh wait, that's right, it's a lot of work.
   20. Harveys Wallbangers Posted: November 04, 2012 at 06:14 PM (#4292805)
arom

not directed at you. just wanted to clarify
   21. Harveys Wallbangers Posted: November 04, 2012 at 06:15 PM (#4292808)
i have tremendous respect for folks who are entrepeneurs. takes gumption.

anyone can be a critic
   22. Zach Posted: November 04, 2012 at 06:29 PM (#4292815)
I don't mean to single out Silver, by the way. Bill James's Win Shares is a classic example of an initially sound system that gets lost in a blizzard of special rules and exceptions. It still basically works, because a) at its heart its components translate to runs in some sort of rough and ready fashion, and b) Bill James spent several years fine tuning it, and the man has a freakish ability to compare, say, Johnny Pesky in 1942 to Joe Gordon in 1949. But it would be extremely difficult to come out with a Win Shares 2.0 that used the same framework but slightly improved on the original model.

In contrast, something like Linear Weights is trivial, both to calculate and to improve upon. You can add in timeline or park adjustments, calculate new weights for different years or run scoring environments. You can do it in any language you want, and the computer time will be trivial on a modern computer. Rigor isn't just about mathematical pissing contests -- it's a way to save effort and increase reliability.
   23. Shock Posted: November 04, 2012 at 06:42 PM (#4292822)
Agree whole heartedly with Harveys on this one. It's part having the idea and part executing the idea and part marketing the idea in a way that appeals to the public. All three components are critical.

It's the same as all the developers who talk about how trivial Twitter is and how they could have done it. Yep, could have. Didn't.
   24. The Clarence Thomas of BBTF (scott) Posted: November 04, 2012 at 07:14 PM (#4292838)
It's my understanding that one of the differences between Wang and Silver is that Wang views each state as individual from the others, whereas Silver's model will see a swing in one state as indicating changes in other surrounding states with similar demographics absent other information.
   25. Harveys Wallbangers Posted: November 04, 2012 at 07:35 PM (#4292843)
aaron schatz of footballoutsiders takes the same guff from folks who ask why an econ major from brown is deemed a reliable resource for football stats

well, he did the grunt work. sorry lazy people.
   26. Shoebo Posted: November 04, 2012 at 07:48 PM (#4292847)
Thanks Harvey's for articulating what I was thinking. Have to run off to work. But I had the same thoughts and reaction.
   27. Harveys Wallbangers Posted: November 04, 2012 at 08:00 PM (#4292855)
glad it was you shoe and not levski. if the latter i would have been wondering.
   28. Danny Posted: November 04, 2012 at 08:29 PM (#4292868)
.
   29. zenbitz Posted: November 04, 2012 at 09:12 PM (#4292891)
Silver and football outsiders is a great analogy. Football and elections are hard problems with not great data. I think both sites do a nice, if imperfect quant job. Defensive stats in baseball are similar.
   30. Walt Davis Posted: November 04, 2012 at 09:34 PM (#4292903)
Well, Nate's been out of baseball predicting for 4 years now, I'm not sure what's worth criticizing at this point. How long was PECOTA around before he left anyway? It doesn't seem like it was that long. Anyway, as far as I know, PECOTA was the first one to publish some measures of the uncertainty of the prediction.

part marketing the idea in a way that appeals to the public

This was part of my problem with BPro -- it existed before Silver/PECOTA so I see no reason to blame Nate for it. But every edition of the annual bragged about all the stuff they got right the year before, oddly not mentioning all the stuff they'd gotten wrong.

In fairness, marketing anything statistical in an honest fashion is nearly impossible because it's all about probability and uncertainty. We laugh about it, but Wieters at his 5th percentile is not an incorrect prediction -- the model is only wrong if the Wieters of the world end up at their 5th percentile or worse substantially more than 5% of the time. When you evaluate a proposed model like this in the real world, you run thousands of simulations from a pre-determined distribution and see if you reproduce that distribution. In baseball, you get to run that experiment once for Wieters and maybe a dozen times for players like Wieters.

Take Jason Heyward. He's had OPS+s of 131, 93 and 117. Of course no model is going to come close to pegging those on the money. (Neither is any scout or manager or GM.) That's the kind of input data baseball projections work with -- way more noise than signal. Calling any of these guys out for getting one player spectacularly wrong is pointless -- fun, but pointless. The most you can hope a model would do is to look at Heyward's 131 and say "he probably is (or is not) that good." You've got to find entire types that they get wrong or you have to find the model that produces a smaller MSE (or less bias or something) before you can start saying the model is "wrong."

But similarly every projection should be saying more than "Heyward is projected to a 133 OPS+". So PECOTA published 5th and 95th percentiles. These tended to be quite wide (nature of the data) but this was also Silver saying quite clearly "even for this broad range, I know I'm still going to be dead, dead wrong 10% of the time." Ten percent is not a small percentage really. ZiPS is projecting something like 1500 players a year and 150 are going to be really, really bad projections. Look back at the old TO Swisher-Betemit post. Swisher's 90% confidence interval for OPS+ was about 75 to 145. All told there was a 15-20% chance Swisher was going to repeat his White Sox season. That's as close as ZiPS could get its prediction (at the time, things may have improved). That's not a bad model, that's highly variable data with low signal.

Take a simple average of the polls within each state

You can do better than this. First, the means should be weighted -- a 2 point lead in a poll of 1000 is worth more than a 3 point lead in a poll of 500. Well, kinda and maybe because it's also easy to calculate a standard error on that. Each sample is independent so your weighted mean is the mean of a sample equivalent to the full sample size -- assuming reasonably similar methodology and they are all using reasonably similar methodology. It's quite easy to calculate the standard error of a proportion: sqrt(P(1-P)/n). The standard error is about 1.5% at n=1000 (or a margin of error of +/- 3%) and you can divide that by 1.4 for each doubling in sample size.

If there's a tricky bit it's in how far back timewise do you go. Still this is why Obama "appears" safe (in the RDP meaning of the term!) -- average the last 4-5 polls in any state and you're talking about a sample size of about 4,000 which means a margin of error down to 1.5%. While no single poll may give Obama a comfy lead in Ohio, the pooled analysis puts him outside the margin of error

Assuming the polls are of sufficient quality. The response rate in phone surveys is pretty abysmal.

Now take that back to baseball. For a sample of 4,000, you can get about a +/- 1.5% margin of error on a proportion (anywhere near 50%). That's 6-7 full season's worth of PAs and let's say the proportion we are interested in is OBP. Well, 1.5% is 15 points of OBP so your best guess is that the player's true talent OBP is somewhere in the 320-350 range (for example). Now, in the upcoming season, that 320-350 OBP gets to play out across 600 random PAs which would probably give us a range something like 300-370 if not wider.

   31. Der-K: Hipster doofus Posted: November 04, 2012 at 11:23 PM (#4292979)
We laugh about it, but Wieters at his 5th percentile is not an incorrect prediction -- the model is only wrong if the Wieters of the world end up at their 5th percentile or worse substantially more than 5% of the time.

That, effectively is what happened. Granted, there's only one Wieters, but a whole class of people got bad predictions (principally) because BPro used an insanely optimistic EL league factor (see post 3). It failed the laugh test before and after the season.

Anyway, I think MCoA (among others) is mostly correct on Silver - for presidential elections, I'm not sure the model has much value added, but the mind behind it does and I go to his site daily.
   32. Howie Menckel Posted: November 04, 2012 at 11:31 PM (#4292983)

What did Baseball Prospectus and Nate come along and do that Shandler at BBHQ wasn't doing sooner, in terms of basic philosophy?

   33. Pasta-diving Jeter (jmac66) Posted: November 04, 2012 at 11:44 PM (#4292997)
Silver and football outsiders is a great analogy. Football and elections are hard problems with not great data. I think both sites do a nice, if imperfect quant job.

DVOA and DYOA are:

1. wonderfully self-consistent
2. utterly useless in the real world

Defensive stats in baseball are similar.

Bingo
   34. DA Baracus is a "bloodthirsty fan of Atlanta." Posted: November 04, 2012 at 11:48 PM (#4293001)
Silver and football outsiders is a great analogy. Football and elections are hard problems with not great data. I think both sites do a nice, if imperfect quant job. Defensive stats in baseball are similar.


Brian Burke has a really good post today about it.
   35. JE (Jason Epstein) Posted: November 04, 2012 at 11:54 PM (#4293005)
Brian Burke has a really good post today about it.

You're velcome! ;-)
   36. DA Baracus is a "bloodthirsty fan of Atlanta." Posted: November 05, 2012 at 12:12 AM (#4293014)
You're velcome! ;-)


Full credit to JE for dropping it in the political thread. I should have said that in the first post. My bad.

I think one reason that Silver is getting so much flak in recent weeks is that unlike other notable polling aggregators like Real Clear Politics, he's just one person. It's so much easier to ding a singular person than to ding Generic Polling Aggregator because you disagree with them. It's his model, it's his forecast, it's his credibility on the line. He has few safety nets and buffers if he is wrong. If RCP is wrong, they can conceivably fire people and get back on track. If Nate is wrong, his career is in doubt.
   37. Brian C Posted: November 05, 2012 at 12:29 AM (#4293019)
Don't know if this point has been made in another thread, but one has to remember that 538 rose to prominence back in the 2008 primaries when the Washington press was pretending that Clinton and Obama were running neck-and-neck. Silver merely came in and said, "uh, do the math, Obama has a lead that's probably insurmountable", and he of course turned out to be right, to the extent that even the halfwits in the press noticed.

Essentially, he got crowned as a genius by a bunch of stupid people, for saying a bunch of stuff that was patently obvious to anyone not invested in all the horserace BS that dominates political coverage in major news orgs. That's not Silver's fault, obviously, and as HW said above, it's hard to begrudge the guy for taking advantage of the situation he found himself in. But I think it's worth keeping in mind who his audience really is - not the people who understand basic statistical concepts, and certainly not the people who are creating advanced statistical models themselves. No, his audience are the morons who employ him at the NYT and the braindead children in the media who get distracted by every poll that's released during an election season.

And frankly, I think it's awesome that someone like him, with the patience to explain how these things work, actually finds himself with a prominent voice. Because he's singlehandedly raising the IQ of polling discussions a few desperately needed points.
   38. Baldrick Posted: November 05, 2012 at 02:54 AM (#4293061)
37 is not quite right. Silver was most notable during the primaries not for calling Obama as the eventual winner, but for using a model of geography/demographics/etc. to far more accurately call the actual delegate results of a bunch of particular contests. He was able to figure out trends in racial patterns, in particular, that helped to sort out the likely results of under-polled primaries.
   39. Jack Carter, calling Beleaguered Castle Posted: November 05, 2012 at 03:59 AM (#4293072)
to be clear if folks have an honest intellectual critique that is fine. but this harping claiming the guy is doing something anyone could do rings hollow. plenty of folks 'could' have done it. and some have tried to compete

have not seen anyone do it like silver. so if it's so easy where are the folks rushing to be the next cool thing? that is how markets work. there is no cost to entry.

oh wait, that's right, it's a lot of work.
Yes it is. It reminds me of the work Sean Foreman put into Baseballreference.com. The numbers were readily available, but he did the work of entering them all into a fast-loading, easy to navigate site. It looks simple, but takes a hell of a lot of time.

And frankly, I think it's awesome that someone like him, with the patience to explain how these things work, actually finds himself with a prominent voice. Because he's singlehandedly raising the IQ of polling discussions a few desperately needed points.
Yes he has. He's done a great job. The impatience occasionally shows through, but he's awfully good at explaining what he's doing.
   40. Harveys Wallbangers Posted: November 05, 2012 at 08:35 AM (#4293094)
criticizing silver's use of analogy misses the basic point is that silver is trying to help his audience understand. so he uses an imperfect comparitive base.

thankfully burke is not nitpicky like so many others.

but it does baffle me about stats people looking to undermine a guy successfully pimping the value of their field. that's some crazy stuff.
   41. Crispix reaches boiling point with lackluster play Posted: November 05, 2012 at 09:12 AM (#4293106)
but it does baffle me about stats people looking to undermine a guy successfully pimping the value of their field. that's some crazy stuff.

It happens all the time with popularizers. Ask an evolutionary biologist about Stephen Jay Gould back in his heyday and all you would hear is complaints about how he simplifies things, and he acts as if debunked principles are still known to be true because the new theory is too hard to turn into catchy metaphors, and he promotes the work of his allies at the expense of his rivals, etc.
   42. The Id of SugarBear Blanks Posted: November 05, 2012 at 09:30 AM (#4293111)
Essentially, he got crowned as a genius by a bunch of stupid people, for saying a bunch of stuff that was patently obvious to anyone not invested in all the horserace BS that dominates political coverage in major news orgs.

Mass confusion and delusion is typically most effectively countered by simple truths.
   43. bob gee Posted: November 05, 2012 at 11:39 AM (#4293223)
why i like silver:

he may be wrong. he may be right. but compared to tons of people in the political arena - in fact, virtually everyone no matter what their political beliefs - he looks like a genius.

i'm not comparing him to someone like sam wang, who i only recently discovered. and i'm sure there are others. but compared with anyone on cnn / msnbc / fox, nate comes out far ahead.

i'm reading his book, and i find it an interesting read. he even admits there's a huge opportunity in the political field and he just happens to fill that niche. i have a feeling that when the 'better' punditry is taking place, nate will have moved on to another area - just as he has moved on from poker and baseball.
   44. Scott Ross Posted: November 05, 2012 at 01:22 PM (#4293353)
Weird that Cosh says that "without a long series of tests—i.e., U.S. elections—we don’t really know that Nate is not pulling the numbers out of the mathematical equivalent of a goat’s bum," totally ignoring the hundreds of elections over the two previous cycles that Silver has worked, and instead focuses on one UK election the guy has done.
   45. KT's Pot Arb Posted: November 05, 2012 at 02:57 PM (#4293497)
That, effectively is what happened. Granted, there's only one Wieters, but a whole class of people got bad predictions (principally) because BPro used an insanely optimistic EL league factor (see post 3). It failed the laugh test before and after the season.


Did someone at BPro enter bad MLE translations for Wieter's minor leagues after Nate left, or did Silver leave them with a Pecota containing bad MLE translations?

If it's the latter, then Silver should take a (minor) hit for the Wieters forecast error.

From the description of Pecota, it seems like Nate built a rats nest that's difficult to maintain, let alone continue to develop, and difficult to verify is working as designed (that the inputs to every sub-part and the whole produce the outputs you expect based on your model). I don't have a great deal of experience with Excel, but I do have experience building complex applications and simulations in C, C++, Java, and even Objective C. It sure seems to me that Excel is a terrible choice for building complex systems. I've built some large Excel spreadsheets and had some basic (non-automated) means of ensuring sub-sheets calculations are verified, so maybe a stud Excel jockey like Nate had no problems ensuring all the parts of the model were correctly calculated (with the exception of some bad inputs like the Wieters MLE translations data).

Making Pecota proprietary probably added another barrier to verification of it's operation. If Pecota could have been open sourced (and I understand it probably couldn't and still have the value it had, esp. given Nate's proprietary ideas and insights), it would have benefited from many eyes finding flaws that Nate and his partners might have over-looked.

Weird that Cosh says that "without a long series of tests—i.e., U.S. elections—we don’t really know that Nate is not pulling the numbers out of the mathematical equivalent of a goat’s bum," totally ignoring the hundreds of elections over the two previous cycles that Silver has worked, and instead focuses on one UK election the guy has done.


I think it was the size of the miss that made it example worthy, it didn't seem like a 1 or even 2 standard deviations error. The problem is for us statistics 101 dropouts no one is quantifying how much error margin is reasonable, and how much sample size is necessary to prove Nate's predictions are good or bad. I believe Cosh is saying that many more elections are needed to know, but how many?

I agree with Harvey's well written post BTW. I more than admire Nate Silver for "getting into the ring", the fact is I'm jealous. Not of the fame he's gained, but because he has worked on some really cool ideas that other people (like me) just spout off about "I could do this, if I found the time". He found the time, he did it, and must have had a great deal of fun when it was finally up and running and he could just add more and more ideas to it.

But I'm not jealous of him having to do all that work in excel. Ugh.
   46. Bitter Mouse Posted: November 05, 2012 at 03:31 PM (#4293525)
it seems like Nate built a rats nest that's difficult to maintain, let alone continue to develop


As someone who has developed a ton of code, even if he did it may not be his fault. There are many factors that go into making a god design. From the outside trying to figure out who to blame for s single prediction is kind of silly (though he gets credit likely beynd what is deserved so I guess OK).

   47. Eddo Posted: November 05, 2012 at 03:35 PM (#4293530)
Yes it is. It reminds me of the work Sean Foreman put into Baseballreference.com. The numbers were readily available, but he did the work of entering them all into a fast-loading, easy to navigate site. It looks simple, but takes a hell of a lot of time.

I'm reminded of this, from Calvin Coolidge:
"Nothing in the world can take the place of Persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination alone are omnipotent. The slogan 'Press On' has solved and always will solve the problems of the human race."
   48. JJ1986 Posted: November 05, 2012 at 03:38 PM (#4293535)
Here's the explanation for the Wieters thing. Relevant part:

The first thing to note is that the Eastern League players, while retaining more of their production in Triple-A than their counterparts, still lost production. This is not what we would expect if BP's difficulty ratings were correct. In fact, all of the Double-A leagues look pretty similar to each other in level of difficulty.


There almost had to be a mistake with the inputs and not the calculations.
   49. Eddo Posted: November 05, 2012 at 03:43 PM (#4293541)
criticizing silver's use of analogy misses the basic point is that silver is trying to help his audience understand. so he uses an imperfect comparitive base.

thankfully burke is not nitpicky like so many others.

but it does baffle me about stats people looking to undermine a guy successfully pimping the value of their field. that's some crazy stuff.

I like Burke, and he's done some great work with regard to the NFL, win probability, and decision-making. But I also think he displays some professional jealousy at times towards those whose work has gone more mainstream.

He's quick to criticize Football Outsiders (sometimes in nitpicky ways(*)), and I get the same feeling here. He also tends to only see the numbers side of things when it comes to NFL decision-making. Now, he's absolutely right in the aggregate, but his criticisms of individual coaching decisions - which are affected by plenty of factors besides strict win probability - are often over-the-top.

(*) The "Curse of 370" is the best example, here. Now, his criticisms are certainly valid - there's nothing magical about the 370th carry that will cause a RB to get hurt - but they miss the forest for the trees. FO has often pointed out the very same fact, and really uses "370" as more of an indicator that some sort of regression is coming and that general RB overuse (could be 371 carries, could be 419) should be avoided.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Jim Wisinski
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogEscape from Cuba: Yasiel Puig’s Untold Journey to the Dodgers
(4 - 11:16pm, Apr 17)
Last: Jolly Old St. Nick Still Gags in October

NewsblogChris Resop - The Most Interesting Reliever in the World
(17 - 11:16pm, Apr 17)
Last: Avoid running at all times.-S. Paige

NewsblogOrioles launch D.C. invasion with billboard near Nationals Park
(12 - 11:08pm, Apr 17)
Last: Jolly Old St. Nick Still Gags in October

NewsblogDaniel Bryan's 'YES!' chant has spread to the Pirates' dugout
(68 - 11:04pm, Apr 17)
Last: andrewberg

NewsblogOMNICHATTER for April 17, 2014
(127 - 10:57pm, Apr 17)
Last: Davo Dozier (Mastroianni)

NewsblogDesign Room: Top 10 Logos in MLB History.
(12 - 10:55pm, Apr 17)
Last: if nature called, ladodger34 would listen

NewsblogGeorge Brett, Inspiration for the Song “Royals”, Meets Lorde
(27 - 10:50pm, Apr 17)
Last: Greg K

NewsblogOT: NBA Monthly Thread - April 2014
(303 - 10:42pm, Apr 17)
Last: NJ in DC

NewsblogMLB: Offense's performance vs. Brewers favors Matheny's interpretation of stats
(6 - 10:14pm, Apr 17)
Last: Walt Davis

NewsblogRobothal: What a relief! A’s could use bullpen differently than other teams
(4 - 9:50pm, Apr 17)
Last: cardsfanboy

NewsblogOT: The NHL is finally back thread, part 2
(135 - 9:46pm, Apr 17)
Last: zack

NewsblogMinuteman News Center: Giandurco: This means WAR
(68 - 9:08pm, Apr 17)
Last: zenbitz

NewsblogGleeman: Mets minor league team is hosting “Seinfeld night”
(138 - 8:59pm, Apr 17)
Last: Moe Greene

NewsblogOTP April 2014: BurstNET Sued for Not Making Equipment Lease Payments
(1584 - 7:52pm, Apr 17)
Last: Publius Publicola

NewsblogPrimer Dugout (and link of the day) 4-17-2014
(15 - 7:43pm, Apr 17)
Last: Eric J can SABER all he wants to

Demarini, Easton and TPX Baseball Bats

 

 

 

 

Page rendered in 0.7307 seconds
52 querie(s) executed