Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Tuesday, January 10, 2017

Grading the projections: 2016

ZiPS didn’t have a good year. ZiPS was the least accurate of the three systems in each of the five categories, and never by a particularly small margin. You don’t want to conclude too much based on a single season of results, but ZiPS didn’t perform very well in last year’s review, either. (I should also note that this is Steamer’s second straight year of leading the pack in convincing fashion.)

Marcel does its job. Marcel wasn’t great, but it was almost always in the neighborhood of accurate. It beat ZiPS in four of the five categories, and even led OBP. Marcel remains very hard to convincingly beat (or even beat at all), despite its simplicity.

Averaging the projections might be a great idea. The “Average” row in the above table is exactly what you would expect: the accuracy of the average of all four systems. It beats all four systems in four of the five categories, and fell short of only Steamer in the fifth. One would expect that an average would rarely be egregiously wrong; it’s surprising to see that the average also tended to be closer to right than each individual projection. This could be a quirk of a single season of projections, but at the very least, it seems to say that the brute-force method of resolving differences between the projection systems is credible.

RoyalsRetro (AG#1F) Posted: January 10, 2017 at 11:34 AM | 7 comment(s) Login to Bookmark
  Tags: marcel, pecota, steamer, zips

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. DJS, the Digital Dandy Posted: January 10, 2017 at 02:30 PM (#5381079)
I had a weaker than average year on hitters, better than average on pitchers (I center systems on league offensive levels first), very good year on teams, weirdly enough.
   2. Shibal Posted: January 10, 2017 at 04:31 PM (#5381181)
Is he looking at pitching, hitting, or both? I can't tell from the article.
   3. PreservedFish Posted: January 10, 2017 at 07:11 PM (#5381283)
How do you grade Zips on teams? Do you use someone else's playing time estimates?
   4. Walt Davis Posted: January 10, 2017 at 07:35 PM (#5381298)
I'm not clear what correction he made for different league contexts assumed by the systems. It's clear he tried something but it would be good to be more precise. One thing to realize is that, for these sorts of rate stats, the mean and the variance are almost certainly related. A league with a 330 OBP will have a slightly higher variance of OBP than one with a 320 (and therefore slightly higher standard deviation). This is trivial in magnitude but all baseball differences are trivial in magnitude. This is part of the point that DanR regularly makes.

Point being that a 15 point error in OBP in the 330 league might be the equivalent of a 14 point error in a d320 league. Trivial but then the differences among the systems is usually on the order of the third decimal place.

But my key gripe is that I don't have a clue what that number is that he's presenting. It can't possibly be raw RMSE -- the systems weren't "typically" off on OBP by 80-90 points of OBP. Is it RMSE/average context? RMSE/actual OBP? 2*RMSE/actual? It can't be average absolute differences either for the same reason. (Tango's other preferred measure)

There's no point just putting a number out there and saying "see this one is lowest, that's the best." What do these numbers mean. Zips gets a .088 on OBP while 3 others are around .076 -- is that ,009 difference at all meaningful? Is it likely to be anything but season-to-season prediction error variance?

By the way, I did click through to the Tango article linked. All that article does is review the Nate article, tell you which two of the measures that Nate cites are likely to be most useful/meaningful, but notes they all need to be adjusted to the particular context each method assumed but doesn't tell you how to do that.

Sorry if I missed the sentence or two that actually clarifies this but I have looked for them.
   5. DJS, the Digital Dandy Posted: January 10, 2017 at 09:37 PM (#5381386)
Walt, I had some of these questions as well. But there's a bit of a self-serving nature when I make them, so I try not to.
   6. Guy Heckler's Veto Posted: January 10, 2017 at 10:06 PM (#5381394)
Admit no weakness, Szym. Give Murray Chass no quarter.
   7. Russ Posted: January 11, 2017 at 08:07 AM (#5381485)
It would be interesting to look at the distributions of the absolute errors as well (at least providing something like the 25th%ile and 75th%ile of the absolute errors, rather than just the average). Because it would be possible to have some systems to more "small" misses, but fewer "big" misses and vice versa. I don't really love marginal summary statistics for these sorts of comparisons because it just buries a lot of things going on which is unnecessary because every player gets projected by each system, so you can really look at how much they agree on each player, rather than averaging across the error.

For example, what could be nice is to see a plot of the error by statistic with players on the x-axis (sorted by decreasing average absolute error in the statistic across the different projection methods) with the raw error on the player for each of the five methods on the y-axis. To make the graph easier to look at (i.e. to have fewer points), you could stratify players into different blocks by either plate appearances, experience, age, whatever. You would be able to see if systems are missing both above AND below (or everyone is missing the same direction), you could see clearly which players are the "easiest" to predict, which are the "hardest", etc. It also would dramatically show where Marcel does very well and where it does very poorly.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

News

All News | Prime News

Old-School Newsstand


BBTF Partner

Dynasty League Baseball

Support BBTF

donate

Thanks to
1k5v3L
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogPujols' Age Revisted
(39 - 2:51am, Apr 25)
Last: Endless Trash

NewsblogLong-Term Battery Combiniations
(8 - 2:41am, Apr 25)
Last: stevegamer

Gonfalon CubsRiding the Rails of Mediocrity
(18 - 2:34am, Apr 25)
Last: Quaker

NewsblogOTP 2018 Apr 23: The Dominant-Sport Theory of American Politics
(547 - 2:19am, Apr 25)
Last: tshipman

NewsblogBBTF ANNUAL CENTRAL PARK SOFTBALL GAME 2018
(64 - 2:13am, Apr 25)
Last: Chicago Joe

NewsblogOT - 2017-18 NBA thread (All-Star Weekend to End of Time edition)
(2608 - 1:54am, Apr 25)
Last: f_cking sick and tired of being 57i66135

NewsblogShe's got legs that go all the way up to her OMNICHATTER! for April 24, 2018
(117 - 1:06am, Apr 25)
Last: LA Podcasting Hombre of Anaheim

NewsblogOT - Catch-All Pop Culture Extravaganza (April - June 2018)
(251 - 12:41am, Apr 25)
Last: Lassus

NewsblogOT: Winter Soccer Thread
(1592 - 12:35am, Apr 25)
Last: Sean Forman

NewsblogESPN's top 50 players
(80 - 11:37pm, Apr 24)
Last: Jarrod HypnerotomachiaPoliphili (TeddyF.Ballgame)

Newsblog'Family' and sense of 'brotherhood' has Diamondbacks picking up right where they left off
(18 - 9:20pm, Apr 24)
Last: shoewizard

NewsblogPrimer Dugout (and link of the day) 4-24-2018
(33 - 9:16pm, Apr 24)
Last: AndrewJ

NewsblogBrandon Belt sets MLB record, sees 21 pitches in AB before lining out
(35 - 9:14pm, Apr 24)
Last: Zonk, Genius of the Stables

NewsblogForget that one call; Sean Manaea deserves our full attention
(23 - 7:02pm, Apr 24)
Last: PepTech, Bane of Epistemological Foundations

NewsblogVIDEO: Rockies Announcers Sound Like Complete Idiots Talking About Javier Baez
(26 - 6:25pm, Apr 24)
Last: Brian C

Page rendered in 0.1337 seconds
47 querie(s) executed