Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Tuesday, January 10, 2017

Grading the projections: 2016

ZiPS didn’t have a good year. ZiPS was the least accurate of the three systems in each of the five categories, and never by a particularly small margin. You don’t want to conclude too much based on a single season of results, but ZiPS didn’t perform very well in last year’s review, either. (I should also note that this is Steamer’s second straight year of leading the pack in convincing fashion.)

Marcel does its job. Marcel wasn’t great, but it was almost always in the neighborhood of accurate. It beat ZiPS in four of the five categories, and even led OBP. Marcel remains very hard to convincingly beat (or even beat at all), despite its simplicity.

Averaging the projections might be a great idea. The “Average” row in the above table is exactly what you would expect: the accuracy of the average of all four systems. It beats all four systems in four of the five categories, and fell short of only Steamer in the fifth. One would expect that an average would rarely be egregiously wrong; it’s surprising to see that the average also tended to be closer to right than each individual projection. This could be a quirk of a single season of projections, but at the very least, it seems to say that the brute-force method of resolving differences between the projection systems is credible.

RoyalsRetro (AG#1F) Posted: January 10, 2017 at 11:34 AM | 7 comment(s) Login to Bookmark
  Tags: marcel, pecota, steamer, zips

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. DJS, the Digital Dandy Posted: January 10, 2017 at 02:30 PM (#5381079)
I had a weaker than average year on hitters, better than average on pitchers (I center systems on league offensive levels first), very good year on teams, weirdly enough.
   2. Shibal Posted: January 10, 2017 at 04:31 PM (#5381181)
Is he looking at pitching, hitting, or both? I can't tell from the article.
   3. PreservedFish Posted: January 10, 2017 at 07:11 PM (#5381283)
How do you grade Zips on teams? Do you use someone else's playing time estimates?
   4. Walt Davis Posted: January 10, 2017 at 07:35 PM (#5381298)
I'm not clear what correction he made for different league contexts assumed by the systems. It's clear he tried something but it would be good to be more precise. One thing to realize is that, for these sorts of rate stats, the mean and the variance are almost certainly related. A league with a 330 OBP will have a slightly higher variance of OBP than one with a 320 (and therefore slightly higher standard deviation). This is trivial in magnitude but all baseball differences are trivial in magnitude. This is part of the point that DanR regularly makes.

Point being that a 15 point error in OBP in the 330 league might be the equivalent of a 14 point error in a d320 league. Trivial but then the differences among the systems is usually on the order of the third decimal place.

But my key gripe is that I don't have a clue what that number is that he's presenting. It can't possibly be raw RMSE -- the systems weren't "typically" off on OBP by 80-90 points of OBP. Is it RMSE/average context? RMSE/actual OBP? 2*RMSE/actual? It can't be average absolute differences either for the same reason. (Tango's other preferred measure)

There's no point just putting a number out there and saying "see this one is lowest, that's the best." What do these numbers mean. Zips gets a .088 on OBP while 3 others are around .076 -- is that ,009 difference at all meaningful? Is it likely to be anything but season-to-season prediction error variance?

By the way, I did click through to the Tango article linked. All that article does is review the Nate article, tell you which two of the measures that Nate cites are likely to be most useful/meaningful, but notes they all need to be adjusted to the particular context each method assumed but doesn't tell you how to do that.

Sorry if I missed the sentence or two that actually clarifies this but I have looked for them.
   5. DJS, the Digital Dandy Posted: January 10, 2017 at 09:37 PM (#5381386)
Walt, I had some of these questions as well. But there's a bit of a self-serving nature when I make them, so I try not to.
   6. Guy Heckler's Veto Posted: January 10, 2017 at 10:06 PM (#5381394)
Admit no weakness, Szym. Give Murray Chass no quarter.
   7. Russ Posted: January 11, 2017 at 08:07 AM (#5381485)
It would be interesting to look at the distributions of the absolute errors as well (at least providing something like the 25th%ile and 75th%ile of the absolute errors, rather than just the average). Because it would be possible to have some systems to more "small" misses, but fewer "big" misses and vice versa. I don't really love marginal summary statistics for these sorts of comparisons because it just buries a lot of things going on which is unnecessary because every player gets projected by each system, so you can really look at how much they agree on each player, rather than averaging across the error.

For example, what could be nice is to see a plot of the error by statistic with players on the x-axis (sorted by decreasing average absolute error in the statistic across the different projection methods) with the raw error on the player for each of the five methods on the y-axis. To make the graph easier to look at (i.e. to have fewer points), you could stratify players into different blocks by either plate appearances, experience, age, whatever. You would be able to see if systems are missing both above AND below (or everyone is missing the same direction), you could see clearly which players are the "easiest" to predict, which are the "hardest", etc. It also would dramatically show where Marcel does very well and where it does very poorly.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

News

All News | Prime News

Old-School Newsstand


BBTF Partner

Support BBTF

donate

Thanks to
Phil Birnbaum
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogRyan Thibs has his HOF Ballot Tracker Up and Running!
(425 - 4:13am, Dec 14)
Last: bachslunch

NewsblogMets agree to two-year deal with Anthony Swarzak
(7 - 2:27am, Dec 14)
Last: Dog on the sidewalk

NewsblogOTP 11 December, 2017 - GOP strategist: Moore would have 'date with a baseball bat' if he tried dating teens where I grew up
(1464 - 2:15am, Dec 14)
Last: The Yankee Clapper

NewsblogOT - NBA 2017-2018 Tip-off Thread
(1975 - 1:54am, Dec 14)
Last: tshipman

NewsblogOT: Winter Soccer Thread
(353 - 12:51am, Dec 14)
Last: Richard

NewsblogCardinals trade for Marcell Ozuna of Marlins
(48 - 12:44am, Dec 14)
Last: Aspring OTP Dancing Monkey (6 - 4 - 3)

NewsblogJack Morris, Alan Trammell elected to Hall | MLB.com
(215 - 11:52pm, Dec 13)
Last: PreservedFish

Hall of Merit2018 Hall of Merit Ballot
(26 - 11:35pm, Dec 13)
Last: OCF

Hall of Merit2018 Hall of Merit Ballot Discussion
(372 - 11:31pm, Dec 13)
Last: Howie Menckel

Hall of MeritMost Meritorious Player: 2010 Discussion
(9 - 10:27pm, Dec 13)
Last: DL from MN

NewsblogEric Hosmer will get paid even though the numbers may argue against it
(32 - 10:06pm, Dec 13)
Last: puck

NewsblogUCL of Los Angeles Angels' Ohtani is damaged, according to physical
(20 - 10:02pm, Dec 13)
Last: You Know Nothing JT Snow (YR)

NewsblogThe Cincinnati Reds showed us what they showed Shohei Ohtani
(7 - 7:16pm, Dec 13)
Last: Tulo's Fishy Mullet (mrams)

NewsblogDerek Jeter's defense of Giancarlo Stanton trade was weak | SI.com
(50 - 5:52pm, Dec 13)
Last: Nasty Nate

NewsblogIf Kyle Schwarber goes anywhere could it be back to Cubs leadoff spot? – Chicago Sun-Times
(28 - 5:35pm, Dec 13)
Last: Andere Richtingen

Page rendered in 0.2968 seconds
47 querie(s) executed