In-Season ZiPS and Regression
The last couple of days over at BP, Colin’s look at my 2007-2008 model of Rest-of-Season projections and found the model to be very simple.
He’s correct, it is. ZiPS RoS was originally meant as a quick, simple tool, rather than something that was as rigorous as it could possibly be. In a way, ZiPS RoS was a spiritual descendant of Marcel, another simple tool used to make basic projections. After all, full-on ZiPS treatment was too complex to do for players en masse in the same way and ZiPS was developed with the knowledge of season-to-season changes in performance, not in-season ones.
Colin talks a lot of the general weighting off 11*Proj, 8*Season*%Season, which is absolutely the general weighting used in the first version of ZiPS RoS, and still used in the current method today for some things. Why were these the weights? Using gamelogs of the last 30 years for all players that qualified for the batting title, that’s what I arrived at as the mix of projection (I used Marcel since that’s easier to use historically) and actual. As it turned out, 11 and 8 were the best weights (actually 7.923 and 11.1154) to predict future in-season play. Various weightings used to generate baselines for season-to-season projections in every system known to man weren’t handed down from Mt. Sinai, either, but derived in the same way. Perhaps there’s a more novel method for determining weights than I did, but the 11/8 certainly wasn’t a figure landed on out of capriciousness.
Today, ZiPS RoS is using a newer model, but as Colin notes, the differences in results are very modest. That’s not unexpected - the previous projection still makes up most of the rest-of-season projection. What the difference amounts to is slightly differing regression-to-projection of a percentage of a projection representing a percentage of a season. In the same way that Marcel gets us most of the way to the projections that ZiPS or CHONE or PECOTA or OLIVER or CAIRO come out with, RoS 1.0 gets us most of the way to what RoS 2.0 does.
Also worth noting is that in-season regression to projection is different than season-to-season regression. To borrow a term from another field, in-season performance is “sticky” and does not regress in-season the same way it regresses season-to-season. For example, in season $H does not regress toward projected $H as if ~3000 BF was the break-even point, but as if ~1400 BF was.
Could RoS 2.0 have improvements? Absolutely. Right now, I’m striking a balance between something that works and something that can be implemented automatically and easily. The goal of RoS (and ZiPS itself) is not only to be accurate, but also to have something that people can access at any time for free. There probably will be an update for next season, depending on practical considerations - Dave Appelman has a lot more on his plate than simply implementing my stuff.
For anyone who wishes to test the accuracy of ZiPS Rest-of-Season projections against their own, FanGraphs allows CSV/Excel downloads of rest-of-season projections. I encourage people to have at it - when you come up with a better method (and I don’t doubt it’s out there), our knowledge in the field is advanced.
Posted: June 09, 2011 at 02:12 PM | 3 comment(s)
Login to Bookmark