Baseball Primer Newsblog — The Best News Links from the Baseball Newsstand ## Wednesday, February 25, 2009## Fack Youk: Gargiulo: He Also Picked Mickey Rourke…“He symbolizes everything that disgusts me. Obviousness. Unoriginal macho energy. Ladies man…”
Repoz
Posted: February 25, 2009 at 02:18 PM | 23 comment(s)
Tags: projections, sabermetrics, yankees |
## Reader Comments and Retorts

1. 1k5v3L Posted: February 25, 2009 at 02:46 PM (#3085743)Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.That's not what linear means, and the projection is not linear. In fact, it's not even declining at a constant non-linear rate. Any projection ought to predict his home run totals declining by year, with the understanding that random events will cause the actual outcomes to fluctuate around the decline, sometimes resulting in totals increasing.

The New Yankee Stadium (that's what we're supposed to call it, right?)The Bronx Bombshelter?

The House That Ruthless Avarice Built?

John Sterling's Liquor Emporium?

The author here.

JCB- Could you go a little bit further into why that's not linear, just for learning purposes?Fair point about the projection consistently declining and not accounting for random fluctuations. But if you are always assuming a decline, doesn't that skew the projection downward? If it starts at 33, that is his highest total going forward? Seems unlikely.

3. The problem with being on pace to be the greatest home run hitter of all time is that you aren’t going to have too many people similar to you.This is the most valid point to me. It's kind of like with Jamie Moyer - I don't pay any attention to his projection. How do you even project his numbers? All the pitchers his age I can think of are knuckleballers.

Note: After checking Moyer's b-ref page I see his nearest comp by age is Tommy John, who fell off the shelf for his age 46 season. Still, it is not much of a sample size.

Fair point about the projection consistently declining and not accounting for random fluctuations. But if you are always assuming a decline, doesn't that skew the projection downward? If it starts at 33, that is his highest total going forward? Seems unlikely.What's the alternative? Try to predict the up and down bumps along the trend line?

Projections aren't meant to get into that level of granularity. Doesn't stop the critics from using that as a means of outright dismissing them in favor of their infinite personal wisdom.

Obviously that would be foolish, and I'm not saying it should start at 55 either. But he published the results of the simulation on ESPN.com and opened up the discussion. It just one man's opinion, but I'm betting against it.

It just one man's opinion, but I'm betting against it.How brave. You get EVERY OTHER POSSIBLE COMBINATION OF HR TOTALS and he gets 1. Again, this shows you don't understand how projections work.

Okay... I'll take the over?

Here's what you need to do: Use whatever method you want and project EXACTLY what Silver does: how many more seasons A-Rod will play and the # of homers he will hit in each season.

Apples to apples, right? Then we can compare. Good luck.

This is the most valid point to me. It's kind of like with Jamie Moyer - I don't pay any attention to his projection. How do you even project his numbers? All the pitchers his age I can think of are knuckleballers.That's not how projections are done. Projections are primarily based on a large multivariate model run on all players. ZiPS and PECOTA (and maybe others now) then do a small tweak based on comparable players which allows them to essentially run a small, localized regression.

The projection for someone like Moyer isn't based on a handful of knuckleballers, it's based on thousands of pitchers. It is a fair point to say that there are only a handful of "points of support" out at Moyer's end of the age curve and it's possible that the model shifts dramatically for post-45 pitchers ... but there's no reason to think it does and so Moyer's projection is basically an extrapolation of the standard model. Or to put it another way, while there have been only a handful of 46-year-old pitchers in baseball, there have been a lot of pitchers (and a lot of "old" pitchers) who've had 3 seasons fairly comparable to Moyer's last 3 seasons.

Hmmm ... maybe that could have been clearer. Let's just put it this way. This is oversimplified but essentially Moyer's projection looks like this (this should roughly be Marcel):

xFIP09 = .5 xFIP08 + .3 xFIP07 + .2 xFIP 06 + .25 Over32

So a weighted average of his last 3 years' performance and .25 penalty for being a year older and being "old". In the real systems, those coefficients would be based on a regression featuring thousands of pitchers. All that PECOTA or ZiPS does is maybe tweak that number a bit based on comps so if Moyer's comps didn't decline quite as much as .25, he'll get a bit of a boost but his projection will be somewhere between the above and the comps. (PECOTA and ZiPS underlying models are more complex than the basic Marcel above).

Or as Dan says in the new ZiPS intro, the comps are mainly just for fun.

Anyway, I can't access Silver's original article -- did he present the confidence bands? I'm guessing things start looking pretty silly pretty quickly.

xFIP09 = (a) xFIP08 + (b) xFIP07 + (c) xFIP 06 + (d) age adjustment

where (a)-(d) are found from regressions over all players, but weighted by a similarity score to the player being projected, where the regression is unique to the individual and the coefficients could vary significantly from player to player (obviously, it depends on the scaling of the similarity score).

It just one man's opinion, but I'm betting against it.No, it's not. It's one man's "mean case" scenario. He's not saying that it's at all likely to actually come out exactly that way.

The standard error on the mean of a group will be quite small. So we can say with great confidence that players who've averaged a 120 OPS+ over the last 3 years will, on average, hit a lot better next year than the group who've averaged a 100 OPS+ over the last 3 years. But for any individual player, the projection is pretty much a crap shoot -- and that's one year out. Multi-year projections for an individual are kinda silly.

If memory serves, in the analysis of the Swisher-Betemit trade, although Swisher's mean projected OPS+ was about 15 points higher than Betemit's, Betemit still had something like a 40% chance of outhitting Swisher in raw terms (much less position-adjusted). Dan's confidence intervals aren't symmetric (good) but, roughly speaking, the SD on an OPS+ projection is about 15-20 points depending on how much data it's based on. The SD on the difference between any two players (assuming they're independent) is going to be about 20-30 points of OPS+.

We really shouldn't express as much confidence on player comparisons and projections as we do.

I'm reminded of some folks who were trying to sell some software for analyzing mortgages. They were very proud of their interest rate projection model. They boasted about how it captured the big downturn in interest rates in the early 00s. It was hilarious. All their model did was take the interest rate from 5 years before and project it forward to be the same with increasing variance. It was a random walk. 5 years out, they were projecting that interest rates would be somewhere between 2% and 16%. Well, duh, anybody in the room could have told them that. (And you wonder how the subprime mess came to pass.)

Hmm, if I actually wanted to go to his site, I'd accept his wager of $100 and put forth mine of 46¢.

Heeeeeey Fridas. Guess what?

http://fackyouk.blogspot.com/2009/02/dueling-projections.html

You got your wish!

