Baseball Primer Newsblog— The Best News Links from the Baseball Newsstand
Friday, July 22, 2011
whenever we use a skill-based metric like xFIP or SIERA….We are using a proxy for regression to the mean that doesn’t explicitly account for the amount of playing time a pitcher has had. We are, in essence, trusting in the formula to do the right amount of regression for us. And like using fly balls to predict home runs, the regression to the mean we see is a side effect, not anything intentional.
Paywall, of course, but interesting nonetheless. Any stat-savvy people want to comment?
|
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Miserable, Non-Binary Candy is all we deserve CoB Posted: July 23, 2011 at 03:10 AM (#3883706)Looks publicly viewable to me.
What multicollinearity affects is the power of your regression to detect effects. The interpretation of a regression coefficient (for say X1) is the impact of a one-unit change in X1 controlling for all the other Xs. If X1 and X2 are highly collinear, this independent effect of X1 (and the independent effect of X2) are harder to detect. All that really means is you need more sample size or, if you don't have that, it means that you can't tell which of those two variables is important.
However, in the case of a regression with quadratic and interaction terms, you simply don't care about collinearity because it is impossible to interpret the effect of X1 controlling for X1-squared.
Yhat = b0 + b1*X1 + b2*X1^2
dYhat/dX1 = b1 + 2b2*X1
So the effect varies with the value of X1 -- i.e. the relationship is curvilinear. That means that you are estimating a parabolic relationship between the Y and X1 and, at certain points on that curve, the impact of X1 will be positive, will go to zero, then will go negative (or vice versa). Wyers refers to such sign-switching as a "mistake" but it's not -- the quadratic model is assuming a different functional form from the linear.
From a strict testing perspective, all you care about is whether the coefficient b2 is statistically significant. If it is, then the quadratic provides a better fit than the linear. In short, the linear model is the "mistake" if b2 is significant.
Now Colin appears to be right _in the case of SIERA_ that the extra complexity provides such a minimal improvement to the overall fit that you might as well stick with the simpler linear model.
In essence, he at one point refers to the "fundamental" relationship between two variables but assumes (or strongly implies) that the fundamental relationship between variables is always linear. That of course isn't true. In the extreme, for the function y=k + x^2, there is no linear relationship (no correlation) between y and x whatsoever but obviously you can perfectly predict y if you know x.
Back to the equation, something similar occurs with interactions:
yhat = b0 + b1*X1 + b2*X2 + b3*X1*X2
dyhat/dX1 = b1 + b3*X2
dyhat/dX2 = b2 + b3*X1
So it's impossible and pointless to think about interpreting the impact of X1 independent of X2 (and vice versa) because you explicitly modelled it so you can't.
My main point here being: Don't throw #### into a regression equation unless you know how to interpret the results.
Some commenters on the BP thread are seeing a connection in the timing; Matt, SIERA's point man and apparent progenitor leaves for Fangraphs, and Colin knocks a swipe at SIERA, only his strongest statement boils down to "SIERA's good as 'em all, but it's not worth the upkeep."
BP is going back to FIP (not xFIP) as its primary ERA estimator.
Happy Base Ball
I also noticed that he criticized the "refinements" that Fangraphs is making to the original SIERA, saying that Fangraphs is "welcome to this demonstrably redundant measure." I don't have an interest in either side of this debate, but it is a rather tart response.
Some fights are worth fighting. The fight to replace batting average with better measures of offense was worth fighting. The fight to replace FIP with more complicated formulas that add little in the way of quality simply isn’t. FIP is easy to understand and it does the job it’s supposed to as well as anything else proposed. It isn’t perfect, but it does everything a measure like SIERA does without the extra baggage, so FIP is what you will see around here going forward.
If you haven't been following the debate (and I haven't) and aren't interested in the nuts and bolts (I sometimes am) there's no need to read further.
if you haven't been following the 'debate' about that particular stat, that article is completely incomprehensible.
But at some point, "specialist" literature has to assume you've been following the "debate" because it's aimed at a specialist audience. Not that sabermetrics is at a similar level but nobody expects articles in physics journals to rehash Einstein -- if you don't know Einstein, what are you doing reading the journal?
If a person doesn't know what FIP, xFIP, SIERA and whatever else are; isn't interested in how they differ; isn't interested in whether they produce different results or lead to different conclusions ... why is that person reading an article on the comparison of FIP, xFIP and SIERA?
I will agree it's needlessly long and other issues. In its way, it's overkill in the same way that SIERA seems to be overkill. I suppose that's partly the result of the monster we've created -- if we ##### at Conlin for not using OPS, some subset of us also ##### at BPro for not using "stat of the day". B-R, Bpro, fangraphs, etc. have to be able to defend the stats they choose to present.
I feel more comfortable using FIP (or in certain cases with pitchers who might have big outlier HR/FBs, xFIP) for my Vorosian evaluations.
Regarding the timing, yeah, it's hard to not see the timing of Colin's article as a slam on the SIERA metric Matt will be bringing to Fangraphs.
Happy Base Ball
Wouldn't be the first time I've been called cranky... I'll just point out that the third paragraph appears to be about 300 words into the article.
But at some point, "specialist" literature has to assume you've been following the "debate" because it's aimed at a specialist audience. Not that sabermetrics is at a similar level but nobody expects articles in physics journals to rehash Einstein -- if you don't know Einstein, what are you doing reading the journal?
Completely fair point. And one I thought about putting in in the prior post.
But at the same time, I don't see BP doing as much of the explanatory articles for the non-math, non-acronym folks. Fangraphs does it more frequently -- their stuff certainly can be much more approachable at times -- but I think that 75% (add your own error bar) of the writing out there is preaching to the converted -- or preaching to a sliver of the already converted.
Colin mentions my name in the article, and links to several minor league pitchers who share my name. I find it funny this case of mistaken identity happens all the time, yet nobody bothers to link to my football stats for the Dolphins.
Double post or not, I still expect a Happy Base Ball.
FWIW, the only word I am certain I could identify and define in Walt's first post was "Colin." That's probably another way of saying I haven't been following the debate.
http://vorosmccracken.com/?page_id=14
So if you have height and weight as variables in a study of 100 and almost everybody in the sample is of a "normal" height/weight relationship and only one or two are much heavier or lighter, those one or two samples will determine almost all of the coefficient for either height or weight.
I do not have a dog in this fight, but I found this piece interesting. It's very rare IMO in sabermetrics that someone asks, "What incremental value does this new formula have over the currently used formulas?". It's unfortunate that this bit of introspection--after much initial touting--was only prompted after a year-and-a-half by a defection to a competitor.
It may be that a behind-the-scenes determination that SIERA wasn't worth the upkeep was the reason for Matt's departure in the first place. So it's not necessarily an after-the-fact swipe. As far as I can tell, Colin's not the intellectually dishonest type.
"I think SIERA is interesting. I also think the differences between SIERA and xFIP are really quite small. I'm glad we have both on the site, but I'll probably keep referring to xFIP most of the time. In cases where the difference is actually meaningful, it will be nice to be able to refer to SIERA and say "here's why xFIP might be missing something on this specific guy", but overall, they're in agreement on 99% of all pitchers."
So I guess BPro isn't alone in seeing a bit of redundancy here, even if they do come off a bit like a scorned lover.
Not just that, but Colin's written about this before, and has posted plenty of comments about the topic on various sites. He very well might not have agreed with using SIERA on the site, or figured, while they have a guy there who wants to maintain it, fine, why pick that particular battle.
Not to worry, it was pure stat-geekery with no baseball content.
As far as I'm concerned, the prime example of the effects of collinearity in sabermetrics is the "BA doesn't matter" mistake. We could start by regressing (correlating) runs with BA and we'd find a strong, positive relationship. However, in the equation:
runs = b0 + b1*BA + b2*OBP + b3*SLG + e
the coefficient for BA is low and/or insignificant which would lead one to the conclusion that, once controlling for OBP and SLG, BA doesn't matter. The question is why. Part of the answer is that BA is the largest component of both OBP and SLG -- they are fairly highly collinear. So whatever impact BA has is being captured in OBP and SLG (but not vice versa). The rest of the answer is that the coefficients in that equation are not interpreted the way people think they are.
Second point first: What does it mean to vary OBP while holding BA and SLG constant? Basically it means you drew a walk, a clearly positive event. What does it mean to vary SLG while holding BA and OBP constant? Basically it means you added an extra base, a clearly positive event. What does it mean to vary BA while holding OBP and SLG constant? Try it sometime, it's not easy to do. But basically it means something like trading a HR and 2 walks for 2 doubles and a single (and even that probably doesn't work out right) which isn't clearly a positive thing -- but who the hell wants to know the impact of that?
So first point: We can rewrite the above equation as:
b1*BA + b2*OBP + b3*SLG = b1*BA + b2*(BA + ISO_OBP) + b3*(BA + ISO_SLG)
= (b1 + b2 + b3)*BA + b2*ISO_OBP + b3*ISO_SLG
So BA has a much larger coefficient than the other two. Also we now have more sensibly interpreted coefficients. A change in BA while holding ISO_OBP and ISO_SLG constant is a single; a change in ISO_OBP is still a walk; a change in ISO_SLG is still an extra base. The main problem of course being that those have different denominators which is just dumb so simplify as:
c1*(H/PA) + c2*(BB + HBP/PA) + c3*(XB/PA)
and you have a nice, clean, easy to interpret regression ... which will show you that H/PA has the largest coefficient. The various linear run estimators lead you to the same conclusion -- the value of a base is roughly constant regardless of how that base was obtained but every hit adds a substantial run-scoring bonus.
= (b1 + b2 + b3)*BA + b2*ISO_OBP + b3*ISO_SLG
o kayyyyyyyyyyy
can someone explain to me exactly what SIERA is looking for - leaving out math words i don't understand what they mean like coefficience?
i understand that FIP imagines that fielders don't matter because there is so little statistically significant difference between the best fielders and the worst that if you looked at any pitcher and pretended that he had the worst 8 fielders in the ML or the 8 best fielders in the ML he would have the exact same FIP even if he didn't have the same ERA because he wouldn't change a thing about how he pitches.
ok
i know you use FIP in fantasy baseball to decide what pitchers to pick, but is the change from FIP to SIERA enough to statistically significantly increase your winning percentage?
I can't believe neither this site nor BP comments mentioned what I think was a crucial takeaway from Colin's article:
That's an incredibly incisive indictment of skill-based metrics like FIP, xFIP and SIERA. It calls into question some of the basic tenets of why we are constructing skill-based metrics to begin with. It also opens up a possible avenue into a completely different set of statistics that properly regress its components.
This also opens up possibilities for better methods for analyzing model fit and predictive accuracy.
I think that Colin made an error by introducing these two main points in an article surrounding the SIERA vs. FIP, which then became a discussion regarding formula complexity. These two points are independent and distinct, and I'd love to see more discussion about the points made above, rather than the complexity issue.
There are much people around here who are waaaaay more qualified than me to answer, but since no one has chimed in yet:
FIP looks at HR, BB, and K rates and treats these as the basic skills. It doesn't look at how the effects of these skills interact.
So if a pitcher starts walking an extra guy per inning, his FIP goes up by 3, regardless of what his HR or K rates are.
SIERA tries to take into account how the skills interact. For example, if a HR-prone pitcher starts walking extra batters, his SIERA goes up more than if he were a low-HR guy.
I think that's the basic idea. The article introducing SIERA is here.
I agree on both points, although I think there are a couple of (quite fixable) problems with his critique.
One was here:
It seems to me that the most important claim of the article hasn't been tested properly. If you think the claims for SIERA "are as imaginary as they can get", then why not see if they still disappear when a non-crude regression is applied to each of ERA, xFIP, FIP and SIERA? For example, why not, for each predictor, regress to the SD that provides the lowest RMSE? If I want to know whether SIERA is measuring something that other predictors aren't, I want to see which is closest when each is at its best, not which is closest after each has been run through a crude filter. Now maybe he has done something like this; if so, I haven't found it yet, which may very well be my fault.
Also, one of the claims that the creators of SIERA made is that
If SIERA claims to measure the skill of certain pitchers particularly well, then one should probably test those separately before "moving on".
Isn't the purpose of regressing home runs to fly balls to find a proper regression? I don't understand what you mean here.
It definitely does call for more research regarding proper regression of analytic metrics like ERA, FIP, xFIP and SIERA. I think a viable finish to the article would be a call for further research in this area inasmuch as this is the first time I've seen this particular critique, and it's an extremely important matter for these metrics.
#26/tshipman: The argument xFIP makes is that HR/FB should be 100% regressed to league avg, essentially assuming that pitchers have very little ability to influence HR/FB. The fact that xFIP appears to predict ERA better than FIP supports this assumption. Colin is making the argument that the assumption is incorrect. xFIP does better than FIP simply because FB-rate has lower variance than HR/PA. Thus, by using lgAvg HR/FB * pitcher FB/PA, you reduce the variance in your metric. This makes the predictor better because you are "regressing to the mean."
The question with regression is always, "how much do you regress?" xFIP randomly chooses a regression amount based on the variance difference from HR/PA to FB/PA. This is most likely not the optimal amount to regress, but it's better than doing no regression (FIP). That's why xFIP does better than FIP. It's not because HR/FB is outside of pitcher ability, in fact, Colin gives data that clearly shows pitchers do control HR/CON has additional value over and above FB/CON.
He basically calls into question how effective all the newer estimators are outside of their ability to regress to the mean. Is FIP better because pitchers don't influence BABIP or because FIP regresses more than ERA? Is xFIP better than FIP because pitchers don't influence HR/FB or because xFIP regresses more than FIP? He gives evidence that the bulk of the improvement we see for xFIP and SIERA is due to regression, and also that the difference between FIP/xFIP/SIERA and ERA is much smaller than previously thought.
What he doesn't do is a more rigorous analysis (like Jittery is calling for above) of how much of the improvement is due to regression vs. actually reducing noise due to abilities outside of the pitchers' control. This is definitely the next step in this line of analysis. The step after that would be to rethink the way we calculate these metrics and how the different components regress independently, and come up with a new metric that does this properly.
For the record, even having a statistic like FRA to measure the value already contributed by the pitcher, I don't think it's a problem to have "mini projection systems" (as Colin insinuates that they are) like SIERA to fill the role of predicting future performance. Projection systems can't generally be run on a daily basis, so component-based statistics that purport to measure underlying skill can still be useful even if Colin's criticisms are taken to be accurate.
Okay. I see what you mean here now. You're right that it's interesting, but I don't know how much more information even a perfectly regressed statistic provides.
Thank you for explaining it.
Certainly, that glimpse of the "truth" can be improved upon by using better regression techniques. In my mind, that's what projection systems should do. However, I don't feel qualified to state whether SIERA, or any other stat, is an improvement or not. (and reading Colin's article, along with Matt's reply, won't help. Not enough brain cells in this old head of mine.)
You must be Registered and Logged In to post comments.
<< Back to main