Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Wednesday, November 11, 2009

Mike Silva: WAR doesn’t actually lead to real wins in the standings

The Mets looking at Pat Burrell is funny, but not nearly has funny as another case for Mike Cameron- does anyone here watch this game or remember Cameron’s first stint? Guys, hate to break this to you, but WAR doesn’t actually lead to real wins in the standings- you know that right? Part of me would pay to see these guys run a team, I think it might be for some good copy at the very least.

***

There brand is suffering and Mike Cameron isn’t about to help it. It’s about winning, but also attracting customers (i.e. fans) to the seats. The Yankees actually do both. While they field all stars Mets fans sit around and rationalize secondary tier players. Do you think the Yankees rationalize a secondary player because of WAR? Of course not thats why they signed Burnett, Sabathia, Teixeira, etc.

*blank look*

Freeballin' (Tales of Met Power) Posted: November 11, 2009 at 12:25 AM | 105 comment(s) | Login to Bookmark
  Related News: GeneralSabermetricsMilwaukeeNY MetsNY YankeesTampa Bay

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 2 pages  1 2 > 
   1. Halofan Posted: November 11, 2009 at 12:51 AM (#3384526)
Mike "There Brand" Silva
   2. snapper (history's 42nd greatest monster) Posted: November 11, 2009 at 01:02 AM (#3384539)
Really? Because I'm pretty sure if you added up the WAR of the Yankee players it would be a lot higher than the Mets.
   3. Enrico Pallazzo Posted: November 11, 2009 at 01:06 AM (#3384548)
Good grief, be sure to read all of the comments on his site. It's painful. And it seems like one of the commentors can't wait to see what you guys have to say about this. Though by now, I wouldn't be surprised if you'd all just given up on harassing Silva for his ineptitude.
   4. Shooty: Applying to be Fearless Leader Posted: November 11, 2009 at 01:10 AM (#3384551)
WAR, good god y'all, what is it good for?
   5. SoSHially Unacceptable Posted: November 11, 2009 at 01:10 AM (#3384552)
Mike "There Brand" Silva


It's in the comments, so I'm more inclined to give him a pass on that than on "has funny."
   6. Shooty: Applying to be Fearless Leader Posted: November 11, 2009 at 01:16 AM (#3384557)
And it seems like one of the commentors can't wait to see what you guys have to say about this.

Is that Joey a primate? Out yourself!
   7. James Kannengieser Posted: November 11, 2009 at 01:24 AM (#3384566)
Seems like whenever traffic at New York Baseball Digest bottoms out, Silva turns to old faithful: saber-trolling. It's like that girl in high school no one pays attention to until she starts putting out.
   8. Harry Balsagne's transparent jealousy Posted: November 11, 2009 at 01:24 AM (#3384567)
And it seems like one of the commentors can't wait to see what you guys have to say about this. Though by now, I wouldn't be surprised if you'd all just given up on harassing Silva for his ineptitude.

Honestly, I can barely discern his screed from anyone else's. He's just another droning anus in the MSM choir as far as I'm concerned. The only thing special about Silva is that he decided to create a BBTF profile and openly troll on the site. Otherwise I think his act would hardly even register here.
   9. pkb33 Posted: November 11, 2009 at 01:58 AM (#3384596)
Part of me would pay to see these guys run a team, I think it might be for some good copy at the very least.

Being in the hands of 'these guys' in the front office has worked out just fine in Boston, thanks.
   10. Dan The Mediocre Posted: November 11, 2009 at 02:04 AM (#3384601)
The only thing special about Silva is that he decided to create a BBTF profile and openly troll on the site.


I believe the correct term is "Obvious troll is obvious."
   11. tl; dr (Voxter) Posted: November 11, 2009 at 02:25 AM (#3384616)
Good God, can we stop linking to this dimwit? He's a bottom-feeder.
   12. sunnyday2 Posted: November 11, 2009 at 02:38 AM (#3384621)
WAR doesn’t actually lead to real wins in the standings


Well, actually, this is correct, isn't it? WAR doesn't lead to anything. It is a way of measuring what has already happened, I think. Sort of like batting average and rbi.

Or I could just say ditto #2.

I think it might be for some good copy at the very least.


Good copy which I, Mike Silva, would be incapable of writing.
   13. zonk Posted: November 11, 2009 at 02:52 AM (#3384627)
If not for trouble makers like Mike Silva, we would finally be able to do away this 'playing the game' thing - which, as we all know, causes us great pain and anguish.

I said we should have gotten spaceships, a hot chick to front us, and promised everyone free health care... but nooo... you guys wanted to go the the internet route.

Now, the resistance, fronted by people like Silva remain a constant thorn in our grand scheme.
   14. Hugh Jorgan Posted: November 11, 2009 at 03:05 AM (#3384635)
WAR, good god y'all, what is it good for?

According to Silva....Absoluting nothing, huh!
   15. Dock Ellis on Acid Posted: November 11, 2009 at 03:08 AM (#3384637)
Tolstoy wrote that.
   16. The District Attorney Posted: November 11, 2009 at 03:44 AM (#3384650)
It was his mistress who convinced him to call it <u>War and Peace</u>.
   17. Alex meets the threshold for granular review Posted: November 11, 2009 at 04:28 AM (#3384661)
He did not need inspiration. God spoke through his pen.
   18. Darren Posted: November 11, 2009 at 04:35 AM (#3384666)
What... is.... that.... noise?
   19. Dan The Mediocre Posted: November 11, 2009 at 04:43 AM (#3384668)
Well, actually, this is correct, isn't it? WAR doesn't lead to anything.


WAR correlates to actual wins fairly well (r^2=.83) That's a very strong correlation, so it's not as if it's abstract.
   20. Tom Nawrocki Posted: November 11, 2009 at 04:51 AM (#3384670)
It's an interesting question, if you can put aside the snark for a minute: Is WAR measuring actual team wins? Let's take a look at those Yankees, who according to Fangraphs accumulated 56.9 WAR. Since they won 103 games, we can use this figure to assess replacement level at around 46 wins, or 46-116. That sounds about right.

Let's see how the rest of the AL East measures up. The Red Sox totaled 51 WAR, which should give them 97 wins - and they won 95, which isn't too far off. The Rays also totaled 51 WAR, which should also give them 97 wins - and they won 84 games. Oops.

To finish out the division, the Blue Jays had 39 WAR, which should give them about 85 wins - and they won 75 games. The Orioles had 23 WAR, which adds up to 69 wins, and they really won 64. At this point, that's a pretty good result.

Maybe I have not pegged replacement level properly. (I looked around on the Fangraphs site and couldn't find anything saying how many games a replacement level team should win, but it's possible I missed it.) The average team in the AL East won 84 games, and had 44.2 WAR, which would suggest replacement level is 40 wins. That would give the Yankees 97 projected wins, the Red Sox 91, the Rays 91, the Jays 79 wins, and the O's 63. Now we're getting somewhere; four of the five teams are within four wins of their actual results. But we still have a major problem here: The Red Sox and Rays were separated by 11 games in the standings, yet totaled the same amount of WAR. This really can't be. (This, by the way, is how you end up with people saying nutty things like Ben Zobrist was the MVP of the league.)

Let's check another division, the NL West. The average team there won the same 84 games, and earned 35 WAR. Right away, we see another problem: We've got the same average number of wins as in the AL East, but almost ten less WAR per team. Here we have replacement level pegged at 49 wins, 49-113. Here are the NL West teams:

Dodgers: 43.3 WAR, leading to 92.3 projected wins, 95 actual wins.
Rockies: 42.3 WAR, leading to 91.3 projected wins, 92 actual wins.
Giants: 34 WAR, leading to 83 projected wins, 88 actual wins.
Padres: 21.7 WAR, leading to 70.7 projected wins, 75 actual wins.
Diamondbacks: 33.5 WAR, leading to 82.5 projected wins, 70 actual wins.

Four of the teams, again, are within five wins, except that Arizona is way out of whack. Like Tampa Bay, the Arizona players are going to be seriously overvalued by this system, which thinks they're better than average rather than a last place team. The Diamondbacks accumulated half of one WAR less than the Giants, who won 18 more games.

Now I'm no expert in this stuff, and I will fully grant that the Fangraphs guys know way more about all this than I do. It's very possible that I am missing something. For one thing, I am sure I am not figuring replacement-level wins the way I'm supposed to, but that wouldn't explain the Rays and Red Sox with identical WAR, or the Giants and Diamondbacks with nearly identical WAR. If I'm wrong about this, feel free to explain to me why, but I have to come to the conclusion that WAR isn't really measuring wins.
   21. Ginger Nut Posted: November 11, 2009 at 05:05 AM (#3384675)
What WAR actually counts is the number of runs you would expect the events a player is responsible for to contribute to his team (or save), and then as a second-order calculation the number of wins that should result in. But all of these are on average, not a prediction of how every team will do. So in order to figure out if WAR is really measuring expected wins in a useful way, you'd have to run a regression analysis to see if there's a statistically significant correlation between WAR and the actual number of games won, with a sufficiently high number of teams in the data set. You would EXPECT that there will be some outliers; teams that overperformed or underperformed their WAR. That's how randomness works; some teams will win a lot of one run games, some teams will have their run-producing (or preventing) events distributed through bad luck in ways that are disadvantageous to them. So you would fully expect that sometimes you will see strange things, like, team A and team B having the same WAR but very different win totals. Just as with pythagorean won-loss, we expect that there will be some outliers every year and once in a while a real doozy.
   22. Walt Davis Posted: November 11, 2009 at 05:11 AM (#3384677)
I have to come to the conclusion that WAR isn't really measuring wins.

I'm not a fan of uber-statistics so won't really defend WAR but ... c'mon, you know this is silly.

Or maybe you believe that run differential and outscoring your opponents isn't how you measure wins either.

The Marlins out-scored their opponents by 4.8 to 4.7 runs a game. They won one more game than the Atlanta Braves who outscored their opponents by 4.5 to 4.0 runs a game. The Astros were outscored 4.0 to 4.8 runs a game yet won 12 more games than the Pirates who were also outscored 4.0 to 4.8 runs a game.
   23. Robinson Cano Plate Like Home Posted: November 11, 2009 at 05:18 AM (#3384680)
A .333 batting average doesn't actually mean you got one third of a hit each time up.
   24. Tom Nawrocki Posted: November 11, 2009 at 05:19 AM (#3384682)
Or maybe you believe that run differential and outscoring your opponents isn't how you measure wins either.


I believe you measure wins by counting how many games each team has won.

Look, if you think that this stat is measuring something significant, that's great. But if the Wins Above Replacement for the players on the 2009 Diamondbacks bear no relation to the games actually won by the 2009 Diamondbacks - and they don't - then don't give yourself more credit than you've earned by calling them Wins. Call them Cameron Points or Ultimate Team Contribution or something. But they're not Wins.
   25. Tom Nawrocki Posted: November 11, 2009 at 05:23 AM (#3384685)
A .333 batting average doesn't actually mean you got one third of a hit each time up.


Hey, good one. But it does mean you got an "average" of a third of a hit every time up, hence the name.
   26. sunnyday2 Posted: November 11, 2009 at 05:34 AM (#3384689)
Well, actually, this is correct, isn't it? WAR doesn't lead to anything.

WAR correlates to actual wins fairly well (r^2=.83) That's a very strong correlation, so it's not as if it's abstract.


You missed my point, which was semantic, I guess. It's a question of what comes first. The event comes first, the WAR comes after as a means of describing what happens. So, no, WAR doesn't lead to wins, wins (and losses) lead to WAR.
   27. Ginger Nut Posted: November 11, 2009 at 05:38 AM (#3384691)
Tom, they're expected wins. Why is it useful? Because teams often get lucky or unlucky, and it's useful to know that when evaluating a team (i.e., a team with 35 WAR that wins 90 games got very lucky--need to know that for next year). In addition, for each individual player it tells us how much the player's actions were worth on average, so-- it's a tool for evaluating expected performance, e.g., player A had 110 RBIS--that's how many runs he contributed in the real games, but was he lucky? was it lineup position? how much were his actions on the field worth on average? Do we trade for this guy? Sign him to an extension? etc.

If a team scored 750 runs and gave up 780 runs and had a winning record, do you think that team should expect to stand pat and have a wining record again the following season? Or should they recognize that they got lucky and act accordingly? Yes, you measure wins by counting how many games each team won, but you measure expected wins more effectively by looking at the batting and defensive stats for each play throughout the season, which in baseball unlike other sports happens to be very easy to do.

The position you're taking is akin to saying that the only stats that matter are RBIs and runs scored--after all, those are the runs that actually scored, right? So why do we care about this "batting average"? We can measure how many runs a player actually contributed in the real games by looking at his RBIs and runs scored! That's pretty much exactly what you're saying about team wins.

These are pretty much the most basic starting issues of intelligent baseball analysis, explained very well I think in Pete Palmer and John Thorne's book The Hidden Game of Baseball, among other places, so I'm sort of confused by why such a reliably intelligent and well informed poster as yourself is inciting this discussion.
   28. Eric J is Financed by a Rich Grandpa Posted: November 11, 2009 at 05:38 AM (#3384692)
wins (and losses) lead to WAR.

Or, in concession to Tom's complaint, the things that tend to lead to wins and losses lead to WAR.
   29. SoSHially Unacceptable Posted: November 11, 2009 at 05:46 AM (#3384693)
The position you're taking is akin to saying that the only stats that matter are RBIs and runs scored--after all, those are the runs that actually scored, right?


No, Tom's point, I believe, is to call them something different. The names RBIs and Runs Scored describe perfectly what they measure. The names given to some other more recent statistical measures have a tendency to overstate what is is they measure, which kind of invites these kinds of screeds from the Silvas of the world.
   30. Benji Gil Gamesh is not being paid to be that guy Posted: November 11, 2009 at 06:06 AM (#3384694)
Is Silva the "stat zombie" guy? Or am I getting him confused with someone else?
   31. An Athletic in Powderhorn Posted: November 11, 2009 at 06:17 AM (#3384699)
Yep, he's the guy your handle's based on. He's also suggested that stat zombies don't really like baseball, they just like numbers. Delicious, delicious numbers.
   32. Repoz Posted: November 11, 2009 at 06:25 AM (#3384700)
Is Silva the "stat zombie" guy? Or am I getting him confused with someone else?

Fun-lovin' Paul Lebowitz is the "stat zombie" guy.

Zombies!
   33. Tom Nawrocki Posted: November 11, 2009 at 06:29 AM (#3384703)
Because teams often get lucky or unlucky, and it's useful to know that when evaluating a team (i.e., a team with 35 WAR that wins 90 games got very lucky--need to know that for next year).


So are you saying that the difference between WAR and actual wins is mere luck? That the Rays and Red Sox were of identical quality in 2009? Do you think the 2009 Diamondbacks were an above-average team? It boggles my mind that people can look at discrepancies like that and assume that the wins themselves are somehow wrong, rather than the statistical analysis that led to such conclusions. If this measure thinks the 88-74 Giants and the 70-92 Diamondbacks were of roughly equal quality, don't you start to wonder if maybe there's something amiss in the process that led to that evaluation?

Yes, you measure wins by counting how many games each team won, but you measure expected wins more effectively by looking at the batting and defensive stats for each play throughout the season, which in baseball unlike other sports happens to be very easy to do.


It is certainly not "very easy" to evaluate the defensive stats for each play. Different defensive metrics can provide very different results.

We can measure how many runs a player actually contributed in the real games by looking at his RBIs and runs scored! That's pretty much exactly what you're saying about team wins.


What I'm saying is that assuming that the performance of the 2009 Rays was equally as good as that of the 2009 Red Sox is a poor assumption to make. There is no reason in the world to think that (the two teams weren't even that close in Pythagorean record), other than that's what WAR tells us. And I'm saying that any further assumption derived from that initial estimate is very likely wrong.
   34. Jeff K. Posted: November 11, 2009 at 06:40 AM (#3384704)
Hell, Lebowitz is still using that line. From his review of his preseason predictions:

The Twins always seem to outplay their projections whether they're formulated by stat zombie tenets or the way I come to my conclusions----don't ask.

Don't worry, I won't.

(EDIT) Love this gem:

I vacillated on the Tigers. There were three ways for them to go. Either they were going to utterly collapse into 2008 Padres/Mariners territory and lose over 100 games; they were going to be somewhere in the middle; or they were going to have a bounce back year and contend.

Why yes, up, down, or sideways *are* three ways to go. ####### slap me with a trout.
   35. Jeff K. Posted: November 11, 2009 at 06:43 AM (#3384705)
Okay, which one of you ####### Mets fans is teasing him with crank emails?

There was a rumor that the framework of a three-way deal was being discussed by the Mets, Cubs and Blue Jays that would send Luis Castillo to the Cubs; Milton Bradley to the Blue Jays; and Lyle Overbay to the Mets.
   36. greenback Posted: November 11, 2009 at 06:48 AM (#3384706)
What I'm saying is that assuming that the performance of the 2009 Rays was equally as good as that of the 2009 Red Sox is a poor assumption to make. There is no reason in the world to think that (the two teams weren't even that close in Pythagorean record), other than that's what WAR tells us.

Pythagorean doesn't get quite as fine as WAR does though. My first guess here would be that Boston was very good with RISP (or Tampa Bay was bad), and a quick check of B-R's splits (here and here) confirms that. So at the very least I'd say there's some reason besides WAR to think the 2009 Rays and Sox were closer than what their raw W-L records or Pythags suggest.
   37. NYCTigersfan Posted: November 11, 2009 at 06:53 AM (#3384708)
So are you saying that the difference between WAR and actual wins is mere luck?

Tom, I think your overall point is valid but I don't think comparing to wins is very relevant. As Ginger Nut noted, WAR doesn't purport to predict wins, it purports to predict things that -- broadly speaking -- contribute to wins: run creation and run prevention.

I'm not an investment banker so this may be flawed, but I look at the difference between evaluating a team by WAR rather than wins as similar to valuing a company by EBITDA rather than income.
   38. David Cameron Posted: November 11, 2009 at 07:09 AM (#3384710)
Tom,

I did a post over at post at FanGraphs about this not too long ago. As noted earlier in the thread, the correlation between projected win total using WAR and actual wins was .83. That's a far cry from the "no relation" that you're suggesting.

But, let me offer up an example. WAR is a measure of an individual's performance without regards to context or the performance of others. It just measures what that guy contributed to the team's goal of winning. Think of it like a production line - if all the guys who are making parts for the computer do their job just fine, but the one assembler at the end doesn't put the pieces together, you have zero finished products, but you wouldn't want to suggest that your entire line was producing nothing.

WAR is measuring how well the individuals create inputs. That the total output doesn't match the total inputs doesn't invalidate the production of the inputs.

Most of the variance that is apparently giving you problems is due to WAR ignoring the timing of events. Because it basically boils down to a lot of different context neutral linear weights, it assumes that the distribution of events will be equal. This is obviously not true in real life, and is the driving cause of the variation between WAR and actual wins. Any statistic that ignores the timing of related events is necessarily going to give up the ability to match what actually happened. If that's the test you're going to apply to statistics, then you'll reject every context neutral metric out there.

However, there are a lot of reasons we want context neutral statistics. They do a better job than situational dependent statistics in projecting nearly everything. They are a better measure of true talent level. They serve a real, significant purpose, but by design, they will not match up with what happened in a season. It's not a flaw - it's a by-product of the design.

It's not evidence of a problem that the Red Sox and Rays had the same WAR but wildly different actual win totals. It's that determination that allows us to notice that the significant gap between the two teams was far more about when the players performed than how they performed. That matters in the final standings, but far less so when thinking about the future.
   39. Jeff K. Posted: November 11, 2009 at 07:10 AM (#3384711)
That's not quite right. To be better, you'd need to have a valuation metric that uses WACC, if WACC were less exactly defined than it is. You need two things: the concept of baseline, and properly weighting outputs by the inputs used to obtain them. EBITDA and income do neither.

Something like Earnings/Working Capital - GDP growth would give you a % number that could be called Cents Against Replacement per Dollar (CARD).
   40. Tuque Posted: November 11, 2009 at 07:18 AM (#3384714)
I just want to say, on a purely intellectual level, the comments on this thread are as fascinating as any article that's been posted all year.

When you get down past the occasional machismo, trolls, and flame wars (it is the Internet), BBTF is an excellent site.
   41. Jeff K. Posted: November 11, 2009 at 07:22 AM (#3384715)
The issue, of course, in comparing financial valuation metrics and baseball metrics is that in raw terms, financial metrics are built out of millions of tiny transactions just like baseball metrics, but the small transactions in business are measured in the same terms as the big ones. This isn't true in baseball. If I want to know earnings for ACME, I could add up the orders from retailers for ACME goods and subtract out the returns. If I want to know the number of hits for the Yankees, I can add up the hits for their hitters. But "Hits total" isn't the be-all, end-all stat for teams, it's wins. Yet I don't measure each plate appearance in wins. So I have to figure out how to get to wins from hits. And then decide whether the approximation of wins I can get from hits is good enough. If not, time to use another stat, or a combination.

It's the conversion from hits to wins that doesn't exist in finance, and what leads to consternation. Believe me, if and when the time comes that business as a whole looks at marketing dollars and their resultant impact on sales, something I wrote a paper in one or another finance class about, you'll see the same consternation. It's a dirty little secret, but ad dollars for MSM outlets are terrible uses of money, but the industries relying on those dollars (network TV, newspapers, magazines) have succeeded in avoiding deep questions. When the first real wave of analysis comes, you can bet the first words out of their mouths will be complaining about the weights applied to the inputs that determine the final value of each campaign.

And that's still in dollars spent/dollars returned totals. To really be like baseball, ad buys would have to be paid for in yen while sales are in dollars. Silva and his ilk would, even given perfectly proven relationships between ad yen spent / $ sales return, be complaining about the exchange rate. This ignores the fact that no matter what the exchange rate is, it's the relationship that matters. The exchange just takes the relative value that the raw numbers and relationship give you and marks it to a common end.
   42. Drew (Primakov, Gungho Iguanas) Posted: November 11, 2009 at 07:52 AM (#3384716)
The only thing special about Silva is that he decided to create a BBTF profile and openly troll on the site.


Oh god, I remember this. It was complete hilarity.
   43. Morally Excellent Posted: November 11, 2009 at 10:08 AM (#3384723)
WAR doesn’t actually lead to real wins in the standings


I doubt that.
   44. Avoid running at all times.-S. Paige Posted: November 11, 2009 at 10:15 AM (#3384726)
Most of the variance that is apparently giving you problems is due to WAR ignoring the timing of events. Because it basically boils down to a lot of different context neutral linear weights, it assumes that the distribution of events will be equal. This is obviously not true in real life, and is the driving cause of the variation between WAR and actual wins. Any statistic that ignores the timing of related events is necessarily going to give up the ability to match what actually happened. If that's the test you're going to apply to statistics, then you'll reject every context neutral metric out there.


Then is it fair to say that WAR would have a much closer relation to Pythagorean record since that also ignores the timing of events?
   45. tjm1 Posted: November 11, 2009 at 12:23 PM (#3384730)
It's not evidence of a problem that the Red Sox and Rays had the same WAR but wildly different actual win totals. It's that determination that allows us to notice that the significant gap between the two teams was far more about when the players performed than how they performed. That matters in the final standings, but far less so when thinking about the future.


Well, the Red Sox scored about 70 more runs than the Rays, and allowed 18 fewer runs. It could be that the difference is all context/timing, or it could be that there are some minor flaws in the weighting of the events. It's probably some combination of the two.
   46. bunyon Posted: November 11, 2009 at 12:47 PM (#3384739)
Then is it fair to say that WAR would have a much closer relation to Pythagorean record since that also ignores the timing of events?


This seems a good question; what is the WAR correlation with Pyth. wins? Is it higher or lower than 0.83?


And, by the way, 0.83 is not equal to 1. I agree with Tom's point completely, if a stat says a 70 win team and 90 win team are roughly equivalent, it's missing something important. If it were just random flucutation, you'd expect a lot of 70 win teams to turn around into 90 win teams the next year and vice versa; in real life that just doesn't happen.


I'm not well versed in advanced metrics but think they're very useful. But the easiest thing to do in science is make a conclusion grander than the data.

Well, the Red Sox scored about 70 more runs than the Rays, and allowed 18 fewer runs. It could be that the difference is all context/timing, or it could be that there are some minor flaws in the weighting of the events. It's probably some combination of the two.

This seems very reasonable.
   47. Freeballin' (Tales of Met Power) Posted: November 11, 2009 at 01:22 PM (#3384747)
A lot of very reasonable explanations on this thread for why total WAR and wins do not correlate exactly. Another that I don't see, the expression of which might unruffle some feathers -- WAR is not a perfect measure of what it attempts to measure. It's imperfect. It may include imprecisely measured defensive contributions, not properly account for "clutch" or "unclutch" tendencies if there are such things, and may not weight its components exactly right.

In other words, it does a very good, though not perfect, job of measuring the things that players do on both sides of the ball that contribute to wins. Obviously a team wants wins before it wants WAR, but when deciding what players to add or let go, it's a useful tool in evaluating their worth on the field.
   48. Shooty: Applying to be Fearless Leader Posted: November 11, 2009 at 01:35 PM (#3384751)
In other words, it does a very good, though not perfect, job of measuring the things that players do on both sides of the ball that contribute to wins. Obviously a team wants wins before it wants WAR, but when deciding what players to add or let go, it's a useful tool in evaluating their worth on the field.

This is right, to me. I like WAR because it gives me a shorthand way of evaluating a player's total value to his team, even if I am a bit skeptical of defensive stats. I certainly don't use WAR as a hammer in arguments because there are some vagaries there about replacement level and defense and such, but WAR is a great place to START a discussion. Which, bringing it back to Silva and his ilk, it's kind of sad that they don't embrace it. WAR is a great conversation piece. If I'm a sportswriter, I'd probably spend an hour a day at fangraphs looking for anomalous stats to build a piece around. Is Ryan Sweeney better than Torii Hunter? Who is Ben Zobrist and was he the best player in baseball in 2009? And the thing is, these articles wouldn't even be hard to write. Maybe not as easy as casting stones at basement dwellers, but not really hard, either. Which makes me wonder why some of these guys even write about baseball. Nothing about the sport seems to interest them. Do they just like having an audience? Do they love the sound of their own voices that much? It's very puzzling.
   49. snapper (history's 42nd greatest monster) Posted: November 11, 2009 at 02:09 PM (#3384772)
This seems a good question; what is the WAR correlation with Pyth. wins? Is it higher or lower than 0.83?

I think it should be higher, b/c it removes the uneven distribution of runs across games from the situation, but still far from a perfect fit b/c pythag wins still includes the uneven distribution of run scoring events, i.e. hitting with RISP vs. not.

Basically, WAR takes all the component plays that lead to run scoring/prevention and calculates how many wins they'd generate if they were distributed randomly across games and game states.

It won't jive with pythag wins b/c run scoring events are not distributed randomly within game states (i.e. HR/BB/BB/K/K/K produces a very different run outcome than K/K/BB/BB/HR/K but looks the same to WAR).

It won't jive with actual wins for the same reasons, and, additionally b/c of the non-random distribution of runs across games, e.g. unusually good records in 1-run games, or an abnormally large # of blow-outs.
   50. Barnaby Jones Posted: November 11, 2009 at 02:20 PM (#3384780)
Then is it fair to say that WAR would have a much closer relation to Pythagorean record since that also ignores the timing of events?


Pythagorean record ignores the timing of runs, but not events generally. WAR will always treat hits/walks/etc. the same way (all else being equal: park factors, yadda yadda) the same way, whereas Pythagorean record only accounts for them if they turn into runs.
   51. Who wants to know? Posted: November 11, 2009 at 02:29 PM (#3384795)
I agree with Tom's point completely, if a stat says a 70 win team and 90 win team are roughly equivalent, it's missing something important.


What about a stat that says 100 teams should win about 70 games, and 99 of them do, but 1 of them wins 90 games?

In other words: We shouldn't just point to one example of teams with similar WARs having very different win totals as evidence that WAR is fundamentally flawed. We need to know how common examples like that are.
   52. Ginger Nut Posted: November 11, 2009 at 02:49 PM (#3384812)
No, Tom's point, I believe, is to call them something different. The names RBIs and Runs Scored describe perfectly what they measure. The names given to some other more recent statistical measures have a tendency to overstate what is is they measure, which kind of invites these kinds of screeds from the Silvas of the world.


It probably would be more accurate to call it "Expected Wins above Replacement," but EWAR is less catchy as an acronym. How about Projected Optimal Wins Exceeding Replacement?

B Pro has VORP, which goes with "value" rather than wins or runs, but then they also use "WARP" which is back to "wins".
   53. Ginger Nut Posted: November 11, 2009 at 03:10 PM (#3384828)
I agree with Tom's point completely, if a stat says a 70 win team and 90 win team are roughly equivalent, it's missing something important.


If a stat tells us something surprising, the appropriate question is, does this mean the stat is missing something or that it's telling us something new that we should pay attention to? The answer is probably a combination of both. No "meta stat" is going to capture everything, so there are probably some events from each team's season that are worth maybe a couple of wins that aren't included. It's not in fact implausible that the rest of the difference can be explained by random chance. Here was the original objection:

Dodgers: 43.3 WAR, leading to 92.3 projected wins, 95 actual wins.
Rockies: 42.3 WAR, leading to 91.3 projected wins, 92 actual wins.
Giants: 34 WAR, leading to 83 projected wins, 88 actual wins.
Padres: 21.7 WAR, leading to 70.7 projected wins, 75 actual wins.
Diamondbacks: 33.5 WAR, leading to 82.5 projected wins, 70 actual wins.

Four of the teams, again, are within five wins, except that Arizona is way out of whack. Like Tampa Bay, the Arizona players are going to be seriously overvalued by this system, which thinks they're better than average rather than a last place team. The Diamondbacks accumulated half of one WAR less than the Giants, who won 18 more games.


So WAR thought the Giants were about 5 wins worse than their actual record for the season. Let's say WAR failed to account for 2 wins worth of events (let us suppose baserunning and defence, which maybe it isn't good at capturing), and the other three wins can be explained by random good luck. On the other side, your have the D-Backs underperforming what WAR would predict by 12.5 games. So let's say WAR missed about 2 and a half wins worth of significant events (maybe the D-Backs defence was worse than WAR thinks it is, for example, and maybe they were really bad baserunners or something like that), then we have 10 losses to explain by "bad luck." Is that really impossible? No, it's not. If WAR were consistently doing this, then yeah, that would raise a lot of doubts, but in this case the interesting point seems to be that the D-Backs MIGHT be a lot better than their record from last year. Since the D-Backs were predicted to win the division by many preseason prognosticators, I don't think that possibility should be viewed as ridiculous out of hand.

On the other hand, even a good analytic system is going to have some outliers. People who say that one example of WAR being off from the real win total by 10 wins or so proves the system is deeply flawed don't seem to understand how variation works. It doesn't mean that a team's win total will be exactly what WAR expects, it simply means that it's more likely to be close to what WAR expects and increasingly less likely, but still possible, the farther you get from what WAR expects. But in a normal distribution you would fully expect that there will be some teams at the extremes of each end of the bell curve. WAR and similar stats just tell us where the middle of that curve is located. It's like growing bean sprouts: you'll get a few really tall ones and a few really short ones, this doesn't mean that genetics "doesn't work" or that bean sprout height is "nothing but luck," it means that most of the plants will be clustered close to the mean and a few will be farther away.
   54. Eric J is Financed by a Rich Grandpa Posted: November 11, 2009 at 03:27 PM (#3384844)
This seems a good question; what is the WAR correlation with Pyth. wins? Is it higher or lower than 0.83?

With a quick run of the numbers, I get correlations of .82 to wins and .88 to Pythagorean wins from 2009 data.

Splitting things up by league improves that a decent amount:

2009 NL: .86 with wins, .95 with Pwins
2009 AL: .88 with wins, .91 with Pwins
   55. SG Posted: November 11, 2009 at 03:34 PM (#3384854)
One thing to keep in mind with Fangraphs' WAR is they use FIP to determine pitching value. I don't know if anyone over there has done any analysis to see how close fielder UZRs + pitcher FIPs gets you to actual runs allowed, but that could skew team totals for WAR depending on any discrepancies there.

Even though we should assume most pitchers will regress towards league average BABIP, there are outliers in either direction where that's just not true.
   56. snapper (history's 42nd greatest monster) Posted: November 11, 2009 at 03:47 PM (#3384873)
One thing to keep in mind with Fangraphs' WAR is they use FIP to determine pitching value. I don't know if anyone over there has done any analysis to see how close fielder UZRs + pitcher FIPs gets you to actual runs allowed, but that could skew team totals for WAR depending on any discrepancies there.

Even though we should assume most pitchers will regress towards league average BABIP, there are outliers in either direction where that's just not true.


Great point! And even the guys who out or underperform their FIP through random chance actually prevent or allow those runs. So, that's a huge source of error between WAR and runs allowed.
   57. CW hits the pinata for the candy Posted: November 11, 2009 at 03:56 PM (#3384887)
B Pro has VORP, which goes with "value" rather than wins or runs, but then they also use "WARP" which is back to "wins".


No. VORP is denominated in runs.
   58. SG Posted: November 11, 2009 at 04:06 PM (#3384897)
And even the guys who out or underperform their FIP through random chance actually prevent or allow those runs. So, that's a huge source of error between WAR and runs allowed.


Well, only if that difference isn't picked up by the defensive metric component of WAR, in this case UZR. That's what I'm unsure of.
   59. snapper (history's 42nd greatest monster) Posted: November 11, 2009 at 04:10 PM (#3384901)
Well, only if that difference isn't picked up by the defensive metric component of WAR, in this case UZR. That's what I'm unsure of.

Well it certainly wouldn't pick up things like strand rate, i.e. performance with RISP, since UZR is context neutral as well, or HR/FB %. It should pick up at least some of BABIP deviation.
   60. SG Posted: November 11, 2009 at 04:31 PM (#3384926)
OK, I ran the differences for 2009 for the hell of it. Since FIP is scaled to ERA, I divided team FIP by 0.92 to get it to RA scale. Here's how all the teams line up when comparing the difference between FIP runs allowed (FIP divided by nine times IP) minus UZR (so a plus defense removes runs, a negative defense adds them) to actual runs allowed . This is quick and dirty, so I didn't reverse-engineer to split out starters versus relievers.

Team: FIP + UZR / RA / Diff
Angels: 822 / 757 / 65
Astros: 676 / 611 / 65
Athletics: 737 / 672 / 65
Blue Jays: 808 / 765 / 43
Braves: 682 / 640 / 42
Brewers: 763 / 723 / 40
Cardinals: 767 / 732 / 35
Cubs: 738 / 709 / 29
Diamondbacks: 796 / 771 / 25
Dodgers: 665 / 641 / 24
Giants: 775 / 753 / 22
Indians: 892 / 876 / 16
Mariners: 829 / 818 / 11
Marlins: 621 / 611 / 10
Mets: 745 / 740 / 5
Nationals: 748 / 745 / 3
Orioles: 763 / 761 / 2
Padres: 769 / 770 / -1
Phillies: 734 / 736 / -2
Pirates: 767 / 769 / -2
Rangers: 684 / 692 / -8
Rays: 853 / 865 / -12
Red Sox: 702 / 715 / -13
Reds: 856 / 874 / -18
Rockies: 745 / 766 / -21
Royals: 737 / 768 / -31
Tigers: 801 / 842 / -41
Twins: 711 / 761 / -50
White Sox: 712 / 782 / -70
Yankees: 684 / 754 / -70

Those are not trivial differences IMO.

Maybe someone from Fangraphs with access to the more granular data can do a more accurate job of comparing.
   61. DL from MN Posted: November 11, 2009 at 04:33 PM (#3384928)
Coming at this without looking at anything but this discussion - is the best fit line linear? If there is a non linear (geometric) fit that would explain some of the lack of correlation. In other words - do below average teams lose more than their fair share? Do above average teams win more than expected?

I would also guess a big part of the problem is in xFIP and UZR. UZR is attempting to break down a defense and analyze it. Is the correlation between adding up all the UZR on a team and the runs actually allowed for that team higher or lower than the correlation between WAR batting wins and the runs actually scored for a team? Sometimes you miss system interactions (scooping, throwing to the wrong base which allows a runner to move up) when you take everything apart and measure it separately.
   62. SG Posted: November 11, 2009 at 05:00 PM (#3384963)
Sometimes you miss system interactions (scooping, throwing to the wrong base which allows a runner to move up) when you take everything apart and measure it separately.


Very good point. There are certainly gaps in what UZR captures and what FIP captures, and that can most definitely cause systemic uncertainties.

You also have the completely missing catcher defense issue, although I understand the reticence to value something that's pretty hard to quantify.

I'm also not sure how Fangraphs handles non SB-baserunning. Eyeballing B-Pro's non-SB baserunning, in 2009 you had a spread of +15 to -15 between the best (Col) and worst (Bal) teams in that, and that could also have an impact.
   63. Gaelan Posted: November 11, 2009 at 05:16 PM (#3384977)
Combine SG posts with Jeff K posts and you have the answer to Tom's querie.

If you want WAR to add up to real wins it has to be calculated using inputs that are actual and consistent. Since it isn't it is impossible for it to all add up. Hence Tom is absolutely right to question the use of WAR in this manner.

Now that doesn't mean fangraphs is wrong since their purpose is to attempt to quantify true talent level. That's a valid enterprise. However since we know that true talent does not equal actual performance we should not be surprised when the aggregate of true talent on a team level does not add up to actual wins.
   64. CW hits the pinata for the candy Posted: November 11, 2009 at 05:16 PM (#3384978)
It should be noted that WAR is not intending to capture contributions to team wins, but to capture contributions to wins on a hypothetical neutral team. To vastly oversimplify - the wins a team gets out of Zach Grienke, say, is dependant upon the offense and defense that is fielded along with Grienke. In the context of WAR, Grienke's contributions are treated as though they occured for a team that, without Grienke, would have a (roughly) .500 record.

There are other features, of course, that (concievably) make WAR less than accurate at the team level, mostly having to do with splitting credit. I don't want to rehash the issue of positional adjustments, only to say that all methods come with some sort of error bar around them, and that accumulates at the team level.

If you were designing a metric designed purely to account for team wins, of course you wouldn't need to adjust for position - each team has exactly the same amount of shortstop playing time as the other, which is not true at the individual player level.
   65. CW hits the pinata for the candy Posted: November 11, 2009 at 05:28 PM (#3385002)
Now that doesn't mean fangraphs is wrong since their purpose is to attempt to quantify true talent level.


No. Why would you say that?
   66. Blackadder Posted: November 11, 2009 at 05:42 PM (#3385034)
Coming at this without looking at anything but this discussion - is the best fit line linear?


I thought all lines were linear =)
   67. Gaelan Posted: November 11, 2009 at 05:58 PM (#3385062)
No. Why would you say that?


They use FIP. Hence they aren't trying to be descriptive they are trying to be prescriptive.
   68. Karl from NY Posted: November 11, 2009 at 07:02 PM (#3385169)
The average team in the AL East won 84 games, and had 44.2 WAR, which would suggest replacement level is 40 wins. ...
Let's check another division, the NL West. The average team there won the same 84 games, and earned 35 WAR. Right away, we see another problem: We've got the same average number of wins as in the AL East, but almost ten less WAR per team. Here we have replacement level pegged at 49 wins, 49-113.


Isn't this basically strength of schedule? A replacement team in the AL East will win fewer games than a replacement team in the NL West, because the competition is stronger. A difference of 9 wins seems reasonable.

Average number of wins among division teams (that 84 number) does not correct for this. That says the AL East was 3 games per team over .500 against other-AL opposition, and the NL West was the same against other-NL opposition, which aren't the same thing.
   69. GuyM Posted: November 11, 2009 at 07:18 PM (#3385199)
They use FIP. Hence they aren't trying to be descriptive they are trying to be prescriptive.

That doesn't have to follow. You can use FIP/DIPS to estimate the pitchers' actual contribution to run prevention, separate from what the fielders contributed. In fact, you must do something like that if you want to break down defensive value between pitchers and fielders.
   70. Kiko Sakata Posted: November 11, 2009 at 07:25 PM (#3385213)
That doesn't have to follow. You can use FIP/DIPS to estimate the pitchers' actual contribution to run prevention


Actually, doesn't Fangraphs use xFIP, which "normalizes" HR/FB? So, at a minimum, they're not measuring actual home runs allowed on the pitching/defense side of the ball.

Isn't this basically strength of schedule? A replacement team in the AL East will win fewer games than a replacement team in the NL West, because the competition is stronger. A difference of 9 wins seems reasonable.


What about WAR would correct for this, though? Does Fangraphs adjust the raw statistics for strength of opponent before they convert them to WAR?
   71. CW hits the pinata for the candy Posted: November 11, 2009 at 07:50 PM (#3385258)
Actually, doesn't Fangraphs use xFIP, which "normalizes" HR/FB? So, at a minimum, they're not measuring actual home runs allowed on the pitching/defense side of the ball.


No. They use FIP.
   72. Lassus: Posted: November 11, 2009 at 08:00 PM (#3385274)
I thought all lines were linear =)

Just the ones denoting zip codes.
   73. Danny Posted: November 11, 2009 at 08:14 PM (#3385292)
That doesn't have to follow. You can use FIP/DIPS to estimate the pitchers' actual contribution to run prevention, separate from what the fielders contributed. In fact, you must do something like that if you want to break down defensive value between pitchers and fielders.

There's also the issue of FIP ignoring the order of events. A pitcher could have 3 BB, 3 K, and 1 HR in an inning and give up either one run or four runs depending on the order of events. FIP treats both the same, which arguably distorts the "pitchers' actual contribution to run prevention."
   74. Kiko Sakata Posted: November 11, 2009 at 08:18 PM (#3385302)
No. They use FIP.


Thanks, CW. I stand corrected.
   75. Mike Emeigh Posted: November 11, 2009 at 08:28 PM (#3385313)
Any aggregate stat is going to introduce a distortion; you're going to lose details that would be useful to know. The fact that Boston and Tampa Bay had roughly the same WAR but diverging W/L records shouldn't be taken to imply that WAR is somehow invalid, but that you need to look more closely at the details.

-- MWE
   76. AJM Posted: November 11, 2009 at 08:44 PM (#3385341)
not nearly has funny as another case for Mike Cameron- does anyone here watch this game or remember Cameron’s first stint?

I do. I'm not sure what's funny.
   77. fra paolo Posted: November 11, 2009 at 08:48 PM (#3385352)
Nationals: 748 / 745 / 3


Positive? Great Jupiter's ghost!
   78. JPWF13 Posted: November 11, 2009 at 08:52 PM (#3385363)
The fact that Boston and Tampa Bay had roughly the same WAR but diverging W/L records shouldn't be taken to imply that WAR is somehow invalid, but that you need to look more closely at the details.


Also Boston's Pythag was 93-69 and TBs was 86-76

TB's WAR is out of line with both it's actual record and it's Pythag record, wheras Boston is pretty close on all three. So WAR apparently thinks TB should have scored more runs than they did, or given up less runs than they did.

The Angels scored 883 runs with an OPS of .792
TB scored just 803 run with an OPS of .782
the Twins scored 817 runs with an OPS of .774

TB gave up 754 runs (8th)
TB pitchers gave up an OPS of .741 (4th)


I think TB simply underperformed across the board, they scored 10-20 runs less than their component stats would suggest, AND they gave up 10-20 more runs than their component stats would suggest AND they underperformed their pythag by 2 games.
   79. Random Transaction Generator Posted: November 11, 2009 at 09:18 PM (#3385412)
Team: FIP + UZR / RA / Diff
Angels: 822 / 757 / 65
Astros: 676 / 611 / 65
Athletics: 737 / 672 / 65
Blue Jays: 808 / 765 / 43
Braves: 682 / 640 / 42
Brewers: 763 / 723 / 40
Cardinals: 767 / 732 / 35
Cubs: 738 / 709 / 29
Diamondbacks: 796 / 771 / 25
Dodgers: 665 / 641 / 24
Giants: 775 / 753 / 22
Indians: 892 / 876 / 16
Mariners: 829 / 818 / 11
Marlins: 621 / 611 / 10
Mets: 745 / 740 / 5
Nationals: 748 / 745 / 3
Orioles: 763 / 761 / 2
Padres: 769 / 770 / -1
Phillies: 734 / 736 / -2
Pirates: 767 / 769 / -2
Rangers: 684 / 692 / -8
Rays: 853 / 865 / -12
Red Sox: 702 / 715 / -13
Reds: 856 / 874 / -18
Rockies: 745 / 766 / -21
Royals: 737 / 768 / -31
Tigers: 801 / 842 / -41
Twins: 711 / 761 / -50
White Sox: 712 / 782 / -70
Yankees: 684 / 754 / -70


Something doesn't seem right here.
When you order by difference (FIP+UZR-RA), it just happens to be exactly the same as ordering by alphabetical team name?

If that is REALLY the case (and not some bizarre sorting error before posting the results), then that is the greatest coincidence I have EVER seen in statistics.
   80. snapper (history's 42nd greatest monster) Posted: November 11, 2009 at 09:20 PM (#3385419)
If that is REALLY the case (and not some bizarre sorting error before posting the results), then that is the greatest coincidence I have EVER seen in statistics.

Team names starting in A are the new market inefficiency?

Please give a big hand to your 2010 Kansas City Aardvarks!
   81. SG Posted: November 11, 2009 at 09:26 PM (#3385429)
Something doesn't seem right here.
When you order by difference (FIP+UZR-RA), it just happens to be exactly the same as ordering by alphabetical team name?

If that is REALLY the case (and not some bizarre sorting error before posting the results), then that is the greatest coincidence I have EVER seen in statistics.


Amazing, isn't it? Or I f'ed up my sort. Here's how the list should look.

Team: FIP + UZR / RA / Diff
Mets: 822 / 757 / 65
Dodgers: 676 / 611 / 65
Cubs: 737 / 672 / 65
Twins: 808 / 765 / 43
Cardinals: 682 / 640 / 42
Reds: 763 / 723 / 40
White Sox: 767 / 732 / 35
Phillies: 738 / 709 / 29
Blue Jays: 796 / 771 / 25
Braves: 665 / 641 / 24
Yankees: 775 / 753 / 22
Orioles: 892 / 876 / 16
Brewers: 829 / 818 / 11
Giants: 621 / 611 / 10
Rangers: 745 / 740 / 5
Tigers: 748 / 745 / 3
Angels: 763 / 761 / 2
Astros: 769 / 770 / -1
Red Sox: 734 / 736 / -2
Padres: 767 / 769 / -2
Mariners: 684 / 692 / -8
Indians: 853 / 865 / -12
Rockies: 702 / 715 / -13
Nationals: 856 / 874 / -18
Marlins: 745 / 766 / -21
Pirates: 737 / 768 / -31
Royals: 801 / 842 / -41
Athletics: 711 / 761 / -50
Diamondbacks: 712 / 782 / -70
Rays: 684 / 754 / -70
   82. Steve Treder Posted: November 11, 2009 at 09:31 PM (#3385438)
Damn, it was a hell of a lot more fascinating the other way.
   83. Ron Johnson Posted: November 11, 2009 at 09:39 PM (#3385445)
#20, A lot of this is already covered but:

Team WAR should reconcile to a teams's pythag as calculated by runs created (or whatever method you're using to calculate offensive contribution and calculated runs allowed (expressed as calculated runs allowed by pitching and defense)

There are 4 sources of potential error. 3-4 games standard error for pythag itself. ~15 runs if using good offensive metrics, and god knows on the defensive side. I've never seen anybody publish how well (say) FIP and (team total of defensive metric of choice) reconciles (and to me this is an important sanity check on any and all defensive methods. Yes, I'm aware that FIP itself has a standard error, but the better the defensive method the smaller the overall standard error)

Anyhow, for planning purposes let's call the overall error on the runs allowed side 25 runs (it's probably better than that, but I'd like to see that demonstrated)

So if Silva's point is that WAR isn't perfect, no kidding. As with any well thought out method it's a very useful place to start any value discussion. And you ought to be able to offer up some reasons why it missed in any specific case.
   84. snapper (history's 42nd greatest monster) Posted: November 11, 2009 at 09:42 PM (#3385449)
Team: FIP + UZR / RA / Diff
Mets: 822 / 757 / 65


If the Mets really were 65 runs lucky on defense last year, and it reverts this year there are going to be mass suicides on this site.
   85. Ron Johnson Posted: November 11, 2009 at 10:05 PM (#3385480)
So are you saying that the difference between WAR and actual wins is mere luck?


Not exactly. Wouldn't be that tough to tease out how much each component (Team Offense, team pitching and team defense) is contributing to the error in the estimate of team wins. (I'm not interested enough to do the heavy lifting, but if you want to know how much errors in the offensive estimates are skewing things the simply calculate offensive WAR from actual team runs scored rater than using the derived numbers. You could also use WPA to double-check any given part.)

How much of the discrepancy between WAR's estimate and actual team wins will turn out to be imperfections in the model? I'd be really surprised if the standard error between team WAR and team wins is under 5/162 games played.

And timing is going to be he biggest factor. Close games aren't precisely random, just close enough for planning purposes. You can build a strong bullpen. A good bench doesn't hurt. Beyond that, luck plays a heavier role than at any time.
   86. Ron Johnson Posted: November 11, 2009 at 10:11 PM (#3385489)
By the way, I missed #60. This is the exact type of sanity check that everybody pushing a defensive method should offer up.
   87. Barnaby Jones Posted: November 11, 2009 at 10:12 PM (#3385490)
You can use FIP/DIPS to estimate the pitchers' actual contribution to run prevention, separate from what the fielders contributed.


I understand that this is justification for using FIP instead of WAR, but this seems like a misuse of the concept to me. What we know is that FIP/DIPS has a better correlation to the next year's ERA than does the previous year's. It is quite a leap to go from that probabilistic statement to "In a given year, FIP - ERA = Defense."

Maybe there are studies that underlie the Fangraphs decision to sublimate the FIP/ERA disparity into the defensive ether; I'd be interested in them if that's the case.
   88. DL from MN Posted: November 11, 2009 at 10:15 PM (#3385496)
Twins: 808 / 765 / 43
Cardinals: 682 / 640 / 42


The two teams with gold glove catchers. WAR is underrating Joe Mauer, he does everything well that it doesn't measure.
   89. Tom Nawrocki Posted: November 11, 2009 at 11:51 PM (#3385644)
On the other side, your have the D-Backs underperforming what WAR would predict by 12.5 games. So let's say WAR missed about 2 and a half wins worth of significant events (maybe the D-Backs defence was worse than WAR thinks it is, for example, and maybe they were really bad baserunners or something like that), then we have 10 losses to explain by "bad luck." Is that really impossible? No, it's not.


Ten losses is an awful lot of losses to explain away with luck. If WAR thinks the DBacks earned ~80 wins, but they won only 70 games, and that 70 is still an acceptable error, then 90 wins would have also been an acceptable measure, right? That's giving AZ a whole bunch of good luck, rather than bad luck.

So WAR has estimated the quality of the 2009 Diamondbacks at somewhere between 70 and 90 wins. Is that really useful? Is that an acceptable estimate?

If WAR were consistently doing this, then yeah, that would raise a lot of doubts,


I checked two divisions at random and found two serious anomalies. I don't know how many 10-win outliers (and the DBacks were really more than that, near as I can tell) a correlation of 0.83 would produce, but if it's one per division, or even one per league, then yeah, that raises a lot of doubts.

Thanks to everyone for their input. By the way, I got to thinking about this after reading the thread on Ben Zobrist yesterday, when someone insisted that Zobrist was the MVP because of his 0.3 WAR edge over Joe Mauer, and would brook no disagreement. I had no idea what I'd find when I checked the Rays' team WAR, but the result make perfect sense: WAR thinks the Rays were a lot better than they actually were last year, so it apportioned a lot more credit for wins than would seem to be merited. Ben Zobrist, the best player on the Rays, would be the biggest beneficiary of that.
   90. Sleepy supports unauthorized rambling Posted: November 12, 2009 at 06:23 AM (#3385851)
Any aggregate stat is going to introduce a distortion


Why? It's just the sum of its inputs, so unless the inputs introduce distortion (error), I don't understand why the sum would. There's an opportunity for larger distortions (error) to exist, but also an opportunity for error to cancel.

great point, #87.
   91. Jeff K. Posted: February 04, 2010 at 03:53 PM (#3453984)
By the way, I missed #60. This is the exact type of sanity check that everybody pushing a defensive method should offer up.

Except, of course, that there would be no indication of whether the flaw/cause of discrepancy lies with FIP, UZR, or in the methodology of the sanity check itself. Your larger point is still a good one, but there are going to be hardships and contention in providing a sanity check for almost any stat.
   92. Ron Johnson Posted: February 04, 2010 at 04:21 PM (#3454017)
#91, best case you get a good reconciliation. And if you don't you may get some indication of where to look from the type of teams that consistently fail to reconcile.

Worst case, you get a jumble of randomness which will at least allow you to reject the hypothesis that FIP + chosen defensive method = team run prevention -- and then you have to figure out where the problem lies.

To me the real key for such a test is coming up with an acceptable method before running the study.
   93. Cowboy Popup Posted: February 04, 2010 at 04:29 PM (#3454030)
I understand that this is justification for using FIP instead of WAR, but this seems like a misuse of the concept to me.

This use of FIP is where the sabr movement loses me. I mean, its great to come up with an imaginary number of what theoretically should have happened and what is more likely to happen next year. But to use it as an a method of determining real value contributed in the past just fails for me. Obviously a lot of people disagree with me, but I think its at least one step removed from reality.
   94. Petooter: 11'6" 355 lbs of scrap and grit Posted: February 04, 2010 at 04:56 PM (#3454053)
Thanks Jeff K., I missed this thread the first time around, and [40] hit it bang on; it was very educational for me.

Question about a question: [89] So WAR has estimated the quality of the 2009 Diamondbacks at somewhere between 70 and 90 wins. Is that really useful? Is that an acceptable estimate?

Nobody actually uses aggregate WAR to rank team quality the way Silva suggests, do they? It's pretty much an individual true-talent-level style metric, isn't it?
   95. Ron Johnson Posted: February 04, 2010 at 05:07 PM (#3454069)
#93, to me it's simple. FIP "makes sense" as an attempt to isolate the raw pitching side of run prevention. It is however essentially unproven as to whether it really works.

To demonstrate that it actually works as intended you should be able to show that FIP + (some defensive metric) models team runs allowed to some acceptable degree. Where "perfection" would be a standard error in the 15 run range.

As Jeff points out, we won't know whether the problem is with the FIP method or (say) UZR, ZR, whatever. For that reason, it's best to run the test with as many metrics as possible. If you get standard errors in the same general range and they're higher than you'd like it's an indication that FIP has a problem.
   96. Jeff K. Posted: February 04, 2010 at 05:57 PM (#3454114)
#91, best case you get a good reconciliation. And if you don't you may get some indication of where to look from the type of teams that consistently fail to reconcile.

I think this is probably correct, but I also think it fails in this particular instance. If #60 had been a chart of discrepancy of Pythag W/L from actual W/L, then sure, looking at the consistent deviators for commonalities might* help you do some bodywork on the Pythag stat. (I * there because in that case, you'd need overwhelming evidence of predictable bias based on team type in order to justify changing Pythag.) But FIP, as noted above, doesn't claim nor is it constructed to reproduce macro runs allowed. So again, I do agree with your general point that these sanity checks are good ideas, and their underutilization is to the detriment of all involved, but I don't find much instructive about those discrepancies in #60.

Worst case, you get a jumble of randomness which will at least allow you to reject the hypothesis that FIP + chosen defensive method = team run prevention -- and then you have to figure out where the problem lies.

Fair enough, but I'd say that the weak rejection of that hypothesis that you get from the jumble of randomness is no better than the rejection of that hypothesis that everyone has already made based on common sense and knowledge of the stats. Are there people out there claiming that FIP+DefMetric=Actual Team Run Prevention? I haven't seen that. It would seem to me that the best you could hope to claim is

FIP+DefMetric=Team Run Prevention (Minus statistically insignificant noise and only a more useful number than Runs Allowed for predictive purposes.)
   97. Jeff K. Posted: February 04, 2010 at 06:09 PM (#3454127)
Question about a question: [89] So WAR has estimated the quality of the 2009 Diamondbacks at somewhere between 70 and 90 wins. Is that really useful? Is that an acceptable estimate?

Nobody actually uses aggregate WAR to rank team quality the way Silva suggests, do they? It's pretty much an individual true-talent-level style metric, isn't it?


I don't know of anybody in particular off the top of my head, but I don't see an inherent reason why it would be bad. It's a question of philosophy that really goes back to the same overall discussion. If you can't add up all the individual numbers and get a pretty close approximation of the team number, why not? And whatever the answer to that is better be a good one, because if it isn't, it very much calls into question the validity of WAR itself. That was Tom's point from up thread.

As for the question you quote, so to Tom: that 70-90 number is an exaggeration. Nobody's saying "10 wins is the accepted margin of error." This is basic statistics, which you well know. It would perfectly fine to say that the standard deviation there is, say, 4. So 68% of the teams will be +/- 4 from their actual win total, 95% +/- 8, and the occasional 10 is not an indictment of the stat. It's an acceptable result that in and of itself is indicative of nothing.
   98. zenbitz Posted: February 04, 2010 at 06:10 PM (#3454128)
It's like that girl in high school no one pays attention to until she starts putting out.


So we are the guys that schtup her and talk about her behind her back??
   99. zenbitz Posted: February 04, 2010 at 06:28 PM (#3454148)
Probably the one thing I still use BP for is the "current adjusted standings" page. They break down W/L into 1st, (RS/RA) 2nd (EQRS/EQRA), and 3rd order components (AEQRS/AEQRA).

1st order is simply pythagorean w/l
2nd order is "component" runs - i.e, accounting for clutch-y-ness
3rd order is adjusted for strength of opposition

WAR is equivalent to 2nd order W/L here.

So, the Giants +5 wins above WAR turns out to be mostly 2nd order difference. For whatever bizarre reason, they were clutch in getting runs out of their (pathetic) individual batting and (glorious) individual pitching events.

This 5 wins is pretty important if you are a Giants fan, but I don't think one would expect this to carry over to 2010.

fangraphs also has WPA (win probability added) value stats which are the opposite of WAR, in that they include leverage and clutch performance.

WPA should sum much closer to actual W/L (modulo replacement, or whatever baseline WPA uses)

The "Wins" in WAR is certainly less egregious than than "Wins" given to starting pitchers.
   100. zenbitz Posted: February 04, 2010 at 06:29 PM (#3454150)
Also re: 81... I am not going to graph it, but that Diff column looks like a normal distribution to me... which is exactly what you'd expect if it was due soley to chance.
Page 1 of 2 pages  1 2 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

Support BBTF

donate

Thanks to
Backlasher
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogBPro: Wyers: Reintroducing PECOTA
(23 - 6:01am, Feb 09)
Last: Ron J

NewsblogStiglich: Lew Wolff touches on San Jose ballpark, revenue sharing and playing waiting game
(1 - 5:54am, Feb 09)
Last: Not The Real Fausto Carmona (Dan Lee)

NewsblogMONEYBALL~ Oscar Nominations 2012: Academy Award Nominees List ~ MONEYBALL
(569 - 5:46am, Feb 09)
Last: Greg (U)K

NewsblogSources: Cubs’ Starlin Castro Accused Of Sexual Assault
(5692 - 5:39am, Feb 09)
Last: Greg (U)K

Newsblog'Duk: Tim Lincecum slims down with swim routine, loses appetite for McDonald’s
(239 - 5:35am, Feb 09)
Last: Cooper Nielson

NewsblogFangraphs: Cameron: The 10 Worst Transactions Of The Winter
(52 - 4:38am, Feb 09)
Last: Cooper Nielson

NewsblogPrimer Dugout (and link of the day) 2-9-2012
(3 - 4:12am, Feb 09)
Last: vortex of dissipation

NewsblogOT: NBA Monthly Thread, February 2012
(319 - 3:13am, Feb 09)
Last: Los Angeles ALBERT F. PUJOLS of Anaheim

NewsblogOT: The Soccer Thread: February 2012
(95 - 2:54am, Feb 09)
Last: Richard

NewsblogGuelph Mercury: Argos will likely be turfed out of Rogers if Jays get grass
(2 - 1:59am, Feb 09)
Last: Vaux, A.B.D.

NewsblogJustice: 5 things that could make the 2012 season a successful one for the Astros
(10 - 1:46am, Feb 09)
Last: Fred Lynn Nolan Ryan Sweeney Agonistes

NewsblogEdes: 'Think Factory' projects falloff for Ellsbury
(36 - 1:33am, Feb 09)
Last: Squash

Hall of MeritMost Meritorious Player : 1969 Discussion
(72 - 1:23am, Feb 09)
Last: OCF

NewsblogBASN: The MLB FRAUD - Oakland, Los Angeles, and New York
(51 - 12:57am, Feb 09)
Last: SoSHially Unacceptable

NewsblogNYT: Alderson Remakes Needy Mets From Bottom Line Up
(11 - 12:42am, Feb 09)
Last: The Yankee Clapper

Buy MLB playoff tickets, plus 2011 World Series, 2011 ALCS tickets and NLCS game tickets. We also have Texas Rangers playoff schedule, tickets to Red Sox games and Yankees game tickets. Plus, buy Phillies baseball tickets, Tigers playoff tickets and the biggies like ALDS baseball tickets and 2011 NLDS tickets.

Demarini, Easton and TPX Baseball Bats

 

 

 

AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets.

Page rendered in 2.5618 seconds
40 querie(s) executed