|
|
|
|
Baseball Primer Newsblog— The Best News Links from the Baseball Newsstand
Wednesday, November 11, 2009
The Mets looking at Pat Burrell is funny, but not nearly has funny as another case for Mike Cameron- does anyone here watch this game or remember Cameron’s first stint? Guys, hate to break this to you, but WAR doesn’t actually lead to real wins in the standings- you know that right? Part of me would pay to see these guys run a team, I think it might be for some good copy at the very least.
***
There brand is suffering and Mike Cameron isn’t about to help it. It’s about winning, but also attracting customers (i.e. fans) to the seats. The Yankees actually do both. While they field all stars Mets fans sit around and rationalize secondary tier players. Do you think the Yankees rationalize a secondary player because of WAR? Of course not thats why they signed Burnett, Sabathia, Teixeira, etc.
*blank look*
|
Support BBTF
Thanks to Backlasher for his generous support.
Bookmarks
You must be logged in to view your Bookmarks.
Hot Topics
Newsblog: BPro: Wyers: Reintroducing PECOTA (23 - 6:01am, Feb 09)Last: Ron JNewsblog: Stiglich: Lew Wolff touches on San Jose ballpark, revenue sharing and playing waiting game (1 - 5:54am, Feb 09)Last: Not The Real Fausto Carmona (Dan Lee)Newsblog: MONEYBALL~ Oscar Nominations 2012: Academy Award Nominees List ~ MONEYBALL (569 - 5:46am, Feb 09)Last:  Greg (U)KNewsblog: Sources: Cubs’ Starlin Castro Accused Of Sexual Assault (5692 - 5:39am, Feb 09)Last:  Greg (U)KNewsblog: 'Duk: Tim Lincecum slims down with swim routine, loses appetite for McDonald’s (239 - 5:35am, Feb 09)Last:  Cooper NielsonNewsblog: Fangraphs: Cameron: The 10 Worst Transactions Of The Winter (52 - 4:38am, Feb 09)Last: Cooper NielsonNewsblog: Primer Dugout (and link of the day) 2-9-2012 (3 - 4:12am, Feb 09)Last: vortex of dissipationNewsblog: OT: NBA Monthly Thread, February 2012 (319 - 3:13am, Feb 09)Last:  Los Angeles ALBERT F. PUJOLS of AnaheimNewsblog: OT: The Soccer Thread: February 2012 (95 - 2:54am, Feb 09)Last: RichardNewsblog: Guelph Mercury: Argos will likely be turfed out of Rogers if Jays get grass (2 - 1:59am, Feb 09)Last: Vaux, A.B.D.Newsblog: Justice: 5 things that could make the 2012 season a successful one for the Astros (10 - 1:46am, Feb 09)Last: Fred Lynn Nolan Ryan Sweeney AgonistesNewsblog: Edes: 'Think Factory' projects falloff for Ellsbury (36 - 1:33am, Feb 09)Last: SquashHall of Merit: Most Meritorious Player : 1969 Discussion (72 - 1:23am, Feb 09)Last: OCFNewsblog: BASN: The MLB FRAUD - Oakland, Los Angeles, and New York (51 - 12:57am, Feb 09)Last: SoSHially UnacceptableNewsblog: NYT: Alderson Remakes Needy Mets From Bottom Line Up (11 - 12:42am, Feb 09)Last: The Yankee Clapper
|
|
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
It's in the comments, so I'm more inclined to give him a pass on that than on "has funny."
Is that Joey a primate? Out yourself!
Honestly, I can barely discern his screed from anyone else's. He's just another droning anus in the MSM choir as far as I'm concerned. The only thing special about Silva is that he decided to create a BBTF profile and openly troll on the site. Otherwise I think his act would hardly even register here.
Being in the hands of 'these guys' in the front office has worked out just fine in Boston, thanks.
I believe the correct term is "Obvious troll is obvious."
Well, actually, this is correct, isn't it? WAR doesn't lead to anything. It is a way of measuring what has already happened, I think. Sort of like batting average and rbi.
Or I could just say ditto #2.
Good copy which I, Mike Silva, would be incapable of writing.
I said we should have gotten spaceships, a hot chick to front us, and promised everyone free health care... but nooo... you guys wanted to go the the internet route.
Now, the resistance, fronted by people like Silva remain a constant thorn in our grand scheme.
According to Silva....Absoluting nothing, huh!
WAR correlates to actual wins fairly well (r^2=.83) That's a very strong correlation, so it's not as if it's abstract.
Let's see how the rest of the AL East measures up. The Red Sox totaled 51 WAR, which should give them 97 wins - and they won 95, which isn't too far off. The Rays also totaled 51 WAR, which should also give them 97 wins - and they won 84 games. Oops.
To finish out the division, the Blue Jays had 39 WAR, which should give them about 85 wins - and they won 75 games. The Orioles had 23 WAR, which adds up to 69 wins, and they really won 64. At this point, that's a pretty good result.
Maybe I have not pegged replacement level properly. (I looked around on the Fangraphs site and couldn't find anything saying how many games a replacement level team should win, but it's possible I missed it.) The average team in the AL East won 84 games, and had 44.2 WAR, which would suggest replacement level is 40 wins. That would give the Yankees 97 projected wins, the Red Sox 91, the Rays 91, the Jays 79 wins, and the O's 63. Now we're getting somewhere; four of the five teams are within four wins of their actual results. But we still have a major problem here: The Red Sox and Rays were separated by 11 games in the standings, yet totaled the same amount of WAR. This really can't be. (This, by the way, is how you end up with people saying nutty things like Ben Zobrist was the MVP of the league.)
Let's check another division, the NL West. The average team there won the same 84 games, and earned 35 WAR. Right away, we see another problem: We've got the same average number of wins as in the AL East, but almost ten less WAR per team. Here we have replacement level pegged at 49 wins, 49-113. Here are the NL West teams:
Dodgers: 43.3 WAR, leading to 92.3 projected wins, 95 actual wins.
Rockies: 42.3 WAR, leading to 91.3 projected wins, 92 actual wins.
Giants: 34 WAR, leading to 83 projected wins, 88 actual wins.
Padres: 21.7 WAR, leading to 70.7 projected wins, 75 actual wins.
Diamondbacks: 33.5 WAR, leading to 82.5 projected wins, 70 actual wins.
Four of the teams, again, are within five wins, except that Arizona is way out of whack. Like Tampa Bay, the Arizona players are going to be seriously overvalued by this system, which thinks they're better than average rather than a last place team. The Diamondbacks accumulated half of one WAR less than the Giants, who won 18 more games.
Now I'm no expert in this stuff, and I will fully grant that the Fangraphs guys know way more about all this than I do. It's very possible that I am missing something. For one thing, I am sure I am not figuring replacement-level wins the way I'm supposed to, but that wouldn't explain the Rays and Red Sox with identical WAR, or the Giants and Diamondbacks with nearly identical WAR. If I'm wrong about this, feel free to explain to me why, but I have to come to the conclusion that WAR isn't really measuring wins.
I'm not a fan of uber-statistics so won't really defend WAR but ... c'mon, you know this is silly.
Or maybe you believe that run differential and outscoring your opponents isn't how you measure wins either.
The Marlins out-scored their opponents by 4.8 to 4.7 runs a game. They won one more game than the Atlanta Braves who outscored their opponents by 4.5 to 4.0 runs a game. The Astros were outscored 4.0 to 4.8 runs a game yet won 12 more games than the Pirates who were also outscored 4.0 to 4.8 runs a game.
I believe you measure wins by counting how many games each team has won.
Look, if you think that this stat is measuring something significant, that's great. But if the Wins Above Replacement for the players on the 2009 Diamondbacks bear no relation to the games actually won by the 2009 Diamondbacks - and they don't - then don't give yourself more credit than you've earned by calling them Wins. Call them Cameron Points or Ultimate Team Contribution or something. But they're not Wins.
Hey, good one. But it does mean you got an "average" of a third of a hit every time up, hence the name.
You missed my point, which was semantic, I guess. It's a question of what comes first. The event comes first, the WAR comes after as a means of describing what happens. So, no, WAR doesn't lead to wins, wins (and losses) lead to WAR.
If a team scored 750 runs and gave up 780 runs and had a winning record, do you think that team should expect to stand pat and have a wining record again the following season? Or should they recognize that they got lucky and act accordingly? Yes, you measure wins by counting how many games each team won, but you measure expected wins more effectively by looking at the batting and defensive stats for each play throughout the season, which in baseball unlike other sports happens to be very easy to do.
The position you're taking is akin to saying that the only stats that matter are RBIs and runs scored--after all, those are the runs that actually scored, right? So why do we care about this "batting average"? We can measure how many runs a player actually contributed in the real games by looking at his RBIs and runs scored! That's pretty much exactly what you're saying about team wins.
These are pretty much the most basic starting issues of intelligent baseball analysis, explained very well I think in Pete Palmer and John Thorne's book The Hidden Game of Baseball, among other places, so I'm sort of confused by why such a reliably intelligent and well informed poster as yourself is inciting this discussion.
Or, in concession to Tom's complaint, the things that tend to lead to wins and losses lead to WAR.
No, Tom's point, I believe, is to call them something different. The names RBIs and Runs Scored describe perfectly what they measure. The names given to some other more recent statistical measures have a tendency to overstate what is is they measure, which kind of invites these kinds of screeds from the Silvas of the world.
Fun-lovin' Paul Lebowitz is the "stat zombie" guy.
Zombies!
So are you saying that the difference between WAR and actual wins is mere luck? That the Rays and Red Sox were of identical quality in 2009? Do you think the 2009 Diamondbacks were an above-average team? It boggles my mind that people can look at discrepancies like that and assume that the wins themselves are somehow wrong, rather than the statistical analysis that led to such conclusions. If this measure thinks the 88-74 Giants and the 70-92 Diamondbacks were of roughly equal quality, don't you start to wonder if maybe there's something amiss in the process that led to that evaluation?
It is certainly not "very easy" to evaluate the defensive stats for each play. Different defensive metrics can provide very different results.
What I'm saying is that assuming that the performance of the 2009 Rays was equally as good as that of the 2009 Red Sox is a poor assumption to make. There is no reason in the world to think that (the two teams weren't even that close in Pythagorean record), other than that's what WAR tells us. And I'm saying that any further assumption derived from that initial estimate is very likely wrong.
The Twins always seem to outplay their projections whether they're formulated by stat zombie tenets or the way I come to my conclusions----don't ask.
Don't worry, I won't.
(EDIT) Love this gem:
I vacillated on the Tigers. There were three ways for them to go. Either they were going to utterly collapse into 2008 Padres/Mariners territory and lose over 100 games; they were going to be somewhere in the middle; or they were going to have a bounce back year and contend.
Why yes, up, down, or sideways *are* three ways to go. ####### slap me with a trout.
There was a rumor that the framework of a three-way deal was being discussed by the Mets, Cubs and Blue Jays that would send Luis Castillo to the Cubs; Milton Bradley to the Blue Jays; and Lyle Overbay to the Mets.
Pythagorean doesn't get quite as fine as WAR does though. My first guess here would be that Boston was very good with RISP (or Tampa Bay was bad), and a quick check of B-R's splits (here and here) confirms that. So at the very least I'd say there's some reason besides WAR to think the 2009 Rays and Sox were closer than what their raw W-L records or Pythags suggest.
Tom, I think your overall point is valid but I don't think comparing to wins is very relevant. As Ginger Nut noted, WAR doesn't purport to predict wins, it purports to predict things that -- broadly speaking -- contribute to wins: run creation and run prevention.
I'm not an investment banker so this may be flawed, but I look at the difference between evaluating a team by WAR rather than wins as similar to valuing a company by EBITDA rather than income.
I did a post over at post at FanGraphs about this not too long ago. As noted earlier in the thread, the correlation between projected win total using WAR and actual wins was .83. That's a far cry from the "no relation" that you're suggesting.
But, let me offer up an example. WAR is a measure of an individual's performance without regards to context or the performance of others. It just measures what that guy contributed to the team's goal of winning. Think of it like a production line - if all the guys who are making parts for the computer do their job just fine, but the one assembler at the end doesn't put the pieces together, you have zero finished products, but you wouldn't want to suggest that your entire line was producing nothing.
WAR is measuring how well the individuals create inputs. That the total output doesn't match the total inputs doesn't invalidate the production of the inputs.
Most of the variance that is apparently giving you problems is due to WAR ignoring the timing of events. Because it basically boils down to a lot of different context neutral linear weights, it assumes that the distribution of events will be equal. This is obviously not true in real life, and is the driving cause of the variation between WAR and actual wins. Any statistic that ignores the timing of related events is necessarily going to give up the ability to match what actually happened. If that's the test you're going to apply to statistics, then you'll reject every context neutral metric out there.
However, there are a lot of reasons we want context neutral statistics. They do a better job than situational dependent statistics in projecting nearly everything. They are a better measure of true talent level. They serve a real, significant purpose, but by design, they will not match up with what happened in a season. It's not a flaw - it's a by-product of the design.
It's not evidence of a problem that the Red Sox and Rays had the same WAR but wildly different actual win totals. It's that determination that allows us to notice that the significant gap between the two teams was far more about when the players performed than how they performed. That matters in the final standings, but far less so when thinking about the future.
Something like Earnings/Working Capital - GDP growth would give you a % number that could be called Cents Against Replacement per Dollar (CARD).
When you get down past the occasional machismo, trolls, and flame wars (it is the Internet), BBTF is an excellent site.
It's the conversion from hits to wins that doesn't exist in finance, and what leads to consternation. Believe me, if and when the time comes that business as a whole looks at marketing dollars and their resultant impact on sales, something I wrote a paper in one or another finance class about, you'll see the same consternation. It's a dirty little secret, but ad dollars for MSM outlets are terrible uses of money, but the industries relying on those dollars (network TV, newspapers, magazines) have succeeded in avoiding deep questions. When the first real wave of analysis comes, you can bet the first words out of their mouths will be complaining about the weights applied to the inputs that determine the final value of each campaign.
And that's still in dollars spent/dollars returned totals. To really be like baseball, ad buys would have to be paid for in yen while sales are in dollars. Silva and his ilk would, even given perfectly proven relationships between ad yen spent / $ sales return, be complaining about the exchange rate. This ignores the fact that no matter what the exchange rate is, it's the relationship that matters. The exchange just takes the relative value that the raw numbers and relationship give you and marks it to a common end.
Oh god, I remember this. It was complete hilarity.
I doubt that.
Then is it fair to say that WAR would have a much closer relation to Pythagorean record since that also ignores the timing of events?
Well, the Red Sox scored about 70 more runs than the Rays, and allowed 18 fewer runs. It could be that the difference is all context/timing, or it could be that there are some minor flaws in the weighting of the events. It's probably some combination of the two.
This seems a good question; what is the WAR correlation with Pyth. wins? Is it higher or lower than 0.83?
And, by the way, 0.83 is not equal to 1. I agree with Tom's point completely, if a stat says a 70 win team and 90 win team are roughly equivalent, it's missing something important. If it were just random flucutation, you'd expect a lot of 70 win teams to turn around into 90 win teams the next year and vice versa; in real life that just doesn't happen.
I'm not well versed in advanced metrics but think they're very useful. But the easiest thing to do in science is make a conclusion grander than the data.
Well, the Red Sox scored about 70 more runs than the Rays, and allowed 18 fewer runs. It could be that the difference is all context/timing, or it could be that there are some minor flaws in the weighting of the events. It's probably some combination of the two.
This seems very reasonable.
In other words, it does a very good, though not perfect, job of measuring the things that players do on both sides of the ball that contribute to wins. Obviously a team wants wins before it wants WAR, but when deciding what players to add or let go, it's a useful tool in evaluating their worth on the field.
This is right, to me. I like WAR because it gives me a shorthand way of evaluating a player's total value to his team, even if I am a bit skeptical of defensive stats. I certainly don't use WAR as a hammer in arguments because there are some vagaries there about replacement level and defense and such, but WAR is a great place to START a discussion. Which, bringing it back to Silva and his ilk, it's kind of sad that they don't embrace it. WAR is a great conversation piece. If I'm a sportswriter, I'd probably spend an hour a day at fangraphs looking for anomalous stats to build a piece around. Is Ryan Sweeney better than Torii Hunter? Who is Ben Zobrist and was he the best player in baseball in 2009? And the thing is, these articles wouldn't even be hard to write. Maybe not as easy as casting stones at basement dwellers, but not really hard, either. Which makes me wonder why some of these guys even write about baseball. Nothing about the sport seems to interest them. Do they just like having an audience? Do they love the sound of their own voices that much? It's very puzzling.
I think it should be higher, b/c it removes the uneven distribution of runs across games from the situation, but still far from a perfect fit b/c pythag wins still includes the uneven distribution of run scoring events, i.e. hitting with RISP vs. not.
Basically, WAR takes all the component plays that lead to run scoring/prevention and calculates how many wins they'd generate if they were distributed randomly across games and game states.
It won't jive with pythag wins b/c run scoring events are not distributed randomly within game states (i.e. HR/BB/BB/K/K/K produces a very different run outcome than K/K/BB/BB/HR/K but looks the same to WAR).
It won't jive with actual wins for the same reasons, and, additionally b/c of the non-random distribution of runs across games, e.g. unusually good records in 1-run games, or an abnormally large # of blow-outs.
Pythagorean record ignores the timing of runs, but not events generally. WAR will always treat hits/walks/etc. the same way (all else being equal: park factors, yadda yadda) the same way, whereas Pythagorean record only accounts for them if they turn into runs.
What about a stat that says 100 teams should win about 70 games, and 99 of them do, but 1 of them wins 90 games?
In other words: We shouldn't just point to one example of teams with similar WARs having very different win totals as evidence that WAR is fundamentally flawed. We need to know how common examples like that are.
It probably would be more accurate to call it "Expected Wins above Replacement," but EWAR is less catchy as an acronym. How about Projected Optimal Wins Exceeding Replacement?
B Pro has VORP, which goes with "value" rather than wins or runs, but then they also use "WARP" which is back to "wins".
If a stat tells us something surprising, the appropriate question is, does this mean the stat is missing something or that it's telling us something new that we should pay attention to? The answer is probably a combination of both. No "meta stat" is going to capture everything, so there are probably some events from each team's season that are worth maybe a couple of wins that aren't included. It's not in fact implausible that the rest of the difference can be explained by random chance. Here was the original objection:
So WAR thought the Giants were about 5 wins worse than their actual record for the season. Let's say WAR failed to account for 2 wins worth of events (let us suppose baserunning and defence, which maybe it isn't good at capturing), and the other three wins can be explained by random good luck. On the other side, your have the D-Backs underperforming what WAR would predict by 12.5 games. So let's say WAR missed about 2 and a half wins worth of significant events (maybe the D-Backs defence was worse than WAR thinks it is, for example, and maybe they were really bad baserunners or something like that), then we have 10 losses to explain by "bad luck." Is that really impossible? No, it's not. If WAR were consistently doing this, then yeah, that would raise a lot of doubts, but in this case the interesting point seems to be that the D-Backs MIGHT be a lot better than their record from last year. Since the D-Backs were predicted to win the division by many preseason prognosticators, I don't think that possibility should be viewed as ridiculous out of hand.
On the other hand, even a good analytic system is going to have some outliers. People who say that one example of WAR being off from the real win total by 10 wins or so proves the system is deeply flawed don't seem to understand how variation works. It doesn't mean that a team's win total will be exactly what WAR expects, it simply means that it's more likely to be close to what WAR expects and increasingly less likely, but still possible, the farther you get from what WAR expects. But in a normal distribution you would fully expect that there will be some teams at the extremes of each end of the bell curve. WAR and similar stats just tell us where the middle of that curve is located. It's like growing bean sprouts: you'll get a few really tall ones and a few really short ones, this doesn't mean that genetics "doesn't work" or that bean sprout height is "nothing but luck," it means that most of the plants will be clustered close to the mean and a few will be farther away.
With a quick run of the numbers, I get correlations of .82 to wins and .88 to Pythagorean wins from 2009 data.
Splitting things up by league improves that a decent amount:
2009 NL: .86 with wins, .95 with Pwins
2009 AL: .88 with wins, .91 with Pwins
Even though we should assume most pitchers will regress towards league average BABIP, there are outliers in either direction where that's just not true.
Even though we should assume most pitchers will regress towards league average BABIP, there are outliers in either direction where that's just not true.
Great point! And even the guys who out or underperform their FIP through random chance actually prevent or allow those runs. So, that's a huge source of error between WAR and runs allowed.
No. VORP is denominated in runs.
Well, only if that difference isn't picked up by the defensive metric component of WAR, in this case UZR. That's what I'm unsure of.
Well it certainly wouldn't pick up things like strand rate, i.e. performance with RISP, since UZR is context neutral as well, or HR/FB %. It should pick up at least some of BABIP deviation.
Team: FIP + UZR / RA / Diff
Angels: 822 / 757 / 65
Astros: 676 / 611 / 65
Athletics: 737 / 672 / 65
Blue Jays: 808 / 765 / 43
Braves: 682 / 640 / 42
Brewers: 763 / 723 / 40
Cardinals: 767 / 732 / 35
Cubs: 738 / 709 / 29
Diamondbacks: 796 / 771 / 25
Dodgers: 665 / 641 / 24
Giants: 775 / 753 / 22
Indians: 892 / 876 / 16
Mariners: 829 / 818 / 11
Marlins: 621 / 611 / 10
Mets: 745 / 740 / 5
Nationals: 748 / 745 / 3
Orioles: 763 / 761 / 2
Padres: 769 / 770 / -1
Phillies: 734 / 736 / -2
Pirates: 767 / 769 / -2
Rangers: 684 / 692 / -8
Rays: 853 / 865 / -12
Red Sox: 702 / 715 / -13
Reds: 856 / 874 / -18
Rockies: 745 / 766 / -21
Royals: 737 / 768 / -31
Tigers: 801 / 842 / -41
Twins: 711 / 761 / -50
White Sox: 712 / 782 / -70
Yankees: 684 / 754 / -70
Those are not trivial differences IMO.
Maybe someone from Fangraphs with access to the more granular data can do a more accurate job of comparing.
I would also guess a big part of the problem is in xFIP and UZR. UZR is attempting to break down a defense and analyze it. Is the correlation between adding up all the UZR on a team and the runs actually allowed for that team higher or lower than the correlation between WAR batting wins and the runs actually scored for a team? Sometimes you miss system interactions (scooping, throwing to the wrong base which allows a runner to move up) when you take everything apart and measure it separately.
Very good point. There are certainly gaps in what UZR captures and what FIP captures, and that can most definitely cause systemic uncertainties.
You also have the completely missing catcher defense issue, although I understand the reticence to value something that's pretty hard to quantify.
I'm also not sure how Fangraphs handles non SB-baserunning. Eyeballing B-Pro's non-SB baserunning, in 2009 you had a spread of +15 to -15 between the best (Col) and worst (Bal) teams in that, and that could also have an impact.
If you want WAR to add up to real wins it has to be calculated using inputs that are actual and consistent. Since it isn't it is impossible for it to all add up. Hence Tom is absolutely right to question the use of WAR in this manner.
Now that doesn't mean fangraphs is wrong since their purpose is to attempt to quantify true talent level. That's a valid enterprise. However since we know that true talent does not equal actual performance we should not be surprised when the aggregate of true talent on a team level does not add up to actual wins.
There are other features, of course, that (concievably) make WAR less than accurate at the team level, mostly having to do with splitting credit. I don't want to rehash the issue of positional adjustments, only to say that all methods come with some sort of error bar around them, and that accumulates at the team level.
If you were designing a metric designed purely to account for team wins, of course you wouldn't need to adjust for position - each team has exactly the same amount of shortstop playing time as the other, which is not true at the individual player level.
No. Why would you say that?
I thought all lines were linear =)
They use FIP. Hence they aren't trying to be descriptive they are trying to be prescriptive.
Isn't this basically strength of schedule? A replacement team in the AL East will win fewer games than a replacement team in the NL West, because the competition is stronger. A difference of 9 wins seems reasonable.
Average number of wins among division teams (that 84 number) does not correct for this. That says the AL East was 3 games per team over .500 against other-AL opposition, and the NL West was the same against other-NL opposition, which aren't the same thing.
That doesn't have to follow. You can use FIP/DIPS to estimate the pitchers' actual contribution to run prevention, separate from what the fielders contributed. In fact, you must do something like that if you want to break down defensive value between pitchers and fielders.
Actually, doesn't Fangraphs use xFIP, which "normalizes" HR/FB? So, at a minimum, they're not measuring actual home runs allowed on the pitching/defense side of the ball.
What about WAR would correct for this, though? Does Fangraphs adjust the raw statistics for strength of opponent before they convert them to WAR?
No. They use FIP.
Just the ones denoting zip codes.
There's also the issue of FIP ignoring the order of events. A pitcher could have 3 BB, 3 K, and 1 HR in an inning and give up either one run or four runs depending on the order of events. FIP treats both the same, which arguably distorts the "pitchers' actual contribution to run prevention."
Thanks, CW. I stand corrected.
-- MWE
I do. I'm not sure what's funny.
Positive? Great Jupiter's ghost!
Also Boston's Pythag was 93-69 and TBs was 86-76
TB's WAR is out of line with both it's actual record and it's Pythag record, wheras Boston is pretty close on all three. So WAR apparently thinks TB should have scored more runs than they did, or given up less runs than they did.
The Angels scored 883 runs with an OPS of .792
TB scored just 803 run with an OPS of .782
the Twins scored 817 runs with an OPS of .774
TB gave up 754 runs (8th)
TB pitchers gave up an OPS of .741 (4th)
I think TB simply underperformed across the board, they scored 10-20 runs less than their component stats would suggest, AND they gave up 10-20 more runs than their component stats would suggest AND they underperformed their pythag by 2 games.
Something doesn't seem right here.
When you order by difference (FIP+UZR-RA), it just happens to be exactly the same as ordering by alphabetical team name?
If that is REALLY the case (and not some bizarre sorting error before posting the results), then that is the greatest coincidence I have EVER seen in statistics.
Team names starting in A are the new market inefficiency?
Please give a big hand to your 2010 Kansas City Aardvarks!
Amazing, isn't it? Or I f'ed up my sort. Here's how the list should look.
Team: FIP + UZR / RA / Diff
Mets: 822 / 757 / 65
Dodgers: 676 / 611 / 65
Cubs: 737 / 672 / 65
Twins: 808 / 765 / 43
Cardinals: 682 / 640 / 42
Reds: 763 / 723 / 40
White Sox: 767 / 732 / 35
Phillies: 738 / 709 / 29
Blue Jays: 796 / 771 / 25
Braves: 665 / 641 / 24
Yankees: 775 / 753 / 22
Orioles: 892 / 876 / 16
Brewers: 829 / 818 / 11
Giants: 621 / 611 / 10
Rangers: 745 / 740 / 5
Tigers: 748 / 745 / 3
Angels: 763 / 761 / 2
Astros: 769 / 770 / -1
Red Sox: 734 / 736 / -2
Padres: 767 / 769 / -2
Mariners: 684 / 692 / -8
Indians: 853 / 865 / -12
Rockies: 702 / 715 / -13
Nationals: 856 / 874 / -18
Marlins: 745 / 766 / -21
Pirates: 737 / 768 / -31
Royals: 801 / 842 / -41
Athletics: 711 / 761 / -50
Diamondbacks: 712 / 782 / -70
Rays: 684 / 754 / -70
Team WAR should reconcile to a teams's pythag as calculated by runs created (or whatever method you're using to calculate offensive contribution and calculated runs allowed (expressed as calculated runs allowed by pitching and defense)
There are 4 sources of potential error. 3-4 games standard error for pythag itself. ~15 runs if using good offensive metrics, and god knows on the defensive side. I've never seen anybody publish how well (say) FIP and (team total of defensive metric of choice) reconciles (and to me this is an important sanity check on any and all defensive methods. Yes, I'm aware that FIP itself has a standard error, but the better the defensive method the smaller the overall standard error)
Anyhow, for planning purposes let's call the overall error on the runs allowed side 25 runs (it's probably better than that, but I'd like to see that demonstrated)
So if Silva's point is that WAR isn't perfect, no kidding. As with any well thought out method it's a very useful place to start any value discussion. And you ought to be able to offer up some reasons why it missed in any specific case.
If the Mets really were 65 runs lucky on defense last year, and it reverts this year there are going to be mass suicides on this site.
Not exactly. Wouldn't be that tough to tease out how much each component (Team Offense, team pitching and team defense) is contributing to the error in the estimate of team wins. (I'm not interested enough to do the heavy lifting, but if you want to know how much errors in the offensive estimates are skewing things the simply calculate offensive WAR from actual team runs scored rater than using the derived numbers. You could also use WPA to double-check any given part.)
How much of the discrepancy between WAR's estimate and actual team wins will turn out to be imperfections in the model? I'd be really surprised if the standard error between team WAR and team wins is under 5/162 games played.
And timing is going to be he biggest factor. Close games aren't precisely random, just close enough for planning purposes. You can build a strong bullpen. A good bench doesn't hurt. Beyond that, luck plays a heavier role than at any time.
I understand that this is justification for using FIP instead of WAR, but this seems like a misuse of the concept to me. What we know is that FIP/DIPS has a better correlation to the next year's ERA than does the previous year's. It is quite a leap to go from that probabilistic statement to "In a given year, FIP - ERA = Defense."
Maybe there are studies that underlie the Fangraphs decision to sublimate the FIP/ERA disparity into the defensive ether; I'd be interested in them if that's the case.
The two teams with gold glove catchers. WAR is underrating Joe Mauer, he does everything well that it doesn't measure.
Ten losses is an awful lot of losses to explain away with luck. If WAR thinks the DBacks earned ~80 wins, but they won only 70 games, and that 70 is still an acceptable error, then 90 wins would have also been an acceptable measure, right? That's giving AZ a whole bunch of good luck, rather than bad luck.
So WAR has estimated the quality of the 2009 Diamondbacks at somewhere between 70 and 90 wins. Is that really useful? Is that an acceptable estimate?
I checked two divisions at random and found two serious anomalies. I don't know how many 10-win outliers (and the DBacks were really more than that, near as I can tell) a correlation of 0.83 would produce, but if it's one per division, or even one per league, then yeah, that raises a lot of doubts.
Thanks to everyone for their input. By the way, I got to thinking about this after reading the thread on Ben Zobrist yesterday, when someone insisted that Zobrist was the MVP because of his 0.3 WAR edge over Joe Mauer, and would brook no disagreement. I had no idea what I'd find when I checked the Rays' team WAR, but the result make perfect sense: WAR thinks the Rays were a lot better than they actually were last year, so it apportioned a lot more credit for wins than would seem to be merited. Ben Zobrist, the best player on the Rays, would be the biggest beneficiary of that.
Why? It's just the sum of its inputs, so unless the inputs introduce distortion (error), I don't understand why the sum would. There's an opportunity for larger distortions (error) to exist, but also an opportunity for error to cancel.
great point, #87.
Except, of course, that there would be no indication of whether the flaw/cause of discrepancy lies with FIP, UZR, or in the methodology of the sanity check itself. Your larger point is still a good one, but there are going to be hardships and contention in providing a sanity check for almost any stat.
Worst case, you get a jumble of randomness which will at least allow you to reject the hypothesis that FIP + chosen defensive method = team run prevention -- and then you have to figure out where the problem lies.
To me the real key for such a test is coming up with an acceptable method before running the study.
This use of FIP is where the sabr movement loses me. I mean, its great to come up with an imaginary number of what theoretically should have happened and what is more likely to happen next year. But to use it as an a method of determining real value contributed in the past just fails for me. Obviously a lot of people disagree with me, but I think its at least one step removed from reality.
Question about a question: [89] So WAR has estimated the quality of the 2009 Diamondbacks at somewhere between 70 and 90 wins. Is that really useful? Is that an acceptable estimate?
Nobody actually uses aggregate WAR to rank team quality the way Silva suggests, do they? It's pretty much an individual true-talent-level style metric, isn't it?
To demonstrate that it actually works as intended you should be able to show that FIP + (some defensive metric) models team runs allowed to some acceptable degree. Where "perfection" would be a standard error in the 15 run range.
As Jeff points out, we won't know whether the problem is with the FIP method or (say) UZR, ZR, whatever. For that reason, it's best to run the test with as many metrics as possible. If you get standard errors in the same general range and they're higher than you'd like it's an indication that FIP has a problem.
I think this is probably correct, but I also think it fails in this particular instance. If #60 had been a chart of discrepancy of Pythag W/L from actual W/L, then sure, looking at the consistent deviators for commonalities might* help you do some bodywork on the Pythag stat. (I * there because in that case, you'd need overwhelming evidence of predictable bias based on team type in order to justify changing Pythag.) But FIP, as noted above, doesn't claim nor is it constructed to reproduce macro runs allowed. So again, I do agree with your general point that these sanity checks are good ideas, and their underutilization is to the detriment of all involved, but I don't find much instructive about those discrepancies in #60.
Worst case, you get a jumble of randomness which will at least allow you to reject the hypothesis that FIP + chosen defensive method = team run prevention -- and then you have to figure out where the problem lies.
Fair enough, but I'd say that the weak rejection of that hypothesis that you get from the jumble of randomness is no better than the rejection of that hypothesis that everyone has already made based on common sense and knowledge of the stats. Are there people out there claiming that FIP+DefMetric=Actual Team Run Prevention? I haven't seen that. It would seem to me that the best you could hope to claim is
FIP+DefMetric=Team Run Prevention (Minus statistically insignificant noise and only a more useful number than Runs Allowed for predictive purposes.)
Nobody actually uses aggregate WAR to rank team quality the way Silva suggests, do they? It's pretty much an individual true-talent-level style metric, isn't it?
I don't know of anybody in particular off the top of my head, but I don't see an inherent reason why it would be bad. It's a question of philosophy that really goes back to the same overall discussion. If you can't add up all the individual numbers and get a pretty close approximation of the team number, why not? And whatever the answer to that is better be a good one, because if it isn't, it very much calls into question the validity of WAR itself. That was Tom's point from up thread.
As for the question you quote, so to Tom: that 70-90 number is an exaggeration. Nobody's saying "10 wins is the accepted margin of error." This is basic statistics, which you well know. It would perfectly fine to say that the standard deviation there is, say, 4. So 68% of the teams will be +/- 4 from their actual win total, 95% +/- 8, and the occasional 10 is not an indictment of the stat. It's an acceptable result that in and of itself is indicative of nothing.
So we are the guys that schtup her and talk about her behind her back??
1st order is simply pythagorean w/l
2nd order is "component" runs - i.e, accounting for clutch-y-ness
3rd order is adjusted for strength of opposition
WAR is equivalent to 2nd order W/L here.
So, the Giants +5 wins above WAR turns out to be mostly 2nd order difference. For whatever bizarre reason, they were clutch in getting runs out of their (pathetic) individual batting and (glorious) individual pitching events.
This 5 wins is pretty important if you are a Giants fan, but I don't think one would expect this to carry over to 2010.
fangraphs also has WPA (win probability added) value stats which are the opposite of WAR, in that they include leverage and clutch performance.
WPA should sum much closer to actual W/L (modulo replacement, or whatever baseline WPA uses)
The "Wins" in WAR is certainly less egregious than than "Wins" given to starting pitchers.
You must be Registered and Logged In to post comments.
<< Back to main