Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Tuesday, September 06, 2011

IATM: Is WAR the new RBI?

Give me OBP, give me OPS, give me IPO, give me WPA, give me K/BB; just don’t give me RBI!  If you’re going to give me RBI, Mr. McCarver, I’d rather you gave me nothing.

And then came WAR.

The concept was ratified by the sabremetric Godfather, Bill James, who’d created Win Shares according to a similar ideology in 2002.  It was a neoclassical economist’s wet dream, like baseball GDP: an elegant equation which accounted for all the sport’s diverse variables and yielded a single number roughly reducible to the oldest and most hallowed statistic of them all, the win.  Hallelujah.

Wins Above Replacement is a beautiful idea.  Euclidean grace in a quantum world.  A simple answer, not only for age-old baseball conundrums like “Mantle or DiMaggio?”, but also a formula for unprecedented comparisons like “Rickey Henderson v. Johnny Bench” and “Roy Halladay v. Alex Rodriguez“.

There’s only one problem.  It doesn’t work.

At least, not yet.  Not in the fantastically straight-forward way we try to use it.  The idea is so good, so clarifying – like democracy or the rational market – that we really, really want it to work, we’re willing to suspend our disbelief just a little while longer in the hope that it might.

...The cruel irony, the I-could’ve-had-Sean-Doolittle-and-all-I-got-was-stupid-Barry-Zito irony, is that the problem with WAR is the same as the problem with RBI.  It frequently measures context as much as performance.  Especially when used to evaluate single seasons, it doesn’t sufficiently account for the inevitable variations in opportunity and environment.

Thanks to Tango...

Repoz Posted: September 06, 2011 at 03:18 PM | 164 comment(s) Login to Bookmark
  Tags: history, projections, sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 2 pages  1 2 > 
   1. snapper (history's 42nd greatest monster) Posted: September 06, 2011 at 03:40 PM (#3917707)
Most of this article (I read/skimmed 80% of it) seems to rest on single season defensive stats not being very accurate yet.

It's definitely a flaw in WAR, but not really analogous to the flaws in RBI.
   2. Don Malcolm Posted: September 06, 2011 at 04:11 PM (#3917738)
In one of his comments, Hippeaux notes the following:

I slightly misrepresent how the stat works here, in favor of making the statement more hyperbolic.

Clearly a man after my own heart...
   3. Tricky Dick Posted: September 06, 2011 at 04:15 PM (#3917740)
Most of this article (I read/skimmed 80% of it) seems to rest on single season defensive stats not being very accurate yet.


I suppose it depends on what use you give WAR. But, if the idea is to project future value (as in, how much should a team pay for a free agent?), then all single season stats are flawed. Presumably you would want to look at multiple seasons of WAR in order to eliminate single season variations---and that same concept would apply to OPS, OBP, SLG, or any other statistic. I haven't read the article, but (based on the quoted passage) a big part of the problem is that the author judges WAR with excessive expectations, as if it is supposed to be the holy grail of statistics. WAR is just another tool that can be useful for certain tasks.
   4. Rickey! trades in sheep and threats Posted: September 06, 2011 at 04:21 PM (#3917752)
WAR doesn't tell me anything that OPS, measured with a passing knowledge of the park and league, and a general understanding of the player's defensive skill set doesn't already tell me. But it does provide a number. A number with a decimal place! And god knows, people do love their false precision metrics like water for chocolate.
   5. snapper (history's 42nd greatest monster) Posted: September 06, 2011 at 04:26 PM (#3917759)
But, if the idea is to project future value

No, I think it's always intended to be backward looking as to value.
   6. BDC Posted: September 06, 2011 at 04:46 PM (#3917774)
I like WAR just fine. In practice, it's like Win Shares or even the old Approximate Value: it gives you a "ballpark" number for each player to use in thinking about him. That's good, and it's much more plausibly derived than Win Shares or other such numbers, which is even better.

When I don't like WAR is when you are having a discussion about whether some player was better than another, and you hear an argument that is basically "60 > 50, so there." That's arithmetic at its finest, but it's not a very nuanced position.
   7. Greg Maddux School of Reflexive Profanity Posted: September 06, 2011 at 04:46 PM (#3917775)
WAR doesn't tell me anything that OPS, measured with a passing knowledge of the park and league, and a general understanding of the player's defensive skill set doesn't already tell me.

Feel free to explain the relative values of a .900 OPS slick-fielding first baseman in Dodger Stadium, a .780 OPS average-fielding shortstop in Fenway Park and a .750 OPS catcher with a great arm and not-so-hot pitch-blocking skills in Coors Field without any kind of framework relating all those things to runs or wins.
   8. AROM Posted: September 06, 2011 at 04:49 PM (#3917778)
Feel free to explain the relative values of a .900 OPS slick-fielding first baseman in Dodger Stadium, a .780 OPS average-fielding shortstop in Fenway Park and a .750 OPS catcher with a great arm and not-so-hot pitch-blocking skills in Coors Field without any kind of framework relating all those things to runs or wins.


And then compare those to a starting pitcher with 215 innings with a 120 ERA+, and a closer with a 175 ERA+ in 65 innings. If I could keep track of all the variables in my head I wouldn't have bothered with any WAR system.
   9. AROM Posted: September 06, 2011 at 05:06 PM (#3917790)
Yet, we misuse WAR to insist that it’s better to have Ian Kinsler than Miguel Cabrera


Fangraphs has Kinsler ahead, BB-ref has Cabrera. The difference is not really in their relative defensive ranking, but the offense rating, probably due to different park factors. Either way, they are close enough that you can't rate one definitively ahead of the other.

The writer is making an assumption here that of course Cabrera's great bat is more valuable than Kinsler's good bat, strong baserunning, and solid glove. It's not proven though. Are you better off with Cabrera + bad 2B or with Kinsler + bad 1B?
   10. Rickey! trades in sheep and threats Posted: September 06, 2011 at 05:37 PM (#3917815)
Feel free to explain the relative values of a .900 OPS slick-fielding first baseman in Dodger Stadium, a .780 OPS average-fielding shortstop in Fenway Park and a .750 OPS catcher with a great arm and not-so-hot pitch-blocking skills in Coors Field without any kind of framework relating all those things to runs or wins.


Each of those player's values depend on the team, other options and where they are on the competitive cycle. WAR doesn't really do anything but provide a false security blanket of round numbers. It pretends to account for that complexity without actually accounting for the complexity. I'd rather have the complexity staring me in the face.
   11. flournoy Posted: September 06, 2011 at 05:42 PM (#3917824)
It's definitely a flaw in WAR, but not really analogous to the flaws in RBI.


I agree, but the flaws with WAR are much more serious. I said in another thread a while back that any statistic that tells me that Freddie Freeman has been below average defensively this year is pointless beyond belief. Imagine a stat that said Jose Bautista has been below replacement level with the bat this year. There's just nothing you can do with it. It's completely invalid. It's worse than if it didn't exist at all - a lack of information is better than obviously incorrect information.

However, Freddie Freeman has driven in 64 runs this year. This I know is true. I don't really know what it means, if anything, about Freeman's value as a baseball player, but at least I know it's not wrong.
   12. Bob Evans Posted: September 06, 2011 at 05:59 PM (#3917839)
It pretends to account for that complexity without actually accounting for the complexity.

Well, then the issue isn't WAR, but how it's used.
   13. John Northey Posted: September 06, 2011 at 06:02 PM (#3917842)
I see WAR as a method to rate general stuff. For example, how many WAR did a draft class produce? How many players cracked 10 WAR career-wise from that draft? Easy ways to quickly tell if a draft was a 'good' or 'poor' one for a team or the league. Same for groups of players - ie: how many rookies did a team produce who would have a 4+ WAR season or something like that. A quick, easy method to get a general idea on things.

Now, if you are using it to say which player deserves the MVP then you better be using a wide range - IE: player A has a 8.1 WAR and player B has a 8.0 tells you nothing. But if Player C has a 2.7 WAR then odds are he wasn't as valuable as the 8's (see 1987 NL MVP voting).
   14. snapper (history's 42nd greatest monster) Posted: September 06, 2011 at 06:04 PM (#3917844)
I said in another thread a while back that any statistic that tells me that Freddie Freeman has been below average defensively this year is pointless beyond belief.

You've said that before.

How can you be so sure he's not? Have you scouted Freeman and the 29 other MLB first basemen extensively?
   15. Crosseyed and Painless Posted: September 06, 2011 at 06:04 PM (#3917846)
I said in another thread a while back that any statistic that tells me that Freddie Freeman has been below average defensively this year is pointless beyond belief. Imagine a stat that said Jose Bautista has been below replacement level with the bat this year.


I'd be interested to see WAR calculations using hitting numbers, ballpark adjustment, and position scarcity, but assuming for average defense. I don't know if that would be useful to anyone, I suspect it wouldn't be, but I'd still like to see them.
   16. SoSH U at work Posted: September 06, 2011 at 06:05 PM (#3917848)
Well, then the issue isn't WAR, but how it's used.


Which, of course, would make it identical to RBI.
   17. Ray (RDP) Posted: September 06, 2011 at 06:06 PM (#3917849)
It's definitely a flaw in WAR, but not really analogous to the flaws in RBI.


Yes, though there are some analogous flaws, like lineup position. Someone who leads off for his entire career finishes with more WAR than had he batted 3rd or 4th, despite 3rd and 4th hitters generally being better hitters overall due to their power.

Anyway, the problem with WAR is that it lumps too many things together, and we can't be confident in all of the things it's lumping in.
   18. AROM Posted: September 06, 2011 at 06:08 PM (#3917852)
I'd be interested to see WAR calculations using hitting numbers, ballpark adjustment, and position scarcity, but assuming for average defense. I don't know if that would be useful to anyone, I suspect it wouldn't be, but I'd still like to see them.


Do you know that BB-ref has exactly that, and calls it offensive WAR?
   19. Crosseyed and Painless Posted: September 06, 2011 at 06:19 PM (#3917864)
Do you know that BB-ref has exactly that, and calls it offensive WAR?


I'm an idiot apparently, because I didn't know that. Thanks for cluing me in.
   20. Chris Needham Posted: September 06, 2011 at 06:28 PM (#3917871)
19: In the future, please try to keep up with all 43,000 variants of this one single statistic that's better than all the others. Sheesh.
   21. Never Give an Inge (Dave) Posted: September 06, 2011 at 06:36 PM (#3917884)
Yes, though there are some analogous flaws, like lineup position. Someone who leads off for his entire career finishes with more WAR than had he batted 3rd or 4th, despite 3rd and 4th hitters generally being better hitters overall due to their power.

This is a problem with all counting stats, but less of a problem with WAR than with raw stats since it compares each player to a baseline. It also seems like this would be relatively easy to correct for.
   22. Rickey! trades in sheep and threats Posted: September 06, 2011 at 06:38 PM (#3917886)
I'm an idiot apparently, because I didn't know that. Thanks for cluing me in.


At that point, oWAR is just a pointless summation of OPS+ adjusted for position/park/league, no? You can do that in your head just as well. Granted, you're probably not going to get decimal points, and god knows we love decimal points, but it's something to live without.
   23. Greg K Posted: September 06, 2011 at 06:41 PM (#3917892)
Yes, though there are some analogous flaws, like lineup position. Someone who leads off for his entire career finishes with more WAR than had he batted 3rd or 4th, despite 3rd and 4th hitters generally being better hitters overall due to their power.

Hmm, I wonder who this could be referring to.

Your ceaseless campaign to besmirch the name of a fine player must come to an end! Leave poor Rickey alone.
   24. Crosseyed and Painless Posted: September 06, 2011 at 06:42 PM (#3917894)
At that point, oWAR is just a pointless summation of OPS+ adjusted for position/park/league, no? You can do that in your head just as well. Granted, you're probably not going to get decimal points, and god knows we love decimal points, but it's something to live without.


I don't really think I can do that in my head, but I already said I'm an idiot once in this thread so I won't bother doing so again.
   25. Greg K Posted: September 06, 2011 at 06:48 PM (#3917901)
At that point, oWAR is just a pointless summation of OPS+ adjusted for position/park/league, no? You can do that in your head just as well. Granted, you're probably not going to get decimal points, and god knows we love decimal points, but it's something to live without.

Well I think it's sole purpose is to make it easy to plug in your own defensive ratings (which seems somewhat convenient seeing as it is the one thing everyone agrees is the least reliable element of WAR).

I think you might be oversimplifying a little bit. Yes, someone who says Ellsbury is clearly more valuable than Adrian Gonzalez because he beats him in WAR 6.1 to 6.0 is fooling themselves.

Maybe I'm just not as adept at juggling things in my head as you, but take Yunel Escobar hitting .290/.364/.412 in the SkyDome, middling baserunner, seems like a good shortstop. Is that more or less valuable than Paul Konerko hitting .311/.398/.535 in Chicago with as far as I know adequate fielding at first? Off the top of my head I'm not sure. WAR tells me that Escobar's probably better if you rate his defence as very good, and maybe closer to equal if you just think he's above average. It's information. There's no reason that I need to be a slave to it.
   26. Rickey! trades in sheep and threats Posted: September 06, 2011 at 06:49 PM (#3917902)
I don't really think I can do that in my head, but I already said I'm an idiot once in this thread so I won't bother doing so again.


Sure you can. You just don't end up with a number. But you can mentally adjust Curtis Granderson for New Yankee Stadium well enough. You can note that JoeyBats gets extra points for playing 3B, where Pujols doesn't because he plays 1B. You can note that Derek Jeter loses points for being really bad defensively.

You don't need a linear weights calculation to evaluate these things.
   27. AROM Posted: September 06, 2011 at 06:50 PM (#3917904)
At that point, oWAR is just a pointless summation of OPS+ adjusted for position/park/league, no? You can do that in your head just as well. Granted, you're probably not going to get decimal points, and god knows we love decimal points, but it's something to live without.


You're fighting a strawman if you think anyone believes that 4.4 is meaningfully different from 4.3. It is the same information that you get from OPS+ and adjustments. The point is to represent things in a concept - runs - to make it easy to compare the contributions of players in very different roles - pitcher/slugging 1B/defensive shortstop, etc.

But if you want to persist with your act maybe we can talk about the pointlessness of having OPS+ calculated to 3 figures. That is no more meaningful than the decimal places in WAR.
   28. AROM Posted: September 06, 2011 at 06:51 PM (#3917906)
You can note that JoeyBats gets extra points for playing 3B, where Pujols doesn't because he plays 1B.


Pujols can play 3B too - and has in 2011.
   29. Randy Jones Posted: September 06, 2011 at 06:55 PM (#3917913)
Sam is just trolling in this thread.
   30. AROM Posted: September 06, 2011 at 07:00 PM (#3917917)
Here's my reasoning on the decimal places displayed in WAR.

Is 4.3 better than 4.2? I don't know, it's a tossup. But is a 4.4 player better than 3.5? Not absolutely, not to the extent we can be sure Barry Bonds was better than Royce Clayton, but very probably. So I'd rather have the decimal place to distinguish them instead of calling them both 4.

Could I go to 2 decimal places? The answer depends on if I think 4.24 is better than 4.15, and the answer is no. So 1 decimal place it is.

Since the decimal place represents runs, then an intellectually consistent critic would also demand that stats such as runs created also round off to the nearest 10.
   31. Morty Causa Posted: September 06, 2011 at 07:02 PM (#3917918)
C'mon, Repoz, where's the youtube link to the song, at least, or, better yet, to the Seinfeld and Elaine War and Peace literary chat?
   32. Rickey! trades in sheep and threats Posted: September 06, 2011 at 07:13 PM (#3917928)
But if you want to persist with your act maybe we can talk about the pointlessness of having OPS+ calculated to 3 figures.


To be perfectly honest, all I ever look at is the three components as a slash stat line.
   33. AROM Posted: September 06, 2011 at 07:26 PM (#3917942)
To be perfectly honest, all I ever look at is the three components as a slash stat line.


Before or after the decimal?
   34. snapper (history's 42nd greatest monster) Posted: September 06, 2011 at 07:31 PM (#3917948)
To be perfectly honest, all I ever look at is the three components as a slash stat line.

And adjust those for park, scoring environment, position, baserunning, position and defence, all in your head, without referencing run values?

Riiiiight.
   35. Repoz Posted: September 06, 2011 at 07:39 PM (#3917951)
Neyer's take....

You know what makes me want to poke someone in the eye with a sharp stick? When someone writes we when he means you ... and isn't honest enough or brave enough to identify you. Are there particular writers or broadcasters who have been abusing WAR this season? Is someone out there actually suggesting that Carlos Lee is just as valuable on defense as Troy Tulowitzki?

Because if nobody is doing those things, then Hippeaux is wasting his considerable talents against a straw man. If somebody is doing those things, then Hippeaux should tell us who. Otherwise we can only assume that he's engaging in mere sophistry.

Hippeaux does make some good points, which I have not excerpted because I can't excerpt everything. Single-season UZR's can be terribly misleading, which everyone's known for a long time. FanGraphs does a poor job with catchers' defense, which was excusable for a long time, but maybe isn't anymore, considering the PITCHf/x work that's been done recently. But considering only hitting statistics and positional scarcity, WAR is a great starting point for any discussion of a player's value. And the defensive stuff is generally reliable, too. As long as you don't go nuts and assume that one season of data tells you everything.

My advice to would-be iconoclasts like Hippeaux? Be specific in your criticisms, with examples of actual people who are making the actual mistakes you say we are making.
   36. DFA Posted: September 06, 2011 at 07:44 PM (#3917956)
It pretends to account for that complexity without actually accounting for the complexity. I'd rather have the complexity staring me in the face.


Well said. I think some want UZR to be something it's not.
   37. Gotham Dave Posted: September 06, 2011 at 07:45 PM (#3917959)
I'm nostalgic for VORP.
   38. Greg K Posted: September 06, 2011 at 07:56 PM (#3917965)
Well said. I think some want UZR to be something it's not.

You better name names or you meant end up getting garotted by flannel in your sleep.
   39. Greg K Posted: September 06, 2011 at 07:56 PM (#3917967)
Well said. I think some want UZR to be something it's not.

You better name names or you might end up getting garotted by flannel in your sleep.

EDIT: Uh oh, the dreaded double post that somehow includes a correction.
   40. Fly should without a doubt be number !!!!! Posted: September 06, 2011 at 08:22 PM (#3917986)
What's the flaw in RBI? Doesn't it tell you how many runs the batter batted in?
   41. PreBeaneAsFan Posted: September 06, 2011 at 08:27 PM (#3917991)

Sure you can. You just don't end up with a number. But you can mentally adjust Curtis Granderson for New Yankee Stadium well enough. You can note that JoeyBats gets extra points for playing 3B, where Pujols doesn't because he plays 1B. You can note that Derek Jeter loses points for being really bad defensively.

You don't need a linear weights calculation to evaluate these things.


In other words, you just make #### up. In that case, why even bother with the slash lines? Why not just use BA and "mentally adjust" for onbase and power skills? Or why use any numbers at all?

I think this is a problem that I see a lot not just in relatively unimportant venues like sports, but also in more important arenas (popular discussions of science, economics, etc.) People correctly point out that we don't have precise answers and that our best quantifications have error bars that are large than the number of decimal places reported. That's a valuable insight and worth discussing, but then people take it a step further and use that as an excuse to remain completely agnostic on things. By denigrating the best efforts of others to quantify difficult questions and insisting that "I don't need all that fancy stuff, just give me the basics and I'll take my own guess since no one knows" they give themselves a feeling of smugness and superiority to those bookish nerds vainly searching for answers they can't pin down, but they also throw away valuable information that the effort to quantify those things tells us and in most cases behave as though the uncertainty is much greater than it actually is.

So WAR isn't accurate down to the decimal place-that doesn't mean it isn't a useful starting point for discussion. It doesn't mean that you can't learn something by seeing two very different types of players with similar WAR values. And it certainly doesn't mean that the process and idea behind WAR isn't a worthwhile pursuit.
   42. Walt Davis Posted: September 06, 2011 at 08:30 PM (#3918003)
I'm nostalgic for VORP

Then look at oWAR or oRAR if you want it in runs.

And maybe my memory's off but didn't WARP precede Win Shares?

Anyway, I like WAR. Nice, one-stop conversation starter/ender.

One of the things I like about it (and I'd like somebody to investigate as thoroughly as we do defense ... OK, maybe not as thoroughly as it's less important) are the baserunning, roe and dp numbers. We never really had good notions of what was being added/lost through those other than scoffing at all those high GDP guys. We tended to greatly over-value baserunning in my day as it turns out.

But it's primary role is as a conversation starter or, occasionally, ender. Upon a time, I would have scoffed at the notion that Adam Dunn could be so bad defensively as to wipe out his bat's value. Yet, from ages 26 to 29, a time when Dunn put up a 131 OPS+, he put up just 6.6 WAR in 2600 PA, so below-average production.

Now that doesn't make WAR right about Dunn but it certainly means I can't reject out-of-hand an argument that he wasn't a good overall player during that time frame. Had anybody ever suggested Bell or Bando for the HoF I would have looked at them as though they were insane but maybe Bell was the Rolen of his day. True, a lot of that is that he was a better hitter than I remember, especially in his prime, and just looking at something like OPS+ would have told me that.

Ray is way out in front of the anti-Ichiro parade but I'd probably be a lot closer to him without WAR with the flip side of the Dunn argument -- yes he's a fine baserunner and fielder but he's still just a 114 OPS+ RF. How many runs could his fielding and baserunning possibly add? Well, as it turns out, they could _possibly_ add 220 runs or 22 wins. Jeepers. I'm still not sure I actually believe that but a set of objective measures that have had a lot of work go into them say that he has clearly puts it in the realm of possibility. I never would have believed that a guy could be adding 2 wins a year over 10 years through defense and baserunning.

We all have plenty of moments when we look at WAR and say "that's crazy." But if you're not having at least as many moments where you look at WAR and think "maybe I over/under-rated that guy", then you're just an obstinate #### who might as well not look at numbers at all.
   43. Zach Posted: September 06, 2011 at 09:09 PM (#3918012)
I don't like WAR, either as a statistic or how it's used in practice. Defensive stats are so poor that they don't have any business being treated on an equal basis with offensive stats.

In common usage, it's even worse due to false precision and small sample size issues.

Eric Hosmer is the perfect encapsulation of the problems with WAR. He's hitting 287/.337/.456 for an OPS+ of 119 this year. Offensive WAR (BBref) of 2.1. Yet his defensive rating is -0.9 war, which puts him as the worst first baseman in the league. This despite drawing rave reviews, speculation about future gold gloves and being among the league leaders in Bill James's list of first baseman scoops (not a definitive stat, but it gives some evidence that he's looking alive out there).

Here you've got half of a very good player's offensive value getting canceled out by a negative number that in all probability shouldn't even be negative. Why would I ever want to include that crummy defensive number?
   44. Rickey! trades in sheep and threats Posted: September 06, 2011 at 09:25 PM (#3918020)
I think this is a problem that I see a lot not just in relatively unimportant venues like sports, but also in more important arenas (popular discussions of science, economics, etc.)


People pretending that favored pseudo-scientific "statistics" create a level of certainty and precision far beyond anything that they could ever actually measure or know? I agree!
   45. Ray (RDP) Posted: September 06, 2011 at 09:28 PM (#3918024)
But considering only hitting statistics and positional scarcity, WAR is a great starting point for any discussion of a player's value.


The problem is that people here often begin and end a discussion with WAR -- even as they comically insist that's not what they're doing.
   46. Joe Kehoskie Posted: September 06, 2011 at 09:34 PM (#3918031)
Defensive stats are so poor that they don't have any business being treated on an equal basis with offensive stats.

I like the concept of WAR, but I agree with the above 100 percent. I often find myself all but ignoring the WAR number and looking at oWAR instead.
   47. BDC Posted: September 06, 2011 at 09:37 PM (#3918034)
Why would I ever want to include that crummy defensive number?

a) It might be right
b) It's methodical, and therefore consistently comparable to other players' WAR
c) It's the latest refinement in a process (defensive evaluation) that gets better all the time; you can't just defer putting it all together till we live in the most perfect world
d) It's a SSS and the number is currently provisional, like any other player performance data
   48. Greg K Posted: September 06, 2011 at 09:38 PM (#3918036)
People pretending that favored pseudo-scientific "statistics" create a level of certainty and precision far beyond anything that they could ever actually measure or know? I agree!

It's not quite the same thing, but this seems to be a trend in history. Post-modernists point out that we don't nearly have the certainty about the past that traditional historians seem to suggest (a valid and important point) and move from there to, "so why bother going to the archive! Let's use literary documents as historical sources, or just make #### up!"

Uncertainty is something you should keep in mind as you examine the limited facts we are able to put together, not a reason to throw them out the window.

EDIT: I should point out that literary sources can (I'd maybe even go so far as to say MUST depending on the research) be incorporated into historical research, but they tell you very different things than traditional historical sources.
   49. Greg K Posted: September 06, 2011 at 09:44 PM (#3918041)
Why would I ever want to include that crummy defensive number?

I think the key is to not just take it as the end of the discussion. I remember the days before bbref had WAR defensive numbers up. If I wanted to know how good some 2B in the NL (who I never saw play) was how would I find out? Fielding %? Find a fan of the team and ask? Not the most convenient or reliable methods. Sure every now and then you get a player with -7 one year +15 the next, -19 the year after...which doesn't provide much clarity. But if I look up a player and see he's at least +7 every year of his career I'm going to think...ah ok, this stat isn't 100% reliable, but there seems to be a pretty strong indication that he's kinda good.

WAR is a fine stat, so long as you use your brain at the same time you're using it. And you understand that it's not going to be 100% accurate all the time.
   50. Joe Kehoskie Posted: September 06, 2011 at 09:48 PM (#3918043)
a) It might be right

With respect, I'd suggest that if this is (a), there's no need to go on to (b). I need more than "might" from a stat that's used so widely as the definitive tool for valuing players.
   51. Zach Posted: September 06, 2011 at 09:54 PM (#3918048)
Apologies if the point by point response seems too aggressive, but this list is a good summary of why I think defensive numbers should not be included.

a) It might be right

It isn't.

b) It's methodical, and therefore consistently comparable to other players' WAR

The method has problems that have been well known for thirty odd years.

c) It's the latest refinement in a process (defensive evaluation) that gets better all the time;
you can't just defer putting it all together till we live in the most perfect world


The needed improvements are in the data, not the method. Defensive statistics have improved slightly in the last decade, but require extremely strong regression to the mean, which few incorporate.

d) It's a SSS and the number is currently provisional, like any other player performance data

Ignoring issues of whether the statistic is useful even in the large sample limit, a statistic that requires three seasons of data to be meaningful should be regressed so strongly to the mean that it should be impossible for a rookie to be the worst fielder in the league.

Just as a rough estimate, 9 runs below average over 108 games should be regressed to
9*108/(108+3*140)= -2 runs.

Small sample sizes in error prone statistics should not be able to affect a player's valuation as much as it does in WAR.
   52. Ray (RDP) Posted: September 06, 2011 at 09:58 PM (#3918054)
Defensive stats are so poor that they don't have any business being treated on an equal basis with offensive stats.


This is my problem as well. You've got a huge chunk of a player's value that is inherently uncertain, and yet you ascribe certainty to it on the level of offensive stats. Then you compound that uncertainty by lumping 5 or 10 or 15 or 20 seasons in. Then you compare that to other players who you've also lumped 5 or 10 or 15 or 20 seasons in. God only knows what mess you end up with. I really don't see how this is anything more than a blunt tool for broadly separating out players. But of course it's being used as far more than that.

Basing a HOF argument on it is inherently problematic. A HOF argument relies on distinctions that are finer than WAR can provide. If the player is a clear HOFer (or is clearly not a HOFer), you don't need WAR to see it. If he's close to the border, WAR can't get you closer. And yet people base their arguments on WAR anyway. This is nothing more than false precision. At least those not basing a HOF argument on WAR aren't claiming to be more precise than they're being; they understand that our methods can only take us so far, and they don't pretend otherwise. But WAR arguments pretend otherwise, and introduce an uber-stat that has a component that could well be taking us in the wrong direction (and is taking us in all kinds of different directions for players). For defense, I'd far sooner rely on an array of defensive metrics than blindly trust what WAR is giving me.
   53. PreBeaneAsFan Posted: September 06, 2011 at 10:18 PM (#3918064)

People pretending that favored pseudo-scientific "statistics" create a level of certainty and precision far beyond anything that they could ever actually measure or know? I agree!


I think Greg picked up my point here to an extent, but I will elaborate more fully.

It's obviously a real problem when people attach a level of certainty and precision beyond that which is actually currently attainable to something (baseball statistic, scientific theory, economic prediction, etc.) To stick with the current context, anyone who looks at WAR and says "See, player x had 4.1 and player y had 3.9 so player x was better" would be doing something dumb. WAR is subject to errors which make it impossible to distinguish between players who are relatively close. For some players it probably has even larger errors. Using WAR uncritically is bad.

But my point is about what one does after this realization. It's a mistake to treat things as being more precise than they actually are and to attach excessive certainty to things. But it is at least as big a mistake to use that as an excuse to write off the whole process. WAR tells us a lot of interesting things. Just because it shouldn't be treated as a final answer doesn't mean that it doesn't add a lot of value and tell us a lot about baseball. Examining why WAR might be right or wrong about a particular player is an enlightening process.

It's intellectually lazy and cheap to simply use difficulty in measuring something (or in some cases simply the small possibility of error) as an excuse to simply believe whatever you want, and when we get outside of baseball it can be downright reckless or dangerous. Getting WAR to a point of precision that comparing a 4.3 to a 4.2 would be meaningful is beyond our current abilities. But I don't believe it's beyond anything we could ever actually measure or know, and even if it is the process of trying to get it to that point is likely to be illuminating that simply having each person use whatever ad-hoc "mental adjustment" process they want to use.
   54. Ray (RDP) Posted: September 06, 2011 at 10:28 PM (#3918069)
#53, what's clear is that there are two basic WAR camps. Camp A thinks Camp B is dismissing WAR completely. Camp B is not doing that; Camp B is using WAR to its limitations, and knows that Camp A is using WAR beyond its capabilities.
   55. Joe Kehoskie Posted: September 06, 2011 at 10:33 PM (#3918072)
Getting WAR to a point of precision that comparing a 4.3 to a 4.2 would be meaningful is beyond our current abilities.

But WAR is far from being at the point of accurately comparing a 4.3 to a 4.2. The Hosmer example up in #43 is a good one. So is Carlos Lee. At a glance, Lee is at 3.7 bWAR this year, which isn't great but also makes his $18.5M salary at least a little more palatable. But then you look under the hood, and Lee's dWAR is suddenly 1.7 in 2011, after it was -2.0, -1.4, -0.9, -0.4, -0.9, -1.8, 0.1, 0.3 over the prior 8 seasons. In other words, almost 50 percent of Lee's (alleged) 2011 value might be nothing more than a major hiccup in the data or formula. As much as I like the concept of WAR, it's still asking us to take some rather huge leaps of faith.
   56. Ray (RDP) Posted: September 06, 2011 at 10:43 PM (#3918076)
The 4.3 to 4.2 discussion is a red herring. If a player has a 4.3, his "true" value could be 2.3. It could be 6.3. That's the problem.

Then you compound that by adding seasons together, and comparing that to players who you've also added multiple seasons together.

It's also a problem that bWAR and fWAR disagree so much, so often. That in itself shows the problem quite well: two very sensibly created stats, both of which are often in utter disagreement as to a player's value.
   57. Ray (RDP) Posted: September 06, 2011 at 10:58 PM (#3918081)
This "We can't be certain, so we should dismiss WAR" is a red herring. No one subscribes to that. At the same time, it doesn't do anyone any good to pretend that there is a level of certainty to WAR that is simply missing. There are things in life we don't know about, can't know about, to the level of certainty we would like. That makes some people uncomfortable, and is why they believe in God, is why they believe in WAR.
   58. DFA Posted: September 06, 2011 at 11:04 PM (#3918083)
You better name names or you meant end up getting garotted by flannel in your sleep.


I think it's more to the points that others are making better than I am...defensive metrics aren't as strong as offensive metrics, combining them merely weakens the offensive metrics. I understand the desire to quantify Brett Gardner's defensive value, but I feel like relying on more ambiguous measurements isn't the way to do it.
   59. TDF, situational idiot Posted: September 06, 2011 at 11:06 PM (#3918085)
Sure every now and then you get a player with -7 one year +15 the next, -19 the year after...which doesn't provide much clarity
I've posted this before, but it underlines this statement.

Jay Bruce, who's generally considered a pretty good fielder, is -11, +3, +18, -7 runs defensively in his 4 seasons per bbref. Fangraphs says he's -2.4, +9.2, +19.7, -0.5. Is he a below average fielder, as both sites will show when figuring his WAR this year? Is he an average fielder, as his bbref career looks, or his he a very good fielder as scouts and fangraphs's career numbers show?

I honestly don't know how to answer that. What I do know is that WAR won't give me an answer.
   60. Robert in Manhattan Beach Posted: September 06, 2011 at 11:08 PM (#3918086)
Do any of the 106 variations of WAR account for clutch hitting/pitching, or at least hitting/pitching with runners on base? If WAR is just a reflective value stat and not a predictive one - which I believe it is - then it should. This was one of the ideas in Win Shares I thought was a good one. Clutch hitting, repeatable or not, will allow a player to create more value in a season then his raw stats would suggest.
   61. Lassus Posted: September 06, 2011 at 11:29 PM (#3918102)
Ray (and Joe) beat me to it, forcing me to agree with him on the same day I applied to work at an animal rescue/shelter. The schism, it hurts my brain.
   62. Ron J Posted: September 06, 2011 at 11:42 PM (#3918106)
Defensive stats are so poor that they don't have any business being treated on an equal basis with offensive stats.


No they aren't.

Yes, there are many potential issues that you have to deal with. But at worst they're good enough that the critic should be able to point out the specific problem with the player in question. (As for instance Chris Dial does when explaining the difference between how ZR sees Andruw Jones and all adjusted range factor systems see him -- essentially Chris argues that Jones took all the discretionary plays -- the plays that somebody was going to turn into an out, it's a matter of accounting as to whom. This incidentally is the big advantage of the DA structure since it used large, overlapping zones)

And not Freddie Freeman is good. My eyes tell me this. Confirmation bias is a huge issue in our evaluation of defense without using defensive stats. Well that and the fact that nobody really watches defense.

Best I can tell the standard error in the defensive systems is in the range of 7-8 runs (per player) on a seasonal basis. It's more like 5 on the offensive side. And the range is smaller in defense (basically all of the well designed systems get a range of +20 to -20 on the defensive side, so a 7 run error can move you up or down two categories.

What that boils down to is that the defensive side is about as accurate as batting average (Given the discussions I've had with Walt Davis you might think that's damning. Not really. Batting average provides no real signal if you have other information, but if it's all you have it's not devoid of value). Not useless in other words, but easily open to an informed argument.
   63. GGC don't think it can get longer than a novella Posted: September 06, 2011 at 11:42 PM (#3918107)
Sam is just trolling in this thread.


He's being reasonable here. He isn't calling for sharp trauma or anything. If you aren't running a major league team, I'm not sure why you need Willie Bloomquist as a baseline instead of an average player. Hell, you can use eXtrapolated runs if you like. It was Jim Furtado's second best idea. (The first being Primer.)
   64. Ray (RDP) Posted: September 06, 2011 at 11:46 PM (#3918111)
But if you want to persist with your act maybe we can talk about the pointlessness of having OPS+ calculated to 3 figures. That is no more meaningful than the decimal places in WAR.


The decimal places in WAR are not a problem per se. It's when people do things like cite a player's career WAR, and cite it as "38.7"; that shows that people aren't thinking, or that they really do think that the .7 is meaningful. (Sometimes this may be a formatting issue -- it's easier to import the data in their present form with decimal places -- but often it's not.)

The difference with citing an .866 OPS is that everyone knows that .866 is as good as .860 or .870 (and often people do round), but we're used to seeing the number to three decimal places so to cite it as .86 would cause a brief hiccup for the reader. Not so for WAR. Citing 39 instead of 38.7 would not.
   65. Ray (RDP) Posted: September 06, 2011 at 11:58 PM (#3918122)
Yes, there are many potential issues that you have to deal with. But at worst they're good enough that the critic should be able to point out the specific problem with the player in question. (As for instance Chris Dial does when explaining the difference between how ZR sees Andruw Jones and all adjusted range factor systems see him -- essentially Chris argues that Jones took all the discretionary plays -- the plays that somebody was going to turn into an out, it's a matter of accounting as to whom. This incidentally is the big advantage of the DA structure since it used large, overlapping zones)


But the problem is that if you didn't watch the player play, you can't point out something like this.
   66. Ray (RDP) Posted: September 06, 2011 at 11:58 PM (#3918124)
A player's offense is also a bigger factor in his contribution to a game than his defense. That is, 50% of the game is offense, and a player is roughly 1 part out of 9 on offense. But only 15-20% of the game is defense (the other 30-35% is of course pitching), and so a player is 1 or 1.5 parts out of 9 on defense (depending on his position)... out of that 15-20%.
   67. Honkie Kong Posted: September 07, 2011 at 12:05 AM (#3918128)
I understand where it is coming from, but to think that Miguel Cabrera ( oWAR 6.2, OPS+ 171 ) and Ian Kinsler ( oWAR 3.6, OPS+ 110 ) are in the same ballpark in value is a mindbend.

I am assuming the above # already is position and park adjusted. BTW, why is Cabrera not getting any MVP buzz?
   68. Mister High Standards Posted: September 07, 2011 at 12:15 AM (#3918143)
Back in the day when I was calculating a war I was regressing defense aggressively.
   69. snapper (history's 42nd greatest monster) Posted: September 07, 2011 at 12:43 AM (#3918160)
I understand where it is coming from, but to think that Miguel Cabrera ( oWAR 6.2, OPS+ 171 ) and Ian Kinsler ( oWAR 3.6, OPS+ 110 ) are in the same ballpark in value is a mindbend.

But it wasn't back in the '40s and '50s when guys like Nellie Fox, Lou Boudreau, Marty Marion and Phil Rizzuto were winning MVPs. Not to mention all the catchers who won.

The early days of sabremetrics caused us to forget position adjustment and defense, and now we're relearning it.

But, at one point it was known.
   70. Eric J can SABER all he wants to Posted: September 07, 2011 at 12:48 AM (#3918163)
It's when people do things like cite a player's career WAR, and cite it as "38.7"; that shows that people aren't thinking, or that they really do think that the .7 is meaningful. (Sometimes this may be a formatting issue -- it's easier to import the data in their present form with decimal places -- but often it's not.)

This doesn't strike me as being too different from any number of other things people cite. Saying that Paul Molitor has 74.8 career WAR (which I would mentally round to 75) is about the same as saying he has a career OPS of .817 (a little higher than .800) or 3319 career hits (over 3300, or fairly rare territory even among the 3000 hit club), or that he has $41.79 in cash in his pocket as we speak (if he does). The distinction between 3319 hits and 3310 or 3326 is just as immaterial as the difference between 74.8 WAR and 75.4.
   71. BDC Posted: September 07, 2011 at 12:57 AM (#3918170)
he has $41.79 in cash in his pocket

Hey, take it to the pocket thread.
   72. Never Give an Inge (Dave) Posted: September 07, 2011 at 12:57 AM (#3918171)
Ignoring issues of whether the statistic is useful even in the large sample limit, a statistic that requires three seasons of data to be meaningful should be regressed so strongly to the mean that it should be impossible for a rookie to be the worst fielder in the league.


I agree with this point, but I don't think anyone actually believes Hosmer is the worst fielder in the league. You're likely correct that dWAR needs a bigger regression factor, although you should regress it towards a player's long-term average, not necessarily towards zero.

Basing a HOF argument on it is inherently problematic. A HOF argument relies on distinctions that are finer than WAR can provide. If the player is a clear HOFer (or is clearly not a HOFer), you don't need WAR to see it. If he's close to the border, WAR can't get you closer.

Why not? Of course WAR alone is insufficient, but why can't you use it as one datapoint among several or many? If you have a player who is a borderline HOFer based on other stats, doesn't one more metric showing him to be a HOFer help his case?

For defense, I'd far sooner rely on an array of defensive metrics than blindly trust what WAR is giving me.

So why don't you do that? If you don't have a serious issue with the non-defensive components of WAR, why not use oWAR with a combination of defensive metrics, including dWAR?

The 4.3 to 4.2 discussion is a red herring. If a player has a 4.3, his "true" value could be 2.3. It could be 6.3. That's the problem.

If his dWAR is only 0.5, then it's highly unlikely he's really a 2.3 or a 6.3 (if you think it is likely, then any offense-only metric is going to have the same problem).

Then you compound that by adding seasons together, and comparing that to players who you've also added multiple seasons together.

If defensive metrics gain validity with larger sample sizes, then why do you think adding multiple seasons together is compounding the problem?

It's also a problem that bWAR and fWAR disagree so much, so often.

I agree.
   73. Harold can be a fun sponge Posted: September 07, 2011 at 01:04 AM (#3918180)
I've been banging this drum a bit lately in threads here and there; let's see if I can combine some thoughts in a way that they make sense here.

Anyway who talks about the "error" or "precision" in WAR is completely missing the point. That implies that there is some problem with the quality/completeness of our data, and that we can work over time to reduce that to zero.

The "problem" is much more fundamental (though I would say the problem is with the people who use WAR, not WAR itself). Uberstats (WAR, WS, WARP, LWTS, etc) are complex beasts based on literally hundreds or thousands of design decisions and assumptions. How do you figure the positional adjustmnet? Do you use component or runs-base park factors? Do you include clutch performance at all? How is credit apportioned between pitchers and fielders? There are no obvious right answers here (and anybody who says there are hasn't given the problem much thought).

It comes down to what question you're asking. Are you asking how many wins fewer the Rangers would've won without Kinsler this year? Where to put him on the MVP ballot? What salary the Rangers should look to pay for him next year (if he were a FA)? What's comparable trade value for him? All of those are valid, interesting questions; and depending on what you're trying to answer, you might answer the questions in the previous paragraph differently. WAR is under the constraint that it picked some set of assumptions/design decisions; the idea that those are the best for *every single* question is laughable (hopefully they're the best for the wide range of questions we talk about most).

The problem with WAR is not "error". The problem is that WAR made some tradeoff that likely make it sub-optimal for the specific question you're asking now (whatever question that is).
   74. AJMcCringleberry Posted: September 07, 2011 at 01:09 AM (#3918184)
Just because someone's defensive WAR fluctuates doesn't mean there is a problem with WAR anymore than Vernon Wells' fluctuating OPS means there is a problem with OPS.
   75. Mike Emeigh Posted: September 07, 2011 at 01:16 AM (#3918188)
But it wasn't back in the '40s and '50s when guys like Nellie Fox, Lou Boudreau, Marty Marion and Phil Rizzuto were winning MVPs. Not to mention all the catchers who won.


Until Berra and Campanella in the 50s, three catchers won MVP awards: Mickey Cochrane (twice), Gabby Hartnett, and Ernie Lombardi - and Cochrane's second award was arguably as much for his managing as his playing. Middle infielders didn't necessarily do all that well in the voting, either. Marion won it in 1944 in large part because Musial won it in 1943 and there was still some residual thinking around that no player should win it twice in a row. Rizzuto won it in 1950 while having by far the best offensive season of his career; he was the runner-up a year earlier largely because he was about the only established Yankee regular who played anything like a full season. Boudreau's 1948 award was, again, a recognition of his managing nearly as much as his playing.

Middle infielders still win MVP awards (Pedroia, Rollins, ARod, Tejada, Kent). Catchers really don't; the only catchers to win in the last 30 years are Pudge and Mauer.

-- MWE
   76. PreBeaneAsFan Posted: September 07, 2011 at 01:32 AM (#3918206)
I think there are a few things worth noting.

First, the vast majority of arguments here are about the defensive components of WAR. It is the dWAR that fluctuates wildly and leads to multi-win swings in player value. But there is no reason you have to construct WAR with a particular defensive stat-use whatever defensive stats you want here, regress them to the mean heavily, or simply ignore defense entirely. (Obviously the long-run solution is better defensive metrics, but for now those options are viable.)

Second, aggregating over years will (in the overwhelming majority of cases) reduce the percentage by which a players WAR differs from their "true WAR" over that period, unless the errors are systematic. But if the errors are systematic someone with knowledge of the player will probably be able to make an argument why and how. If a player's dWAR differs substantially from reported views of a player's defense, that should lead to further investigation to determine whether we think WAR is wrong or the scouting reports were. At any rate, the defensive metrics at least contribute more information to the discussion.

It seems to me that the problems people are mentioning here are problems with how (some people) use WAR. I agree with a lot of those objections. What I don't agree with is when people (or perhaps just one person in this case) essentially say "I don't know why you'd want to do that anyway, I'll just make some adjustments in my head." Because with WAR, we can debate whether the defensive components are accurate, or whether the positional adjustments or replacement level are set to the right values. We can debate and discuss these things because by quantifying them we've made things specific, opened the process to scrutiny and (eventually) made the process verifiable (at least to a limited degree.) By discussing things this way our knowledge is enhanced, whereas ad-hoc personal adjustments can't be scrutinized in the same way.
   77. Lassus Posted: September 07, 2011 at 01:42 AM (#3918219)
Just because someone's defensive WAR fluctuates doesn't mean there is a problem with WAR anymore than Vernon Wells' fluctuating OPS means there is a problem with OPS.

Are you still feverish? I can't understand how this makes sense.
   78. Mike Emeigh Posted: September 07, 2011 at 01:43 AM (#3918221)
Uberstats (WAR, WS, WARP, LWTS, etc) are complex beasts based on literally hundreds or thousands of design decisions and assumptions. How do you figure the positional adjustmnet? Do you use component or runs-base park factors? Do you include clutch performance at all? How is credit apportioned between pitchers and fielders? There are no obvious right answers here (and anybody who says there are hasn't given the problem much thought).


This.

If you have the 1984 Baseball Abstract (or James's later compilation of his articles, This Time Let's Not Eat the Bones), dig it out and reread the article James wrote about rating baseball players by statistics (it's at the start of the player comments). James writes:

"if you could really measure all of a player's contributions to victory and defeat, then you would have a legitimate basis for ranking players. Of course...you can't. Besides that, winning a pennant in baseball requires diverse skills, combinations of unlike skills...So even if those skills do all acquire their value along the same line, it would be of questionable value to make a linear representation of them."

James also makes the point that uberstats can actually become a barrier to understanding, in part because crushing so many different things into a single number can cause you to forget about the underlying details that make up the number in the first place.

Personally, I think it's more important to have the details than it is to have the single number.

-- MWE
   79. Tom Nawrocki Posted: September 07, 2011 at 01:45 AM (#3918227)
Saying that Paul Molitor has 74.8 career WAR (which I would mentally round to 75) is about the same as saying he has a career OPS of .817 (a little higher than .800) or 3319 career hits (over 3300, or fairly rare territory even among the 3000 hit club),


I don't think those are comparable at all. Paul Molitor had 3,319 career hits; this is a verifiable fact. I suppose you could say it's a fact that Paul Molitor had 74.8 career WAR, but it's not factual in the least to say that Paul Molitor provided his teams with 74.8 career wins over a replacement player. It's an estimate, and as Harold says, it's built upon many other estimates and assumptions that we've gotten so used to that we no longer notice all the subjective factors in there.

There may not be a whole lot of distinction between a player having 3300 hits and a player having 3319 hits, but at least we're telling the truth there. When we say Paul Molitor was responsible for 74.8 wins over the course of his career, we're just guessing.
   80. Brian White Posted: September 07, 2011 at 01:51 AM (#3918234)
Are you still feverish? I can't understand how this makes sense.


It isn't that unreasonable. Players have good offensive years, and bad ones. We can track these easily with our offensive metrics, in which we have pretty good confidence.

Can't the same be true of defensive metrics? Could Joey Votto have registered -7 defensive runs last year because he's a decent defensive player who just had a bad year with the glove?

Its still an open question of how much of the movement of defensive metrics is due to players, and how much is due to the quality of the metric. But we should expect some year to year fluctuation.
   81. Mike Emeigh Posted: September 07, 2011 at 01:55 AM (#3918241)
Because with WAR, we can debate whether the defensive components are accurate, or whether the positional adjustments or replacement level are set to the right values.


But we don't really know when we have the *right* values, and what is *right* today might not have been *right* 25 or 30 years ago and may not be *right* 25 or 30 years from now.

-- MWE
   82. Fancy Pants Handles lap changes with class Posted: September 07, 2011 at 01:57 AM (#3918245)
what's clear is that there are two basic WAR camps. Camp A thinks Camp B is dismissing WAR completely. Camp B is not doing that; Camp B is using WAR to its limitations, and knows that Camp A is using WAR beyond its capabilities.


And as we can tell from your post, Camp B thinks Camp A is using WAR beyond it's capabilities, while Camp A is saying be aware of the limitations, but don't disregard the valueable information it does provide, which Camp B insists on disregarding.
   83. Lassus Posted: September 07, 2011 at 02:04 AM (#3918253)
Can't the same be true of defensive metrics? Could Joey Votto have registered -7 defensive runs last year because he's a decent defensive player who just had a bad year with the glove?

I think the problem is that on top of the yearly rollercoaster, you have different defensive metrics showing the same player to be both (sometimes wildly) above and below average in the same year. And then numbers that seem to simply bear no resemblance to reality, and not slightly, but jarringly. It's just very hard to take all that and have any confidence in the metrics.
   84. Eric J can SABER all he wants to Posted: September 07, 2011 at 02:05 AM (#3918254)
I don't think those are comparable at all. Paul Molitor had 3,319 career hits; this is a verifiable fact. I suppose you could say it's a fact that Paul Molitor had 74.8 career WAR, but it's not factual in the least to say that Paul Molitor provided his teams with 74.8 career wins over a replacement player. It's an estimate, and as Harold says, it's built upon many other estimates and assumptions that we've gotten so used to that we no longer notice all the subjective factors in there.

There may not be a whole lot of distinction between a player having 3300 hits and a player having 3319 hits, but at least we're telling the truth there. When we say Paul Molitor was responsible for 74.8 wins over the course of his career, we're just guessing.


This is true, for the most part; it does rely a bit on the consistency of the hit-error determination of various official scorers over time, which may or may not introduce a variance on the order of, say, 10 hits over a Molitor-length career (and, if you go quite a bit further back in time than Molitor, it also relies on the accuracy of record keeping, which has not always been immaculate).

Anyway, that's a rather petty point, but I don't find Ray's objection to citing a player's 38.7 career WAR (which I would read as "probably a bit less than 40") to be particularly more noteworthy.
   85. AJMcCringleberry Posted: September 07, 2011 at 02:15 AM (#3918265)
I think the problem is that on top of the yearly rollercoaster

I was just referring to the people who say the yearly roller coaster means there is a problem.
   86. Never Give an Inge (Dave) Posted: September 07, 2011 at 02:20 AM (#3918269)
It comes down to what question you're asking. Are you asking how many wins fewer the Rangers would've won without Kinsler this year? Where to put him on the MVP ballot? What salary the Rangers should look to pay for him next year (if he were a FA)? What's comparable trade value for him? All of those are valid, interesting questions; and depending on what you're trying to answer, you might answer the questions in the previous paragraph differently. WAR is under the constraint that it picked some set of assumptions/design decisions; the idea that those are the best for *every single* question is laughable (hopefully they're the best for the wide range of questions we talk about most).

Sure, but who is trying to use WAR (or any other uber-stat) to answer every single question? It's easy to claim that everyone is using the stats incorrectly so we'd better not use them at all. It's more difficult to ask a specific question and design a statistic geared towards answering it, but ultimately those are the processes that lead to advancements in knowledge.

There may not be a whole lot of distinction between a player having 3300 hits and a player having 3319 hits, but at least we're telling the truth there.

Sure, but it's just raw data. It's great if you like memorizing the backs of baseball cards, but if you want to make decisions or take actions based on the data, you're going to need to interpret it in some way. The fact that WAR is denominated in wins is fun, but largely besides the point. Advanced statistics attempt to interpret the raw data to answer specific questions, and if they are used properly they can provide value.
   87. Something Other Posted: September 07, 2011 at 02:33 AM (#3918281)
My advice to would-be iconoclasts like Hippeaux? Be specific in your criticisms, with examples of actual people who are making the actual mistakes you say we are making.
The end of the internet, as we know it!!


I agree with this point, but I don't think anyone actually believes Hosmer is the worst fielder in the league.
Sorry to report you'd be very wrong. On some of the sbnation sites WAR, all of it, is quoted as gospel. People regularly did Marcel projections using WAR out to the five years of Jason Bay's deal, and insisted these were supremely accurate. Elsewhere it's not uncommon for people to scream that short reliever x was worth 0.6 last year, and was therefore worth the $1.5m his team paid him. Now, I'm not going to claim that the folks making these silly argument would, in the absence of WAR, suddenly reason with great clarity, but there's a great unwashed out there who have adopted WAR as their go-to stat, and who are utterly convinced that 3.4 is clearly better than 3.2.

Anyone looking to preach the truth might want to head on over to sbnation.com. It won't be easy, but there are converts to be made.
   88. Harold can be a fun sponge Posted: September 07, 2011 at 02:36 AM (#3918282)
It seems to me that the problems people are mentioning here are problems with how (some people) use WAR. I agree with a lot of those objections. What I don't agree with is when people (or perhaps just one person in this case) essentially say "I don't know why you'd want to do that anyway, I'll just make some adjustments in my head." Because with WAR, we can debate whether the defensive components are accurate, or whether the positional adjustments or replacement level are set to the right values. We can debate and discuss these things because by quantifying them we've made things specific, opened the process to scrutiny and (eventually) made the process verifiable (at least to a limited degree.) By discussing things this way our knowledge is enhanced, whereas ad-hoc personal adjustments can't be scrutinized in the same way.

Agreed. I'm not dismissing WAR at all. The people that annoy the most are the ones who basically say, "Well, WAR clearly sucks, so I'm dismissing your entire post because you used it." WAR components are an interesting part of an argument, not the end of one.
   89. Harold can be a fun sponge Posted: September 07, 2011 at 02:37 AM (#3918285)
This is true, for the most part; it does rely a bit on the consistency of the hit-error determination of various official scorers over time, which may or may not introduce a variance on the order of, say, 10 hits over a Molitor-length career (and, if you go quite a bit further back in time than Molitor, it also relies on the accuracy of record keeping, which has not always been immaculate).

No, you're completely missing the point here. Molitor did get 3319 hits. That actually happened. The scorer called them hits, so they're hits.

There are no 74.8 wins anywhere. That's totally an estimate.
   90. Eric J can SABER all he wants to Posted: September 07, 2011 at 02:41 AM (#3918292)
No, you're completely missing the point here. Molitor did get 3319 hits. That actually happened. The scorer called them hits, so they're hits.

That's fair. At the same time, as Tom pointed out, it's fair to point out that 74.8 is the answer you get if you plug Molitor's career accomplishments into AROM's WAR calculator. Of course, they're not discrete events in the same sense that his hits are, but I don't really try to interpret them that way.

Anyway, I think I probably overextended myself a little, and will acknowledge that.
   91. Harold can be a fun sponge Posted: September 07, 2011 at 02:43 AM (#3918293)
But we don't really know when we have the *right* values, and what is *right* today might not have been *right* 25 or 30 years ago and may not be *right* 25 or 30 years from now.

And I'm saying that there is no such thing as a "right" value. You take your best guess based on what you're trying to do. Or if you create an uberstat, you take your best guess based on what you think its users will want to use it for.
   92. Fancy Pants Handles lap changes with class Posted: September 07, 2011 at 02:45 AM (#3918298)
There are no 74.8 wins anywhere. That's totally an estimate.

Only because you are interpreting 'wins' literally. If it were measured in 'value points', you could make the same argument that each of those 'value points' happened. The outs recorded/not recorded on defense for which the points are awarded also actually happened, and the scorer awarded the points. Same for baserunning and offense.
   93. Harold can be a fun sponge Posted: September 07, 2011 at 02:49 AM (#3918301)
That's fair. At the same time, as Tom pointed out, it's fair to point out that 74.8 is the answer you get if you plug Molitor's career accomplishments into AROM's WAR calculator. Of course, they're not discrete events in the same sense that his hits are, but I don't really try to interpret them that way.

Anyway, I think I probably overextended myself a little, and will acknowledge that.


Fair enough, and I get your point in the first paragraph; he really does have exactly 74.8 WAR. But those aren't actual wins; they're the estimated difference between two theoretical teams. His 3319 hits are actually 3319 hits. But, yeah, I think we're on the same page here.
   94. Harold can be a fun sponge Posted: September 07, 2011 at 02:52 AM (#3918306)
Only because you are interpreting 'wins' literally. If it were measured in 'value points', you could make the same argument that each of those 'value points' happened. The outs recorded/not recorded on defense for which the points are awarded also actually happened, and the scorer awarded the points. Same for baserunning and offense.

Or I could choose my own way to measure value points and decide that it's better. We could spend years arguing about value points, and each change each other's minds here and there (umm, did the scorers just award more value points?). It would be lots of fun. And we'd know that all we're talking about is value points, and that there are many reasonable ways to define them.

But when we say that Paul Molitor had 3319 hits, and argue that maybe 10 of them should have been errors, that's a totally different argument. That's all I was saying in that post.
   95. Brian White Posted: September 07, 2011 at 03:05 AM (#3918320)
Only because you are interpreting 'wins' literally. If it were measured in 'value points', you could make the same argument that each of those 'value points' happened.


But some of Molitor's 'value points' have to do with the setting of the level of a hypothetical minor leaguer at whatever position Molitor played, as well as a defensive measurement that requires input from all fielders in the league at that position that aren't Paul Molitor. He doesn't have input into all the things that make up his value points.

For every hit he records, Molitor at least had *something* to do with it. Maybe he only got a hit because the shortstop was out of position, or a friendly scorer recording a hit on a ball off someone's glove, but Molitor had a hand in every hit recorded. Because WAR brings in ideas like replacement level, average defense, park factors and what not, these points don't happen. They're just concepts.
   96. Tom Nawrocki Posted: September 07, 2011 at 03:27 AM (#3918344)
Sure, but it's just raw data. It's great if you like memorizing the backs of baseball cards, but if you want to make decisions or take actions based on the data, you're going to need to interpret it in some way.


People keep saying this as if it's obvious, but I'm still unclear on what kind of actions I can take if I boil every player's contribution down to a single number. WAR gets used a lot in MVP and Hall of Fame discussions, but if I were running a baseball team, how exactly would I use this? Greg in post 25 talks about the value of figuring out whether Yunel Escobar is having a better year than Paul Konerko, which might be a fun thing to talk about in a bar, but I don't see why an actual GM would care, much less make decisions or take actions based on that.
   97. Never Give an Inge (Dave) Posted: September 07, 2011 at 03:42 AM (#3918357)
People keep saying this as if it's obvious, but I'm still unclear on what kind of actions I can take if I boil every player's contribution down to a single number. WAR gets used a lot in MVP and Hall of Fame discussions, but if I were running a baseball team, how exactly would I use this? Greg in post 25 talks about the value of figuring out whether Yunel Escobar is having a better year than Paul Konerko, which might be a fun thing to talk about in a bar, but I don't see why an actual GM would care, much less make decisions or take actions based on that.

Well, I don't think WAR is the right number to use if you're a GM trying to make a trade decision. But the point is that there are interpretive stats for making such decisions; you can't just look at hits and stolen bases and RBIs. You have to rely on stats with some amount of imprecision or uncertainty.

Or perhaps you're trying to formulate a draft strategy and you want to look at whether certain positions have tended to be overvalued or undervalued in the draft over time. Or maybe you're trying to evaluate the long-term track records of your various scouts in a systematic manner. Certainly WAR isn't the only tool that can be used for such studies, and it may not be the best one, but it can be used in a way that raw numbers can't be.
   98. Fancy Pants Handles lap changes with class Posted: September 07, 2011 at 04:53 AM (#3918392)
Or I could choose my own way to measure value points and decide that it's better.

And you can do the same for hits.
   99. Harold can be a fun sponge Posted: September 07, 2011 at 07:08 AM (#3918415)
"Sure, but it's just raw data. It's great if you like memorizing the backs of baseball cards, but if you want to make decisions or take actions based on the data, you're going to need to interpret it in some way."


"People keep saying this as if it's obvious, but I'm still unclear on what kind of actions I can take if I boil every player's contribution down to a single number."

Right. I see the former post, and wonder if this person has ever had to make any real decisions; and if so, does he do it by reducing everything to a single number?

OK, there are times at work where we do have to reduce a set of variables into a single value (total cost of ownership for a technology buy, for example); but then *we use a framework/variables specific to that decision*. We don't blindly use an all-purpose framework for any real decision.
   100. Harold can be a fun sponge Posted: September 07, 2011 at 07:11 AM (#3918418)
Or I could choose my own way to measure value points and decide that it's better.

And you can do the same for hits.


Forget "hits", and let's go with "hits+ROE".

You want to make the same argument? "Value points" and "Hits+ROE" are not remotely comparable. I think it's reasonable for us to come to a number of different definitions of value points; there's no single right definition.
Page 1 of 2 pages  1 2 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
cardsfanboy
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogOT: Politics, October 2014: Sunshine, Baseball, and Etch A Sketch: How Politicians Use Analogies
(2769 - 1:28am, Oct 21)
Last: The Yankee Clapper

NewsblogSielski: A friend fights for ex-Phillie Dick Allen's Hall of Fame induction
(71 - 1:20am, Oct 21)
Last: Jacob

NewsblogCalcaterra: So, if you’re not a fan of the Royals or Giants, who ya got?
(93 - 12:26am, Oct 21)
Last: Howie Menckel

NewsblogDealing or dueling – what’s a manager to do? | MGL on Baseball
(15 - 12:12am, Oct 21)
Last: Ray (RDP)

NewsblogFan Returns Home Run Ball to Ishikawa; Receives World Series tickets
(33 - 11:52pm, Oct 20)
Last: The Yankee Clapper

NewsblogOT: NFL/NHL thread
(8366 - 10:29pm, Oct 20)
Last: steagles

NewsblogBrisbee: The 5 worst commercials of the MLB postseason
(133 - 10:26pm, Oct 20)
Last: zonk

NewsblogHitting coaches blamed for lack of offense - Sports - The Boston Globe
(15 - 10:19pm, Oct 20)
Last: Walt Davis

NewsblogCould the Yankees ever be Royals? Young and athletic K.C. is everything that Bombers are not - NY Daily News
(28 - 10:18pm, Oct 20)
Last: Barry`s_Lazy_Boy

NewsblogPitch from Zito helped sell Hudson on Giants | MLB.com
(6 - 9:15pm, Oct 20)
Last: the Hugh Jorgan returns

NewsblogWhy Royals great Frank White no longer associates with the team whose stadium he built - Yahoo Sports
(19 - 9:06pm, Oct 20)
Last: A New Leaf (Black Hawk Reign of Terror)

NewsblogAngell: Gigantic
(38 - 8:22pm, Oct 20)
Last: Jolly Old St. Nick Is A Jolly Old St. Crip

NewsblogOT: Monthly NBA Thread - October 2014
(272 - 7:27pm, Oct 20)
Last: andrewberg

NewsblogMorosi: Could Cain’s story make baseball king of sports world again?
(97 - 6:24pm, Oct 20)
Last: BDC

NewsblogESPN: Brian Roberts retires
(22 - 6:19pm, Oct 20)
Last: Captain Supporter

Page rendered in 1.1357 seconds
52 querie(s) executed