Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Tuesday, November 21, 2017

A Discussion of WAR Wherein I Ardently Attempt to Avoid any WAR-Related Puns | Sports-Reference.com

It’s really tough to have a debate on Twitter. Comments, posts, and replies get so disjointed. I’m not sure Bill is actually as “steadfast” in the absolute certainty of his position. Now, he may have been at the beginning of this debate but, as the discussion has evolved, his comments, IMO, have softened as he’s heard/read other people’s comments. Again, my opinion.

In any event, Sean makes his case.

Now through years of arguing and debate, I have come to the conclusion that these two differing approaches, when considering how to value a season already in the books, are largely a matter of taste and worldview. Looking forward, I think the latter is better, but in looking back I’m fine with a person taking either approach to evaluating the value a player added to during a season. In interacting with Bill on Twitter, I believe that he’s steadfast that his view is correct and ours is wrong (or “nonsense”, “misleading”, “in error” to use his words). I disagree and believe each viewpoint has merit and is, on this issue, largely one of personal preference.

Jim Furtado Posted: November 21, 2017 at 01:37 PM | 73 comment(s) Login to Bookmark
  Tags: sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. 6 - 4 - 3 Posted: November 21, 2017 at 03:55 PM (#5579658)
The real problem is that too many sabremetric consumers think that 8.3 WAR versus 8.1 WAR is definitive evidence that the first player was more valuable than the second. But failure by some (many?) to properly interpret the numbers isn't really related to whether the process to generate the numbers are valid or not.

Small nitpick: not sure that WPA is really a measure of "clutch" since that would seem to assume the opportunities are equally distributed, which they are not.
   2. snapper (history's 42nd greatest monster) Posted: November 21, 2017 at 04:02 PM (#5579662)
Wow do people pronounce "WAR" in their heads?

To me, the a has an ahh sound, like "wahr". I feel silly pronouncing it like the Vietnam War ("wore").
   3. 6 - 4 - 3 Posted: November 21, 2017 at 04:06 PM (#5579667)
I've always heard it pronounced "war," as in Vietnam War.
   4. Rally Posted: November 21, 2017 at 04:19 PM (#5579685)
I've always pronounced it like in the book "War and Peace". It's not serious enough that people are getting killed over it, but their sure have been a lot of battles.
   5. What did Billy Ripken have against ElRoy Face? Posted: November 21, 2017 at 04:27 PM (#5579693)
Wow do people pronounce "WAR" in their heads?

"Hwar." Just like "Cool Whip."
   6. Sean Forman Posted: November 21, 2017 at 04:31 PM (#5579700)
This is the serious discussion I was looking for when I posted this. :P
   7. jmurph Posted: November 21, 2017 at 04:49 PM (#5579732)
It's actually a silent A, bizarrely enough. And you pronounce the R like in the original French. (The French invented Baseball and War but you guys already knew that.)
   8. Pasta-diving Jeter (jmac66) Posted: November 21, 2017 at 04:50 PM (#5579734)
"Throat-Warbler Mangrove"
   9. BDC Posted: November 21, 2017 at 05:09 PM (#5579764)
Were I to vote, my criteria would be: "If I was a GM on April 1 and knew all of the players' performances for the season ahead, which player would I choose?"


Exactly. Who would be the first player to go in a completely open draft starting from zero, with complete foresight.
   10. Eric J can SABER all he wants to Posted: November 21, 2017 at 05:15 PM (#5579775)
Wow do people pronounce "WAR" in their heads?

To me, the a has an ahh sound, like "wahr". I feel silly pronouncing it like the Vietnam War ("wore").


This is what I've always done as well. If you're going to have two different meanings for a single spelling, it's best to pronounce them differently to avoid ambiguity.
   11. LA Podcasting Hombre of Anaheim Posted: November 21, 2017 at 05:21 PM (#5579783)
Pronunciation issues aside, I do think James has been unrelenting in his criticisms of WAR over the years, and I do think it has to do with the existence of his own baby, Win Shares. This isn't to say that everything James has said about WAR is wrong -- he's made some very insightful critiques with respect to WAR over the years -- but I've always felt that James brings a prejudice into each of these arguments. I sort of give him a pass because (1) he's Bill F. James, and (2) he's always been indelicate when it comes to things he disagrees with, but this isn't the first time he's gone off on WAR in ways that I don't think are entirely fair.
   12. Batman Posted: November 21, 2017 at 05:36 PM (#5579801)
I've always pronounced it like in the book "War and Peace".
война?
   13. BDC Posted: November 21, 2017 at 05:37 PM (#5579804)
the original French

It seemed to me there ought to be a French word oire, but there isn't. I did find the word hoir, the source (and synonym) of the English word "heir." Anyway, hoir sounds like how some here would pronounce WAR.
   14. Fred Garvin is dead to Mug Posted: November 21, 2017 at 06:05 PM (#5579819)
I’m still undecided on which approach I favor. On one hand, I tend to agree (or at least sympathize) with Sean’s view that:

Were I to vote, my criteria would be: "If I was a GM on April 1 and knew all of the players' performances for the season ahead, which player would I choose?"


What I would ask, though, is “wouldn’t some context be useful?” To give an example, suppose that in a context-neutral system, Judge rates as higher than (or at least close to) Altuve. In the vacuum WAR presents, one might take Judge — or at least view it as a toss up. OTOH, what if it could be shown that the context of Judge’s production largely occurred in low-leverage situations? Isn’t this good to know?

I suppose one might say this is irrelevant because has not yet been shown this will repeat in the future, but I would say (1) it hasn’t been conclusively shown it will not repeat either; and (2) isn’t the whole point of this exercise to weigh the two based on what historically took place?
   15. Blanks for Nothing, Larvell Posted: November 21, 2017 at 06:25 PM (#5579830)
"Hwar." Just like "Cool Whip."


Was that the episode where Stewie also ridiculed Colin Farrell's wool cap?

I like that episode.
   16. Blanks for Nothing, Larvell Posted: November 21, 2017 at 06:27 PM (#5579834)
Exactly. Who would be the first player to go in a completely open draft starting from zero, with complete foresight.


Fair enough, but you wouldn't look to WAR to provide the answer to that question and you'd account for performance in high-leverage situations.
   17. snapper (history's 42nd greatest monster) Posted: November 21, 2017 at 07:17 PM (#5579848)
I sort of give him a pass because (1) he's Bill F. James,

It's funny, I became interested in sabermetrics completely without reading Bill James. I started with Thorn and Palmer in the 1980's, and then got back into it with Rob Neyer's column and some other. The only thing I've read of James' was his HoF book which was OK, nothing great, IMHO.

So, I have no fond memories of James that prevent me from mostly finding him to be an old crank.

   18. Mike Webber Posted: November 21, 2017 at 07:24 PM (#5579851)
@6 Sean = one of Joe Poz criticisms was that after all the math, you just divide by 10 to get the WAR.

1) Is that a valid complaint?
2) If it really not adjusted on a yearly basis, and should it be?
3) If the answer is "it's basically always 10" what's the range?

   19. K-BAR, J-BAR (trhn) Posted: November 21, 2017 at 08:06 PM (#5579863)
This is the serious discussion I was looking for when I posted this. :P


If it makes you fell better, Sean, the dWAR thread on Hot Topics is a discussion of blended scotch and the Joe's Blog one is about Wilco.
   20. Jim Furtado Posted: November 21, 2017 at 08:12 PM (#5579866)
1) Is that a valid complaint?
2) If it really not adjusted on a yearly basis, and should it be?
3) If the answer is "it's basically always 10" what's the range?

No. It's adjusted every year.

Runs to Wins
   21. Fancy Crazy Town Banana Pants Handle Posted: November 21, 2017 at 09:40 PM (#5579885)
3) If the answer is "it's basically always 10" what's the range?

Just reverse engineering the numbers from RAR and WAR, I get:
1968 was 9.74 runs per win.
2000 was 10.91 runs per win.

That will probably be about your effective range unless you want to go back to the deadball era.
   22. Mike Webber Posted: November 21, 2017 at 10:17 PM (#5579896)
Thanks for the answers Jim and FCTBPH

I thought it seemed like something so easy to do that just dividing by 10 had to be false.
   23. SoSH U at work Posted: November 21, 2017 at 11:13 PM (#5579908)
So, I have no fond memories of James that prevent me from mostly finding him to be an old crank.


I have tremendously fond memories of James. He influenced my views as a baseball fan more than anyone else.

That doesn't prevent me from seeing him as an old crank.
   24. Dr. Chaleeko Posted: November 21, 2017 at 11:15 PM (#5579909)
As for WPA, I've always pronounced that WooPah in my head.

@Mike Webber

IIRC my runs-to-wins correctly, the trick isn't that the league has a converter but that each player has an individual converter. This means that in Roy Halladay's starts, a run is more valuable than in Jimmy Haynes' starts because Hallady reduces the run environment (aka: the context) whereas Haynes inflates it. I might have this wrong, but that's my interpretation.
   25. 6 - 4 - 3 Posted: November 21, 2017 at 11:29 PM (#5579915)
IIRC my runs-to-wins correctly, the trick isn't that the league has a converter but that each player has an individual converter. This means that in Roy Halladay's starts, a run is more valuable than in Jimmy Haynes' starts because Hallady reduces the run environment (aka: the context) whereas Haynes inflates it. I might have this wrong, but that's my interpretation.

Wouldn't that mean that a HR hit in a Hallady game would have been worth more than a HR hit in a Haynes game? That also seems it would double-count for pitchers.

It's an interesting idea, but I'm sure that's not how BBRef does it (see the article that Jim linked to earlier). And I'm pretty sure that FG doesn't do that either.
   26. Meatwad Posted: November 21, 2017 at 11:36 PM (#5579919)
21, 2017 at 08:06 PM (#5579863)
This is the serious discussion I was looking for when I posted this. :P


If it makes you fell better, Sean, the dWAR thread on Hot Topics is a discussion of blended scotch and the Joe's Blog one is about Wilco.

Lies! There was no wilco discussion.
   27. Cooper Nielson Posted: November 22, 2017 at 02:23 AM (#5579972)
Were I to vote, my criteria would be: "If I was a GM on April 1 and knew all of the players' performances for the season ahead, which player would I choose?"

This sounds reasonable, but given that the MVP award is voted on after the season is over, it seems like you are missing out on some useful data/evidence if you ignore timing, game results, and team performance. (I'm assuming "players' performances" is context-free here -- looking at the year-end totals only. Is this a correct assumption?)

WAR does a great job identifying theoretical/potential value by separating a player from his context. It tells us what he should have done or deserved to do. Obviously we should not conclude that a guy who hits a leadoff triple and gets stranded at third by his last-place teammates did nothing. He certainly "helped his team try to win games." But after you know what actually happened, you can identify realized value rather than just potential value. If that triple happened with the bases loaded in the bottom of the 9th, down 3, and the player then scored on a wild pitch, I would submit that it was "more valuable" than the other triple even if that's not reflected in WAR.

So, for example, you could have two teammates with essentially the same totals of singles, doubles, home runs, walks, outs, etc., but one of them, through luck, utilization (lineup position) or happenstance had much higher R/RBI numbers or did more of his hitting in close games/clutch situations, or in team wins. Given the "April 1" criteria you would theoretically be indifferent between these two guys; they are interchangeable. Both have the same potential value to a generic opening-day roster. But for the purposes of MVP voting, we can examine realized value in addition to potential value. After the season has already happened, if Player A had, for example, 60 more R+RBI than Player B, or Player A had 10 walk-off hits and Player B had 2, wouldn't it make sense to give your MVP vote to Player A? Both have the same intrinsic value, but Player A got more mileage/value out of it. (To make this a more interesting argument, we can even assume that Player B had a marginally better WAR.)

I guess I am sympathetic to both viewpoints. Generally I agree that the MVP should go to the guy who had the best individual season, and WAR is as good a way as any to determine this. But I also think that a player -- for MVP purposes -- should get credited for things that are out of his control but that he nevertheless takes advantage of. For example, getting to bat with a lot of runners on base, or playing in meaningful September games.
   28. BDC Posted: November 22, 2017 at 08:16 AM (#5579985)
I think your approach fits with the "narrative tiebreaker" philosophy, Cooper, and I would be fine with that. In some years there are a bunch of guys near the top of the leaderboard and WAR doesn't meaningfully distinguish them. WAR makes "gross" assessments just fine most of the time, obviously. Joe Mauer had a WPA of 3 this year, and Aaron Judge was at 2; but it would be hard to defend putting Mauer above Judge on an MVP ballot, even if Mauer arguably made more of a clutch difference to the Twins getting over the top and into the playoffs than Judge to the Yankees. Judge (eight WAR to three-and-a-half for Mauer) clearly made a bigger difference in getting the Yankees to "good enough" in the first place.

That's all quite obvious, but it just underscores that most of this debate is about small differences and marginal or odd cases.
   29. fra paolo Posted: November 22, 2017 at 11:58 AM (#5580175)
James has been unrelenting in his criticisms of WAR over the years, and I do think it has to do with the existence of his own baby, Win Shares. This isn't to say that everything James has said about WAR is wrong -- he's made some very insightful critiques with respect to WAR over the years -- but I've always felt that James brings a prejudice into each of these arguments.

Of course he does.

He spent some time constructing an elaborate system of Win Shares which does something that doesn't happen in the dominant WAR systems, which is to insert an element of team context in assigning credit. WAR mutes that context considerably, but not totally (eg, groundball tendencies of pitching staffs for evaluating fielding).

Now, he could have created a system exactly like WAR, but he didn't. So we must think that his 'prejudice' is based on his understanding of how credit for baseball events should be assigned to individual players.

I don't understand why an apparent honest intellectual difference is characterised as a 'prejudice'.
   30. snapper (history's 42nd greatest monster) Posted: November 22, 2017 at 12:01 PM (#5580179)
I don't understand why an apparent honest intellectual difference is characterised as a 'prejudice'.

Because his criticisms of WAR do not acknowledge an "honest intellectual difference". He badmouths and mis-represents WAR.
   31. fra paolo Posted: November 22, 2017 at 12:22 PM (#5580197)
He badmouths and mis-represents WAR.

He's been like that all his life. Look at

a) his comments about 'acquired tastes' in the 1986 Abstract
b) various things he's said about Chuck Tanner over the years
c) the essay about Pete Palmer in Win Shares

I'm sure other people can volunteer more examples of this sort of behaviour.

But sabermetrics has been ever thus, as a familiarity with the works of people like mgl, Joe Sheehan and David Cameron will attest. The audience laps it up until it's turned on something they like.

James' vicious dissents are not based on prejudices but conclusions that are the result of him thinking and observing, even if he might change his mind later.
   32. Rally Posted: November 22, 2017 at 01:32 PM (#5580277)
a) his comments about 'acquired tastes' in the 1986 Abstract
b) various things he's said about Chuck Tanner over the years
c) the essay about Pete Palmer in Win Shares


A) don't remember that and will have to look it up.
B) He didn't like Tanner and did not hold back on him
C) If I'm remembering the essay, he made it clear that he liked Palmer, considered him a friend, and it wasn't personal before tearing into Pete's fielding system.

Fielding analysis has evolved quite a bit since Palmer and Thorn did Hidden Game of Baseball, but I think Palmer's system would hold up as a viable, rough draft system that usually gets things right if he had simply used balls in play as his denominator instead of innings played.
   33. snapper (history's 42nd greatest monster) Posted: November 22, 2017 at 01:35 PM (#5580279)
But sabermetrics has been ever thus, as a familiarity with the works of people like mgl, Joe Sheehan and David Cameron will attest. The audience laps it up until it's turned on something they like.

Well, I don't like those guys either.

James' vicious dissents are not based on prejudices but conclusions that are the result of him thinking and observing, even if he might change his mind later.

BS. Just because you think you're right doesn't give you the right to be vicious. Especially when it's equally likely you're wrong.
   34. Sean Forman Posted: November 22, 2017 at 01:36 PM (#5580280)
Mike, there is no division of RAR by a run to win value.

What we do is take the runs added and prevented and put that on an average team in the league context and compute their pythag (using a more complicated pythagenpat formula)


Version 2.1, the underlying system is the that of runs added and subtracted changing the number of wins for a team. This is modeled very well by the PythagenPat formula cited earlier. To apply this here, we find the number of runs per game the player is better or worse than average. In this case, for Halladay, we have 59.7 runs in 32 games or 1.866 runs/game. So our exponent X = (53.6*.15447 - 1.866).285=1.698. And then we set RS=4.14 and RA=4.14-1.866 = 2.27 and plug into the PythagenPat formula, W-L% = .735. So an otherwise average team facing an average team should win 73% of the time with 2011 Halladay starting for them. Then to get wins above average: WAA = (.735-.500)*32 games = 7.52 WAA. We then do a similar calculation for a replacement player's runs allowed to get the difference between average and replacement level and add that here to get WAR. For Halladay's # of innings pitched and run-scoring environment this is 1.7 wins, so WAR = 1.7 + 7.52 = 9.22 WAR.

Halladay WAR: version 1.0, 8.57 WAR; version 2.0, 10.25 WAR; version 2.1, 9.22 WAR. As you can see this estimate makes a big difference, and with 2.1, our estimate is now very tight and technically correct.

A couple nice things about this new method. The W-L% is a display of WAR as a rate stat. Note this is just for games the player plays in, so a top starter will always have a higher W-L% than a top position player, but position players are in a lot more games. If you want the team's W-L% per 162-game season, add in enough .500 games to get the season total. One other nice thing is that it handles correctly the effect of the player's runs on run environment, and the differences between high and low run scoring environments. We've ditched the approximation for the real thing.

For position players, we add offensive times to the team's runs scored and subtract fielding runs from the runs allowed, so an offensive player will affect both sides of that interaction.
   35. fra paolo Posted: November 22, 2017 at 02:14 PM (#5580303)
In keeping with my comment about James' lifetime tendency towards badmouthing, I just found this
Finally, my biggest complaint is probably tone...some passages have (IMHO) an unnecessarily nasty and off-putting tone, e.g., the comments re H. Greenberg, M. Wills, J. Morgan, etc. I don't necessarily disagree with James's views in some cases, but the way he discussed some of these issues seemed unfair. E.g., the discussion of J. Morgan's comments about the ESPN commercial. He's got a point, but couldn't have just mentioned the incident and said: "Dear Joe -- it's a commercial. It's not real life. Don't worry about it?" He goes out of his way to really slam Morgan, which seemed pointless and beneath him.

here.
   36. Greg K Posted: November 22, 2017 at 02:21 PM (#5580305)
That's interesting reading. The 2001 edition of the Historical Baseball Abstract may have had a deeper influence on me than any other book, and is largely responsible for getting me back into baseball in my early 20s after taking a leave of absence as a teenager.

The immediate leap into the negative reviews there is jarring!
   37. Morty Causa Posted: November 22, 2017 at 03:26 PM (#5580337)
Bill James does not have the scholar's dispassionate, objective mindset. He's a polemicist. He began as one, was one most of his career, and probably only sets it aside when he feels it conflicts with his role on the Red Sox. But, the mindset is still there.
   38. K-BAR, J-BAR (trhn) Posted: November 22, 2017 at 03:34 PM (#5580342)
And I'm pretty sure that FG doesn't do that either.


Fangraphs definitely does (from their WAR for Pitchers):

Dynamic Runs Per Win

As noted above, pitchers directly influence their run environment based on how well they pitch. As a result, we don’t simply use the league average Runs Per Win (RPW) value, we use a dynamic equation that creates a unique RPW value based on the pitcher’s innings per game and pFIPR9.


I wrote a long Comment here a while back arguing it's double counting. It might be unfair, as these things tend to break down at the extremes, but I think I found that a 32 GS CG SHO pitcher with a 0.00 ERA in an otherwise neutral environment had more wins above replacement than games started. The reason I looked into it at all was it seemed odd that WAR would say a SP should win the MVP every year.
   39. Dr. Chaleeko Posted: November 22, 2017 at 05:24 PM (#5580399)
Re #25

Yes, I think the implication is that a homer is worth more in a Halladay start than a Haynes start. Not that it is literally worth more runs (in case that’s what you were getting at), but that the homer against Halladay has more impact on winning against his team because he lowers the run environment just by being amazing.

As someone who plays along at home, I’ve found both BBREF and FG’s explanation pages hard to follow when working through WAR on my own. The page Sean cited required especially close reading. Not trying to be “vicious” a la noted above for James, more like a little user feedback. Sean’s page seems a lot more helpful than FG’s did when I last looked at the latter several years ago (don’t know if it has changed but back then it felt like a lot of background and mechanical info was assumed or simply missing). But, hey, this is complicated stuff, and Bill needed a huge chunk of his Win Shares book to explain his system.
   40. BDC Posted: November 22, 2017 at 05:43 PM (#5580408)
The 2001 edition of the Historical Baseball Abstract may have had a deeper influence on me than any other book

For many years, I read a book during lunch, and for many of those years, it was one of the Historical Abstracts, or The Politics of Glory, or Win Shares. I read those books (at least the prose parts) over and over. Bill James is a terrific writer and rhetorician, whatever people say about him, and he has been a major influence on my writing if not my often-vapid thinking.
   41. Fred Garvin is dead to Mug Posted: November 22, 2017 at 06:39 PM (#5580428)
He badmouths and mis-represents WAR.

Funny, but that’s my main gripe about the BPro essay on the subject — that it misrepresents James’s argument. (That and the fact it was hard for me to understand and seemed more focused on self-promoting a proprietary stat.)

The thing is that both sides have dogs in the fight. Bill James doesn’t just favor a context-based approach that matches up with actual wins, but he created a system based on it. Sean Forman and the folks at Fangraphs and BPro did the same thing based on their context-neutral approach.
   42. Walt Davis Posted: November 22, 2017 at 07:29 PM (#5580448)
what if it could be shown that the context of Judge’s production largely occurred in low-leverage situations?

So when instead of Judge I draft 2017 Altuve on April 1, 2017 ... I somehow have to make sure the rest of my team provides him with the exact same number and type of opportunities in the same order with the same game score and equal pitcher/defense because otherwise I don't know how he'll hit in the unique/different contexts he would be presented with if we replayed 2017?

This nicely exposes the paucity of the "context" argument. Even as a thought experiment, there's no way to recreate the context of any player's season-long performance.

If that triple happened with the bases loaded in the bottom of the 9th, down 3, and the player then scored on a wild pitch, I would submit that it was "more valuable" than the other triple even if that's not reflected in WAR.

It's potential vs. kinetic energy. It's not a question of which event produced the better outcome, we all agree a triple with the bases loaded is a better team outcome than a triple with the bases empty (in the same/similar run context). The question is who deserves credit for that better outcome. The triple is "meaningless" without 3 previous batters reaching base safely; the 3 batters who reached base safely are "meaningless" without the triple (or some positive event). None of them deserve individual credit for the context they all created.

One firecracker makes a nice big noise; another is a dud. Why give credit/blame to the guy who lit the fuse?

There is almost no value in baseball that is produced by an individual, especially on offense (pitches are weird). Value is produced by the interaction (or sequencing) of events and any individual batter has (partial) control over only one event in that sequence. The final realized value of the sequence is shared among those involved. And all a batter can do to maximize the value of a sequence he's in is to maximize his individual outcome. That's true regardless of what the sequence is.

If you really want to adjust for context then you need something like WPA/LI or "how did the actual change in WPA compare with the average or maximum possible WPA?"

In statistics, in a multi-variable regression model, the model estimates the independent (or conditional) effect of each variable on the outcome. However, this is in recognition that much of the variance explained is due to the covariance among variables.
   43. cardsfanboy Posted: November 22, 2017 at 08:35 PM (#5580466)
 9. BDC Posted: November 21, 2017 at 05:09 PM (#5579764)
Were I to vote, my criteria would be: "If I was a GM on April 1 and knew all of the players' performances for the season ahead, which player would I choose?"


Exactly. Who would be the first player to go in a completely open draft starting from zero, with complete foresight.


That is the way I look at the MVP also. (Ignoring salary and any future contract issues) MVP is the individual who helps creates the most wins for any team he plays on. I get other arguments about the spot on a particular team (in the past, the Cardinals argued that Jose Oquendo was the MVP because of his versatility, and other players got bonus points for willingness to change positions, which they should---Miguel Cabrera is the most recent excellent example, even if their defensive performance was lacking, it was about the combination of the player and the team needs(same could be said about Biggio and Musial to a lesser extent also) )
   44. cardsfanboy Posted: November 22, 2017 at 08:41 PM (#5580468)
Pronunciation issues aside, I do think James has been unrelenting in his criticisms of WAR over the years, and I do think it has to do with the existence of his own baby, Win Shares. This isn't to say that everything James has said about WAR is wrong -- he's made some very insightful critiques with respect to WAR over the years -- but I've always felt that James brings a prejudice into each of these arguments. I sort of give him a pass because (1) he's Bill F. James, and (2) he's always been indelicate when it comes to things he disagrees with, but this isn't the first time he's gone off on WAR in ways that I don't think are entirely fair.


I work retail, so I have been kinda busy the past week or two, so I've missed a lot of this discussion on War that has exploded the past few days, but I agree with the comment posted above(even if I don't know the full scope of the conversation going on right now). I like the concept of Win share myself, and think it gets a bit of a short shrift among saber circles, but at the same time, I do think James has a bit of a stick up his butt on War because he did so much work on win shares and it wasn't embraced.
   45. Rob_Wood Posted: November 22, 2017 at 08:45 PM (#5580471)
I think Walt has hit the nail on the head. However, the answer to this conundrum is not necessarily context-neutral stats.

Suppose, in an alternate universe, our standard stats kept track of each hitter's performance in each of the 8 base situations (bases empty, runner on first, ... , bases loaded). Then our WAR-type stats could take the base situation into account and, presumably, be more "accurate" in valuing a hitter's overall contribution to his team scoring runs, leaving aside for the moment the conversion of runs into wins.

That is, there is a trade-off between the abstraction built into the gathering of the statistical data and the accuracy of the end-resulting WAR-type stats built from those statistical abstractions. This point has been made before and by others in our recent discussions so I claim no originality here.

So, it appears to me, the context-neutral vs. context-adjusted debate is more properly considered to be at what level of context are we most comfortable with. I believe that Kiko Sakata has recently developed a play-by-play based stat that does essentially what Walt "recommends" above to derive the ultimate context-adjusted stat. Kiko's stat requires a ton of detailed data and a complex statistical framework.

I wish I knew a good idiom for this. Something like a good carpenter has many different tools and knows which tool to use for each job. WAR is undoubtedly the proper tool for some jobs and other tools (stats) are undoubtedly better for other jobs.
   46. cardsfanboy Posted: November 22, 2017 at 08:54 PM (#5580473)
For many years, I read a book during lunch, and for many of those years, it was one of the Historical Abstracts, or The Politics of Glory, or Win Shares. I read those books (at least the prose parts) over and over. Bill James is a terrific writer and rhetorician, whatever people say about him, and he has been a major influence on my writing if not my often-vapid thinking.


Agree, that historical abstract was one of the greatest gifts from my gf and it had a sacred place in my bathroom for years(a players page was perfect bathroom reading)

At one point in time I found a bookstore that had them on clearance for 4.95(or something like that) and I bought about 8 of them and handed them out to friends. It's one of the best books I've ever read, he is a terrific writer, and I love the way his brain works. Same thing with his Politics of Glory, great book and again it doesn't just talk to you, it explains the thought process involved in what he was saying... (same thing with the Win Shares book) so even if you are coming into his books completely blind, he "teaches" you what he is thinking.
   47. cardsfanboy Posted: November 22, 2017 at 09:06 PM (#5580475)
The thing to me about the context debate, is that it's often used in MVP discussions, when the numbers WPA produces is based upon league average, and we are discussing league elite hitters. Let's say you have two hitters for consideration of the MVP, and everything more or less looks similar, but then you see that player a has a larger wpa(or whatever you are using) vs player b.... Instinctively you might use that as a tie breaker, (and I'm not going to really say you are wrong there) but there is still more to the discussion than just WPA, you could have two guys who under every situation will perform exactly the same, and being elite hitters, they will both do well above average, but one guy just gets more chances because of his team, so he is putting up a 5 wpa, while his competition is putting up a 3 wpa.


I get giving the guy who did more actual performance the advantage, but at the same time, it's not like he actually did outperform the other guy, he just got lucky by team selection. Going from BDC criteria in post 9, if you swapped the two players you wouldn't have lost anything in the w/l column.
   48. cardsfanboy Posted: November 22, 2017 at 09:10 PM (#5580476)
I fully am on board with criticizing war, I've had my share of beefs with it(some of which I don't think is resolvable) but that still doesn't negate it's value, and the reason why it's rightfully, the first stat that people go to, when it comes to starting the MVP debate.


bb-ref clearly states on their website that it's not accurate to the decimal place, and just because we are who we are, we often accept the accuracy to the decimal place when we shouldn't. (I haven't looked at it recently, but originally the Ryan Braun vs Kemp MVP was within that margin of error, yet still some people had issues with Braun winning it over Kemp who had the higher war by a few decimal places---looking at it now, and of course two pitchers were ahead of them--Cliff Lee and Roy Halladay)
   49. GuyM Posted: November 23, 2017 at 09:20 AM (#5580540)
The “wins matter” vs “context-neutral” debate is certainly interesting, and obviously elicits great passion in many quarters. But as a practical matter, it’s much ado about relatively little. Even if you agree that team wins matter, I don't think incorporating them into WAR is going to change individual player assessments very much.

James says the penalty when a team underperforms its pythag record should be proportional to the player’s WAR (lowering Judge's WAR by a substantial 1.3 wins). But why? If we are going to distribute blame for a team’s bad luck in the arrangement of its RS and RA (or bad timing, or whatever you prefer to call it), why should the best players receive nearly all of it while poor performers receive little or none? Bill doesn’t say, and it would be great to hear his argument. But it seems unlikely this is the right answer.

First, it’s hard to imagine that only a team’s “above replacement” runs come at inopportune times. All of NYY’s 858 runs had the potential to impact game outcomes. In actual games we can’t identify the “above replacement” runs, there are just runs. Example: Matt Holliday had 427 PA but zero WAR. James' adjustment says that Holliday gets no adjustment at all (0*.16=0). So even though Holliday made outs, got hits, ran the bases, even played a little in the field, Bill is saying that Holliday—by definition—could not have played *any* role in NYY underperforming its pythag W-L. That can’t generally be true (it might be true for Holliday, I have no idea) -- it’s certainly possible that the timing of Holliday’s outcomes (both hits and outs) was unusually bad. So even if you think the win penalty should be proportional to a player’s production, it should be tied to runs created (RC) not runs above replacement (RAR). In Judge’s case, he created 17% of the team’s runs. Using B-Ref, NYY had 52 WAR + 48 replacement wins = 100 theoretical wins -- but really won 91 -- so NYY got 9 undeserved WAR. We penalize the offense for 60%, or -5.4 wins, so Judge’s share is .17*-5.4= -0.9 wins, or about 30% less than James' estimate.

But as the Holliday example also illustrates, it’s not obvious that a player’s impact on a team’s win deficit is proportional to his RC. When a team underperformed, its positive outcomes (H, BB) were less valuable than usual. But it will also be true that its negative outcomes (outs) were *more damaging* than usual. And therefore, weak hitters can have a big impact too: they may come up more often in important situations, and/or perform even worse than usual at those times. Again, I’d like to hear James' argument, but it’s hard to see why only good outcomes – and thus good players – should see their value changed when we link to team wins.

Instead, our starting assumption should be that players are debited/credited based on playing time – their “footprint” on the game -- not their productivity. For Judge: position players account for 60% of wins, and Judge had 11% of NYY PAs, so his penalty is 9*.6*.11 = 0.6 wins, about half of James' estimate. That reduces Judge's WAR from 8.1 to 7.5 -- not a trivial change, to be sure, but also not a huge deal (and 2017 NYY is about as big a pythag departure as we ever see).
   50. PreservedFish Posted: November 23, 2017 at 10:13 AM (#5580549)
GuyM, clearly you've thought about the math more deeply than James did. I would be surprised if he thought through all of the ramifications of his suggestion, for example, that a 0 WAR player would receive no penalty.
   51. shoewizard Posted: November 23, 2017 at 10:35 AM (#5580556)
One firecracker makes a nice big noise; another is a dud. Why give credit/blame to the guy who lit the fuse?


Love this.
   52. PreservedFish Posted: November 23, 2017 at 10:55 AM (#5580562)
One firecracker makes a nice big noise; another is a dud. Why give credit/blame to the guy who lit the fuse?


It's a very bad analogy, though. It's only relevant if we are proposing that we care about value with perfect hindsight (ie a single that is followed by a homerun is more valuable than a single that is followed by a GDP) - and in these various threads and articles so far on the topic I have only ever seen that mooted as a strawman.
   53. Blanks for Nothing, Larvell Posted: November 23, 2017 at 12:02 PM (#5580577)
The question is who deserves credit for that better outcome. The triple is "meaningless" without 3 previous batters reaching base safely; the 3 batters who reached base safely are "meaningless" without the triple (or some positive event). None of them deserve individual credit for the context they all created.


Actually, the question is whether allocating individual "credit" for things that happen under the rules of baseball is even possible and as your second sentence notes, it isn't. The contextual states an individual hitter faces are definitionally the results of other players' successes or failures. (With I guess some marginal exceptions, like leading off a game.)

What this means among other things, of course, is that any argument that a piece of data that measures things that happened, like RBIs can't fairly be attacked for improperly allocating "credit."
   54. BDC Posted: November 23, 2017 at 12:45 PM (#5580589)
any argument that a piece of data that measures things that happened, like RBIs can't fairly be attacked for improperly allocating "credit."

But this also seems to me a straw man. Nobody's attacking RBIs. As I've always understood it, the usual argument is that a better hitter in a weak context would have had more RBIs than a lesser hitter in a good context, if you switched contexts. More RBIs would be a great thing.

Willie Mays had 112 RBIs in 1965 and Deron Johnson had 130; Mays out-hit Johnson by 30 points and out-slugged him by 130. It is reasonable to assume that if Mays had spent 1965 batting behind Tommy Harper, Pete Rose, Vada Pinson, and Frank Robinson, he'd have had more than 130 RBIs, and that if Johnson had spent it batting behind Dick Schofield and Jesus Alou, he would have had fewer than 112. That seems uncontroversial to me, and not at all an "attack on RBIs."
   55. cardsfanboy Posted: November 23, 2017 at 01:07 PM (#5580594)

What this means among other things, of course, is that any argument that a piece of data that measures things that happened, like RBIs can't fairly be attacked for improperly allocating "credit."


The issue with rbi's is that it doesn't have a rate component. So we don't have a clue if a guy is good or bad at the skill, with homeruns or any other hitting event we have an easy to identify rate component, with rbi we do not. A guy putting up 100 rbi over the course of a full season, might be a guy having a good year offensively or might not be, we just do not know anything from the stat rbi.
   56. fra paolo Posted: November 23, 2017 at 01:52 PM (#5580598)
Walt Davis:
This nicely exposes the paucity of the "context" argument. Even as a thought experiment, there's no way to recreate the context of any player's season-long performance....

There is almost no value in baseball that is produced by an individual, especially on offense (pitches are weird). Value is produced by the interaction (or sequencing) of events and any individual batter has (partial) control over only one event in that sequence. The final realized value of the sequence is shared among those involved.


Rob Wood:
I think Walt has hit the nail on the head. However, the answer to this conundrum is not necessarily context-neutral stats....

So, it appears to me, the context-neutral vs. context-adjusted debate is more properly considered to be at what level of context are we most comfortable with.


Blanks:
Actually, the question is whether allocating individual "credit" for things that happen under the rules of baseball is even possible and as your second sentence notes, it isn't. The contextual states an individual hitter faces are definitionally the results of other players' successes or failures

I have mentioned before that the supporters of context-neutral approaches basically throw up their hands at the prospect of incorporating context because it's impossible to account for everything. They argue to the effect that 'either we include all context or we try to reduce context as far as possible'.

But has anyone on the opposite side of the debate taken the position that we have to pin a number on everything? James argued that Judge wasn't nearly as valuable as WAR implied, because WAR gives numbers closer to Pythagorean wins that real wins. I've argued that we shouldn't eliminate the team context from allocating credit. SBB here implies that we should accept that all our numbers -- traditional or modern -- are imperfect, and a measure of judgment is required when analysing them.

Contextual advocates simply are not interested in the context-neutralist's interpretation of what is to be done regarding context. The numbers are only part of the process in deciding who was the MVP or who should be signed and for how much or whatever question one wants to answer. In the end, that's a position probably almost everyone on both sides agrees with.

GuyM:
Instead, our starting assumption should be that players are debited/credited based on playing time – their “footprint” on the game -- not their productivity. For Judge: position players account for 60% of wins, and Judge had 11% of NYY PAs, so his penalty is 9*.6*.11 = 0.6 wins, about half of James' estimate. That reduces Judge's WAR from 8.1 to 7.5 -- not a trivial change, to be sure, but also not a huge deal

James may have overstated it, but for my part I have often found that to be the case with sabermetric numbers -- that while not trivial, they are often exaggerated in their impact. IIRC, in years gone by at Primer park factors were given greater rhetorical significance than their practical effects. (The impact of groundball staffs' on fielders' value is another area I seem to recall something similar.)
   57. Fred Garvin is dead to Mug Posted: November 23, 2017 at 02:01 PM (#5580599)
So when instead of Judge I draft 2017 Altuve on April 1, 2017 ... I somehow have to make sure the rest of my team provides him with the exact same number and type of opportunities in the same order with the same game score and equal pitcher/defense because otherwise I don't know how he'll hit in the unique/different contexts he would be presented with if we replayed 2017?

No, not at all. All I’m saying that if Sean views the MVP ballot in terms of drafting players on April 1 with full knowledge of what they will produce, I’m thinking the purpose of having such a draft is to see what teams would win the most games — and, to that end, it would help to know that most of Judge’s performance took place in lower leverage situations, because although he may have a similar amount of production as Altuve, it didn’t actually help win as many games.

On a related note, suppose that in Mike Trout’s curtailed year, he accumulated the same WAR as both Judge and Altuve. Where would he fit in the mix? Should it make a difference that he amassed his entire production in a short time span? What if Mike Stanton hit all his 59 HRs in one game (presumably the Marlins won 375-6), then he missed the rest of the season and didn’t help the Marlins at all. Should that matter.
   58. Fred Garvin is dead to Mug Posted: November 23, 2017 at 02:12 PM (#5580600)
GuyM (#49): James says the penalty when a team underperforms its pythag record should be proportional to the player’s WAR (lowering Judge's WAR by a substantial 1.3 wins). But why? If we are going to distribute blame for a team’s bad luck in the arrangement of its RS and RA (or bad timing, or whatever you prefer to call it), why should the best players receive nearly all of it while poor performers receive little or none? Bill doesn’t say, and it would be great to hear his argument. But it seems unlikely this is the right answer.

James did say. Rather than just docking all the good players (as you suggest), James began from the premise that if the Yankees won 10% fewer games than they should, perhaps there should be a 10% reduction across the board. He then argues that this is at least justified in Judge’s case, because to the extent differences in actual/projected performance can be attributed to losing in high leverage situations, Judge was especially poor in this respect.
   59. GuyM Posted: November 23, 2017 at 02:52 PM (#5580608)
James did say. Rather than just docking all the good players (as you suggest), James began from the premise that if the Yankees won 10% fewer games than they should, perhaps there should be a 10% reduction across the board.

What you call "across the board" does in fact dock good players more. If you design the penalty as a percentage of each player's WAR -- as James does -- then a high-WAR player like Judge gets a big penalty (1.3 wins) while a low-WAR player like Holliday does not (zero penalty). What James did not say is *why* he thinks this is the right way to distribute the penalty among Yankee players (probably because he just assumed it).

He then argues that this is at least justified in Judge’s case, because to the extent differences in actual/projected performance can be attributed to losing in high leverage situations, Judge was especially poor in this respect.

We really don't know if Judge was "especially poor" at turning runs into wins. James looks at only one piece of the puzzle: how Judge hit with men on base and in high-leverage PA (poorly). That only looks at Judge's hitting from the RBI perspective. Judge led the league in both BB and Runs Scored -- did he perhaps do a good job of getting on base when it counted most, i.e. before other players got hits? James doesn't say (and probably doesn't know). By the same logic, when Judge hit poorly in high-leverage situations, half of the blame must go to the base runners who often got on base at the wrong time (i.e. when Judge failed to hit). When teams succeed or fail to cluster their offensive successes efficiently, there is no obvious reason to credit mainly those who hit later rather than earlier in the inning.

Focusing on leverage considers only what was known about the game when Judge hit, while ignoring lots of other information we now have. Consider Judge’s performance on April 17, when NYY won a game 7-4 and Judge went 2-4 with 1 HR and 3 RBIs. His WPA was just .008, because his RBIs came when NYY was already leading 4-0 and 5-0 and leverage was very low. Had Judge waited until the score was 4-4, and leverage was higher, WPA would have deemed him the hero. We now know that Judge produced NY's last 3 runs -- the winning runs! -- but leverage/WPA didn't know that. Eleven days later, NYY won 14-11 and Judge was 2-for-4 with a BB, 2 HR, and 3 RBI. His WPA was just 0.04, 5th among hitters and far behind a backup catcher who walked in his only PA. Judge’s failing here was the opposite: he hit his HRs when NYY was trailing 4-0 and 9-2. Leverage tells us his HR were virtually worthless, because his team still had no chance to win. But with hindsight, we know that his team *did* win, and Judge made an important contribution. The value of his performance now cannot possibly depend on the inning in which he delivered it.

And then there are the 400 or so balls hit to RF when Judge was in the field -- did he convert those to outs at particularly good or bad times? What about the hundreds of throws he made? His baserunning? All of these may have helped or hurt the team turn runs into wins. James has only scratched the surface....
   60. Blanks for Nothing, Larvell Posted: November 23, 2017 at 03:27 PM (#5580611)
What you call "across the board" does in fact dock good players more.


Because they benefit the most by the overage.

Think of WAR as not really real, but instead provisional -- the equivalent of going into escrow. They aren't really getting "docked" because the overage was never real to begin with. If our employer accidentally sends us a paycheck that's too high, we don't really get "docked" the money, because it was never ours to begin with. Payroll gave the 2017 Yankees too much money.
   61. cercopithecus aethiops Posted: November 24, 2017 at 09:37 AM (#5580670)
Because they benefit the most by the overage.


That doesn't mean that James' adjustment for the overage is mathematically correct. The highest WAR players would still get the biggest downward adjustment if one used Guy's approach.

I also don't get the argument that Judge's own clutch stats in 2017 somehow prove that this particular adjustment is appropriate. What about a player who performs well with RISP and in close/late situations, but whose team still underperforms?
   62. Dr. Chaleeko Posted: November 24, 2017 at 10:30 AM (#5580684)
then there are the 400 or so balls hit to RF when Judge was in the field -- did he convert those to outs at particularly good or bad times? What about the hundreds of throws he made? His baserunning? All of these may have helped or hurt the team turn runs into wins. James has only scratched the surface....

Completely agree. James ignored this entirely. He, in fact, based nearly all of his examples on batting average. Which is very 1980s. He barely, if at all talked about the kind of hits, of other on-base events and of baserunning. Let alone fielding! In fact, he might have been better off comparing this MVP race to the 1987 MVP races, where the power versus everything-else question was even more prominent. Just makes me think that James hadn’t thought this all the way through or, if so, was using some mental shorthand to talk about it, which weakened his examples. He claimed a level of certainty in his thinking about Judge’s in-context performance that wasn’t fully supported by the incomplete examples. And the context stats on BBREF don’t include fielding either, so they don’t go far enough either. I think Kiko’s do all of this, but, no offense to him, they need more intuitive and expressive labels to make it clear what they are capturing, how they differ, and how their interactions work. Especially because they are not intuitive (at least not yet) and are not easily calculable by someone playing along at home.
   63. Rally Posted: November 25, 2017 at 09:28 AM (#5580913)
No, not at all. All I’m saying that if Sean views the MVP ballot in terms of drafting players on April 1 with full knowledge of what they will produce, I’m thinking the purpose of having such a draft is to see what teams would win the most games — and, to that end, it would help to know that most of Judge’s performance took place in lower leverage situations, because although he may have a similar amount of production as Altuve, it didn’t actually help win as many games.


I don't think we can assume that Judge, if drafted by a different team on 4/1, would have the same pattern of hitting well in low leverage and hitting poorly in high leverage. We just don't know the causes of why Judge did so. If it is random luck, then his pattern would be totally different if playing a different schedule. Perhaps Judge was more "in the zone" some days, which just happened to coincide with Yankee blowouts, and he was in a funk other days when the Yankees were playing close games. Maybe his state of mind and body is just related to the calendar, so if he had been on the Angels on a given great day for him, the Angels might have been playing a 1 run game while the Yankees and Kole Calhoun were blowing somebody out.

You'd have to be convinced that under/overperforming in high leverage situations is an inherent skill/deficiency of the player to believe that his leverage pattern would transfer along with the homers, walks, and strikeouts when you move him into a new situation.
   64. Baldrick Posted: November 25, 2017 at 10:22 AM (#5580924)
Ultimately, for me, I'm just not very interested in pegging assessing of how 'good' a season was to whether the team won games. It's fine if other people are interested in that, but it doesn't seem particularly instructive for comparing quality of players, determining historical value, etc. (i.e. - the stuff that we spend the most time discussing on a site like this). Sure, overperformance can generate more wins, and that's great. But it's already rewarded...by more wins.

Let's say that Whitey Ford's performances happened to come on teams that overperformed their pythag and Warren Spahn's performances didn't. I have a hard time understanding why I would care at the time. I certainly can't think of why I would care in retrospect.
   65. fra paolo Posted: November 25, 2017 at 11:56 AM (#5580938)
You'd have to be convinced that under/overperforming in high leverage situations is an inherent skill/deficiency of the player to believe that his leverage pattern would transfer along with the homers, walks, and strikeouts when you move him into a new situation.

People are indeed convinced of that, and that's representative of the big problem in this conversation.

Listening to Dave Cameron reiterate his case on the Effectively Wild podcast earlier this week highlighted his complete lack of interest in what those people have to say. Which, you know, fair enough.

But it puts us in an impasse in regard to this particular conversation, because Cameron (and others) who didn't like what James had to say have basically argued that either we try to put all the context in or take all of it out.

Neither of those goals is achievable. In my opinion. (Cameron seems to think StatCast will make some difference, and perhaps he's right.)

In fact, the WAR that we have has completely failed to take all the context out. The problem with the Cameron position is that the 'logical conclusion' he assigns to the 'Contextualists' is one that he doesn't demand of his 'Neutralist' position.

James is arguing that some contexts are important to leave in. Cameron correctly raises the question 'Which ones?', but then refuses to answer it on the grounds that 'Putting them all in is too hard, and in any case more people use MY WAR rather than YOUR Win Shares.'

It's clearly a question he doesn't want to discuss.
   66. GuyM Posted: November 25, 2017 at 01:36 PM (#5580958)
James is arguing that some contexts are important to leave in. Cameron correctly raises the question 'Which ones?', but then refuses to answer it on the grounds that 'Putting them all in is too hard, and in any case more people use MY WAR rather than YOUR Win Shares.' It's clearly a question he doesn't want to discuss.

That doesn't seem fair to Cameron. He has written a lot over the years about how and why Fangraphs' version of WAR deals with luck. I don't always agree with where he comes down -- I don't like basing pitcher WAR on FIP, for example -- but he has certainly discussed these issues openly and at significant length.
   67. fra paolo Posted: November 26, 2017 at 03:46 PM (#5581163)
That doesn't seem fair to Cameron. He has written a lot over the years about how and why Fangraphs' version of WAR deals with luck.

I had a longer post, with a quote from Cameron, that got eaten. In summary it said that although Cameron may have written more extensively in the past, hearing this Effectively Wild appearance and reading the accompanying FanGraphs post, one gets the sense he considers -- for his purposes -- the conversation about dependent-versus-neutral over, despite the fact James has now made a valid criticism.
   68. Jay Z Posted: November 27, 2017 at 01:08 AM (#5581303)
Ultimately, for me, I'm just not very interested in pegging assessing of how 'good' a season was to whether the team won games. It's fine if other people are interested in that, but it doesn't seem particularly instructive for comparing quality of players, determining historical value, etc. (i.e. - the stuff that we spend the most time discussing on a site like this). Sure, overperformance can generate more wins, and that's great. But it's already rewarded...by more wins.

Let's say that Whitey Ford's performances happened to come on teams that overperformed their pythag and Warren Spahn's performances didn't. I have a hard time understanding why I would care at the time. I certainly can't think of why I would care in retrospect.


Because... wins matter?

Wins are the reason the sport is played.

Earl Weaver's teams were a combined 37 wins over their Pythag from the years 1976-82. I have no idea why. Don't know why this just started happening in 1976. But it kind of seems statistically significant in a game where there's often less than a 40 game spread, per year, between the best and the worst.

But it seems like you're just going to ignore stuff like that because it doesn't fit your worldview or you just don't care. Okay then.
   69. Jay Z Posted: November 27, 2017 at 01:13 AM (#5581304)
But this also seems to me a straw man. Nobody's attacking RBIs. As I've always understood it, the usual argument is that a better hitter in a weak context would have had more RBIs than a lesser hitter in a good context, if you switched contexts. More RBIs would be a great thing.

Willie Mays had 112 RBIs in 1965 and Deron Johnson had 130; Mays out-hit Johnson by 30 points and out-slugged him by 130. It is reasonable to assume that if Mays had spent 1965 batting behind Tommy Harper, Pete Rose, Vada Pinson, and Frank Robinson, he'd have had more than 130 RBIs, and that if Johnson had spent it batting behind Dick Schofield and Jesus Alou, he would have had fewer than 112. That seems uncontroversial to me, and not at all an "attack on RBIs."


I don't think anyone that favors context is arguing this point. A win expectancy formula adjusted for replacement value would handle this correctly.
   70. Baldrick Posted: November 27, 2017 at 08:01 AM (#5581311)
But it seems like you're just going to ignore stuff like that because it doesn't fit your worldview or you just don't care. Okay then.

I was going to write a response to this, but realized it was literally just repeating the thing you're quoting. If you're unable to read that and grasp how someone can believe that wins matter without wanting to peg individual player assessments to wins, nothing else I say is probably going to help.

I'll just reiterate that I have no problem with people generating statistics that care about context. Such stats might help construct a narrative story of a season, and some people might want to use them for determining things like MVP. Not my chosen approach, but it doesn't hurt to have more than one way of telling a story.
   71. GuyM Posted: November 27, 2017 at 09:32 AM (#5581333)
although Cameron may have written more extensively in the past, hearing this Effectively Wild appearance and reading the accompanying FanGraphs post, one gets the sense he considers -- for his purposes -- the conversation about dependent-versus-neutral over, despite the fact James has now made a valid criticism.
Well, for Cameron and Fangraphs, the conversation *is* over. They have thought about this a lot, and come down in favor of the context-neutral approach. As I said, I don't agree 100% with them on that, but it's a totally valid approach, and they've made their decision. Why should they pretend to view it as an open question? As for James, his essay hardly provided a novel insight. These issues have been debated and considered for years. The fact that individual player WAR doesn't always sum to team wins was a surprise to James, but not to anyone who has followed these issues for the past decade.

Some who agree with James may try to produce "contextualized WAR" type metrics. But I'll be surprised if any catch on and develop a following. I say this for two reasons:
1) for the vast majority of player seasons, it won't make much difference (less than 0.5 WAR). Even for Judge, on team that was -9 wins compared to pythag, the adjustment is only 0.6 wins.
2) Doing this in a precise way, trying to assess each player's individual contribution to his team over- or under-performing, is an enormously complex task. There are 100 different ways you could do it, and I doubt there will ever be any kind of consensus -- even among "pro-context" fans -- around a single approach.

Doing this is a vast amount of work, probably much more work than calculating the original WAR estimate, to add at best a little precision. After a few efforts, my guess is the project fizzles out......
   72. Steve Parris, Je t'aime Posted: November 27, 2017 at 10:35 AM (#5581384)
Suppose, in an alternate universe, our standard stats kept track of each hitter's performance in each of the 8 base situations (bases empty, runner on first, ... , bases loaded). Then our WAR-type stats could take the base situation into account and, presumably, be more "accurate" in valuing a hitter's overall contribution to his team scoring runs, leaving aside for the moment the conversion of runs into wins.

RE24/REW doesn't get much traction even in sabermetric circles, but it's something I've looked at in MVP discussions since this Joe Pos article. It's a bit of an apples-and-oranges mix and you would need to add in baserunning but you could use REW as a replacement for oWAR to better gauge run production without getting into the small sample messiness of leverage/WPA.

Judge comes out much better if you look at REW, leading Altuve by 0.7 of a win (though the AL winner is actually Trout). Similar story in the NL - Votto and Stanton are pretty similar by adjusted OPS, but Votto has a big 1.4 win advantage by REW. Interestingly that lead disappears if you look at WPA, even though Votto has a 100 point OPS advantage in high leverage situations.
   73. fra paolo Posted: November 27, 2017 at 11:14 AM (#5581419)
Doing this in a precise way, trying to assess each player's individual contribution to his team over- or under-performing, is an enormously complex task. There are 100 different ways you could do it, and I doubt there will ever be any kind of consensus -- even among "pro-context" fans -- around a single approach.

But this is the way of thinking that really distinguishes the two sides. There is no reason to go to these lengths. In my opinion. Win Shares sees a big gap between Altuve and Judge (36.7 vs 30.8), WAR not so much (8.3 vs 8.1). Funnily enough, they both agree who had more value, either way.*

The nub of the matter as raised by James is 'to what extent do team contexts matter?' in the numbers we actually have to work with. Cameron doesn't even discuss this question seriously in his recent contributions, nor in the links to older ones he has provided.

Cameron 'solves' the problem by weighting all events equally even though he -- not explicitly -- admits to knowing that having Carlos Correa batting behind Jose Altuve has some kind of influence on Altuve's WAR.

Win Shares thinks teams as a whole generate some kind of specific value that must be distributed or deducted from its players, FanGraphs' WAR not so much. It could be this value matters more in some areas than others. (I'm thinking of the value of pitching + fielding in particular.)

Leaving it up to Cameron's disregard of the problem as too complicated isn't going to advance our understanding.
________
* Interestingly, the NL picture is different
Win Shares: Blackmon 32.4, Stanton 32.2, Votto 31.5
BB-ref WAR: Stanton & Scherzer 7.6, Votto 7.5

You must be Registered and Logged In to post comments.

 

 

<< Back to main

News

All News | Prime News

Old-School Newsstand


BBTF Partner

Support BBTF

donate

Thanks to
BarrysLazyBoy
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogOT - NBA 2017-2018 Tip-off Thread
(2013 - 10:11pm, Dec 15)
Last: PJ Martinez

NewsblogOTP 11 December, 2017 - GOP strategist: Moore would have 'date with a baseball bat' if he tried dating teens where I grew up
(2272 - 10:11pm, Dec 15)
Last: David Nieporent (now, with children)

NewsblogRyan Thibs has his HOF Ballot Tracker Up and Running!
(458 - 9:37pm, Dec 15)
Last: homerwannabee

NewsblogDerek Jeter Was Once the Captain. But Now He’s the Apprentice. - The New York Times
(96 - 8:57pm, Dec 15)
Last: fra paolo

NewsblogPrimer Dugout (and link of the day) 12-15-2017
(9 - 8:48pm, Dec 15)
Last: Der-K: downgraded to lurker

NewsblogTaking Back the Ballparks - Mariners voting thread
(20 - 8:41pm, Dec 15)
Last: Omineca Greg

NewsblogA's reportedly acquire OF Stephen Piscotty
(6 - 7:13pm, Dec 15)
Last: cardsfanboy

NewsblogOT: Winter Soccer Thread
(361 - 6:41pm, Dec 15)
Last: Biff, highly-regarded young guy

NewsblogWinter Meeting Signings
(29 - 6:37pm, Dec 15)
Last: charityslave is thinking about baseball

Gonfalon CubsLooking to next year
(347 - 5:46pm, Dec 15)
Last: Moses Taylor, aka Hambone Fakenameington

NewsblogESPN: Bob Costas wins Hall of Fame's Frick Award for broadcasting
(23 - 5:02pm, Dec 15)
Last: ERROR---Jolly Old St. Nick

NewsblogMets agree to two-year deal with Anthony Swarzak
(22 - 4:52pm, Dec 15)
Last: Dog on the sidewalk

NewsblogOT Gaming: October 2015
(717 - 3:52pm, Dec 15)
Last: GGIAS (aka Poster Nutbag)

Sox TherapyA Container of Milk, A Loaf of Bread and a Dude Who Can Hit Home Runs
(30 - 2:39pm, Dec 15)
Last: jmurph

Hall of Merit2018 Hall of Merit Ballot Discussion
(394 - 2:23pm, Dec 15)
Last: Fridas Boss

Page rendered in 0.6366 seconds
47 querie(s) executed