Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Sunday, November 19, 2017

More on WAR – Joe Blogs – Medium

Bill is off base here. Bill seems to want to use WAR to answer questions that WAR is not really suited to answer. He’s not alone, of course. Other people use WAR to answer the MVP question all the time. The reason it’s done is, there really isn’t a tool out there designed to specifically answer the MVP question. Bill tried to do it with Win Shares. Unfortunately the adjustment methods he chose were too broad in nature.

In any event we don’t have to throw out WAR. It’s really useful for answering a lot of questions. I heartily agree with a Tangotiger suggestion:

I have always thought the best way to design a WAR would be to break it down into separate elements, which we later combine in the most appropriate way to best answer specific questions. The breakdown, IMO, should be: 1) offense, 2) defense (further broken down into components), 3) baserunning, 4) positional adjustment, and 5) context adjustment. (They should all also be presented with the related rate stat to help people answer other specific questions.)

Anyway, by introducing a timing/context adjustment as Tangotiger suggested, the value of the current WAR systems would increase. Our current data sets are much better than they were twenty years ago. We can now provide individual contexts, and need not rely on team ratios as Win Shares did. We should do it.

Unfortunately, though, the additions will generate more confusion as many people will still want to use one number to answer all questions.

“But because that is true, I ASSUMED that these were complex, nuanced, sophisticated systems. I never really looked; I just assumed that the details were out of my depth. But sometime in the last year I was doing some research that relied on these WAR systems, so I took a look at them, and … they’re not very impressive. They’re not well thought through; they haven’t made a convincing effort to address many of the inherent difficulties that the undertaking presents. They tend to get so far into the data, throw up their arms and make a wild guess. I don’t know if I’m going to get the time to do better of it, or if it will be left to others, but … we’re not at anything like an end point here. I assumed that these systems were a lot better than they actually are.”

Jim Furtado Posted: November 19, 2017 at 07:28 AM | 208 comment(s) Login to Bookmark
  Tags: bill james, sabermetrics, war

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 3 pages  1 2 3 > 
   1. Captain Supporter Posted: November 19, 2017 at 09:44 AM (#5578474)
I could not agree with Bill James more. WAR may indeed be useful for answering certain questions (certain ones, not a lot of them), but it has become way oversold to the general public as a generic way to do seemingly accurate comparisons of the performance and/or ability of any set of players irregardless of the era they played, the position they played, their performance in 'clutch' situations in a particular year or over a career, their offensive and defensive abilities and the weight they assign to those facets of performance, the accuracy of particular components of what is a composite stat, etc., etc.

I am well aware of the arguments that can be made about how to adjust for some of the above problems as well as for many other problems that I did not mention, but the problem is that the stat is not used that way in the real world. Its a crude measure that does in fact pretend to be an accurate measure to answer all questions, particularly when it comes to historical comparisons or when it compares an strong offensive player to a strong defensive player. Of course, WAR is a poor stat for MVP comparisons but readers of this site know that the supposedly knowledgeable people here generally start all MVP debates by starting with a list of players with their associated WAR ranking. When I hear David Cone show off his clearly very limited understanding of statistics on Yankee broadcasts people by spouting some nonsense about WAR, I cringe. As far as I can tell, the analytic departments of baseball teams (including, obviously, Bill James) don't pay the slightest attention to WAR, and I can well understand why.
   2. shoewizard Posted: November 19, 2017 at 11:34 AM (#5578507)
Couple of questions:

1.) If for example, you magically shifted a player's hits and homers in low leverage numbers to high leverage, than don't you turn some of those low leverage into high leverage ? If you take away a 3 run homer a player hits in the first inning, and the rbi single he hit in the 4th when his team was up 3-0, then this also lowers the leverage for his 6th inning at bat. In Judge's specific case, how many games were the Yankees NOT in a high leverage and NOT clinging to a 1 run lead and taxing the back end of their bullpen specifically because Judge went off in the early innings in low leverage situations. This is not necessarily a question about Judge specifically, just in general.


2.) Is Win Share updated and published in season ? I haven't seen it, but it might be. How do you set up a value metric to start the season when you have no idea what the sequencing is going to be and the sequencing is not predictable ?

3.) What does Win Shares say about Judge vs. Carlos Correa ? Or George Springer ?

4.) Edit: 1 more question: What does WS say about Stanton or Votto vs. Altuve ? Is the advantage Altuve has greater or less in WS compared to WAR ? Note that both Stanton and Votto have higher WPA numbers, but lower overall WAR, and of course on losing teams.


When it's all said and done, I believe, (But am not 100% certain, so compelling arguments can sway me) that Win Shares is going to give more credit to guys who play on winning teams. There may have been a confluence of factors that can be pointed out in THIS particular MVP race to cause a larger than usual gap in how WAR and Win Shares measure. But it seems it simply goes back to crediting guys on winning teams more than players on losing teams. Guys with better teammates, or even guys with Neutral High/Low leverage profiles,but whose teammates performed better in close games, (or who had a lights out bullpen) are going to get a bump.

As for the emotional components to this debate, which clearly there are many, I am not personally invested like Bill, and his closest supporters are, or Sean or the folks at FG. But honestly something doesn't feel right here. I won't say much more than that though about this aspect of the ensuing debate. But it made me think of This scene, starting from the 1 minute mark

   3. cercopithecus aethiops Posted: November 19, 2017 at 01:11 PM (#5578528)
In Judge's specific case, how many games were the Yankees NOT in a high leverage and NOT clinging to a 1 run lead and taxing the back end of their bullpen specifically because Judge went off in the early innings in low leverage situations. This is not necessarily a question about Judge specifically, just in general.


Yankees relievers blew 23 saves, thus creating not only team losses,
but high leverage situations that might otherwise have been low leverage situations had earlier offensive contributions not been squandered.
   4. The Duke Posted: November 19, 2017 at 02:10 PM (#5578546)
The real problem with WAR and all the sabr stats is a refusal of those that display them to use the same definition. New age analysts hate batting average and RBIs and Wins but no matter where you look those numbers are the same. When you get to WAR fangraphs, bbref, baseball prospectus etc all have different numbers and the differences are large. If WAR was 2.4, 2.5, and 2.6 it would be ok but some players can be 2.5 in one call and 3.5 in another. How is anyone supposed to take that seriously? How can Tommy John have such different WAR totals?

Jay jaffe has to change his hall his metrics because WAR keeps changing - his idea was a good one but his metric sucks. And why do new stats have incomprehensible acronyms like bWAR and fWAR or SIERA At least BABIP is straight-forward. The stat guys need a course in marketing and stare decisis. Stop changing the metrics and give them easy handles.
   5. Mike Webber Posted: November 19, 2017 at 02:15 PM (#5578548)

3.) What does Win Shares say about Judge vs. Carlos Correa ? Or George Springer ?

4.) Edit: 1 more question: What does WS say about Stanton or Votto vs. Altuve ? Is the advantage Altuve has greater or less in WS compared to WAR ? Note that both Stanton and Votto have higher WPA numbers, but lower overall WAR, and of course on losing teams.


Answering Shoe plus the other awards. These aren't necessarily leader boards..

35 Altuve
29 Judge
28 Jose Ramirez
26 Correa
24 Springer


33 Votto
29 Stanton
29 GoldSchmidt
26 Arenado
26 Bryant

23 Bellinger

23 Kluber
20 Sale
19 Kimbrel

21 Scherer
19 Kershaw

   6. cmd600 Posted: November 19, 2017 at 02:55 PM (#5578555)
The real problem with WAR and all the sabr stats is a refusal of those that display them to use the same definition. New age analysts hate batting average and RBIs and Wins but no matter where you look those numbers are the same. When you get to WAR fangraphs, bbref, baseball prospectus etc all have different numbers and the differences are large. If WAR was 2.4, 2.5, and 2.6 it would be ok but some players can be 2.5 in one call and 3.5 in another. How is anyone supposed to take that seriously? How can Tommy John have such different WAR totals?


Yeah! The way science works is totally that we all just kind of agree before any research is done to look at the data only one way and to agree that the results are ironclad forever, because keeping things simple enough for laymen to understand is way more important than acknowledging that stuff can be complicated and messy.
   7. Kiko Sakata Posted: November 19, 2017 at 02:56 PM (#5578557)
   8. Mike Webber Posted: November 19, 2017 at 03:05 PM (#5578562)
Kiko - is there a way to see the 2017 leaderboards? Or the 2017 results for individuals?
   9. shoewizard Posted: November 19, 2017 at 03:08 PM (#5578565)
just looked, don't think he has updated.
   10. Kiko Sakata Posted: November 19, 2017 at 03:16 PM (#5578567)
Unfortunately, my data source is Retrosheet and they don't release current seasons until the end of November. Obviously, the next thing I need to do is figure out how to pull MLB data myself and update in-season. Maybe next season.
   11. shoewizard Posted: November 19, 2017 at 03:27 PM (#5578570)
I Have checked it out several times previously, and enjoy the perspective. Look forward to the next update.

   12. bachslunch Posted: November 19, 2017 at 03:28 PM (#5578571)
It’s not clear to me what’s wrong with WAR if you control for position and either IPs or PAs.
   13. Kiko Sakata Posted: November 19, 2017 at 03:42 PM (#5578573)
I Have checked it out several times previously, and enjoy the perspective. Look forward to the next update.


Thank you very much!
   14. Dr. Chaleeko Posted: November 19, 2017 at 06:05 PM (#5578593)
I’m a little unclear about James’ point about Pythagoras records. He phrases things as if WAR were credited to players based on their team’s pythag wins, but as mentioned upthread, WAR is about runs not wins. I recall no place in the WAR framework or calculations where team records of any sort are mentioned. Though I could be quite mistaken too.

As Kiko points out, too, James is only talking about hitting. But clutch situations exist for running and fielding, and Bill probably needs to consider them as well instead of speaking almost exclusively about batting average in the clutch. Even so, if Judge bats .190 in those situations, but he hits four walk off grand slams it’s a whole ‘nother thing. I like extreme examples. What if Altuve hits .330 in the clutch but they are all infield hits? Or if Hosmer hit .300 in the clutch and runners got thrown out at home on every one of his hits? Should Hosmer get credit for hitting well in the clutch? Or were his hits not good enough to score the runners? For that matter, when Judge homers in the clutch, do we credit it to him as a clutch performance or debit it against the pitcher for clutch failure? Because if the pitcher chokes and leaves one in the middle of the plate, and Judge homers on a fat pitch is it really all his glory?
   15. Jay Z Posted: November 19, 2017 at 07:08 PM (#5578599)
I’m a little unclear about James’ point about Pythagoras records. He phrases things as if WAR were credited to players based on their team’s pythag wins, but as mentioned upthread, WAR is about runs not wins. I recall no place in the WAR framework or calculations where team records of any sort are mentioned. Though I could be quite mistaken too.


If WAR is about runs, then perhaps not call it "Wins Above Replacement?" Maybe Estimated WAR at least? Because right now it isn't tied to actual wins at all, why call it that. Pitcher Wins, at least the pitcher's team won the game.
   16. Jay Z Posted: November 19, 2017 at 07:55 PM (#5578609)
1.) If for example, you magically shifted a player's hits and homers in low leverage numbers to high leverage, than don't you turn some of those low leverage into high leverage ? If you take away a 3 run homer a player hits in the first inning, and the rbi single he hit in the 4th when his team was up 3-0, then this also lowers the leverage for his 6th inning at bat. In Judge's specific case, how many games were the Yankees NOT in a high leverage and NOT clinging to a 1 run lead and taxing the back end of their bullpen specifically because Judge went off in the early innings in low leverage situations. This is not necessarily a question about Judge specifically, just in general.


I can't answer your question directly. Baseball Reference doesn't explicitly say what is high, medium, low leverage.

Judge did have slightly more high and medium leverage situations than Altuve. Judge and Altuve had almost exactly the same OPS in high and medium leverage situations. In low leverage situations Judge wins 1.115 to .923.

The stats can be all over the place. Judge had more Late + Close PAs than Altuve. He did much worse than Altuve in those PAs. Judge had more PAs in games where the margin was more than 4 runs. His OPS was ridiculous - 1.500! Altuve had .942.

I am comfortable discounting Judge's performance based on his rather poor clutch splits.
   17. K-BAR, J-BAR (trhn) Posted: November 19, 2017 at 07:59 PM (#5578611)
Ha ha. Bill James wants to avoid punching down at bb-ref and fangraphs. Because Bill has been driving the debate for the last decade!

As Tango implies, this whole thing is semantic. People who have paid attention to sabremetrics know that WAR is another word for VORP or WARP and that it relies to some level of abstraction. To the casual fan, it's probably like QB rating, some means of ranking that at least vaguely suggests the magnitude of differences among players. Folks like it because while may be surprising occasionally, it's rarely outright crazy. And it mostly confirms folks' intutitions.

My main complaint about WAR is something I don't understand on the pitcher side. There's an adjustment in the Runs to Win formula that decreases the number of runs per win based on how pitchers influence their own run environment. Basically Kershaw only needs 7 runs per win whereas Edwin Jackson needs, like 11. Seems like that's double counting.
   18. Slivers of Maranville descends into chaos (SdeB) Posted: November 19, 2017 at 09:13 PM (#5578620)
When it's all said and done, I believe, (But am not 100% certain, so compelling arguments can sway me) that Win Shares is going to give more credit to guys who play on winning teams. There may have been a confluence of factors that can be pointed out in THIS particular MVP race to cause a larger than usual gap in how WAR and Win Shares measure. But it seems it simply goes back to crediting guys on winning teams more than players on losing teams.


Bill James said the whole idea behind Win Shares was NOT to do this. Others can better say how well he succeeded, but for example Steve Carlton earns a whole pile of Win Shares on those shitty Phillies teams.
   19. Baldrick Posted: November 19, 2017 at 09:28 PM (#5578621)
If WAR is about runs, then perhaps not call it "Wins Above Replacement?" Maybe Estimated WAR at least? Because right now it isn't tied to actual wins at all, why call it that. Pitcher Wins, at least the pitcher's team won the game.

The goal of the game is to get wins, but players can't generate wins individually; they can only generate runs (or pieces of runs, really). So you calculate how many runs they contribute and then see how many wins that can be expected to produce.

I find it hard to believe that people are genuinely confused about this. I think they just don't like the idea of acontextual stats and have decided that this is a useful talking point. Maybe I'm wrong.
   20. PreservedFish Posted: November 19, 2017 at 09:47 PM (#5578623)
When it's all said and done, I believe, (But am not 100% certain, so compelling arguments can sway me) that Win Shares is going to give more credit to guys who play on winning teams.


I don't even understand why this would follow. Winning teams have more good players.
   21. PreservedFish Posted: November 19, 2017 at 09:52 PM (#5578624)
I’m a little unclear about James’ point about Pythagoras records. He phrases things as if WAR were credited to players based on their team’s pythag wins, but as mentioned upthread, WAR is about runs not wins. I recall no place in the WAR framework or calculations where team records of any sort are mentioned. Though I could be quite mistaken too.


Yes, you are missing the point. Pythagorean records have nothing to do with this except as a rough measure of team ability. A team with a pythag of 100 but 90 wins is more likely to have an accumulated WAR total befitting that of a 100 win team, rather than that of a 90 win team. But they did not win 100 games, they won 90 games, so James would have us discount the WAR of every one of those players to account for that fact.
   22. Kiko Sakata Posted: November 19, 2017 at 10:09 PM (#5578626)
When it's all said and done, I believe, (But am not 100% certain, so compelling arguments can sway me) that Win Shares is going to give more credit to guys who play on winning teams.


Win Shares gives more credit to players on teams which won more games than expected (basically, that outplayed their Pythag, although Win Shares is based on a linear approximation of Pythag). There's probably a slight correlation between teams that win a lot and teams that win more than expected, but there are plenty of winning teams that under-perform their Pythag (the 2016 Cubs immediately come to mind for me).

Where Win Shares falls short, at least when he originally wrote the book on them back in 2002, I believe, is that James adjusts everybody's Win Shares based on team-wide deviations from Pythag, rather than looking at the game level to see who was actually responsible for the team's specific wins. In contrast, my Player won-lost records are calculated at the game level, so player wins (pWins) are associated with specific team wins. [I also calculate expected wins (eWins) which are divorced from context.]
   23. snapper (history's 42nd greatest monster) Posted: November 19, 2017 at 10:51 PM (#5578631)

Yes, you are missing the point. Pythagorean records have nothing to do with this except as a rough measure of team ability. A team with a pythag of 100 but 90 wins is more likely to have an accumulated WAR total befitting that of a 100 win team, rather than that of a 90 win team. But they did not win 100 games, they won 90 games, so James would have us discount the WAR of every one of those players to account for that fact.


But that's dumb. The fact that a 100 Pythag-win team happened to go 10-20 in one-run games tells us exactly nothing about their performance or ability.
   24. PreservedFish Posted: November 19, 2017 at 11:08 PM (#5578634)
It tells you nothing about their ability. It certainly tells you something about their performance: that they sucked in close games. This is just the old value/ability debate. This was in context of the MVP vote and comes from people using WAR as a measure of value - this is an adjustment that perhaps makes it track value even better.
   25. Ziggy: The Platonic Form of Russell Branyan Posted: November 19, 2017 at 11:33 PM (#5578641)
If WAR is about runs, then perhaps not call it "Wins Above Replacement?" Maybe Estimated WAR at least? Because right now it isn't tied to actual wins at all, why call it that. Pitcher Wins, at least the pitcher's team won the game.


I was going to respond to this, but Baldrick did a good job in #19, so just let me add a little justification for using 'wins'. If you took the player and dropped him into an arbitrarily chosen team, WAR tells you how many more games you could expect that team to win than if you'd dropped a AAAA guy into that team. It's not how many games his actual team will win, but it's still connected to events at the game level.

I'll also disagree with PF in #24. WAR does reasonably well as a proxy for value and shouldn't be adjusted based on a team's actual record. If we're measuring an INDIVIDUAL player's value (which is what we're doing for MVP debates, for example), it shouldn't matter how the player's team performed. Because the individual isn't the rest of his team. So whatever we use to measure value should be divorced from actual wins and losses. WAR doesn't measure value because a 4 WAR season is more than twice as valuable as a 2 WAR season (it's getting to the post season that's really valuable, and you need above-average players to get there). But what WAR does right, if we're to measure an individual player's value, is to abstract away from the particular facts about the situations in which the player found himself (what his teammates do, the context in which he comes up to bat, and so on). Otherwise we're measuring what other people are doing, not what our player is doing.
   26. PreservedFish Posted: November 19, 2017 at 11:50 PM (#5578644)
You're mostly disagreeing with James, not with me. I haven't decided if I agree with him or not. But I think it's an interesting argument.

If we're measuring an INDIVIDUAL player's value (which is what we're doing for MVP debates, for example), it shouldn't matter how the player's team performed. Because the individual isn't the rest of his team.


First of all: no, not really, because most people would agree that value is somewhat context dependent. Kirk Gibson's famous homerun was more valuable than Bartolo Colon's, but you can't get there without "measuring what other people are doing." If you think that value should scrupulously ignore all context, you're unusual.

But back to this subject, and let's take it for granted that value is context-dependent. Here James would say that you're making a different error, which is to assume that it was the other 24 guys that entirely caused the failure vs Pythagoras. James merely assumes that Judge shares the blame equally. And in this case, it looks like he was right, or if anything understating it, because Judge was comparatively awful in tight games.
   27. Sunday silence Posted: November 20, 2017 at 12:15 AM (#5578647)

In contrast, my Player won-lost records are calculated at the game level, so player wins (pWins) are associated with specific team wins. [I also calculate expected wins (eWins) which are divorced from context.



I understand what you are trying to do but when I go to your website I am only confused. I see that r
Aaron judge this year was about 7.5 games to the good using the expected parameter and only 5.5 using an actual game related context or however it works.

Ok that sounds maybe correct. I know he hit a bunch of home runs maybe an historical amount and he probably scored 7 WAR or something but...

1 how is Ryan radmanovich the most similar comp to Aaron judge? Really?? Some guy who played 25 games for SEA 20 years ago???

2 so I did a comparison of judge to altuve I hit the compare button and that 7.5 wins thing disappears fior judge. And so does the 5.5 wins as well. What's left is something telling me judge won 2.3 games and lost 2.4 games this year... Oh and altuve is 6 games below .500 for his career and every year he's been negative.

So these players really suck I guess.

Is it any wonder people throw up there hands when they these sorts of systems? If this is the best you can do I'll go back to WAR thank you at least I understand that
   28. Shock Posted: November 20, 2017 at 01:09 AM (#5578654)
We have an award for the team that wins the most actual games. It is called the World Series. Its champion is based largely on chance but we accept this because it is fun.

The whole point of these metrics is to measure individual performance irrespective of team wins. That it is not connected to team wins is a feature, not a bug.

If you want to make the MVP trophy based in part on team wins you are free to do so. As far as I know, these trophies are not automatically handed to the WAR leaders and team performance is still a major part of the voting. This is not a strong argument for changing the metric.

BILL JAMES does not like WAR because he did not invent it. It really is as simple and transparently puerile as that.
   29. Hank G. Posted: November 20, 2017 at 03:23 AM (#5578659)
Bill James said the whole idea behind Win Shares was NOT to do this. Others can better say how well he succeeded, but for example Steve Carlton earns a whole pile of Win Shares on those shitty Phillies teams.


Carlton accumulated a lot of WAR while playing on those teams too. In fact, no one in MLB since 1972 has had a higher WAR total than the 12.5 Carlton compiled that year.
   30. Captain Supporter Posted: November 20, 2017 at 08:59 AM (#5578680)
BILL JAMES does not like WAR because he did not invent it. It really is as simple and transparently puerile as that.


Bill James does not like WAR because he sees the weaknesses in it that people like you apparently don't. And ad hominem attacks don't constitute a valid argument, particularly against a guy whose track record should serve to give him the benefit of the doubt
   31. Blanks for Nothing, Larvell Posted: November 20, 2017 at 09:28 AM (#5578682)
The fact that a 100 Pythag-win team happened to go 10-20 in one-run games tells us exactly nothing about their performance or ability.


??

It tells us a boatload about their performance.
   32. Blanks for Nothing, Larvell Posted: November 20, 2017 at 09:30 AM (#5578683)
The goal of the game is to get wins, but players can't generate wins individually; they can only generate runs (or pieces of runs, really).


Then it's dumb to call it "wins" over replacement.
   33. dlf Posted: November 20, 2017 at 09:30 AM (#5578684)
James' follow up article - I believe it is not behind the paywall.
   34. Blanks for Nothing, Larvell Posted: November 20, 2017 at 09:31 AM (#5578686)
But back to this subject, and let's take it for granted that value is context-dependent.


It is -- but the main context it's dependent on is actual team wins.
   35. Blanks for Nothing, Larvell Posted: November 20, 2017 at 09:43 AM (#5578690)
Posnanski:

And even if we believe that the fight is over, even we believe that those extra wins are chance — how can we not include chance in our stats? Look, in the end EVERYTHING IN SPORTS AND LIFE has some chance involved.


I still submit that the desire to pretend that away stems ultimately from a mix of psychological and ideological desires and imperatives, having nothing to do with baseball.
   36. snapper (history's 42nd greatest monster) Posted: November 20, 2017 at 09:55 AM (#5578694)

It tells us a boatload about their performance.


Not as individual players. WAR is an individual stat.

A team's record in one-run games has been shown to be almost exclusively the product of luck. It tells us nothing about the performance of the pitchers that they allowed 3 runs on a day the hitters only scored two.

Then it's dumb to call it "wins" over replacement.

Both stats exist. RAR is converted to wins by dividing by the number of runs needed to produce a win, on average in a given season.

It is -- but the main context it's dependent on is actual team wins.

This is no more valid a criticism of WAR than it is a criticism of ERA or BA, or RBI.

Individual stats in baseball have never been dependent on team performance, except for the win and the save, both of which are largely derided in sabermetric circles.
   37. Rally Posted: November 20, 2017 at 09:58 AM (#5578695)
BILL JAMES does not like WAR because he did not invent it. It really is as simple and transparently puerile as that.


If Bill James wanted to claim he did invent it, I would not argue at all. His Rain Delay article in the 1988 Abstract is my inspiration for WAR.
   38. Blanks for Nothing, Larvell Posted: November 20, 2017 at 10:04 AM (#5578696)
It tells us a boatload about their performance.

Not as individual players.


It absolutely does. It's just hard to measure, so WAR devotees just drop the mic and say it's all "chance" or it's "random," yada yada.

The Yankees went 10-20 in one run games. That record had to have resulted from individual performances. There's no other way it could have happened.

Put more broadly, the same thing is true in the difference between the Yankees' and Astros' actual wins, notwithstanding their virtually identical pythag wins. The difference in their actual wins resulted from their players' individual performances. There's no other way it could have happened.
   39. Blanks for Nothing, Larvell Posted: November 20, 2017 at 10:06 AM (#5578697)
Individual stats in baseball have never been dependent on team performance,


Right, but those stats are routinely adjusted for context. Park, era, etc. Why they then wouldn't be adjusted for the most important context of all -- the wins they actually generated -- is ... quite odd. James makes the same point.
   40. snapper (history's 42nd greatest monster) Posted: November 20, 2017 at 10:10 AM (#5578700)
Why they then wouldn't be adjusted for the most important context of all -- the wins they actually generated -- is ... quite odd. James makes the same point.

Read Dave Cameron's piece today at Fangraphs. It thoroughly demolishes James' argument.
   41. BDC Posted: November 20, 2017 at 10:12 AM (#5578701)
Might be as good a time as any to remark that most baseball stats outside of team runs scored and allowed, and team wins and losses, are abstractions. BA and RBI, which seem as straightforward as you can get, are dependent on conventions (certain outs, like sacrifices, can result in RBI but aren't counted as AB, errors complicate scoring and involve judgments calls, etc.)

Even an individual run scored, which is pretty darn basic … Gardner singles, Frazier grounds weakly to second base, forcing Gardner at second, but he's safe at first. Judge hits a home run, Frazier gets a run scored despite doing nothing but hurt the Yankees in the inning. You've just got to roll with the abstractions. WAR may be a second- or third-order abstraction, but it's not like the stats it's based on are absolute stone-cold Dingen an sich :)
   42. Fancy Crazy Town Banana Pants Handle Posted: November 20, 2017 at 10:16 AM (#5578703)
It tells you nothing about their ability. It certainly tells you something about their performance: that they sucked in close games. This is just the old value/ability debate. This was in context of the MVP vote and comes from people using WAR as a measure of value - this is an adjustment that perhaps makes it track value even better.

But how are you distributing that? It makes no sense to ding the starting pitchers for the fact that their closer sucked. They weren't the ones causing the extra loses. It makes no sense to ding the shortstop because the first basemen hit .200/.250/.300 with runners in scoring position. And so on.

And even if you want to distribute that equally among all players. 10 wins divided among 8-9 starters, 5 starting pitchers, an 8 man bullpen, plus all the reserves... Even at a 10 win difference, you are looking at dinging a starter at most half a win. And that is probably stretching it. That is well within the margin of error for WAR. It simply isn't even worth the hassle, even if I philosophically agreed with it (which I don't).
   43. Blanks for Nothing, Larvell Posted: November 20, 2017 at 10:19 AM (#5578704)
Read Dave Cameron's piece today at Fangraphs. It thoroughly demolishes James' argument.


It does no such thing.

WAR, on the other hand, attempts to address a question that a lot of people seem interested in answering. If the WAR leaderboards were posed as a question, they might be written as something like this:

“What did each player do, as an individual, to help his team try to win games?”


That isn't the right question. The right question is "What did each player contribute to the games his team won?" (*)

And when it comes down to assigning value to individuals for the events in which they’re involved, the general consensus in the sabermetric community has been that we want to reward (or penalize) hitters for what they can control. And the context of the situations in which they play is just not something players can create.

That premise is where the argument falls apart. It doesn't matter that the players aren't "responsible" for the contexts in which they hit. They contribute to team wins by, and only by, functioning (**) in the contexts they're presented. There's no other way they do.

(*) And even that kind of gives Cameron's argument too much credit. Even on Cameron's own terms, Jose Altuve "helped" the Astros "try to" win games by hitting really, really well in high-leverage situations. Aaron Judge did not "help" the Yankees win games by hitting so poorly in those situations.

(**) Or not functioning.
   44. Fancy Crazy Town Banana Pants Handle Posted: November 20, 2017 at 10:20 AM (#5578705)
but it's not like the stats it's based on are absolute stone-cold Dingen an sich

'Dinge' not 'Dingen' in the Kasus you are using it there.
   45. Kiko Sakata Posted: November 20, 2017 at 10:32 AM (#5578711)
I understand what you are trying to do but when I go to your website I am only confused. I see that r
Aaron judge this year was about 7.5 games to the good using the expected parameter and only 5.5 using an actual game related context or however it works.

Ok that sounds maybe correct. I know he hit a bunch of home runs maybe an historical amount and he probably scored 7 WAR or something but...

1 how is Ryan radmanovich the most similar comp to Aaron judge? Really?? Some guy who played 25 games for SEA 20 years ago???

2 so I did a comparison of judge to altuve I hit the compare button and that 7.5 wins thing disappears fior judge. And so does the 5.5 wins as well. What's left is something telling me judge won 2.3 games and lost 2.4 games this year... Oh and altuve is 6 games below .500 for his career and every year he's been negative.

So these players really suck I guess.

Is it any wonder people throw up there hands when they these sorts of systems? If this is the best you can do I'll go back to WAR thank you at least I understand that


SS in #27

Thanks for asking the questions.

As I said in comment #10, my source is Retrosheet which hasn't released their official 2017 numbers. That said, I got a preview of their 2017 and did a preliminary update of 2017 last night, which shows up on the player pages. But I didn't update some of the more detailed pages, which is definitely screwing up the PlayerComps page and is probably screwing up the Similars page. I'll try to update the relevant databases today (both of which surprise me - I thought I updated the underlying data for both of those tables).

That said, on the Most Similar players, the default is to compare entire careers. So, for a guy with only one season, it's going to end up comping to guys with very short careers. Although I think the Radmanovich comp is based on 2016 alone. Once I fix it so that Judge's 2017 is used for the analysis, you can modify it by specifying an age range to draw a better set of Similars.

As for Altuve being below average over his career, this is the effect of the context in which he played. Remember, Altuve was around for all of the Astros' horrific rebuild (horrific as in the team lost 108, 109, and 111 games three seasons, if I remember correctly). The default for Player comps is pWins - which tie to team wins - and the 2011-13 Houston Astros just didn't have a lot of wins to go around - which was mostly not Jose Altuve's fault. On the 'Player Comps' page, in the line 'Comparison Statistic', type "eWins" (or "ew") in the box, and click 'Go'. Now, the comp uses eWins, in which Jose Altuve looks much better for most of his career.

I do have an article on my site that tries to give an overview of the website here. And an article that tries to explain my calculations here.

Sorry about the partial update. It turned out to be more partial than I had hoped. Give me 20 minutes and go back and check things again.
   46. shoewizard Posted: November 20, 2017 at 10:39 AM (#5578716)
   47. shoewizard Posted: November 20, 2017 at 10:51 AM (#5578725)
From Mike in post #5


33 Votto
29 Stanton
29 GoldSchmidt


Based on Bill's articles, I can understand why Goldschmidt would close the gap on Stanton.

Interesting though that Votto, who played on a 68 win team that was 2 wins below it's Pythag maintains a large lead over Goldschmidt.

Maybe this would have been a more interesting debate if Bill gave the same amount of scrutiny to the NL MVP situation. Because the explanation does not seem to apply as well to the this set of players.

(I'm not being a homer or trying to suggest Goldy should have been the MVP, I didn't think he should be, I'm just trying to understand what on the surface looks like a contradiction)
   48. Kiko Sakata Posted: November 20, 2017 at 10:56 AM (#5578728)
Okay, I've updated enough so that PlayerComps and Similars should work. Sorry about that.

Here's the players most similar in value to Aaron Judge through age 25.

Here's a comparison of the careers of Jose Altuve and Aaron Judge based on expected wins.
   49. shoewizard Posted: November 20, 2017 at 11:07 AM (#5578731)
Oh well....just realized Bill blocked me on twitter. ( I wanted to ask him directly my question in #47) I didn't even say anything disrespectful to him, (in fact it was the other way around).

I was simply debating several points with another poster last night and he popped in with a terse one liner to say nothing I said made sense. But didn't say why.

Too bad. This entire episode, and the way he is conducting himself has made me lose a lot of respect for him.

He seems way too agenda driven. At the end of the day his proposal is to add 1 win share for players from playoff teams and choose Win Shares over WAR for evaluating MVP candidates ? Whatever. Anyone with any sense never used exclusively either metric for evaluating these things. They are data points. Certainly not the only ones that should be considered.




   50. Rally Posted: November 20, 2017 at 11:11 AM (#5578732)
Okay, I've updated enough so that PlayerComps and Similars should work. Sorry about that.


When I saw the question I assumed it had not been updated and was comparing Judge's 2016 September callup, with a 50% strikeout rate, to Radmanovich. Does your system look at MLB only or consider minor league stats? Because Radmanovich looks like he was a similar minor league slugger to what Judge was, Lots of strikeouts, some walks, and good power but nothing exceptional.
   51. Jay Z Posted: November 20, 2017 at 11:17 AM (#5578735)
Read Dave Cameron's piece today at Fangraphs. It thoroughly demolishes James' argument.


No, it doesn't. Cameron admits WAR is a poor tool for a MVP vote because it doesn't take context into account.
   52. Kiko Sakata Posted: November 20, 2017 at 11:20 AM (#5578737)
When I saw the question I assumed it had not been updated and was comparing Judge's 2016 September callup, with a 50% strikeout rate, to Radmanovich. Does your system look at MLB only or consider minor league stats? Because Radmanovich looks like he was a similar minor league slugger to what Judge was, Lots of strikeouts, some walks, and good power but nothing exceptional.


MLB only. It also doesn't break things down by component - it could and perhaps should, but I haven't done that yet. It looks at overall batting, baserunning, pitching, and fielding - where the latter is measured against replacement level which ideally sorts players by position - although not always perfectly (nor should it necessarily, but the idea is to be comparable to an average shortstop, you have to be an above-average third baseman).

Anyway, that's why in #48 I say most similar "in value" to Judge. His top sim through age 25 ends up being Wade Boggs who was, of course, nothing like Aaron Judge, except for the fact that both of them produced a ton of batting value at age 25 (1983 for Boggs) (and had relatively late starts - most guys who hit like Aaron Judge at age 25 were probably major-league regulars for several years beforehand, which would make them less similar to Judge).
   53. PreservedFish Posted: November 20, 2017 at 11:37 AM (#5578749)
Read Dave Cameron's piece today at Fangraphs. It thoroughly demolishes James' argument.


I will be the third to object to this comment. Cameron's piece is very even-handed, and it acknowledges that one could adjust WAR in order to make it better as a MVP vote criterion.

In this entire conversation I find absolutely nothing novel - we have been having this value/ability debate for over a decade! - with one exception: James' shorthand adjustment for WAR based on the gap between wins and pythag. I'm not sure it's a good rule of thumb to follow, but in the case of Aaron Judge and his clutch failures, it appears to work quite well.

Cameron's objection to that new adjustment is not entirely compelling. He basically uses a slippery slope argument ... if we adjust for this, why aren't we adjusting for this? Or that? Or the other? My response to that is, don't let perfect be the enemy of good.
   54. shoewizard Posted: November 20, 2017 at 11:42 AM (#5578755)
Kiko

Can you offer your thoughts regarding how your system breaks down comparisons between the following players. Votto seems too low here but I don't understand why.

Arenado       WOPA 2.9 WOPL 4.4
Stanton       WOPA 2.2 WOPL 3.9
Blackmon      WOPA 1.6 WOPL 3.2
Goldschmidt   WOPA 1.6 WOPL 2.9
Votto         WOPA 0.8 WOPL 2.0 


Thanks
   55. Kiko Sakata Posted: November 20, 2017 at 12:05 PM (#5578766)
Votto seems too low here but I don't understand why.


Joey Votto is low in the numbers you quote above for two reasons.

(1) The Reds were terrible and so Joey Votto spent a lot of time getting on base only to be left there or hitting home runs so that the Reds lost 5-2 instead of 5-1.

And I completely get that such a result is unfair to Joey Votto. Which is why I also calculate eWins. By eWins, Votto beats Goldschmidt in eWOPA 1.7 - 1.4 and in eWORL 3.0 - 2.6.

But even those numbers put Votto farther down the leaderboard than I think one would expect, because of reason (2).

(2) Votto rates relatively poorly (but still top-10 in the NL), even when controlling for context, because of how I calculate positional averages. Positional averages are calculated empirically for the specific season in question. And the 2017 National League, in particular, was loaded at first base.

Here's the top 10 and bottom 10 first basemen (relative to positional average, context-neutral, so Votto's not getting dinged here for being a Red) in MLB in 2017. In the top 10, you have Votto, Goldschmidt, Jose Abreu, Cody Bellinger, Freddie Freeman, Anthony Rizzo, Carlos Santana, etc. In contrast, very few teams played outright BAD first basemen this year - Tommy Joseph of the Phillies was the worst.

Here are positional averages by position by year (note: this table is very wide and probably looks like crap on a phone; sorry). The positional average for first base in the NL is .526 which is the highest it's been since 2009-10 and eyeballing it is perhaps top-10 all-time.

For pWins, I think calculating the value every year is correct - as great as Votto was, the Reds' edge at first base wasn't HUGE when compared to the D-Backs or Dodgers or Cubs. But one thought I've had was to try to normalize positional averages when calculating eWOPA based on historical norms - i.e., allow first basemen to be above average as a group in 2017, which would boost Votto (and Goldschmidt).

As a minor third point, I have Votto rated as one of the worst baserunners in the National League this past season.
   56. Ziggy: The Platonic Form of Russell Branyan Posted: November 20, 2017 at 12:23 PM (#5578777)
we have been having this value/ability debate for over a decade!


Ability really doesn't have much to do with it. To measure ability what you need is a projection system, and WAR isn't one of those. If the choice is between measuring ability or measuring value, then WAR is a value stat.* (Although, as I mentioned above, we need to take into consideration the fact that a 2x WAR season is more valuable than 2 seasons of x WAR.)

*And of course it isn't really, what WAR measures is wins above replacement. It's just that that's a better proxy for value than it is for ability.
   57. shoewizard Posted: November 20, 2017 at 12:35 PM (#5578783)
Thanks for explanations Kiko.

My question now becomes, be using your system, which of the metrics you created would MOST influence YOUR vote for NL MVP, and who would you choose among the position players.



   58. Kiko Sakata Posted: November 20, 2017 at 12:42 PM (#5578788)
My question now becomes, be using your system, which of the metrics you created would MOST influence YOUR vote for NL MVP, and who would you choose among the position players.


I do think that pWins are more appropriate than eWins in MVP voting, but not to the exclusion of eWins entirely. I also have to confess that I think I may not be adjusting enough for Colorado's run-scoring environment - there's a maximum adjustment I allow for a specific ballpark to differ from league averages, mainly to avoid the possibility of negative numbers, because of the way I do ballpark adjustments, and I think it may not be large enough for extreme ballparks.

Anyway, per my numbers, Max Scherzer would get my vote for NL MVP. But you specifically asked about position players and fair enough. I'd probably lean Arenado with Stanton second, but would want to drill down a bit more to convince myself that I'm okay with Arenado's (and Charlie Blackmon's) numbers.
   59. PreservedFish Posted: November 20, 2017 at 12:43 PM (#5578790)

Ability really doesn't have much to do with it. To measure ability what you need is a projection system, and WAR isn't one of those. If the choice is between measuring ability or measuring value, then WAR is a value stat.


WAR is a compromise. All of these little adjustments that are getting chewed over here push it closer to one ideal or to the other.
   60. Kiko Sakata Posted: November 20, 2017 at 12:44 PM (#5578792)
Incidentally, at the risk of shameless self-promotion, I wrote an article about how Player won-lost records address the issue being raised by Bill James.
   61. shoewizard Posted: November 20, 2017 at 12:55 PM (#5578804)
Thanks again Kiko, will read the article soon. ;)

   62. fra paolo Posted: November 20, 2017 at 01:05 PM (#5578815)
I still submit that the desire to pretend [chance] away stems ultimately from a mix of psychological and ideological desires and imperatives, having nothing to do with baseball.

While I can quibble with this statement, SBB has got to the central question in this issue. Which brings me to being yet another one finding Cameron's rebuttal a bit of a failure.

The history of sabermetrics, of which I have done a partial study, demonstrates that what Don Malcolm characterised as 'neo-sabermetrics' made a decisive move towards focusing on 'true talent' rather than 'player contributions'.

It's important to see James' turn away from the Abstracts in 1988 in this context, because 'neo-sabermetrics' began to take shape only after that. Before I knew this kerfuffle had erupted, I was looking yesterday at James' 1991 Baseball Book and in the introudction he wrote: 'I stopped writing the Abstract because I was concerned about information pollution.'

It is pretty clear from the context that he had in mind data like 'what a player hit with two out and a runner on third base', wondering how knowing that mattered. (This is a close paraphrase of what he wrote.)

It's important to recognise that he is not dismissing that data, just wondering how it fit into a wider picture. I don't think Dave Cameron is doing any wondering at all, when he writes:
And when it comes down to assigning value to individuals for the events in which they’re involved, the general consensus in the sabermetric community has been that we want to reward (or penalize) hitters for what they can control. And the context of the situations in which they play is just not something players can create.

And then after writing many more words comes to:
Once you adjust for the full context of a player’s input into wins and losses, you’re left with a version of WAR that is so far removed from his own contributions that I don’t know what question it would answer.

I find Cameron here frames the problem incorrectly, because he is still focused on the individual player. The fundamental FanGraphs' question is 'What is this player worth?' And in that way we get to an ideological pillar of WAR, and of 'neo-sabermetrics', which as much as 'palæo-sabermetrics' grew out of a measurement mindset that took hold in society with the advent of relatively cheap computational power way back in the mainframe era.

Meanwhile, James' initially framed his questions within the context of what 'experts' said about baseball. And he liked to talk about what did the statistics tell us about winning (or losing) games. And that meant context was something to be observed, not stripped away. 'Jim Rice wasn't as valuable to his team as his statistics might suggest because of Fenway' does not answer the question 'What is Jim Rice worth?'. It is an answer to a question that comes out of a different ideological basis, one without a dollar sign attached.

James has always challenged the 'experts', and I think he is doing it again here. Psychologically it is just part of who he is. It's just now the 'experts' are 'Our People'.
   63. shoewizard Posted: November 20, 2017 at 01:12 PM (#5578819)
Does anyone want to take on the question in post 47 ?
   64. Rob_Wood Posted: November 20, 2017 at 01:17 PM (#5578822)
Sorry, I am very late to this discussion.

Can't we distill this debate down to the game level (maybe this is what Kiko has done)?

Is one side of the debate essentially arguing that a batter who hits 4 two-run home runs in a game his team loses 9-8 has provided no "value" to his team winning (since, of course, they lost the game)? And the other side is saying that players can and do provide value even in games that ultimately are team losses?

Many years ago I introduced (I doubt I was the first) a stat that took ex-post conditional probabilities into account to estimate the "average" value a player's performance in a game contributed to his team winning a game. So a hitter who hits 4 two-run home runs in a game would likely receive a tremendous amount of value since in most games that performance would have contributed mightily to his team winning the game.

I realize that the concept of ex-post conditional probabilities are problematic to many, but it seems to me to be a suitable ex-post value measure.

   65. snapper (history's 42nd greatest monster) Posted: November 20, 2017 at 01:19 PM (#5578824)
I will be the third to object to this comment. Cameron's piece is very even-handed, and it acknowledges that one could adjust WAR in order to make it better as a MVP vote criterion.

James makes a blanket criticism of WAR, he doesn't simply point out that it's not perfect for assessing an MVP vote. See:

“But because that is true, I ASSUMED that these were complex, nuanced, sophisticated systems. I never really looked; I just assumed that the details were out of my depth. But sometime in the last year I was doing some research that relied on these WAR systems, so I took a look at them, and … they’re not very impressive. They’re not well thought through; they haven’t made a convincing effort to address many of the inherent difficulties that the undertaking presents. They tend to get so far into the data, throw up their arms and make a wild guess. I don’t know if I’m going to get the time to do better of it, or if it will be left to others, but … we’re not at anything like an end point here. I assumed that these systems were a lot better than they actually are.”


That's what Cameron demolishes. WAR is well thought out, it is intended to be context neutral, and it is not based on guesses.

is it perfect? Of course not. But James is absurd in acting like it's broken.
   66. Kiko Sakata Posted: November 20, 2017 at 01:32 PM (#5578832)
Can't we distill this debate down to the game level (maybe this is what Kiko has done)?


Yes, my stat is calculated at the game level. It applies win probability play by play, then goes back at the end of the game and adjusts the totals so that every game is worth the same number of player decisions.
   67. dlf Posted: November 20, 2017 at 01:33 PM (#5578833)
#65 - I wish that Poz had put a little more context around that James quote. It is not part of his recent articles on the MVP vote. Instead, it was written several years ago discussing not the larger framework of WAR, but specifics with how WAR dealt with apportioning credit for pitching and fielding in, if I remember correctly, a discussion of the use of wins and saves in either WinShares or Season Scores. I'm trying to find the original, but using it as a pull quote makes it more hostile than in context.
   68. PreservedFish Posted: November 20, 2017 at 01:34 PM (#5578834)
Yes snapper, you are right about that. I'm taking James' argument and using it as a tool that can be selectively applied to improve WAR depending on its intended use. But James doesn't put his idea forward in those terms. His blanket criticism of WAR is off-base.
   69. Kiko Sakata Posted: November 20, 2017 at 01:37 PM (#5578836)
WAR is well thought out, it is intended to be context neutral, and it is not based on guesses.


The problem with WAR, which I think James hints at, but doesn't come right out and say (Posnanski is more direct in the article linked here) is that it's derived from runs, not derived from wins. The conversion from runs to wins is something of an afterthought (that word probably isn't fair to the various WAR builders; I have a huge amount of respect for the work that Sean Smith and Sean Forman have done on WAR, as well as the guys at Fangraphs and Prospectus, whose names I'm less sure of). And it turns out that the conversion rate from runs to wins isn't the same across events.
   70. Baldrick Posted: November 20, 2017 at 01:51 PM (#5578847)
The problem with WAR, which I think James hints at, but doesn't come right out and say (Posnanski is more direct in the article linked here) is that it's derived from runs, not derived from wins. The conversion from runs to wins is something of an afterthought (that word probably isn't fair to the various WAR builders; I have a huge amount of respect for the work that Sean Smith and Sean Forman have done on WAR, as well as the guys at Fangraphs and Prospectus, whose names I'm less sure of). And it turns out that the conversion rate from runs to wins isn't the same across events.

This is a complaint about branding.

Call it 'RAR' and don't translate runs into wins and the complaint evaporates with zero information being lost. 'Wins above replacement' is easier to grasp, and produces a number that's easier to get your hands around. It's not integral to any part of the analytic work.

Edit: That was perhaps more abrupt than necessary. Obviously, you have raised interesting questions about different ways of assessing values, and have identified a way in which WAR simply doesn't help answer certain types of questions. And in that sense, it's a definite issue. I think 'problem' is an overbid, but 'limitation' seems fair.
   71. Jay Z Posted: November 20, 2017 at 01:54 PM (#5578851)
62: Players cannot control the situations they are placed into. They can control their results in those situations.

I don't think anyone wants to go back to the days where a slugger was praised for his high RBI count just because he had a zillion runners on base. But an opposite side has developed that wants to always ignore context. "The player was 0 for 100 with runners on third, but we should ignore it because sample size, or I just don't believe that it matters or could happen." There is definitely a part of the community that just wants to plug run components into formulas and disregard how they actually interacted together on the field.

Baseball is still a team support. Suppose a player has some ability to adjust his performance within context. Take more walks when it's needed, swing for the fences when it's needed. Any such ability is going to be thrown away by those who just want to grind up all of the components into WAR sausage.
   72. Rob_Wood Posted: November 20, 2017 at 01:57 PM (#5578854)
The problem with WAR, which I think James hints at, but doesn't come right out and say (Posnanski is more direct in the article linked here) is that it's derived from runs, not derived from wins. The conversion from runs to wins is something of an afterthought (that word probably isn't fair to the various WAR builders; I have a huge amount of respect for the work that Sean Smith and Sean Forman have done on WAR, as well as the guys at Fangraphs and Prospectus, whose names I'm less sure of). And it turns out that the conversion rate from runs to wins isn't the same across events.


Well, and I am not saying that you are saying this, but after-the-fact adjusting the runs-based seasonal WAR figures for all players on a team to "match" the team's actual wins total does not seem to be the appropriate remedy.
   73. 6 - 4 - 3 Posted: November 20, 2017 at 01:59 PM (#5578857)
The problem with WAR, which I think James hints at, but doesn't come right out and say (Posnanski is more direct in the article linked here) is that it's derived from runs, not derived from wins. The conversion from runs to wins is something of an afterthought (that word probably isn't fair to the various WAR builders; I have a huge amount of respect for the work that Sean Smith and Sean Forman have done on WAR, as well as the guys at Fangraphs and Prospectus, whose names I'm less sure of). And it turns out that the conversion rate from runs to wins isn't the same across events.

But you can model a player's contribution to a run scored or prevented with greater precision than you can his contribution to a win in a way that's more generalizable and comparable to other players. That was basically the heart of David Cameron's response.

In my mind, it's a basic problem of "I am most interested in Z (wins), but can better estimate Y (runs) that is a function of measure X (production) and correlated with Z. So I'll estimate effect of X and Y and then assume a relationship to Z." Sure it's an imperfect solution, but it's not an unusual empirical strategy when modeling complex phenomena.

Note: I haven't had a chance to read your book yet (and probably won't until I have some time off around Christmas), so very interested in better understanding your approach.
   74. Kiko Sakata Posted: November 20, 2017 at 02:02 PM (#5578859)
This is a complaint about branding.

Call it 'RAR' and don't translate runs into wins and the complaint evaporates with zero information being lost. 'Wins above replacement' is easier to grasp, and produces a number that's easier to get your hands around. It's not integral to any part of the analytic work.


I think the argument is more than mere semantics. When I was putting together Player won-lost records, I started with actual wins and worked backwards from there to get to expected wins. And the results take you to a different place than you get to starting from runs and converting them to wins.
   75. Rally Posted: November 20, 2017 at 02:02 PM (#5578860)
It's not linked to actual team wins, but the runs to wins conversion is necessary since being +75 runs on offense in 2000 is not as valuable as being +75 runs in 1968.

#65 - I wish that Poz had put a little more context around that James quote. It is not part of his recent articles on the MVP vote. Instead, it was written several years ago discussing not the larger framework of WAR, but specifics with how WAR dealt with apportioning credit for pitching and fielding in, if I remember correctly, a discussion of the use of wins and saves in either WinShares or Season Scores. I'm trying to find the original, but using it as a pull quote makes it more hostile than in context.


I thought the quote might have been from his writing about Porcello winning the Cy over Verlander last year, but it does go back a bit further. But Tango posted a link on his blog and it looks like it goes back to 2014.
   76. Kiko Sakata Posted: November 20, 2017 at 02:05 PM (#5578864)
Well, and I am not saying that you are saying this, but after-the-fact adjusting the runs-based seasonal WAR figures for all players on a team to "match" the team's actual wins total does not seem to be the appropriate remedy.


Absolutely. This was the fatal flaw of Win Shares - which, in fact, was what gave me the idea to do Player won-lost records. That if you want to tie to actual team wins, you need to look at the actual team wins to see who contributed to them, rather than adjusting seasonal numbers. And, for that matter, Win Shares are actually run-based as well, so I would certainly not go so far as to say that Win Shares are better than WAR. On the whole, they're probably not.
   77. Rob_Wood Posted: November 20, 2017 at 02:14 PM (#5578867)
Baseball is still a team support. Suppose a player has some ability to adjust his performance within context. Take more walks when it's needed, swing for the fences when it's needed. Any such ability is going to be thrown away by those who just want to grind up all of the components into WAR sausage.


As people have pointed out above, this is not really a debate about ability/value. Context vs. non-context is the heart of the matter here. I don't even like to introduce "clutch" into this discussion since it is such a loaded word.

WAR and virtually all stats at the seasonal level are essentially non-contextual. This works fine for the most part to estimate a player's "value" and, believe it or not, this was quite an insight that Bill James and others found decades ago. Non-contextual stats fueled the sabermetric revolution way back then. (The fact that non-contextual stats are better at predicting future performance (the ability debate) is not entirely relevant to the current discussion.)

But of course if you add up the WAR figures of every player on a team you will not necessarily arrive at the team's actual win total for the season. In fact, if you back off one level and express everything in terms of runs and add up all the RAR figures of every player on a team you will not necessarily arrive at the team's actual runs scored for the season.

Of course, in the intervening years and with more data available to the masses (i.e. game level data such as play-by-play data), new contextual stats have been developed to "better" estimate player value.

It's not really a fair criticism of WAR-based stats to point out that they are non-contextual.
   78. dlf Posted: November 20, 2017 at 02:50 PM (#5578891)
I thought the quote might have been from his writing about Porcello winning the Cy over Verlander last year, but it does go back a bit further. But Tango posted a link on his blog and it looks like it goes back to 2014.


The link you provided is to the Poz article on NBCSports. I *think* that it came not from a conversation or email exchange between Joe and Bill as part of that 3 year old article, but from something Bill wrote on BJOL even earlier that had more context around it.
   79. GuyM Posted: November 20, 2017 at 03:04 PM (#5578904)
WAR and virtually all stats at the seasonal level are essentially non-contextual. This works fine for the most part to estimate a player's "value" and, believe it or not, this was quite an insight that Bill James and others found decades ago. Non-contextual stats fueled the sabermetric revolution way back then.

Yes, it's strange to watch the former leader of the revolution make the case for Thermidor. In fact, though, Bill has been doing this for some time, at least as far back as Underestimating the Fog in 2004. He clearly feels that those who followed in his footsteps have often gone too far. (And he has criticized his own past work as well.) One gets the sense that his years working for the Sox have a lot to do with his change in direction, but I assume he's not really at liberty to write about that, at least not with any specificity.
   80. Blanks for Nothing, Larvell Posted: November 20, 2017 at 03:09 PM (#5578908)
But James is absurd in acting like it's broken.


Except it is -- because it's not tied to team wins.

It's certainly not useless, or even close, but it is broken.

Someone in this debate, I can't remember who maybe it was James, talked about gravity. Let's say someone weighs 200 pounds on Earth and you want to measure the contribution of each of your major body parts to that weight. Why would you dispense with the known 200 pounds data point, just because it's context-dependent and would be different on the moon or Jupiter?
   81. snapper (history's 42nd greatest monster) Posted: November 20, 2017 at 03:16 PM (#5578914)
Except it is -- because it's not tied to team wins.

It is tied to teams wins, but it's expected team wins.

The fact that you starting pitcher give up 7 runs on a day you hit 2 three-run HRs doesn't reduce the value of those home runs, especially if you hit them before he implodes.

e.g. player X hits a HR in the 1st and 4th inning, putting his team up 6-0. He has tremendously increased his teams probabality of winning. If the SP gives up a 7-spot in the 5th and undoes it, that doesn't mean your HRs weren't valuable.

Even in the other direction, if the starter gives up 7 runs in the first, and then you hit the 2 HRs, bringing the game back to7-6 in the 4th, you've again, massively raised your team's chances of winning.

Tying value to actual wins produces bizarre results. No performance, not matter how good, produces value in a loss. That's silly.
   82. Shock Posted: November 20, 2017 at 03:17 PM (#5578915)

And ad hominem attacks don't constitute a valid argument,


Except that wasn't my argument. I made my argument above, and you ignored it. The remark you quoted was not my argument, but it was nevertheless the truth.
   83. shoewizard Posted: November 20, 2017 at 03:26 PM (#5578922)
It is tied to teams wins, but it's expected team wins.

The fact that you starting pitcher give up 7 runs on a day you hit 2 three-run HRs doesn't reduce the value of those home runs, especially if you hit them before he implodes.

e.g. player X hits a HR in the 1st and 4th inning, putting his team up 6-0. He has tremendously increased his teams probabality of winning. If the SP gives up a 7-spot in the 5th and undoes it, that doesn't mean your HRs weren't valuable.

Even in the other direction, if the starter gives up 7 runs in the first, and then you hit the 2 HRs, bringing the game back to7-6 in the 4th, you've again, massively raised your team's chances of winning.

Tying value to actual wins produces bizarre results. No performance, not matter how good, produces value in a loss. That's silly.


This is exactly the argument I was trying to make last night on Twitter that got me blocked by Bill.
   84. K-BAR, J-BAR (trhn) Posted: November 20, 2017 at 03:26 PM (#5578924)
To measure the thing that people who like WAR claim to want to measure requires some level of abstraction. Is a single worth the average value of a single context neutrally? Is it worth more in April against Scherzer than September vs. Edwin Jackson? Is it worth less for the punchless Padres than the powerful Cubs?

But even adding context requires abstraction. To put it in terms of WPA, the most contextual stat I can think of: Should the value of a single in April vs. Scherzer be the WPA? Or should it be WPA vs. Scherzer (i.e. the average value of a single given that game state against Max)? Or the WPA of a single vs. Scherzer at the 110 pitch mark? Fundamentally, a single is only worth first base. (And even then, who we assign that to may be uncertain...) Any other value we assign to it is going to be incomplete.

When you make an uber stat, you probably have to draw a lot of these lines. For instance, in a WAR-like stat whether to use FIP or RA for pitchers. But wherever you draw the line will cause you to miss something. For instance, Reddick is REALLY unclutch. He looks way worse by WPA than WAR. So much so that you start to wonder. Maybe a win shares-y type stat may catch that whereas a WARy type stat might not? fWAR might catch that John Lackey might be unrosterable in 2018, whereas bWAR or win shares may not. And bWAR might catch that Edwin Jackson has been unrosterable for 5 years, whereas fWAR may not.
   85. Blanks for Nothing, Larvell Posted: November 20, 2017 at 03:30 PM (#5578927)
Context vs. non-context is the heart of the matter here. I don't even like to introduce "clutch" into this discussion since it is such a loaded word.


See, that second sentence is inherently political and ideological and that's why I say what I say on that point. People who tend to be WAR devotees are repelled more than average by the very notion of "clutch" and by the very notion that some people would be better psychologically attuned to perform better in those types of situations. They derive that perspective entirely from factors exogenous to baseball.(*)

In any event, I would actually expand the observation that "Baseball is a team sport." It is, but even more so, baseball is a context sport. It's inherent in the rules.

(*) James hit on this in one of the earlier Abstracts, punctuating the point with, in substance, "Racists will insist that blacks can't hit in the clutch."
   86. Blanks for Nothing, Larvell Posted: November 20, 2017 at 03:36 PM (#5578931)
It is tied to teams wins, but it's expected team wins.


Expected team wins aren't wins.

e.g. player X hits a HR in the 1st and 4th inning, putting his team up 6-0. He has tremendously increased his teams probabality of winning. If the SP gives up a 7-spot in the 5th and undoes it, that doesn't mean your HRs weren't valuable.


They wouldn't be treated as without value by Win Shares or anything similar. As to the increase in probability, yeah, they do increase it -- just not tremendously. WPA measures the increase, and hitting an HR up 8-0 in the 8th doesn't increase the probability anything close to hitting it tied 3-3 in the bottom of the ninth.

No performance, not matter how good, produces value in a loss. That's silly.


No one's making that argument in the least, and it isn't shocking that James would block someone saying anyone was.

   87. Blanks for Nothing, Larvell Posted: November 20, 2017 at 03:40 PM (#5578932)
No one who's played any of the "hit ball effectively" sports at any level of seriousness would argue with the idea that psychology matters to performance and that psychology differs, even among players at high levels.

Baseball, golf, and tennis all fall within this observation.

There's no reason to engage in the second-order enterprise of trying to infer it from data, when it can be directly observed and directly testified to. Golfers routinely admit it (i.e., being nervous under pressure, with actual performance impact); tennis players a little less so. It's not part of the baseball culture to admit it, but so what?
   88. shoewizard Posted: November 20, 2017 at 03:51 PM (#5578936)
See, that second sentence is inherently political and ideological and that's why I say what I say on that point. People who tend to be WAR devotees are repelled more than average by the very notion of "clutch" and by the very notion that some people would be better psychologically attuned to perform better in those types of situations. They derive that perspective entirely from factors exogenous to baseball.(*)


I've never been one to deny that clutch ability exists. I've just always believed that like other abilities, it's not static.

Just like a hitter's ability to time the pitcher or the pitcher's ability to control his release point can ebb and flow year to year, game to game, inning to inning at times, "clutch" ability ebbs and flows too. It's clearly less projectable than overall results, and by the time you might have a big enough sample size to predict the player has either aged or retired. And whatever emotional or psychological attributes one believes contributes to the "clutch gene", you will certainly fail to come up with a way to project that performance with degree of accuracy, at least relative to the overall projections.

If you can't model or project clutch performance up front, then it's highly suspect to bake it into a metric after the fact.

I know this is not a WPA discussion, but WPA is an offshoot of what these efforts to add situational context are at their very core.

When I look at a WPA graph, to me it represents a nice visual of the ebb and flow of a game, and sometimes can really convey the emotional swings of what it was like to watch that game

But thats about as far as it goes for me.






   89. Rob_Wood Posted: November 20, 2017 at 03:52 PM (#5578937)
It seems to me that insisting on truing up non-contextual WAR figures to a team's actual wins is tantamount to saying that all hitting performances in a loss have no value. Using seasonal data masks this fact but the two arguments are essentially equivalent. Of course, nobody would admit to the equivalence. Which is partly why we are having this discussion (and Bill James is getting so upset).
   90. shoewizard Posted: November 20, 2017 at 03:57 PM (#5578938)
Thanks Rob. Stated more succinctly than I am capable of , obviously.
   91. snapper (history's 42nd greatest monster) Posted: November 20, 2017 at 04:01 PM (#5578939)
See, that second sentence is inherently political and ideological and that's why I say what I say on that point. People who tend to be WAR devotees are repelled more than average by the very notion of "clutch" and by the very notion that some people would be better psychologically attuned to perform better in those types of situations. They derive that perspective entirely from factors exogenous to baseball.(*)

No, we derive the disbelief in "clutch" from all the studies that have shown it's not a repeatable skill at the major league level.

There are most definitely "chokers" in the world. The issue is that they never get out of low A ball.

Within the major league player population, there is no evidence for "clutch" as a player attribute. Players deliver clutch hits, but there are no "clutch" players.

   92. GuyM Posted: November 20, 2017 at 04:07 PM (#5578943)
It seems to me that insisting on truing up non-contextual WAR figures to a team's actual wins is tantamount to saying that all hitting performances in a loss have no value.
I'm not very sympathetic to Bill's argument, but I don't think it has to lead to this conclusion. One can reasonably say about a game the Astros lost that Altuve contributed 0.2 wins, while his teammates "contributed" -0.7 wins (in WPA terms). Then the total player wins equals actual team wins, which is what Bill feels is important. After all, the game really was played, and some players did far more than others to create that loss, while ignoring all that data would be the same as assuming the game was never played. So, I think there is room to argue the sum of player values should equal actual team wins, without believing all production in losses should be ignored.

BUT, someone could certainly object that Altuve's one-fifth of a win cannot possibly be "real," or represent true "value," when the Astros actually recorded 0.0 wins that day. Indeed, most of what Bill wrote about WAR could be applied with equal force to a system like Win Shares that assigns value even in the context of losses. NONE of these systems can be said to perfectly measure "real" wins -- all are approximations. So it would be nice if James would stop describing as "errors" what are simply different judgments about how closely to connect value metrics to game outcomes.
   93. PreservedFish Posted: November 20, 2017 at 04:11 PM (#5578945)
It seems to me that insisting on truing up non-contextual WAR figures to a team's actual wins is tantamount to saying that all hitting performances in a loss have no value.


Hey, if you're going to go head first down this slippery slope, why not claim that any event that didn't directly contribute to a championship trophy had no value?
   94. K-BAR, J-BAR (trhn) Posted: November 20, 2017 at 04:13 PM (#5578946)
It seems to me that insisting on truing up non-contextual WAR figures to a team's actual wins is tantamount to saying that all hitting performances in a loss have no value.


If Mike Trout plays on a team that goes 30-132 because of pitching, wouldn't he still be worth 8-10 wins in a wins based system? Or is the problem with the 0-162 team where Trout's 70HRs would be worth zero? That sounds like a problem with system design since a Trout-led team that went 0-162 would have many folks worth negative wins on it.

For hitters, what are the inputs to such a system? Batting events and opportunities? Are those being converted to runs that are then converted to wins? And 'truing it up' means extra credit for teams that whose actual record exceeds what you'd imagine based on runs scored? If so, then the argument is about 1) whether that credit is appropriate and 2) semantics / branding.

EDIT: And I'd be of the opinion that you can design a win shares stat that rewards something beyond just the batting events. And that may be appropriate and interesting. But it would measure something different from fWAR or bWAR.
   95. Rob_Wood Posted: November 20, 2017 at 04:15 PM (#5578948)
I agree with GuyM. But that is not what James appears to be saying.
   96. GuyM Posted: November 20, 2017 at 04:27 PM (#5578953)
But that is not what James appears to be saying.
What do you think Bill is saying? To me, his main contention is that wins have to "add up" at the team level. Do you read it differently?
   97. PreservedFish Posted: November 20, 2017 at 04:31 PM (#5578957)
There are most definitely "chokers" in the world. The issue is that they never get out of low A ball.


This is a tangent, I know, feel free to ignore. But this doesn't make sense and has never made sense. Fat players make it to the majors. Stupid players make it to the majors. Slow players make it to the majors. There's no earthly reason that a choker couldn't make it to the majors.

This is particularly true when you acknowledge that different people feel pressure at different times - one can easily imagine a kid that never felt stress until he pulled on that MLB uniform and played in front of a crowd of 30,000.
   98. Blanks for Nothing, Larvell Posted: November 20, 2017 at 04:33 PM (#5578958)
There are most definitely "chokers" in the world. The issue is that they never get out of low A ball.


Nope, they absolutely make it to major league baseball, just as they make it to the PGA Tour.

And you're loading the argument with the word "choker," as if it's inherent to character and immutable. Even "choking" a few times does not make one a "choker."

There are most definitely "chokers" in the world. The issue is that they never get out of low A ball.

Within the major league player population, there is no evidence for "clutch" as a player attribute. Players deliver clutch hits, but there are no "clutch" players.


I'd disagree with that, for the reasons I stated -- but there don't have to be clutch players. All there have to be is players not as good in the clutch. And there's no question, given the reality of psychology, that there are.
   99. snapper (history's 42nd greatest monster) Posted: November 20, 2017 at 04:36 PM (#5578960)
This is a tangent, I know, feel free to ignore. But this doesn't make sense and has never made sense. Fat players make it to the majors. Stupid players make it to the majors. Slow players make it to the majors. There's no earthly reason that a choker couldn't make it to the majors.

But the data has consistently shown that there aren't. There's never been any finding of repeatability in year to year clutch and un-clutch performance.

The one exception would be the Steve Blass-disease sufferers. That could be called choking.
   100. snapper (history's 42nd greatest monster) Posted: November 20, 2017 at 04:37 PM (#5578961)
I'd disagree with that, for the reasons I stated -- but there don't have to be clutch players. All there have to be is players not as good in the clutch. And there's no question, given the reality of psychology, that there are.

But they don't show up in the data.
Page 1 of 3 pages  1 2 3 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

News

All News | Prime News

Old-School Newsstand


BBTF Partner

Support BBTF

donate

Thanks to
There are a lot of good people in alt-Shooty
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogJack Morris, Alan Trammell elected to Hall | MLB.com
(55 - 12:50am, Dec 11)
Last: The Ghost of Logan Schafer

NewsblogRyan Thibs has his HOF Ballot Tracker Up and Running!
(332 - 12:49am, Dec 11)
Last: the Hugh Jorgan returns

NewsblogOT - 2017 NFL thread
(378 - 12:49am, Dec 11)
Last: Random Transaction Generator

NewsblogAlan Trammell worthy of Cooperstown call
(44 - 12:36am, Dec 11)
Last: SoSH U at work

NewsblogOTP 04 December 2017: Baseball group accused of ‘united front’ tactics
(1727 - 12:17am, Dec 11)
Last: The Yankee Clapper

Gonfalon CubsLooking to next year
(299 - 12:15am, Dec 11)
Last: Walt Davis

NewsblogThe Giancarlo Stanton Trade Shines a Light on the Sad Difference Between the Mets and Yankees
(27 - 11:53pm, Dec 10)
Last: Russlan thinks deGrom is da bomb

Hall of Merit2018 Hall of Merit Ballot Discussion
(316 - 10:45pm, Dec 10)
Last: Bleed the Freak

NewsblogYankees in talks on Giancarlo Stanton trade
(183 - 10:38pm, Dec 10)
Last: kwarren

NewsblogShohei Ohtani’s Value Has No Precedent | FiveThirtyEight
(26 - 10:34pm, Dec 10)
Last: SoSH U at work

NewsblogShohei Ohtani agrees to deal with Angels | Los Angeles Angels
(59 - 9:37pm, Dec 10)
Last: PreservedFish

Hall of Merit2018 Hall of Merit Ballot
(18 - 8:37pm, Dec 10)
Last: Chris Fluit

NewsblogRosenthal: He’s 53 and hasn’t played in the majors since 2005, but Rafael Palmeiro is eyeing a comeback, and redemption – The Athletic
(94 - 8:07pm, Dec 10)
Last: McCoy

NewsblogOT - NBA 2017-2018 Tip-off Thread
(1895 - 7:03pm, Dec 10)
Last: don't ask 57i66135; he wants to hang them all

NewsblogOT: Winter Soccer Thread
(288 - 1:16pm, Dec 10)
Last: Jose is an Absurd Doubles Machine

Page rendered in 1.1585 seconds
47 querie(s) executed