User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Page rendered in 0.9237 seconds
50 querie(s) executed
| ||||||||
You are here > Home > Baseball Newsstand > Discussion
| ||||||||
Baseball Primer Newsblog — The Best News Links from the Baseball Newsstand Monday, November 30, 2020Bill James: The Biggest Problem With WAR
RoyalsRetro (AG#1F)
Posted: November 30, 2020 at 01:29 PM | 197 comment(s)
Login to Bookmark
Tags: war |
Login to submit news.
You must be logged in to view your Bookmarks. Hot TopicsNewsblog: Hall of Famer Henry "Hank" Aaron dies at 86.
(146 - 8:56am, Jan 25) Last: Mefisto Newsblog: Nationals, Brad Hand agree to one-year, $10.5 million deal, per reports (3 - 8:44am, Jan 25) Last: RoyalsRetro (AG#1F) Newsblog: OT - Soccer Thread - Winter Is Here (701 - 8:40am, Jan 25) Last: spivey 2 Newsblog: NBA 2020 Season kick-off thread (1019 - 8:23am, Jan 25) Last: Russlan thinks deGrom is da bomb Newsblog: NY Mets GM acknowledges sending unsolicited, explicit images while working for Cubs (184 - 7:14am, Jan 25) Last: Lassus Newsblog: Sources: New York Yankees acquire pitcher Jameson Taillon from Pittsburgh Pirates for four prospects (23 - 6:54am, Jan 25) Last: catomi01 Newsblog: Braves re-sign Pablo Sandoval (2 - 12:30am, Jan 25) Last: What did Billy Ripken have against ElRoy Face? Newsblog: Aaron’s death prompts call to change name: Braves to Hammers (41 - 10:40pm, Jan 24) Last: Der-K's emotional investment is way up Newsblog: Nationals' Ryan Zimmerman: Rejoining Nationals for 2021 (4 - 8:35pm, Jan 24) Last: The Yankee Clapper Newsblog: OT - 2020 NFL thread (14 - 6:41pm, Jan 24) Last: SoSH U at work Newsblog: 2021 BBHOF Tracker Summary and Leaderboard – Baseball Hall of Fame Vote Tracker (579 - 4:05pm, Jan 24) Last: McCoy Newsblog: Hall of Fame pitcher Don Sutton dies at 75 (59 - 3:55pm, Jan 24) Last: yest Newsblog: MASN cutting on-air talent, reportedly slashing pregame and postgame shows for Orioles and Nationals (12 - 2:23pm, Jan 24) Last: bfan Newsblog: Source: Jurickson Profar, San Diego Padres agree to 3-year, $21 million deal (14 - 2:18pm, Jan 24) Last: Tom Goes to the Ballpark Newsblog: Garrett Richards, Boston Red Sox reach 1-year, $10 million deal, sources say (5 - 12:38pm, Jan 24) Last: Howie Menckel |
|||||||
About Baseball Think Factory | Write for Us | Copyright © 1996-2014 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| Page rendered in 0.9237 seconds |
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
This is, um, a frustrating thing to read from someone as generally respected as Bill James. WAR is intended to be context neutral, but that doesn't mean individual teams have to use it that way to make decisions. If a team has a really good fourth outfielder, they can use the difference between his expected WAR and the expectation for one of their starters to assess how much value they want back in a trade to shore up a position of weakness, for instance. The benefit of a (hopefully) context-neutral measure is that you can always add context back in.
By comparison, if you start off by comparing a player to the actual backups on his actual team, you get some undesirable results and no way to avoid them. (Yogi Berra looks WAY worse after 1955 than before; in 1958, he has a 119 OPS+, significantly above average for a catcher, but would still be below "actual replacement" level because Elston Howard had a 130.)
EDIT: 100 Coke Shares to Eric.
Which is not to say that Win Shares are a perfect solution either. But I think that they do explicitly try to deal with this sort of issue by design.
I don't think that WAR is some sort of correct answer, or anything more than an estimate, either. It wouldn't surprise me if in 50 years it ends up viewed as a relic of its time. It's just that James has had his say on this subject many times before. That he continues to explain why WAR isn't the answer, occasionally sneaking in comparisons to his own system that never took off, makes it hard not to think that there's some truth behind comment #1.
It would if you want WAR to be interpreted very literally. For example let's compare the value of Gary Sanchez to what replacements were available to the Yankees and Yadier Molina to what replacements the Cardinals had. Then you've got a lot of uncertainty on both ends. But WAR is not meant to compare to specific replacements, but a very stable replacement level. That replacement level is set around 20 runs below the league average.
League average shifts a bit every season, but that's the benchmark. It really doesn't matter if a truer value of replacement level is 17 runs, or 23 runs worse than average. That's just a shortcut to having a stable comparison for players that gives them credit for average play, since a league average player has value.
Only part I can agree with here is the first sentence. Wilson's Fangraphs WAR was 4.2 that year, and 5.9 for Baseball reference, at least pitching WAR. Baseball reference is the one that likes Wilson more that year, but it has nothing to do with components, BBref is the one that uses actual runs for pitchers as the base.
Using the more favorable version of WAR for Wilson's 1966 season, it doesn't say he was better than Robinson that year, but that they were tied at 7.7. Bill wants us to say that's really stupid, but he's either reacting without trying to understand what's going on here or intentionally obfuscating here. Earl Wilson was quite a bit better than "some guy whose ERA was not much better than league average".
Wilson in 1966 did indeed have a league average ERA in 100 innings with the Red Sox, but was traded to the Tigers mid season and gave them 163 innings of a 2.59 ERA (134 ERA+). In addition Wilson allowed only 4 unearned runs, so he was better than his ERA says in run prevention. All in all, that's 264 innings of a 118 ERA+. He was third in the league in innings pitched behind Jim Kaat and Denny McLain.
Still, how does he get from 5.9 pitching WAR up to a tie with the MVP at 7.7? It would not have happened if the DH had been around a decade earlier, but thankfully it was not. Wilson was a damn good hitting pitcher. He hit .240 with 7 homers, a .500 slugging percentage. So he picks up 1.9 WAR for being better than the typical pitcher.
Frank Robinson was quite obviously the best hitter in the league in 1966. What would it take to equal that value in other ways? I guess if Ozzie Smith was playing short and hit like, well, better than Ozzie generally did but less than Frank Robinson, that would be one way to get there. I am not going to reflexively dismiss the idea that a guy who pitches 264 innings with excellent run prevention and also slugs .500 is another way it could be done.
Well, in 2012, Miguel Cabrera won the triple crown and he certainly was not the best player or hitter in the league. Interestingly enough, Cabrera had a higher WAR in the year before and the year after his triple crown year.
And that was still better than "In my day, men and boys showered together all the time."
https://chicagobaseballmuseum.org/chicago-baseball-history-news/chicago-baseball-history-feature/glenn-beckert-provoked-smiles-in-his-on-and-off-the-field-chicago-cubs-exploits/
English was at third base at the time and like a number of players says Ruth was holding up his fingers. In fact English says Ruth was holding up two fingers (probably to indicate the two strikes). Ruth if I recall had said he might have made the gesture twice so this sort of suggests that.
If only they had Win Shares!
I mean, can you show me -1 apples? Don't get me started on imaginary numbers!!
Anyway, it's fair enough to say that, statistically speaking, WAR is a bit of a mess for the reasons James states. An estimate plus an estimate plus an estimate ... does blow out a standard error quite quickly. (Standard errors would be useful to know.) Generally taking aggregate-level coefficients and applying them to individuals risks being an ecological fallacy. And rather obviously a walk to Rickey Henderson with nobody on base is more likely to result in a run than a walk to a base-clogger like a Molina with nobody on base. And calibrating to known totals is a perfectly sensible thing to do. (Doing it by deciding by hand on individual player adjustments is totally daft though.) And at the end of the day, all any of us can do is explain the past then assume that the same rules will apply in the near future ... in a relative but not absolute sense cuz you never know when the league is gonna introduce a rabbit ball.
But it's also true that statistics relies day after day, model after model, on the "plug-in principle" -- i.e. that there are parameters in our equations that we will never "know" the value of so we plug in estimates of those parameters. Those estimates are derived using statistical principles (i.e. they come from some other model) and yes, it is important that we incorporate the uncertainty of those estimates as best we can and they add to the uncertainty of our final model. Your choice is to do that or to make #### up. (Let your prior dominate your posterior for you Bayesians out there.)
But all we've got that's concrete is runs scored. Nearly concrete are RBI, singles, doubles, etc. but those involve some rules, scoring decisions, random fluke-y stuff (he pulled a hamstring on what would have been a double) and non-random flukey stuff (Pesky's pole, vines). Strikeouts and walks are similarly nearly concrete but, we learn, are a combination of pitcher, catcher, batter, umpire and apparently park. And probably time of day and wind and whether The Mick popped one or two greenies that morning.
And until we estimate the run value of a double relative to a single, all we've got is common sense telling us that a double is obviously more valuable ... but that doesn't get us anywhere once we have to consider whether 2 doubles is good/better/worse than a HR and a single ... much less the "fact" that the guy with 2 doubles seems to be the better defender.
I mean what is the concrete answer to the question: was 680 PA with 122 RS, 122 RBI, 97 singles, 34 doubles, 2 triples, 49 HRs, 8 SB, 5 CS, 87 BB, 90 K, 24 GDP, 10 HBP, 7 SF, 11 IBB while playing 1181 innings in RF, 167 in LF and 26 at 1B while playing home games in Baltimore's Memorial Stadium in the 1966 AL more valuable than 264 innings pitched (1,059 BF, >50% more PA participated in), 94 RA (90 "earned"), 214 HA including 30 HR, 74 BB (3 IBB), 200 Ks, 6 HBP, 9 WP, 27 GDP plus 113 PA, 20 RS, 22 RBI, 14 singles, 2 triples, 7 HR, 8 BB, 36 K, 0 GDP, 1 HBP, 6 SH, 2 SF split across one very bad team (home field Fenway Park, reportedly a good-hitting park) and a good team that finished a distant third (home field Tiger Stadium, reportedly a good pitchers' park)?
We might refer to rarity here. Robinson's totals were certainly unusual -- do we need to compare just to his specific year? His "era"? How do we compare across? -- while Wilson's pitching totals weren't so much and neither were his hitting toals, but we note that it's very unusual for an individual to combine such hitting and pitching totals in a single season. I'm not sure which achievement is more rare.
Do we need to factor in that the Tigers had to trade actual players to obtain Wilson and the O's had to trade Pappas to obtain Robinson? Does it matter that Wilson was the only Tigers' starter who didn't suck that year (by ERA at least)? Does it matter than the O's 4th OF, Russ Snyder, had a 126 OPS+? Does it matter that Robinson usually had Aparicio and Blefary hitting in front of him, Brooks and Boog behind him? Does the bullpen support that Wilson received in his 24 incomplete games matter? What about his catchers and defense and umpires? Does the quality of Robinson's defense matter? Does it matter less because he had the reportedly great Paul Blair in CF for half the season?
We quicly realize we are nuts for asking the question and insane to think we could possibly ever come up with a reliable answer. And some of that is the crazy stuff WAR tries to answer and some of it is the even more finely grained detail James thinks we need to take into account (while somehow overcoming all the other sources of uncertainty).
James doesn't like cootext-neutrality. That's fair enough -- obviously a single with 2 outs and a runner on 3B contributes a lot mroe to a win than a single with 2 outs and nobody on ... unless of course you are ahead/behind by 6+ runs at the time in which case nothing you do really matters. It's reasonable to say that WPA (or whatever) should be incorporated into a backward-looking, here's what happened measure of value. But how much of that extra value goes to the guy on third and how much to the guy at the plate? Do we need a "mutual value" measure? I also suspect that if you took that seriously, you'd find that "great" seasons come down to about 150-200 PAs. Sure, the Babe seemed pretty awesome but how many of those hits and HRs came when the game was already out of reach? I strongly suspect that a heavily contextualized measure will lead to a lot more WTFs and more extreme WTFs than bWAR and fWAR.
Contextual value in baseball is a funny thing. In 1966, Robinson went hitless in 46 gamees, 30% of his starts. He failed to score a run in 67 of them, had no RBI in 83 of them, had neither in 53 of them. So in 30-40% of his games he was useless. In at least 23 of the 72 games (I counted by hand) in which he had at least 1 RBI, the game was decided by 5+ runs so maybe nearly half the time his contribution was either "negative" or didn't really matter. Now in those other 75-80 games, he may have hit something like 600/800/1200 which probably helped his team win. :-) It seems pretty clear that the only way you can come up with a sensible estimate of his value there is to acknowledge that he actually hurt his team's chances of winning in about 1/3 of his games, had an average-ish contribution 1/3 of his games and was massively valuable in the other 1/3.
Baseball is not, say, basketball. It's a really, really hard game where the PA by PA outcome for hitters is usually utter failure. That has to be acknowledged and accounted for. Any batter except maybe the craziest Bonds and Ruth seasons is more likely to hurt than help his team's chances of winning when he steps into the box. It is obviously possible to hurt so much more often than help as to have negative value.
Speaking of "opportunity" and "value", I note that in 1966 Denny McLain went 20-14 on a 89 ERA+ (Lolich 14-14 on a 73).
He doesn’t seem to understand why an open framework like WAR won out over a proprietary closed system like Win Shares, which has only one person working on it, and not very often at that.
That doesn’t negate his pioneering work on baseball analysis, and showing that someone could make a good living at analyzing baseball.
He remains the patron saint and everybody still loves the early books. But he has made it clear over the years in his comments that he makes no effort to keep up with other people's research. You can see it here where he can't be bothered to even find out which is fWAR and which is bWAR, claims he didn't even really know that fWAR uses FIP, etc. The "shift" may have begun with Win Shares which he sort of promoted as being the end-all of such measures but it was clear he hadn't bothered with anybody else's work on the topic and made some cavlier decisions (no negative value; calibrating to win totals; achieving that calibration via idiosyncratic adjustments; arbitrary multiplication by 3) without really thinking them through. Here we are a couple of decades later and he still seems to think he cracked the code with win shares.
He's a grumpy Paul McCartney -- he did some absolutely groundbreaking stuff, he remains a solid writer but he hasn't tried to keep up (probably a good thing in McCartney's case) and he hasn't done anything impactful in 30(?) years. Unlike McCartney who only seems to have nice things to say about other musicians, James occasionally resurfaces to re-fight the battles of 2000.
Or James is like Joe Morgan -- a decade of utter greatness, a big chunk of other good work, then cranky old-manliness. In short a damn fine life well-lived while most of us only have the old-man crankiness part.
This difference should largely be addressed by including baserunning value, which WAR does. Comparing, say, Tim Raines to Yadier, B-R has a 150-run difference in their baserunning value, Fangraphs has it as 180. (Not sure why there's a gap of 30 runs there; B-R has Molina at -35 compared to -80 for Fangraphs.) The walk is counted the same, but what happens after is addressed separately.
The rest of your comment I generally agree with.
That’s something that has always bugged me about some of James’ studies: he will use some seemingly random number as a multiplier or divisor and say that the results don’t work without using that specific number. That may actually be true, but it gives the impression that he is massaging the numbers to get the results he is looking for. He is still fun to read, but I am no longer surprised at how obvious his conclusions are once you hear them. Maybe all the obvious stuff has been figured out.
And how exactly does Win Shares avoid the problem of inaccurate inputs? It has literally the exact same issue, to the extent it's an issue. "The real problem is ... estimates are never exactly right; they are always just estimates" is one of the dumbest pieces of argumentative rhetoric I've ever heard. Every criticism leveled against WAR here applies to Win Shares, and then Win Shares earns some additional critiques if we're being honest.
If you want a context-dependent value stat, hey, that's a preference. Simply say that's your preference. Explain the pros and cons of it (in an honest way). See who you convert. Or do this, and get called out for your obvious agenda and bias.
However, I'm not sure that measuring the run value of a player is really different from measuring the run value of a replacement player. You're going to do that with linear weights, which are calculated league wide with good statistics. There is no measurement error in counting at bats, singles, doubles, etc. There's quite a lot of measurement error in measuring defense (which is what I hate about WAR -- get those dirty defensive stats away from the nice clean offensive stats!). The run environment of the league as a whole should be accounted for when you're calculating the linear weights.
So the way to avoid stacking tolerances is to use the same value for a single, double, etc for every player and make sure that you don't ever add offensive value and defensive value.
(There's also some error due to park factors, but both systems use those in the same way, so it's not relevant to this discussion)
"The real problem with WAR is that people use it wrong." This is a problem with people, not a problem with WAR. I don't have any particular suggestions on how to fix it, but it'd be nice to see the blame placed where it belongs.
For the unloaded truck
W_t = k * (x_1)
For the loaded truck
(W_t + w_g) = k * (x_2)
so w_g = (x_2 - x_1)/k
Assuming no measurement error in where the needle ends up pointing, the percent error is delta k/k, which is the parameter inspected by the state.
If you use two scales, you run into the problem Bill's dad was noticing.
In polite company, this is referred to as "calibration via idiosyncratic adjustments." :-)
Fun fact: Wilson was the first black player signed by the Tigers, the last of the then-16 MLB teams to do so.
Good reason to stop writing about it, Bill.
It hasn't historically been used, because
1) Sabermetricians have tended to be hobbyists doing the math by hand or with spreadsheets
2) The other source of sabermetricians is big data, where errors are statistical and quickly average down to zero.
and
3) Nobody has been badly burned yet.
Back in the day, Baseball Prospectus kept doing article after article "evaluating" how well PECOTA had done over the last season. This for a forecasting system that claimed to include error bars! Five minutes plotting Z scores (Z score = (prediction - observed)/standard deviation) would have shown more than every article combined. That finally seems to have caught up with Nate Silver this year, as people noticed that he was calling most states correctly but the winning margins were well outside of the predicted ranges.
People use WAR to stack up and compare MVP candidates all the time. Using Bill’s problem with the baseline being replacement level: it is unlikely that the next person who would show in the dodgers line up would be merely replacement level if Mookie Betts had to sit out. However, if Freddie freeman sat out Johann Camargo becomes the next man up and he is replacement level or worse. So Mookie’s value in the line up is overstated and Freeman’s is understated. It doesn’t change what an incredible player Mookie is or how great he was in 2020, but it does reflect on his value to the team.
Ah yes, there's a term in statistics called "p-hacking" where you look through tons of data until you find something statistically significant, but it's really a false correlation because if you look hard enough to find a small p-value you will eventually. That leads to "overfitting" where you design a model based on spurious correlations.
His true crime work, and his dedicated anti-professionalism, is also utter crap.
I thought you were referring specifically to adjustments in calibrating the scale to win totals - didn’t he also just have some sort of “manual override” for a player’s final number?
FR 66: 122 R, 122 RBI, 97 singles, 34 doubles, 2 triples, 49 HRs in 680 PA
EW 66: 120 R, 132 RBI, 84 singles, 0 doubles, 12 triples, 42 HRs
So is 264 IP of good pitching worth 5/6 of Robinson? We could think of that as 1,059 BF vs 567 PA so each Wilson PAA only has to be slightly more than half as far below average as Frank's PAs were above average.
Now some of you clever snots will suggest that Wilson's batting wasn't really worth 1/6 of Frank's. You'd have a point as WAR credits Robinson with 71 Rbat and Wilson with just 5. Darn OBP it seems. Yet Wilson scored and knocked in runs at an equal or better rate so are we sure Frank's meager 111-point OBP edge was really worth all that much? (Better get an estimate!) Or do we need to adjust for their on-base contexts -- surprisingly for Frank almost exactly league-average while Wilson had a lot more?
Now, if Frank had pitched to about 500 batters, we'd have an easy comparison.
I don't think that the part about him pushing back against other people's work fully holds up. He praised DIPS and accepted it as true, mentioning it in his 2001 book and crediting the guy who came up with it. He just really seems to have a hang-up about WAR.
Recall he assailed Linear Weights years ago for doing exactly that.
[Trying to remain silent about his politics, but I can't deny that his blinders there has had a retroactive effect on how I now view his older writings]
It's funny. I came to Sabermetrics without reading James at all. Pete Palmer's Hidden game of Baseball introduced me too it, then it was Neyer on ESPN.com, and so forth.
Hey if you think you'd discovered a better system in betamax and everyone went VHS, you'd be angry too!
I thought he was relatively liberal?
Similar path here. The other big influence—not sabermetrics—was Murray Chass, who introduced me to and educated me on the business side of the game. While for me all three of James, Neyer, and Chass went from groundbreaking to somewhere near unbearable, I'm grateful for their peaks and the trails they blazed.
Nowadays, specifically when it comes to James, I keep in mind what Tango has said about him, something like, "Bill often identifies a problem without offering a solution, which he leaves us to work on." Tango's point—and as far as I'm aware he and James get along fine—is that it's best to think of James's work more as a conversation starter than a complete and rigorous analysis.
Still he very rarely starts a study with a desired conclusion in mind and that's huge.
Plus he's a better writer than almost anybody in the field.
Nope. I've read sufficient comments in his weekly column that demonstrate otherwise. Old heroes fall hard. [Tho I still acknowledge #47's 1st point]
If by "impactful" you mean a hit single, or something that has had any impact on the current direction of pop music, I'd agree with you. But McCartney's work from 1997's "Flaming Pie" onwards has featured quite a few magnificent songs. He's actually had a remarkable late career renaissance as far as the quality of his music is concerned, whether it gets played on the radio or not. I recently put together a two-hour playlist of post-1997 McCartney music, and it was wonderful.
Ha, No. Silver has done Z-score analyses of his models in the past, and they do quite well. The people criticising him this year have no idea what Z-scores, normal distributions, sampling errors, systematic error, math, etc. are. A few recognise one or two of the words, and know just enough to misapply them. He had some bad luck getting all 50 presidential states right in 2008, and winning over people too much by dumb luck.
In fairness, groundbreaking revolutionary work deserves massive accolades. And his writing really was compelling.
However, if Freddie freeman sat out Johann Camargo becomes the next man up and he is replacement level or worse. So Mookie’s value in the line up is overstated and Freeman’s is understated. It doesn’t change what an incredible player Mookie is or how great he was in 2020, but it does reflect on his value to the team.
Rosters aren’t static, the Braves can trade for or sign a better replacement if they want to. It’s not an efficient market like the systems imply, but it doesn’t make sense to think about value as dependent on the quality of one’s teammates (or GM). And likewise, if Mookie was unavailable his replacement could get hurt the next day and then they are using more of a replacement-level replacement.
That being said, I think James’ critiques of WAR are partially correct. WAR would be better expressed as a range but it doesn’t seem like anyone has attempted to quantify what the standard error is around the metric. And on the pitching side, there are some big assumptions underlying WAR when it comes to defensive support. Those should be more prominently explained and it should be easier to identify the magnitude of their effects. But Win Shares has very similar problems, as Matt Welch noted in #3.
Is any of it rock and roll?
Wilson was signed by the Red Sox and traded to the Tigers in 1966 (signed in 1953, MLB debut in 1959 - same year as Pumpsie Green). At that point they already had Willie Horton and Gates Brown, who were signed in 1961 and 1960. I don't know if any of these guys was the first black Tiger, but it definitely was not Wilson.
That got him a massive advance on a book deal and the backing/funding to do what he wanted in a media site. If that's bad luck, sign me up.
I hate to single anybody out, cause this is really a sort of minor chord that's been running through this whole discussion: THe notion that Bill James has suddenly gotten old. That he no longer has it. That he's cranky.
EMphatically: NO! This is ridiculous notion.
This has always been Bill James thing. He takes on controversy, he's a maverick, he finds things that no one else finds because he wants to go against the grain. This is also the beauty of Bill James, so before I lambast James, let me just say that almost all of us have cut our teeth on BIll James, and he's body of work shadows over this entire field of Sabermetrics.
Its probably the main reason most of us are here.
Ok having said, let me just say that like Lindsay Lohan saying something stoopid, or Bobby Bonilla booting a groundball or Patton slapping a soldier; Bill James is likely to say something ridiculous at any time any place.
OK? Thats just who he is. I could cite numerous obvious examples where he defends Pete Rose's gambling, or drops hints that a baseball player is gay, or that Hal CHase was a serial philanderer or decide that Dick ALlen never helped any ball club he was on. No. I'm going to cite something more prosaic, something right in Bill Jame's wheelhouse.
Baseball research 101. Here's the quote, its the fifth excerpt on the page from this archive site:
http://baseballanalysts.com/archives/2004/07/abstracts_from_12.php
How many times do you think Sal Bando made 20 errors at 3b? Like four times, with a high of 24. Here's some others:
Graig Nettles one of the greatest fielding 3b: 5 times, a season high of 26
Ken BOyer 7x, 3x 24 or more errors, (did you know James ranked Boyer 12th all time at 3b and also plumped for him in the HoF at the same time he was plumping Santo?)
Darrell Evans, 7 times, each time 25 or more, high of 36
Buddy Bell, 4 times, yeah Bell was really good.
Ron Santo. 11 TIMES! 6x 25 or more erros. Effin Ron Santo committed 25 errors at 3b 6 times. his high is like 31 or something.
The point is not that committing 20 or so errors at 3b is unremarkable, well it is unremarkable. Making errors at 3b comes with the territory. And Bill James should know that cause you know BILL JAMES IS A RESEARCHER. Its not the main point. The point is that this is an easy Look Up.
Bill James plumped for Ron Santo for the HoF. He also plumped for Boyer. James is supposed to know baseball statistics. His statement above is him mailing it in. Just saying something off the cuff. Here's another quote from the same excerpt:
Im not even gonna research this one. Im gonna go out on a limb and say I am 85% certain that that statement is also utter bullsheet. WHy? Cause I know two things:
1. Throwing the ball away with runners on base is just about the worse thing you can do as a 3b. Its usually at least -1.2 weighted runs with just one man on base right there.
2. Even Dick Allen for all his defensive woes didnt throw the ball away all that much. He knew they will yank you off 3b in a heartbeat if you do that often enuf.
Oh I also know that Bill James is doing his Bill James thing right here.
Ozzie Virgil, 1958.
The mistake here is comparing Earl Wilson's bat directly to FRobinson. Presumably the Tigers have a RF who can hit (that would be Northrup OPS+ of 120). Technically we(you) should be comparing Robinson to the avg AL or MLB RFer in terms of hitting/off. Since every position has to be filled by someone who bats.
Wilson OPS+ of 129 which is probably 100pts? more than the avg pitcher but he's only batted 113 times so maybe 15 runs created more than the average pitcher.
While some primates may disagree I think the better way is to compare a man's off contribution vs same guys playing same position. Instead of these bizarre and highly theoretical positional adjustments. I mean if you want to do it that way, then I guess you 'd have to make Frank RObinson pitch every fifth day and make Wilson play RF 4 out of every 5 days. Who wins that match up?
SO that would be how you do it. Going from memory I think frank is about 70 runs above avg on off, so vs the average RF he's what +50 or so? in terms of offense.
wilson maybe +15 runs on off. So that leaves Earl with 35 runs to catch up to Frank in our theoretical MVP revisited race.
Anyone know if there is an equivalent "runs saved" for pitching? Im sure there is.
We havent even talked about Frank's fielding which we know is pretty bad.
How bad is Frank's arm really? How many guys take the extra base on him?
And how bad is Wilson's fielding? Is he capable of the "throw the ball into the stands while trying to make the play at 1b with runners on base?"
Who is more clutchy? who costs more money? Tune in tomorrow!
And why I'm likely to go grumpy old man when I see discussion that assume first decimal place precision. The standard error on the offensive components is not smaller than 14 runs (EDIT: per year) for a full time player. And that's the area that I'm pretty sure has the highest precision.
Similarly I'm confident that the standard error at the career level is not smaller than 4 wins (probably more like 5) for anybody involved in a HOF discussion.
EDIT: I should point out that Walt Davis may not raise these issues explicitly but he good at asking method related questions and the like.
Are we sure about this? Was he limited by injury or something? His -8 fielding for the year is not an outlier or anything. He was -4 the year before, then averaged -8 for 1967-68. After that he's got one good year, one bad year, 2 average years, then became a DH.
Great idea. I like that so much better than what James is doing here. Robinson obviously is so much better than Wilson, ergo, WAR is stupid. Hard to believe that unJamesian sentiment is coming from the same guy who wrote the Mattingly/Clemens and Rice/Guidry comparisons.
He did get blown out a couple of times; on one occasion (19 July) Frank Robinson chased him out of the box in the first inning with a 3-run homer.
But on 18 May (with Boston), he held Robinson to 1-for-5 with a GDP. In the tenth inning, Wilson homered off Jim Palmer to put the Red Sox ahead, and then got Frank on a foul pop to end the game in the bottom of the 10th.
On 14 July, Frank hit a first-inning solo HR to put Baltimore ahead (of the Tigers, now). Wilson settled down and drove in the decisive run with a SF in the the sixth, Tigers won.
The next night, Wilson hit a 3-run pinch homer in the bottom of the 13th to beat the Orioles again.
That's pretty good :)
I feel strongly that he is, but I can't prove it with data. A -8 on TZ is quite bad. For one thing, the TZ ratings or whatever the 1966 equivalent is, just seem to be attentuated in both negative and positive directions. [NOTE: these claims are disputed by certain primates]
Also I think one of those years you're referring to is 1970 in LAD, and I feel that the half season or whatever he played is never enuf data pts.
We talked about this in re: Rabbit Maranville. Who was obviously really good. He held down SS for like 20 years. It would follow then that he must have had some sort of peak, some time in which he was really stellar, how else do you hold on for 20 years? Everyone ages, so he must have been coming down from a really high defensive peak. But you cant see it that from TZ, I forget what the numbers were.
I just looked up Tris Speakers numbers. Speaker was considered the greatest fielding CF before 1950. We could argue that. I know Dom DiMaggio was really good at his peak but I dont think he had a long career. I know Bob Meusel was really good for awhile. Can we agree Speaker is near the top of his profession?
OK TZ gives him 10+ def WAR 4x in his career. His high is 14. (the all the rest are 10). That cant be a true evaluation of his talent? You can look at stat cast and see the best guys are hitting like 18 OAA (out against average) and we're not counting assists and holding runners (which probably max out at 10 runs/season and 5/season respectively). Betts has hit 30 def runs against replacement. Perhaps thats the outer limit (but Betts in RF what of Mays in CF?). Would 20-25 def runs for 12 seasons be reasonable for Speaker? I think so.
There's some other stuff Ive bookmarked that we can reference later. for example SOmeone did a study on Jeter, how many balls he didnt get to vs the best SS that year. Jeter is almost -40 vs avg and the other guy +30 or so. I question how they value those in weighted runs, but it give a real indication of what we would expect the spread to be for SS. 3b seems to be the other key inf. position where the spread is that large.
It follows that the same problem exists going in the negative direction. A -8 on TZ would likely be a lot worse on DRS/BIS, which is more recent development.
RIght. I was looking at this a little while ago and there is lots of room for disagreement/areas to be plowed. The biggest problem with Wilson is that fangraphs seems to ascribe babip as a near constant measure of defense. So Earl WIlson 6.6 WAR or whatever on Baseball reference is being fueled by a .250 BaBip! It was a dead ball era (I think babip is aroiund .280 back then) but Wilson's babip is freakin awesome.
Is Wilson inducing weak ball contact or is he really lucky? Fangraphs says he's lucky so that .250 babip gets adjusted into a FIP that is higher than his ERA+. Thing is Wilson also produced a similar babip in 1967, so I dont think its just luck. we had this discussion a few days ago in the Andy PEttite thread.
The other guy who's getting jobbed by babip/fangraphs in 1966: Juan Marichal. He's hitting 10 WAR and 9 WAR in 1965-66. Fueled by babips of .238 and .223! That gets cut by Fangraphs to 6.8 and 6.3. Gee I dont know what to think. Conversley his teammate Gaylord Perry is going in the other direction. His FIP skews equally in the opposite direction suggesting he's better than that same SFG defense that is helping Marichal.
I dont have any conclusion other than it be interesting to study teammate situations, and two or three year trends to see which method is better. Its just so weird that the SFG defense is so good for Marichal and so leaky for Perry. You can also see babip trending upward in the careers of both WIlson and Marichal and probably many others. hmm
Other food for thought:
How do folks feel about positional adjustments for this sort of thing? Should a pitcher get more credit for eating innings?
We might as well throw in Tommy Agee and Yaz into the 1966 discussion. These guys field better than Frank and we might as be thorough if we're going this far.
Could throw in Jim Kaat (6.4 WAR on FG) and Sam McDowell in there (4.4 FG vs 5.0 BRef) as well.
Certainly if modern day infielders can save 30 runs vs replacement we should find some AL infielders who are being jobbed by hackneyed TZ. Tom Tresh is one candidate or DIck McAuliffe or Clete Boyer might be possible. Im not sure who the catcher would be.
HOw did Agee get to +18 def. runs or whatever anyhow? His range seems about average and he has 12 assists. That's good it doesnt seem to be outstanding.
The NL seems to have all the talent in 1966, both pitching and hitting. It seems really strange. Frank came over and just lit up the league and as a fan you'd probably believe that the AL really is inferior.
Yes.
http://fieldingbible.com/jeter.asp
My only question is his conclusion that this represents only about 30 runs. (5th para. from the end). Dont you have to add .45 runs (value of hit) vs .23 runs (the value of an out) to get .68? Also I guess we are assuming that none of these are throwing errors with men on base which will kill you; I have no reason to think Jeter was exceptionally bad at this though there might be some runs here.
On Agee, it should generally correlate to range factor but not always. The method used for 1966 is to allocate all the hits when he was in the field to the defensive positions. So if a hitter has 20% of his outs on balls in play made by the CF, then if he gets a hit the CF is charged with 0.2 of a hit. Add it up, compare to league average, throw in a park adjustment and you get his TZ.
I used different methods based on what data was available, that method is roughly what was used from the fifties to the 80’s.
Jeter's fielding percentages were actually quite good. Per BB-Ref, his career fielding percentage (.976) was better than league-average over those seasons (.972) and he even led the league in the stat twice (2009, 2010) with six other top-5 finishes. It's one of the explanations for his Gold Gloves.
He was ahead of his time covering the business of baseball in a mainstream publication. He brought that side of the sport to large swaths of people just like how Neyer later did with sabermetrics. In terms of broad reach, the lasting effects of those two can be seen in so much of the baseball content we read these days. I'm sure other people eventually would have done similar, wide-reaching work as those guys, but credit to them for leading the way and I'm grateful for both writers' work.
How It's Going: These statistics are telling me things that I didn't expect. These statistics are bad.
(*) Nor does anything remotely approximating it. Quick, who are the CFs who are going to give you replacement level CF production next year? How exactly are you determining that, and how big are your error bars? Are you adjusting the quality of the replacement player down to account for the risk inherent in these error bars? (**) If you can't go into the marketplace and get predictable replacement player production -- and you can't do anything close -- the concept Challenger-space-station fails.
(**) If this isn't happening, stated WAR is significantly understated.
this is coming off the debacle that was 1994, of course.
anyone interested in his take on specific players - no request too small!
:)
It really isn't wins over any kind of player or even player archetype -- that's basically false advertising at this point -- but instead "Wins Over Baseline," only the baseline is completely arbitrary and makes no inductive or deductive sense and follows from no sensible premise or set of premises. "Mike Trout hit .... and fielded ... and baseran ....; OK, let's say you had a CF who hit ... and fielded ... and baseran ..., Mike Trout was ... wins better than this player." There's no reason to use the imaginary player's production as anything that means anything as a baseline. That production isn't available in the marketplace and doesn't have any independent tangible meaning or significance. Yeah, if you could actually go into the marketplace and get assured replacement level production it would make sense. But you can't. You can't even approximate the concept.
Win shares actually does make sense -- Team X won 92 games, which of its players contributed how many of these wins to the total? That endeavor makes perfect sense.
More like the opposite of that, to be honest. Read through the Win Shares book and you'll see that he's constantly coming up with reasonability tests checking to see that similar players are evaluated similarly. That's his process, and I don't think it works for anybody who doesn't have strong opinions about the relative value of, say, Louis Aparicio and Rabbit Maranville. But James is much more resistant to simply applying a statistical technique and trusting the results than average, not less resistant.
His whole issue here is that he wants a rating system to be completely logically coherent, and he thinks that WAR is cutting corners.
Rally: I wonder if you've seen this extended debate about Dave Winfield's fielding here. It starts with Bill James article but the comments are real interesting, using I believe the very same method you are describing:
https://www.billjamesonline.com/winfield_and_evans/
what do you think about the conclusions and how much error do we expect in this method?
Incorrect. Replacement level is calibrated so that all all replacement level team would win about 40 games, which is the worst results ever observed.
Kiko: Yes but, the issue isnt his fielding range. It's: Did he make bad throws an inordinate amount of time? Do we have any data on that? I cant imagine that he would but I didnt really watch him much.
Another question: you have worked on another component of OF defense and that is getting to balls and turning doubles into singles and such. Would this sort of thing be reflected in the baserunnner advances data? if someone does give up more baserunner advances would that not reflect his ability to get to base hits? How much more defense value can be added or subtracted by studying this?
Actually I think the number is 48 wins but you have the right idea. It's like a .300 team. How many teams have you seen play .300 ball? Actually I guess the expansion Mets played .250 ball so I guess the assumption is that they were worse than replacement level.
Right -- that's "Wins over Baseline" as I said. And as I've also said, the "calibration" is impossible in the real world. If the BTF community spent each winter running a parallel project to the Hall of Merit wherein they predicted which five players at each position would generate replacement level production the next season, they'd probably be off by 50% or more when those players' actual production is compared to predicted. I could let you pick five "replacement teams" for next year and you could take the closest one at each position and your projected error bar would have to be at least 20% and that's being generous.
You can't obtain guaranteed replacement level production in the real marketplace, in the way you can obtain risk-free returns in the financial marketplace. You can't even approximate the concept. That collapses the entire endeavor. All that's left is gut feeling a baseline of a "kinda passable player but not very good and in some ways kinda shitty." That's a meaningless baseline.
What would happen is:
1. Some would play better than replacement level
2. Some would play worse than replacement level
3. Many wouldn't play at all, because what team actually goes into a season trying to find playing time for replacement level talent?
But if you add up the sum of value for the ones who do play, as a group they would be very close to replacement level.
I kind of remember it happened but have forgotten anything about it.
It looks like about half his errors in his career-high 24-error 2000 season were on throws, but I don't know what the typical figure is.
If I had to guess, I'd say that absent any specific trend, he would be more likely to get charged with a throwing error than the average player, if only because fielding errors can be scored hits, and a longtime superstar such as Jeter would get that benefit more frequently than a mediocre player. Throwing errors are much harder to write off that way.
But that wouldn't speak to any particular trend he had on miscues fielding v. throwing.
Exactly -- but replacement level isn't adjusted down for that uncertainty of performance as it should be to make any sense.
Be happy to wager on the validity of that one, but even assuming it's true, the real world doesn't let you play 20 CFs at a time and aggregate their performance.
Exactly -- but replacement level isn't adjusted down for that uncertainty of performance as it should be to make any sense.
Be happy to wager on the validity of that one, but even assuming it's true, the real world doesn't let you play 20 CFs at a time and aggregate their performance.
Not sure what your point is. Sometimes the replacement-level guy plays at a below replacement level. Sometimes Edwin Diaz or Andrew Benintendi does the same thing. So what, that doesn't change the fundamental exercise.
Why should it be. If the mean is zero the positive variance offsets the negative.
You can't obtain guaranteed replacement level production in the real marketplace, in the way you can obtain risk-free returns in the financial marketplace.
You can't actually do that either, except for a very short time. One month T-bills might be risk free, but 10 yr. Notes have a lot of interest rate risk.
I'm arguing that there's no such thing as replacement level. As to the "too high" point, the only thing I'm arguing is that the replacement level that's wrongly assumed to exist pegs the level too high. But that's neither here nor there; the choice of level is completely arbitrary in the first instance. It's a level. So are a bunch of other levels. There are a lot more variables and what not, but at the end of the proverbial day there's no more substance to WAR than there is to noting that a .300 batting average is []% above a .275 average and []% above a .250. Or that Mike Trout played 25% better than a player who played 25% worse than him and 45% better than a player 45% worse than him. That's what it reduces to.
That there's no such thing as replacement level.
Because it was premised on the idea of freely available talent. Once that premise fades away, it's just a performance level just like any other performance level. You might just as well use "performance above really good player" or "performance above really shitty player." It's entirely arbitrary.
But if you get the negative, you haven't obtained freely available replacement level performance.
Not if you hold them to maturity. If you hold them to maturity, and they don't default, you know your return exactly. Compare and contrast the "returns" of baseball players. Kinda ... not really that, right? (The risk free rate is a well-known feature of modern finance; no need to relitigate that one. The only weirdo quibble one could make is that US debt isn't cosmically 100% assured against default, but it's 99.999% and that's good enough.)
Nonsense. Locking in a return of 0.91% is not risk free. If rates rise significantly over the period, you have lost substantially. If you want risk-free you have to have near zero-duration.
The one month Tsy is at 0.07%, the 10 year is at 0.91%. The only reason you get 84 extra bps is because you're taking duration risk. If rates risk 2%, the 10-Yr Note will lose something like 15-20% of its value. That's risk.
Holding to duration only spreads the loss over multiple years. In a trading portfolio, you have to mark the loss to market immediately.
The "risk-free rate" in WACC calculations (for example) is every bit as theoretical as replacement level.
Replacement level is just one component of WAR, and it’s pretty much irrelevant when comparing players of equal playing time. So for things like MVP discussions it’s largely moot. You can compare them against average (WAA) or whatever baseline you want to choose.
But if you do want to compare guys who played different amounts of time, where you set the baseline is important. You can’t simply throw your hands up and complain there’s no such thing as freely available talent. I mean you can, but it doesn’t get you any closer to the truth.
Hey, that's my schtick.
On Robinson vs Wilson hitting -- chill man, I was just having fun with the fact that Wilson had (nearly) exactly 1/6 of Robinson's PAs and if you multiplied his numbers by 6, he (nearly) matched Robinson on R, RBI and HR. The proper reaction to that is "damn, Earl Wilson could hit!" not "the proper way to make this comparison ..."
Somebody mentioned Wilson's fielding. bWAR (and near as I can tell fWAR) just assumes its effects are captured in RA which, on average in large samples, it will be. But pitchers have the luxury that fielding miscues that don't actually result in runs essentially don't count (but then the ones that do, count in full) while WARpos gets dinged regardless of the result. TZ isn't even calculated for pitchers, DRS is but only appears on b-r under a pitcher's advance fielding stats.
A one trick pony, but it's a good trick.
Terrible PH. Not that you really want a .195 hitter who didn't walk much in most PH scenarios.
You must be Registered and Logged In to post comments.
<< Back to main