WARP3’s systematic flaws with 19th Century players
I’m going to do an overhaul on pennants added this week (hopefully, if you want to volunteer for some data entry, let me know!), including revising for the WARP 2003 numbers (I’ll leave the old ones up, for reference), and adjusting for the overrating of fielding by setting the replacement level too low.
One thing I noticed, that troubles me from the Prospectus glossary:
“FRAR
Fielding Runs Above Replacement. The difference between an average player and a replacement player is determined by the number of plays that position is called on to make. That makes the value at each position variable over time. In the all-time adjustments, an average catcher is set to 39 runs above replacement per 162 games, first base to 12, second to 34, third to 26, short to 38, center field to 30, left and right to 20.” (emphasis mine)
This means that WARP3 systematically overrates 2B and underrates 3B from this time period. It also underrates 1B, who were at least as valuable as LF and RF defensively probably a little more valuable.
If I had the time, I could adjust this defensive spectrum, and re-rate the fielding component for each player, but that ain’t happening any time soon.
What I’d suggest is this: Using the same total of runs 39-38-34-30-26-20-20-12, I’d portion them this way, pre 1930:
39 C, 38 SS, 34 3B, 26 2B, 25 CF, 21 1B, 19 LF, 17 RF.
You could tweak that to your own specifications, but I think I’m being pretty conservative, you could easily make the case that 1B was equal in value to CF, though I won’t.
Doing that, you get an adjustment (using a 9.7 R/W converter) of:
3B: 34-26 = 8; 8/9.7 = +.82 W/season
2B: 26-34 = -8; -8/9.7 = -.82 W/season
CF: 25-30 = -5; -4/9.7 = -.52 W/season
1B: 21-12 = 9; 9/9.7 = +.93 W/season
LF: 19-20 = -1; 1/9.7 = -.10 W/season
RF: 17-20 = -3; -3/9.7 = -.31 W/season
I think that if you do this, you’re going to come out with rankings much more in line with the generally accepted historical rankings, especially with players like Jimmy Collins and Bid McPhee.
I’m not saying that WARP should be eliminated from the tool box when discussing 19th Century players. Just that it needs to be tweaked somewhat.
The mistake that Prospectus is making is that they think (implied by their formulas) that if Jimmy Collins were playing in 1990 he’d be a 3B. But most likely he’dve played 2B, because that’s where the more skilled fielder played after 1930. That’s the only way to explain how 3B outhit 2B after 1945, and 2B outhit 3B before 1920. From 1920-42 they were about even (who cares about 1943-45 :-)
Joe Dimino
Posted: December 02, 2003 at 04:53 PM |
61 comment(s)
Login to Bookmark
Related News:
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. jimd Posted: December 02, 2003 at 07:52 PM (#519651)-------- SO's C-SO Infd Outf -- SO's C-SO Infd Outf
Can you point me to a thread where you had/have this discussion?
I also agree with Joe's viewpoint that it's rather silly to apply the same fielding standards to each position across time. In weak leagues, whether it's the start of baseball or today in high school, the 3B is a much better fielder than a 2B.
Alot of that, in HS, is because of the much greater # of RH hitters. Taking a quick peek at the Lahman DB, and it looks like the # of RH hitters was over 70% before the turn of the century, and then a steady drop. From 1999-2002, I can tell you it's at 58%.
This might also affect the way you think of pitchers. A LP has a tremendous advantage in the old days, compared to say RJ today, where a manager can stack the lineup so that he faces 50% LH.
Apologies if this has been talked about already.
Tango - isn't this backwards? Shouldn't a LHP be better off today than he would have been facing all those RH hitters in the 19th century?
You are correct. A RP is disadvantaged today, because of the enormous # of LH he'd face. (I'm reading it for 2 minutes now, just to make sure I didn't mess up again.)
Can you point me to a thread where you had/have this discussion?
The discussion of positional impact of 1B, such as it is, appears, I think, in the 1913 ballot discussion thread, since that's the year Beckley became eligible. But your inference, Tango, that the positional impact arises from the need for someone to scoop balls from poor fielders, is not one that we had discussed. The main arguments for greater 1B defensive value have been that first basemen in this era needed greater range and a better throwing arm because of all the bunts. I There's some discussion of that issue in the 1913 thread.
> 3B: 34-26 = 8; 8/9.7 = +.82 W/season
WARP1; 2; 3; Name (net adjustment, WARP1 to WARP3)
WARP1; 2; 3; Name (net adjustment, WARP1 to WARP3)
Flick NL 1898 W1-9.7 W3-8.9 - 8% down
I think you're looking for WARP's love in the wrong places. If you look at the WARP2 adjustments for BRAR and FRAR, you'll see that the big losses for Magee and Sheckard are in fielding runs. They are left fielders, who in that period were a lot closer to CF in defensive value than they were to RF. When WARP2 normalizes for all time, LF defense is prorated to equal RF defense in value. So Sheckard and Magee, the leftfielders, drop off a lot in WARP2, while Flick and Keeler, who were rightfielders, lose less.
It's basically the same adjustment principle that leads to early third-basemen being underrated in WARP2 & 3.
Here's their career FRAR adjustment in WARP2
Keeler 323 --> 220
But why? Is it because replacement level is assumed to be artificially low in their time? And they are being compared to an average replacement level over time? Is that also right? Does this make sense? It's sort of like saying that we'll pretend they didn't do what they did, (we'll pretend they didn't really contribute what they did contribute toward their team's success) because other players didn't have the same opportunity to do the same things. Would that be one way to characterize the logic here?
Or to put it another way, this seems to assume that--a pennant is NOT a pennant.
"FRAR
I don't use WARP3 for the simple reason that I do not timeline. A pennant is a pennant and value toward that pennant is value. I really don't care what Pete Browning could/should/would have accomplished if he had been plopped down in 1910, 1949, 1994 or any other time. I DO use an AA discount on adjWARP1. And of course, a person could apply whatever timeline they might want to apply to adjust WARP1, if they wanted to do so.
I do understand the problem of comparing Joe Kelley and Charlie Keller using adjWARP1. Is it Keller's fault that he did not have the opportunity to contribute to the extent that Kelley did? No. But to give Keller credit for what he would have done if he had had the opportunity does not seem like a solution. If Keller were as good a fielder as Joe Kelley, maybe he would have played CF or RF and then, perhaps, he would have had the opportunity. Both played in a 27-out context and their managers used each in a team context to the best of their ability to contribute to those 27 outs. In that sense, they had the same opportunity (of course, there's the little matter of pitcher K's).
But I don't understand the reasoning, ultimately, for pretending that either did more or less than what they did. That, it seems to me, is to reduce the question from a "value" question to a "tools" question. What did Kelley and Keller have the "tools" to do in an average environment? That seems to be what the question becomes with the WARP3 adjustment.
But anyway, back to my basic question, is using an adjWARP, with whatever adjustments you personally happen to favor, a "solution" to the adjustments for all-time problems?
Yes, I'd say it is a reasonable solution, if you think that adjustments to an average environment reduces a value question to a tools question. Myself, I think it does, so insofar as I use WARP, I place more weight on WARP1.
There remain level-of-competition adjustments for which WARP2-3 might provide valuable guidance, but I haven't yet begun to really study how the system makes those kinds of adjustment. I hope to learn, though!
In principle, one can tabulate measured LF-fielding and RF-fielding for every MLB team-seasons --the collective value, skill, etc, depending on interpretation of the measure. In turn, that table can be the object of analysis. The three example tables for 1890-1909 can jointly be used to study the Davenport adjustment and the Dimino adjustment applied to that time period.
TomH, I like your article in the current "By The Numbers". Can you do this article next?
Thanks, Paul, for explaining a reasonable alternative approach. I am not as familiar with linear weights and the rest of Pete Palmer's work as I should be. My thinking about sabermetrics has been largely shaped by Jamesian approaches, and sometimes my assumptions are limited by that. It's possible, I suppose, that WARP1 takes a linear weights approach, in which case the period adjustments would have to be built into WARP2, though their explanation of their adjustments does not imply that they are using a linear weights approach, at least not for fielding.
Understood. I phrased my last post poorly, using "linear weights" as the name of the approach. I should have written, "It's possible, I suppose, that WARP1 takes a Pete-Palmer-style approach and uses a single, all-time formula."
Understood. I phrased my last post poorly, using "linear weights" as the name of the approach. I should have written, "It's possible, I suppose, that WARP1 takes a Pete-Palmer-style approach and uses a single, all-time formula."
Understood. I phrased my last post poorly, using "linear weights" as the name of the approach. I should have written, "It's possible, I suppose, that WARP1 takes a Pete-Palmer-style approach and uses a single, all-time formula."
The seasonal adjustment is supposed to be 2/3 of the way from X to 162, right? If Joe Blow's team played 100 games, a full seasonal adjustment is 1.62, but 2/3 is to 141 or 1.41, right?
And that adjustment should be from WARP2 to WARP3, right?
Well, Bob Caruthers 1886 works out. WARP1=17.3. WARP2 (adjusted for difficulty) is -28.9% adjustment (the timeline values are a whole different subject, so let's just accept that for the moment) down to 12.3. St. Louis played 139 games that year. 162-139 x 2/3 = 154, so his seasonal adjustment is 154/139 or 1.05, and indeed 12.3 X 1.05 = 12.9, which is what his WARP3 value is. Great.
But take Jim McCormick 1880. WARP1=8.5 (now, forget for the moment that Caruthers 1886 was at ERA+ 148 in 387 IP while McC in '80 was only at 127 but in 657 IP, again, that's another story). WARP2 is -16.4% or 7.1. Now, Cleveland played 85 games. By my calc, his seasonal adjustment should be 1.6 (up 2/3 of the way to 162, i.e. 136, or 136/85=1.6). But no, his adjustment is 12.125%, from 7.1 to 8.0. Why? It seems to me he should be at 11.4.
Not only is this the right adjustment but 11.4 seems right compared to other highly rated pitcher seasons. It would have him slightly behind Caruthers who not only had that 148 but also hit OPS+ 196. But if you add in 387 innings played in the OF, Caruthers still spent only a little more than 100 more innings on the field than McCormick did. So a slight advantage for Caruthers (after the difficulty adjustment) seems right. A 12.9 to 8 advantage does not seem right (but of course, neither does a 17.3 to 8.5 advantage before the difficulty adjustment).
Anyway, back to my main question, aside from all the others. Why is McCormick's seasonal adjustment not 1.6? Thanks anybody.
The seasonal adjustment is supposed to be 2/3 of the way from X to 162, right? If Joe Blow's team played 100 games, a full seasonal adjustment is 1.62, but 2/3 is to 141 or 1.41, right?
And that adjustment should be from WARP2 to WARP3, right?
Well, Bob Caruthers 1886 works out. WARP1=17.3. WARP2 (adjusted for difficulty) is -28.9% adjustment (the timeline values are a whole different subject, so let's just accept that for the moment) down to 12.3. St. Louis played 139 games that year. 162-139 x 2/3 = 154, so his seasonal adjustment is 154/139 or 1.05, and indeed 12.3 X 1.05 = 12.9, which is what his WARP3 value is. Great.
But take Jim McCormick 1880. WARP1=8.5 (now, forget for the moment that Caruthers 1886 was at ERA+ 148 in 387 IP while McC in '80 was only at 127 but in 657 IP, again, that's another story). WARP2 is -16.4% or 7.1. Now, Cleveland played 85 games. By my calc, his seasonal adjustment should be 1.6 (up 2/3 of the way to 162, i.e. 136, or 136/85=1.6). But no, his adjustment is 12.125%, from 7.1 to 8.0. Why? It seems to me he should be at 11.4.
Not only is this the right adjustment but 11.4 seems right compared to other highly rated pitcher seasons. It would have him slightly behind Caruthers who not only had that 148 but also hit OPS+ 196. But if you add in 387 innings played in the OF, Caruthers still spent only a little more than 100 more innings on the field than McCormick did. So a slight advantage for Caruthers (after the difficulty adjustment) seems right. A 12.9 to 8 advantage does not seem right (but of course, neither does a 17.3 to 8.5 advantage before the difficulty adjustment).
Anyway, back to my main question, aside from all the others. Why is McCormick's seasonal adjustment not 1.6? Thanks anybody.
The adjustment is not 2/3 of the way, but raised to the 2/3 power.
Adj Factor = (162/team games)**(2/3)
The ** seems to have confused some people. That is a ^ not a *.
If you scan the discussion threads in the years before Spalding's election in 1906 and Galvin's in 1910, there are some discussions about WARP (in an earlier version), and our speculations on its calculations.
That is ridiculous, Andrew. I don't think anyone here with a straight face can defend it, either.
The problem in the assessment appears to lie in the adjustments that turn IP into XIP. There are two. One adjusts innings for decisions (some sort of a "leveraged" inning concept, where these are inferred from decisions and saves.) The other adjusts innings for Defense-Independent events (BB, HBP, K, HR), of which Browning had more than a usual share.
While the impact of these adjustments may be fine with a pitcher with a more "typical" distribution of decisions and DIP events, they apparently "blow up" in Browning's case, giving him 4.3 extra innings at his horrible rate, yielding the nonsense results. 37 Pitching Runs below Replacement is ridiculous; 6 or 7 tops would be more like it, Pete taking the majority of the blame for the one loss he pitched in.
It looks like these adjustments need a sanity check for extreme cases; it doesn't mean they "don't work" at all. Win Shares can't handle bad teams; that doesn't mean that whole system is flawed beyond redemption (though some make it out that way).
I noticed that that happens for Burkett as well. My solution was to only count PRAR for predominantly position players if it aided their cause, for example GVH or Bobby Wallace. The results seem palatable.
MP
Oh, oh.
Make that a double "oh, oh" from me. :-)
Except that ERA+ is not weighted, while WARP is. It seems the weighting mechanism needs to be revised for WARP.
Pete Browning's pitching is completely unrelated to his HoM case.
I agree with you in spirit, but I don't understand why WARP goes a little haywire with small sample sizes.
TPR, whatever problems it may have, never had a problem like this that I can remember.
756 Ruth (1/1)
I think most of us would agree that 1900 NL was the strongest league we've seen and will see for at least a while.
The regulars from 1900 played as follows (as regulars) in subsequent years:
1901: NL 61-32 AL
Wagner drops due to WARP's assessment of NL weakness 1902-1910.
From page 33 of Win Shares:
"a rule which says that the Defensive Win Shares of a team cannot be less than .16375 per game played, nor more than .32375 per game played."
Quick response to Casey re WS and 19th-century pitching value:
The evidence you've presented, showing WS calculations of team's defensive WS through history is not relevant to the problem. It _assumes_ that WS is representing defensive value correctly, and then uses the evidence that WS provides to prove the validity of WS. Circular logic is no good in this case.
To see the problem with WS (and it's not a fatal problem by any means -- I prefer WS to any other comprehensive metric, but I _adjust_ it where it's demonstrably inaccurate), you have to look not at the system's results, but at the way it reaches those results. We had a discussion of this matter on the 1920 ballot discussion thread, so I'm not going to re-hash the whole substance of that argument here. If you want to read it, check out posts 83, 92, 121, and 123-125. Here's a quick summary:
WS divides credit between pitchers and fielders based on a formula that assigns value to fielding events based on the extent to which teams deviate from the league average for that event. It sets the value of average by means of a constant. The constants in the formula are the same throughout history, regardless of the relative importance of the defensive event to defensive value. To figure runs created, James uses 19 (or something like that) specifically tailored formulas to reflect as accurately as possible the changes in offensive conditions that alter the relative value of hits, walks, extra base hits, as well as in the data available. Yet for figuring defensive value for runs prevented, he has only one formula. Is it any surprise that it becomes significantly inaccurate under different conditions? Its results on the team level look consistent because it normalizes all its results to the ratio of pitching value to fielding value typical of the post-1930 game.
He does address the historical differences, but rather than altering the constants in the formula to let the historical differences out, he introduces tweaks to the system to suppress the historical differences where they arise even with the normalized formula (altering the value of a passed ball when passed balls become very common, and as jimd noted from a brilliant reading the WS fine print, putting an absolute ceiling on the percentage of a team's win shares that can go to fielding. See post #31 above in this thread.)
If you want to debate this matter further, I'd be happy to, but let us move the discussion over to the WARP3 thread, which was set up for discussion of the reliability of comprehensive metrics. I'll put a copy of this post there, so if you, or anyone else who wants to get into this discussion, want to reply, please do so there!
When RedSox and Yankee fans used to debate Williams vs Dimaggio, they'd always speculate how each would do in the other's park. These are also "adjustments", just less "precise".
When I violently disagree with one of these mathematical opinions, I try to figure out why, so I can take that into account when using them to evaluate other players.
I hate having to be the BP defender around here, but I guess I will again. Do you know why all the numbers were adjusted? Because they assumed that anyone buying there book was also buying other books and therefore had access to the real numbers. This is nearly a 10 year old argument (moot now, since they publish the real numbers), but Ive never understood why anyone would be "dismayed" by it.
I'd rather watch and admire great players than argue whether his "rating" should be raised or lowered by .068 percent because of conditions which he had no control over.
Thats all fine and dandy until you start trying to put together a top 15 list. Then the adjustments suddenly become important. BP style adjustments are completely unnecessary for being a fan of the game.
You must be Registered and Logged In to post comments.
<< Back to main