## Tuesday, December 02, 2003

#### WARP3’s systematic flaws with 19th Century players

I’m going to do an overhaul on pennants added this week (hopefully, if you want to volunteer for some data entry, let me know!), including revising for the WARP 2003 numbers (I’ll leave the old ones up, for reference), and adjusting for the overrating of fielding by setting the replacement level too low.

One thing I noticed, that troubles me from the Prospectus glossary:

“FRAR
Fielding Runs Above Replacement. The difference between an average player and a replacement player is determined by the number of plays that position is called on to make. That makes the value at each position variable over time. In the all-time adjustments, an average catcher is set to 39 runs above replacement per 162 games, first base to 12, second to 34, third to 26, short to 38, center field to 30, left and right to 20.” (emphasis mine)

This means that WARP3 systematically overrates 2B and underrates 3B from this time period. It also underrates 1B, who were at least as valuable as LF and RF defensively probably a little more valuable.

If I had the time, I could adjust this defensive spectrum, and re-rate the fielding component for each player, but that ain’t happening any time soon.

What I’d suggest is this: Using the same total of runs 39-38-34-30-26-20-20-12, I’d portion them this way, pre 1930:

39 C, 38 SS, 34 3B, 26 2B, 25 CF, 21 1B, 19 LF, 17 RF.

You could tweak that to your own specifications, but I think I’m being pretty conservative, you could easily make the case that 1B was equal in value to CF, though I won’t.

Doing that, you get an adjustment (using a 9.7 R/W converter) of:

3B: 34-26 = 8; 8/9.7 = +.82 W/season
2B: 26-34 = -8; -8/9.7 = -.82 W/season
CF: 25-30 = -5; -4/9.7 = -.52 W/season
1B: 21-12 = 9; 9/9.7 = +.93 W/season
LF: 19-20 = -1; 1/9.7 = -.10 W/season
RF: 17-20 = -3; -3/9.7 = -.31 W/season

I think that if you do this, you’re going to come out with rankings much more in line with the generally accepted historical rankings, especially with players like Jimmy Collins and Bid McPhee.

I’m not saying that WARP should be eliminated from the tool box when discussing 19th Century players. Just that it needs to be tweaked somewhat.

The mistake that Prospectus is making is that they think (implied by their formulas) that if Jimmy Collins were playing in 1990 he’d be a 3B. But most likely he’dve played 2B, because that’s where the more skilled fielder played after 1930. That’s the only way to explain how 3B outhit 2B after 1945, and 2B outhit 3B before 1920. From 1920-42 they were about even (who cares about 1943-45 :-)

1. jimd Posted: December 02, 2003 at 06:52 PM (#519651)
I think a case can be made that early Catchers are also underrated by WARP.

-------- SO's C-SO Infd Outf -- SO's C-SO Infd Outf
2. Marc Posted: December 02, 2003 at 08:34 PM (#519653)
TomH, just put Dahlen and Davis (or Davis and Dahlen) at the top and you know that can't be baaaad.
3. tangotiger Posted: December 03, 2003 at 04:51 PM (#519655)
Joe, interesting discussion. I imagine you guys have discussed the positional impact of the 1B, which I suppose would be similar to a weak league that needs a good 1B to scoop balls from poor fielders.

Can you point me to a thread where you had/have this discussion?

I also agree with Joe's viewpoint that it's rather silly to apply the same fielding standards to each position across time. In weak leagues, whether it's the start of baseball or today in high school, the 3B is a much better fielder than a 2B.

Alot of that, in HS, is because of the much greater # of RH hitters. Taking a quick peek at the Lahman DB, and it looks like the # of RH hitters was over 70% before the turn of the century, and then a steady drop. From 1999-2002, I can tell you it's at 58%.

This might also affect the way you think of pitchers. A LP has a tremendous advantage in the old days, compared to say RJ today, where a manager can stack the lineup so that he faces 50% LH.

4. OCF Posted: December 03, 2003 at 05:58 PM (#519656)
This might also affect the way you think of pitchers. A LP has a tremendous advantage in the old days, compared to say RJ today, where a manager can stack the lineup so that he faces 50% LH.

Tango - isn't this backwards? Shouldn't a LHP be better off today than he would have been facing all those RH hitters in the 19th century?
5. tangotiger Posted: December 03, 2003 at 06:35 PM (#519657)
My head was definitely not screwed on straight when I wrote that. I'll take a mulligan.

You are correct. A RP is disadvantaged today, because of the enormous # of LH he'd face. (I'm reading it for 2 minutes now, just to make sure I didn't mess up again.)
6. Chris Cobb Posted: December 03, 2003 at 06:38 PM (#519658)
Joe, interesting discussion. I imagine you guys have discussed the positional impact of the 1B, which I suppose would be similar to a weak league that needs a good 1B to scoop balls from poor fielders.

Can you point me to a thread where you had/have this discussion?

The discussion of positional impact of 1B, such as it is, appears, I think, in the 1913 ballot discussion thread, since that's the year Beckley became eligible. But your inference, Tango, that the positional impact arises from the need for someone to scoop balls from poor fielders, is not one that we had discussed. The main arguments for greater 1B defensive value have been that first basemen in this era needed greater range and a better throwing arm because of all the bunts. I There's some discussion of that issue in the 1913 thread.
7. MattB Posted: December 03, 2003 at 08:28 PM (#519659)
Right. The discussion also has been scattered around discussions of Joe Start, who made the HoM despite not being an offensive powerhouse. Part of that was pre-NA experience, but some of it was in comparing him to other first basemen of the time. If the 1Bs aren't leading their leagues in offense, then it is extrinsic evidence that their defensive roles must have been greater.
8. Paul Wendt Posted: December 04, 2003 at 05:07 AM (#519661)
Tangotiger:
9. Paul Wendt Posted: December 04, 2003 at 05:12 AM (#519662)
jimd:
10. Marc Posted: December 04, 2003 at 03:02 PM (#519664)
Joe, if I want to use these adjustments:

> 3B: 34-26 = 8; 8/9.7 = +.82 W/season
11. Paul Wendt Posted: December 13, 2003 at 12:58 AM (#519666)
Marc and I suspect that WARP2 stumbles over one or two of these three 1900s outfielders. Can anyone explain the markedly different adjustments?

WARP1; 2; 3; Name (net adjustment, WARP1 to WARP3)
12. Paul Wendt Posted: December 13, 2003 at 01:19 AM (#519667)
Marc and I suspect that WARP2 stumbles over one or more of these three 1900s outfielders. Can anyone explain the markedly different adjustments?

WARP1; 2; 3; Name (net adjustment, WARP1 to WARP3)
13. ronw Posted: December 13, 2003 at 02:37 AM (#519668)
More individual numbers for the three. How is there a different league adjustment number for the same league?

Flick NL 1898 W1-9.7 W3-8.9 - 8% down
14. Chris Cobb Posted: December 13, 2003 at 03:03 AM (#519669)
Paul and Marc,

I think you're looking for WARP's love in the wrong places. If you look at the WARP2 adjustments for BRAR and FRAR, you'll see that the big losses for Magee and Sheckard are in fielding runs. They are left fielders, who in that period were a lot closer to CF in defensive value than they were to RF. When WARP2 normalizes for all time, LF defense is prorated to equal RF defense in value. So Sheckard and Magee, the leftfielders, drop off a lot in WARP2, while Flick and Keeler, who were rightfielders, lose less.

It's basically the same adjustment principle that leads to early third-basemen being underrated in WARP2 & 3.

Here's their career FRAR adjustment in WARP2

Keeler 323 --> 220
15. Marc Posted: December 13, 2003 at 11:35 PM (#519670)
Chris, let me see if I understand. RF of the period in question already have low defensive values in WARP1, so they are not adjusted downward very much for all-time. LF and CF have relatively more value in the period in question, and so when adjusted for all time have a lot of value to lose, and indeed do lose it. Is that right?

But why? Is it because replacement level is assumed to be artificially low in their time? And they are being compared to an average replacement level over time? Is that also right? Does this make sense? It's sort of like saying that we'll pretend they didn't do what they did, (we'll pretend they didn't really contribute what they did contribute toward their team's success) because other players didn't have the same opportunity to do the same things. Would that be one way to characterize the logic here?

Or to put it another way, this seems to assume that--a pennant is NOT a pennant.
16. Paul Wendt Posted: December 13, 2003 at 11:55 PM (#519671)
Regarding Flick v Sheckard, Magee, and Bobby Veach:
17. Paul Wendt Posted: December 14, 2003 at 12:07 AM (#519672)
18. Chris Cobb Posted: December 14, 2003 at 02:28 AM (#519673)
Let me try to clarify what I have tried to point out by quoting the passage from the WARP glossary that Joe quoted when starting this thread:

"FRAR
19. Marc Posted: December 14, 2003 at 06:00 PM (#519675)
Chris and/or Joe, would it be reasonable in light of what you have so patiently tried to explain in this thread to simply use an adjWARP1? The topic of this thread, after all, is WARP3's systematic problems with fielding (a fact which I had sort of forgotten as we got further down the thread).

I don't use WARP3 for the simple reason that I do not timeline. A pennant is a pennant and value toward that pennant is value. I really don't care what Pete Browning could/should/would have accomplished if he had been plopped down in 1910, 1949, 1994 or any other time. I DO use an AA discount on adjWARP1. And of course, a person could apply whatever timeline they might want to apply to adjust WARP1, if they wanted to do so.

I do understand the problem of comparing Joe Kelley and Charlie Keller using adjWARP1. Is it Keller's fault that he did not have the opportunity to contribute to the extent that Kelley did? No. But to give Keller credit for what he would have done if he had had the opportunity does not seem like a solution. If Keller were as good a fielder as Joe Kelley, maybe he would have played CF or RF and then, perhaps, he would have had the opportunity. Both played in a 27-out context and their managers used each in a team context to the best of their ability to contribute to those 27 outs. In that sense, they had the same opportunity (of course, there's the little matter of pitcher K's).

But I don't understand the reasoning, ultimately, for pretending that either did more or less than what they did. That, it seems to me, is to reduce the question from a "value" question to a "tools" question. What did Kelley and Keller have the "tools" to do in an average environment? That seems to be what the question becomes with the WARP3 adjustment.

But anyway, back to my basic question, is using an adjWARP, with whatever adjustments you personally happen to favor, a "solution" to the adjustments for all-time problems?
20. Chris Cobb Posted: December 14, 2003 at 07:35 PM (#519676)
Is using an adjWARP1, with whatever adjustments you personally happen to favor, a "solution" to the adjustments for all-time problems?

Yes, I'd say it is a reasonable solution, if you think that adjustments to an average environment reduces a value question to a tools question. Myself, I think it does, so insofar as I use WARP, I place more weight on WARP1.

There remain level-of-competition adjustments for which WARP2-3 might provide valuable guidance, but I haven't yet begun to really study how the system makes those kinds of adjustment. I hope to learn, though!
21. Marc Posted: December 15, 2003 at 03:37 PM (#519679)
To me it's a lot easier to start with WARP1 and add in a "timeline" (similar to same for all contemporaneous position players, same for all contemporaneous pitchers) than to use WARP3 and add in separate adjustments for each position. (My "timeline" basically consists of the pitching discount and fielding bonus pre-'93. Not a real timeline.) But anyway the point is WARP1 is a lot easier jumping off point.
22. Paul Wendt Posted: December 16, 2003 at 12:22 AM (#519680)
Jim Spencer (#27)
23. Paul Wendt Posted: December 16, 2003 at 12:40 AM (#519681)
Suppose any particular measure of outfielding value; eg, FRAR, the adjusted version used in WARP2, or the adjusted version JoeDimino proposes for the first five MLB decades.

In principle, one can tabulate measured LF-fielding and RF-fielding for every MLB team-seasons --the collective value, skill, etc, depending on interpretation of the measure. In turn, that table can be the object of analysis. The three example tables for 1890-1909 can jointly be used to study the Davenport adjustment and the Dimino adjustment applied to that time period.

TomH, I like your article in the current "By The Numbers". Can you do this article next?
24. Paul Wendt Posted: December 16, 2003 at 12:47 AM (#519682)
Chris Cobb (#23)
25. Chris Cobb Posted: December 16, 2003 at 02:31 AM (#519683)
It might be considered desirable to use a single all-time formula for WARP1. Compare Batting Runs: Pete Palmer's linear weights applied to batting, in which a home run or stolen base has the same coefficient (value) in every MLB team-season. A segmented formula, such as Bill James uses everywhere, is undesirable in some ways.

Thanks, Paul, for explaining a reasonable alternative approach. I am not as familiar with linear weights and the rest of Pete Palmer's work as I should be. My thinking about sabermetrics has been largely shaped by Jamesian approaches, and sometimes my assumptions are limited by that. It's possible, I suppose, that WARP1 takes a linear weights approach, in which case the period adjustments would have to be built into WARP2, though their explanation of their adjustments does not imply that they are using a linear weights approach, at least not for fielding.
26. Paul Wendt Posted: December 16, 2003 at 05:45 PM (#519684)
Chris,
27. Chris Cobb Posted: December 16, 2003 at 06:13 PM (#519685)
Paul,

Understood. I phrased my last post poorly, using "linear weights" as the name of the approach. I should have written, "It's possible, I suppose, that WARP1 takes a Pete-Palmer-style approach and uses a single, all-time formula."
28. Chris Cobb Posted: December 16, 2003 at 08:04 PM (#519686)
Paul,

Understood. I phrased my last post poorly, using "linear weights" as the name of the approach. I should have written, "It's possible, I suppose, that WARP1 takes a Pete-Palmer-style approach and uses a single, all-time formula."
29. Chris Cobb Posted: December 16, 2003 at 08:34 PM (#519687)
Paul,

Understood. I phrased my last post poorly, using "linear weights" as the name of the approach. I should have written, "It's possible, I suppose, that WARP1 takes a Pete-Palmer-style approach and uses a single, all-time formula."
30. Marc Posted: January 04, 2004 at 04:44 AM (#519688)
Another question about the new WARP. I have not adopted it yet because of the various questions. Newest one.

The seasonal adjustment is supposed to be 2/3 of the way from X to 162, right? If Joe Blow's team played 100 games, a full seasonal adjustment is 1.62, but 2/3 is to 141 or 1.41, right?

And that adjustment should be from WARP2 to WARP3, right?

Well, Bob Caruthers 1886 works out. WARP1=17.3. WARP2 (adjusted for difficulty) is -28.9% adjustment (the timeline values are a whole different subject, so let's just accept that for the moment) down to 12.3. St. Louis played 139 games that year. 162-139 x 2/3 = 154, so his seasonal adjustment is 154/139 or 1.05, and indeed 12.3 X 1.05 = 12.9, which is what his WARP3 value is. Great.

But take Jim McCormick 1880. WARP1=8.5 (now, forget for the moment that Caruthers 1886 was at ERA+ 148 in 387 IP while McC in '80 was only at 127 but in 657 IP, again, that's another story). WARP2 is -16.4% or 7.1. Now, Cleveland played 85 games. By my calc, his seasonal adjustment should be 1.6 (up 2/3 of the way to 162, i.e. 136, or 136/85=1.6). But no, his adjustment is 12.125%, from 7.1 to 8.0. Why? It seems to me he should be at 11.4.

Not only is this the right adjustment but 11.4 seems right compared to other highly rated pitcher seasons. It would have him slightly behind Caruthers who not only had that 148 but also hit OPS+ 196. But if you add in 387 innings played in the OF, Caruthers still spent only a little more than 100 more innings on the field than McCormick did. So a slight advantage for Caruthers (after the difficulty adjustment) seems right. A 12.9 to 8 advantage does not seem right (but of course, neither does a 17.3 to 8.5 advantage before the difficulty adjustment).

Anyway, back to my main question, aside from all the others. Why is McCormick's seasonal adjustment not 1.6? Thanks anybody.
31. Marc Posted: January 04, 2004 at 04:52 AM (#519689)
Another question about the new WARP. I have not adopted it yet because of the various questions. Newest one.

The seasonal adjustment is supposed to be 2/3 of the way from X to 162, right? If Joe Blow's team played 100 games, a full seasonal adjustment is 1.62, but 2/3 is to 141 or 1.41, right?

And that adjustment should be from WARP2 to WARP3, right?

Well, Bob Caruthers 1886 works out. WARP1=17.3. WARP2 (adjusted for difficulty) is -28.9% adjustment (the timeline values are a whole different subject, so let's just accept that for the moment) down to 12.3. St. Louis played 139 games that year. 162-139 x 2/3 = 154, so his seasonal adjustment is 154/139 or 1.05, and indeed 12.3 X 1.05 = 12.9, which is what his WARP3 value is. Great.

But take Jim McCormick 1880. WARP1=8.5 (now, forget for the moment that Caruthers 1886 was at ERA+ 148 in 387 IP while McC in '80 was only at 127 but in 657 IP, again, that's another story). WARP2 is -16.4% or 7.1. Now, Cleveland played 85 games. By my calc, his seasonal adjustment should be 1.6 (up 2/3 of the way to 162, i.e. 136, or 136/85=1.6). But no, his adjustment is 12.125%, from 7.1 to 8.0. Why? It seems to me he should be at 11.4.

Not only is this the right adjustment but 11.4 seems right compared to other highly rated pitcher seasons. It would have him slightly behind Caruthers who not only had that 148 but also hit OPS+ 196. But if you add in 387 innings played in the OF, Caruthers still spent only a little more than 100 more innings on the field than McCormick did. So a slight advantage for Caruthers (after the difficulty adjustment) seems right. A 12.9 to 8 advantage does not seem right (but of course, neither does a 17.3 to 8.5 advantage before the difficulty adjustment).

Anyway, back to my main question, aside from all the others. Why is McCormick's seasonal adjustment not 1.6? Thanks anybody.
32. RobC Posted: January 04, 2004 at 06:50 PM (#519690)
Marc,

The adjustment is not 2/3 of the way, but raised to the 2/3 power.

The ** seems to have confused some people. That is a ^ not a *.
33. Paul Wendt Posted: January 13, 2004 at 12:05 AM (#519691)
There may be no "flaw" here.
34. Paul Wendt Posted: January 13, 2004 at 12:39 AM (#519692)
old and new WARP3:
35. Paul Wendt Posted: January 13, 2004 at 03:52 PM (#519693)
I asked Clay Davenport about the revision, espy the impact of including pitcher fielding in WARP (large) and the big gains for Walsh and White in particular. He noted that there is a bug in pitcher ratings to be corrected as soon as this weekend. Pete Browning will gain. I suppose that George Van Haltren will lose.
36. jimd Posted: January 13, 2004 at 06:28 PM (#519699)
I'd like to echo Joe's sentiments. ;-)
37. jimd Posted: January 30, 2004 at 07:12 PM (#519703)
For WARP numbers, go to www.baseballprospectus.com In the upper right corner, you'll see a box labeled "Player Search". Type in your favorite player, click "Find Player", and then read numbers to your heart's content. Clicking through the column headings takes you to a document giving way-too-brief descriptions of the stats. Have fun.
38. jimd Posted: January 30, 2004 at 11:16 PM (#519705)
VORP and WARP are two different measuring systems. VORP is the older, and IIRC is an offensive measure which factors in defense only by giving an offensive debit/credit based on the position played and how well players at that position hit as a group. (Maybe there was a later VORP that expanded on defense, I don't remember.) To my knowledge there is no historical reference collection for VORP, like there is for WARP at BP and for Win Shares in James' book.

If you scan the discussion threads in the years before Spalding's election in 1906 and Galvin's in 1910, there are some discussions about WARP (in an earlier version), and our speculations on its calculations.
39. jimd Posted: January 30, 2004 at 11:23 PM (#519706)
Also, if you've got any of the annuals published by Baseball Prospectus the past few years, some of them contain articles on WARP or EQA (Davenport's offensive measure) or his fielding measures or the "league quality" adjustments, which are probably similar to Davenport Translations, his version of MLE. Some of this stuff may also be available at the baseballprospectus site if you hunt around.
40. John (You Can Call Me Grandma) Murphy Posted: February 07, 2004 at 10:34 PM (#519709)
In 1884, Pete Browning faced five batters as a pitcher. He got one out and allowed three runs, two earned. According to WARP, his pitching performance was 37 runs below replacement level. What kind of new math produces that kind of idiocy? Is it a typo or a systemic problem? And isn't that all the evidence we need to junk the whole system?

That is ridiculous, Andrew. I don't think anyone here with a straight face can defend it, either.
41. Devin has a deep burning passion for fuzzy socks Posted: February 09, 2004 at 03:30 AM (#519710)
Andrew - I think that was supposed to be the "adjustment" that was mentioned a few weeks back by Paul Wendt. I had e-mailed Davenport pointing it out, he said that it was caused by a flaw in the system, that was essentially looking at Browning like he was a modern relief pitcher, and that he would fix it shortly. Of course, it still hasn't happened, and it's been at least a month.
42. jimd Posted: February 11, 2004 at 06:35 PM (#519711)
Let's take a closer look at that Start by Browning. It was an execrable performance; he faced five batters and got none out (2 hits, 2 walks, 1 HBP). One baserunner was retired on some fielding play (hence the 1/3rd IP). Three runs scored, two of them earned, putting the team in a hole they didn't climb out of, Browning earning the loss.

The problem in the assessment appears to lie in the adjustments that turn IP into XIP. There are two. One adjusts innings for decisions (some sort of a "leveraged" inning concept, where these are inferred from decisions and saves.) The other adjusts innings for Defense-Independent events (BB, HBP, K, HR), of which Browning had more than a usual share.

While the impact of these adjustments may be fine with a pitcher with a more "typical" distribution of decisions and DIP events, they apparently "blow up" in Browning's case, giving him 4.3 extra innings at his horrible rate, yielding the nonsense results. 37 Pitching Runs below Replacement is ridiculous; 6 or 7 tops would be more like it, Pete taking the majority of the blame for the one loss he pitched in.

It looks like these adjustments need a sanity check for extreme cases; it doesn't mean they "don't work" at all. Win Shares can't handle bad teams; that doesn't mean that whole system is flawed beyond redemption (though some make it out that way).
43. Max Parkinson Posted: February 11, 2004 at 07:23 PM (#519712)
Jim,

I noticed that that happens for Burkett as well. My solution was to only count PRAR for predominantly position players if it aided their cause, for example GVH or Bobby Wallace. The results seem palatable.

MP
44. Marc Posted: February 11, 2004 at 08:07 PM (#519713)
>he said that it was caused by a flaw in the system, that
45. John (You Can Call Me Grandma) Murphy Posted: February 11, 2004 at 08:17 PM (#519714)
So as we approach the modern era in about 50 years, a relief pitcher who allows 3 runs in 1/3 of an inning will get a -37 PRARP?

Oh, oh.

Make that a double "oh, oh" from me. :-)
46. RobC Posted: February 12, 2004 at 04:05 PM (#519715)
era+ has the same problem with a 3 run/.3 inning season. An era+ of 5? Its a scientific computing problem, when dividing by real small numbers wonky things tend to happen. The proper response is to ignore results due to ludicrously small sample size.
47. John (You Can Call Me Grandma) Murphy Posted: February 12, 2004 at 05:15 PM (#519716)
era+ has the same problem with a 3 run/.3 inning season. An era+ of 5?

Except that ERA+ is not weighted, while WARP is. It seems the weighting mechanism needs to be revised for WARP.
48. RobC Posted: February 12, 2004 at 05:37 PM (#519717)
The "weighting mechanism" should be something like "if <10 ip for the season, dont bother weighting" or even better "users of this data should turn on their brain and ignore results caused by small sample size". I hope no one is using ANY number in a vacuum.

Pete Browning's pitching is completely unrelated to his HoM case.
49. John (You Can Call Me Grandma) Murphy Posted: February 12, 2004 at 06:18 PM (#519718)
The "weighting mechanism" should be something like "if <10 ip for the season, dont bother weighting" or even better "users of this data should turn on their brain and ignore results caused by small sample size". I hope no one is using ANY number in a vacuum.

I agree with you in spirit, but I don't understand why WARP goes a little haywire with small sample sizes.

TPR, whatever problems it may have, never had a problem like this that I can remember.
50. jimd Posted: February 18, 2004 at 12:43 AM (#519719)
Top 25 Win Shares (WS rank/WARP-3 rank)

756 Ruth (1/1)
51. jimd Posted: February 18, 2004 at 03:31 AM (#519720)
Pertaining to the AL vs NL quality debate during the early years (as manifested in WARP-3):

I think most of us would agree that 1900 NL was the strongest league we've seen and will see for at least a while.

The regulars from 1900 played as follows (as regulars) in subsequent years:

1901: NL 61-32 AL
52. jimd Posted: February 18, 2004 at 03:32 AM (#519721)
Well, except for 1901, of course.
53. Chris Cobb Posted: February 18, 2004 at 03:56 AM (#519722)
I couldn't say yet how these lists fit with my top 25 careers, but I do note that several of the divergences in ratings near the top occur where we might expect them. Going from WS to WARP --

Wagner drops due to WARP's assessment of NL weakness 1902-1910.
54. DanG Posted: February 18, 2004 at 04:17 AM (#519723)
jimd:
55. RobC Posted: February 18, 2004 at 03:23 PM (#519724)
Just eyeballing the list I would say I agree with Warp's top 15 and WS 16-25. At least according to the "how they rank inside my head" methodology.
56. jimd Posted: February 23, 2004 at 09:55 PM (#519725)
I originally posted this on the 1921 Discussion, but it is probably better discussed here, on the "integrated-numerical-rating-system" thread.

From page 33 of Win Shares:

"a rule which says that the Defensive Win Shares of a team cannot be less than .16375 per game played, nor more than .32375 per game played."
57. jimd Posted: February 23, 2004 at 10:09 PM (#519726)
Another example is that the Boston Beaneaters were capped at ".32375 per game played" every season from 1889 to 1901 (except for 1894).
58. Chris Cobb Posted: February 27, 2004 at 04:37 AM (#519727)
A copy of a post I made on the 1921 ballot discussion thread:

Quick response to Casey re WS and 19th-century pitching value:

The evidence you've presented, showing WS calculations of team's defensive WS through history is not relevant to the problem. It _assumes_ that WS is representing defensive value correctly, and then uses the evidence that WS provides to prove the validity of WS. Circular logic is no good in this case.

To see the problem with WS (and it's not a fatal problem by any means -- I prefer WS to any other comprehensive metric, but I _adjust_ it where it's demonstrably inaccurate), you have to look not at the system's results, but at the way it reaches those results. We had a discussion of this matter on the 1920 ballot discussion thread, so I'm not going to re-hash the whole substance of that argument here. If you want to read it, check out posts 83, 92, 121, and 123-125. Here's a quick summary:

WS divides credit between pitchers and fielders based on a formula that assigns value to fielding events based on the extent to which teams deviate from the league average for that event. It sets the value of average by means of a constant. The constants in the formula are the same throughout history, regardless of the relative importance of the defensive event to defensive value. To figure runs created, James uses 19 (or something like that) specifically tailored formulas to reflect as accurately as possible the changes in offensive conditions that alter the relative value of hits, walks, extra base hits, as well as in the data available. Yet for figuring defensive value for runs prevented, he has only one formula. Is it any surprise that it becomes significantly inaccurate under different conditions? Its results on the team level look consistent because it normalizes all its results to the ratio of pitching value to fielding value typical of the post-1930 game.

He does address the historical differences, but rather than altering the constants in the formula to let the historical differences out, he introduces tweaks to the system to suppress the historical differences where they arise even with the normalized formula (altering the value of a passed ball when passed balls become very common, and as jimd noted from a brilliant reading the WS fine print, putting an absolute ceiling on the percentage of a team's win shares that can go to fielding. See post #31 above in this thread.)

If you want to debate this matter further, I'd be happy to, but let us move the discussion over to the WARP3 thread, which was set up for discussion of the reliability of comprehensive metrics. I'll put a copy of this post there, so if you, or anyone else who wants to get into this discussion, want to reply, please do so there!
59. jimd Posted: February 27, 2004 at 06:16 PM (#519730)
Exactly. Always keep these ratings in perspective. They are just mathematical opinions.

When RedSox and Yankee fans used to debate Williams vs Dimaggio, they'd always speculate how each would do in the other's park. These are also "adjustments", just less "precise".

When I violently disagree with one of these mathematical opinions, I try to figure out why, so I can take that into account when using them to evaluate other players.
60. Marc Posted: February 27, 2004 at 06:53 PM (#519731)
Casey, yes, absolutely, I question WARPs "difficulty" adjustments. The deadball NL gets hammered. Not to mention the 19th century, at least pre-'93. I think the difficulty adjustments generally are too steep and should be reduced to their square roots.
61. RobC Posted: February 27, 2004 at 07:28 PM (#519732)
I bought a Baseball Prospectus a few years ago but was dismayed to see that every statistic in the book was adjusted. That's a good sidebar, but a dose of reality would have been nice. I haven't bought one since.

I hate having to be the BP defender around here, but I guess I will again. Do you know why all the numbers were adjusted? Because they assumed that anyone buying there book was also buying other books and therefore had access to the real numbers. This is nearly a 10 year old argument (moot now, since they publish the real numbers), but Ive never understood why anyone would be "dismayed" by it.

I'd rather watch and admire great players than argue whether his "rating" should be raised or lowered by .068 percent because of conditions which he had no control over.

Thats all fine and dandy until you start trying to put together a top 15 list. Then the adjustments suddenly become important. BP style adjustments are completely unnecessary for being a fan of the game.

