Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Hall of Merit > Discussion
Hall of Merit
— A Look at Baseball's All-Time Best

Monday, October 11, 2004

Battle of the Uber-Stat Systems (Win Shares vs. WARP)!

Don’t ever say that I never gave you anything! :-)

John (You Can Call Me Grandma) Murphy Posted: October 11, 2004 at 02:46 PM | 366 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 4 pages  1 2 3 4 > 
   1. PhillyBooster Posted: October 11, 2004 at 04:06 PM (#909829)
Let's see if I can start:

Win Shares:
-- Replacement Level is too how
WARP
-- Defensive replacement level is too low

Win Shares:
-- Give too much credit for "luck" by using actual wins
WARP:
-- Based upon components. Does not take into account what actually happened.

Win Shares:
-- Point allocations for defensive value seem arbitrary
WARP:
-- Point allocations for defensive value seem inscrutible

Win Shares:
-- Pitching/defense allocations are arbitrary
WARP:
-- Interleague adjustments are arbitrary

Win Shares:
-- Numbers are obviously flawed in 10-20% of cases, but hard to figure out which ones
WARP:
-- Numbers may be perfect or completely wrong, since the formula is unavailable.

WARP:
-- How can you trust a formula that keeps changing every year?
Win Shares:
-- How can you trust a formula that never changes, even after you point out its flaws?

Conclusion: When in doubt, vote for the guy who'll look cutest on his plaque.
   2. DavidFoss Posted: October 11, 2004 at 04:15 PM (#909846)
When in doubt, vote for the guy who'll look cutest on his plaque.

Super Grover!!!

:-)

OK... well, I think we are all in agreement that there are too many flaws and two many issues involved to use any one number for ranking players.

Any advice on how to use WS & WARP together or in conjunction with other numbers would be greatly appreciated.

Arguments such as WS overrates these types of players, yet it still rates player A above player B can be quite useful.
   3. John (You Can Call Me Grandma) Murphy Posted: October 11, 2004 at 04:19 PM (#909850)
Win Shares is more malleable than WARP, IMO. While it is certainly not perfect and has undeniable flaws, I can correct these problems with WS. WARP is not nearly as flexible.

I also hate with a passion negative numbers credited to players. Even Tony Suck should be credited with something like .000000000000000000000000000000000000000000000000000001 rather than a negative number. The idea that a player who played one game could theoretically have a higher WARP rating than another player with a long career is beyond silly.

Conclusion: When in doubt, vote for the guy who'll look cutest on his plaque.

Is there something we should know about you? :-D
   4. John (You Can Call Me Grandma) Murphy Posted: October 11, 2004 at 04:21 PM (#909852)
I should point out that I do utilize BRAR, FRAR and PRAR for the NA, so BP does have its place in my system.
   5. smileyy Posted: October 11, 2004 at 04:38 PM (#909877)
The idea that a player who played one game could theoretically have a higher WARP rating than another player with a long career is beyond silly.

Why, if that player with the long career performed at below replacement value his entire career?

I've worked with plenty of programmers whose contribution to the team has been a net negative.
   6. PhillyBooster Posted: October 11, 2004 at 05:57 PM (#910038)
Is there something we should know about you? :-D


Just a firm belief that aesthetics is an important component of any ranking system. Ideally, there would be some sort of melding of sabremetrics and "Hot or Not".
   7. mommy Posted: October 11, 2004 at 06:30 PM (#910110)
"Why, if that player with the long career performed at below replacement value his entire career?"

do you really think players have long careers of being below replacement level?

i'm sure some players hang around for a season or 3 of below whatever you would calculate replacement level as being.

but a long career? 10+ seasons of being worse than a readily available piece-of-#### replacement? i can't see it, i don't think any front office is really quite that stupid.

can you name someone who would fall in this category?
   8. Chris Cobb Posted: October 11, 2004 at 06:36 PM (#910122)
can you name someone who would fall in this category?

Well, the poster-child for negative batting runs among position players is Bill Bergen. I don't know of any player who was a regular for several years who earns negative WARP for his career.
   9. Jim Sp Posted: October 11, 2004 at 07:34 PM (#910234)
Can someone summarize what we really know about WARP?
   10. Slinger Francisco Barrios (Dr. Memory) Posted: October 11, 2004 at 07:42 PM (#910249)
do you really think players have long careers of being below replacement level?

As long as hitters have been assessed based on their BA, sure, it was at least theoretically possible. Or as long as a player was viewed as having a glove that balanced his offensive shortcomings, it was at least theoretically possible, particularly if the glove wasn't necessarily all that great.

As a practical example, how about Aurelio Rodriguez? I've never seen a rating system more sophisticated than FA that liked his glove, he made outs at an astounding rate, yet he played almost 2000 games at third.
   11. robc Posted: October 11, 2004 at 07:47 PM (#910259)
Jim,

Lets see...
We know how EQA is calculated. We know how RAA is calculated from EQA. Same for RAR, but probably not RARP.

Fielding Runs were explained in a BP a few years ago. Not sure they are using the EXACT formulas from it, but it should be pretty close. Its complicated and involves filling in and balancing a bunch of charts, but should be repeatable.

Pitching runs relate to fielding runs, they are calculated together from a semi-Voros like manner, but not entirely. See same BP that I dont remember which year.

I think we have no friggin clue on the adjustment from WARP1 to WARP2. WARP2 to WARP3 is a straight forward (with weird formula) length of season adjustment.
   12. Mark Shirk (jsch) Posted: October 11, 2004 at 08:18 PM (#910322)
I am pretty sure it was BP 2002, it was the first one I bought. I remember them crying foul that Bill James had said he came up with divining individual defensive perfomrance from team defensive performance, when they had already done it. I wouldnt' be surprsied if the two systems are very similar, FRAA and Defensive WS I mean.

Also, in the daily EQA reports, there is a chart for Totals by Position. In this cahrt it has the number of PA's, outs, etc. Along with the Average and replacement level EQA for each position, or at least average I am not sure. I would presume that RARP (and I guesss BRAR) is largely divined form this chart, at least for WARP1.

It is odd, you can find most everything that goes into WARP somewhere on the BP site but it si so hard to find it all and put it all together that WARP remains a mystery. Don't know if this is our fault for being lazy and not doing any work, BP's fault for not putting it in one place, or the fault of BP being a pay site that is contracted by a few clubs and doesn't want to give away its secrets.

By the way I try and balance WARP1 and WS equally. WARP3's timeline adjustment seems a little too harsh. When we ahead in years I will begin to use WARP# more as it will be tough to compare, say George Van Haltren and Vada Pinson by using only WARP1.
   13. KJOK Posted: October 11, 2004 at 11:01 PM (#910634)
The idea that a player who played one game could theoretically have a higher WARP rating than another player with a long career is beyond silly.

Depends on what you're trying to measure. Since we're trying to determine "HOM Worthiness" then the ideal system would give negative numbers to ANY player who is not "HOM Worthy", give close to zero scores to boarderline players, and give positive numbers to those who definitely belong.
   14. Dr. Chaleeko Posted: October 12, 2004 at 12:27 AM (#910773)
To add still another layer of complexity...

Clay Davenport has also implied in the past that WARP numbers change periodically because they are calibrated against historical norms at each position which change with each passing season. This might just be for fielding or for positional adjustments for hitters--I'm not sure--but it's a moving target.

Also, for what it's worth, because I couldn't locate any documentation of it, at some point I tried to reverse engineer a few examples from each position to get a very broad sense of WARP2's positional adjustments, trying to figure out what EQAs they were using as replacement at each position. If I remember correctly, they varied from around .220 (catchers) up to around .250 (1B), with most, of course, centered around .230. But I think that's just for WARP2, becuase I think WARP1 is calculated based on replacment level of the particular season in question. So the more generic benchmarks I was trying to figure were, I guess, designed to put players into an all-time historical context.
   15. Paul Wendt Posted: October 14, 2004 at 07:08 PM (#916524)
WARP:
-- Interleague adjustments are arbitrary


Does "arbitrary" mean unpublished?
Re Bill James, it usually means underived or unfounded.

Win Shares:
-- Numbers are obviously flawed in 10-20% of cases, but hard to figure out which ones


How then is it obvious?
   16. PhillyBooster Posted: October 14, 2004 at 09:04 PM (#916750)
WARP:
-- Interleague adjustments are arbitrary

Does "arbitrary" mean unpublished?
Re Bill James, it usually means underived or unfounded.


I can only imagine one way to compare leagues: looking at players who switched from one to the other.

Since very few players switched leagues in any given year, it is impossible to judge league quality to any real level of precision, beyond a vague 'the AL was likely better.'

I can only assume that the WARP adjustments are based upon the few numbers they have. But if those numbers aren't statistically significant or representative, then they just aren't useful, no matter how many digits you calculate them to.

If Player A and Player B both go from NL to AL, and their production drops 10%, you could conclude that the AL was 10% better, or that the changes were due to random chance. The WARP adjustments are large enough that it is clear (to me, at least) that insufficient weight is given to the "chance" hypothesis.

Win Shares:
-- Numbers are obviously flawed in 10-20% of cases, but hard to figure out which ones

How then is it obvious?


The flaws in WS (failure to apply an appropriate replacement level, failure to adequately divvy up batting win shares to account for pitchers batting, failure to rationalize division between pitching and defensive WS) are, by now, obvious. Figuring out how to adjust for them without recreating WS from scratch is not.
   17. jimd Posted: October 14, 2004 at 09:35 PM (#916797)
I can only imagine one way to compare leagues: looking at players who switched from one to the other.

Actually, there is another way which gets around the small sample size problem of only looking at players which switch leagues.

Measure the change in quality of a league each season. That measurement involves all players who played in both seasons; you'll never get a larger sample size. String those measurements together and one gets a graph of the league evolution, a separate picture for each league. Using all of the interleague transfers from 1901 on, the two pictures can then be calibrated relative to each other.

This yields a much more stable overall picture than trying to use a small number of samples from periods when there are very few league interactions to measure the difference during those periods.
   18. PhillyBooster Posted: October 15, 2004 at 02:24 PM (#917529)
This certainly gives you a larger sample, but I'm not sure if it gives you a better answer. Small errors in expected aging patterns and other assumptions (e.g., did the hitting get better or the pitching get worse?) compound year to year. Using the interleague numbers to "calibrate" is only effective to the extend that the interleague numbers were valid to begin with.
   19. Paul Wendt Posted: October 15, 2004 at 05:06 PM (#917828)
1937 Ballot Discussion #75
Posted by Chris Cobb (#917022)

jimd, I didn't do an extensive study; I just compared Roush's WARP1 to his WARP2 and Duffy's WARP1 to his WARP2 (didn't mess with WARP3 at all). The reduction from W2 to W1 for Duffy about .78; the ratio for Roush is typically between .70 and .75.

Maybe I'm making too much of this small bit of data, but it's my impression that it's representative of what I've seen when looking at other comparisons between 1890s stars and NL stars of the teens and twenties.


Does anyone have an e-database with WARP1 and WARP2, some edition, for a large number of seasons
played by HOM candidates?
Probably not, or someone would have studied the matter systematically.
   20. jglassman Posted: October 15, 2004 at 07:30 PM (#918130)
A negative WARP is a not a negative value of in terms of contribution. It's just below a replacement level, sign of incompetant GMing. If I had a long carreer in MLB, wouldn't I perform worse than somebody who could easily be found to replace me?
   21. Paul Wendt Posted: October 19, 2004 at 04:40 PM (#925892)
1937 Ballot Discussion #71-76. I don't agree with all of this; indeed, I still have no informed opinion on some points. It is worth moving here.

#71 Posted by jimd (#916762)
> there's pretty clear evidence that fielding and pitching
> improved substantially from the late 1890s through the aughts.

WARP does not contradict this. It considers the overall level of competition in the teens and twenties as higher than in the nineties. (Or at least it did when I did my study of this a couple of WARP revisions ago.) The overall MLB average had improved.

There are two factors involved in the WARP-1 to WARP-3 conversions. One is the "quality of competition", the other is the schedule length. 1890's players will benefit from the latter because they played 132 G most years, not 154 G. Is this getting mixed in with the "quality of competition" comparisons?

#73 Posted by PhillyBooster (#916773)

Looking at the team record for the 36-117 Philadelphia A's, our example replacement level team, over on BP, I notice that this team, as a team, earned 51.5 WARP1. What could replacement level possibly mean in this case?

This is just a guess, but I would assume that WARP is based upon individual component stats, divorced from actual runs or wins.

The A's underperformed their pythagorean W/L by five wins, and we would all understand (if not agree with) a system that used 41 wins as the baseline, rather than 37.

But beyond that, the A's actually scored very few runs based on their hits and walks. As just a rough approximation, I looked at (TB+BB)/R. The A's scored 1 run for every 4.42 TB+BB. The 7th place Senators scored 1 run every 3.91. The first place teams was in the 3.50s. I would guess that the Phila A's "scattered" their hits a lot in 1916, so that not only did they underperform Pythagorus, but they underperformed the number of "runs scored" that would have gone into the formula by a pretty large amount.

#74 Posted by jimd (#916819)

It seems to me that the new WARP is misnamed; it no longer is Wins Above Replacement, but a kind of WARP-Shares. The team numbers appear to add up to approximately the number of wins in the league. However, as PhillyBooster points out, the players appear to share, not their actual number of wins, but the number of wins they should have had (based on their statistics). The shares are apparently handed out based on total runs-above-replacement.

Personally, I think I'd prefer to have a metric that divvied up the actual wins, but based on the runs-above-replacement measurements (once they get the fielding replacment level adjusted; it's better than it was, but still a little too high).

#75 Posted by Chris Cobb (#917022)
> It considers the overall level of competition in the
> teens and twenties as higher than in the nineties.
> (Or at least it did when I did my study of this
> a couple of WARP revisions ago.)
> The overall MLB average had improved.

jimd, I didn't do an extensive study; I just compared Roush's WARP1 to his WARP2 and Duffy's WARP1 to his WARP2 (didn't mess with WARP3 at all). The reduction from W2 to W1 for Duffy about .78; the ratio for Roush is typically between .70 and .75.

Maybe I'm making too much of this small bit of data, but it's my impression that it's representative of what I've seen when looking at other comparisons between 1890s stars and NL stars of the teens and twenties.

#76 Posted by Chris Cobb (#917038)
> It seems to me that the new WARP is misnamed;
> it no longer is Wins Above Replacement, but a
> kind of WARP-Shares.
> The team numbers appear to add up to approximately
> the number of wins in the league.

This makes sense, and you're right, the name is now misleading. I assume that wins above replacement means wins above the number that a replacement player would earn.

I nevertheless can't quite wrap my mind around the idea that a team made up entirely of replacement level players would have 0 WARP, but some small number of wins (let's imagine it's 20 out of 162, to go outrageously low), so some team with 65 wins would be 45 games better than the replacement level team, but have 65 WARP. I guess the zero point in win shares creates the same problem, but it's not that important because the zero point is low enough that the difference between the zero point and zero wins really doesn't matter much most of the time . . .
   22. Paul Wendt Posted: November 07, 2004 at 05:57 PM (#956922)
WARP3 has been revised again since mid-year. I presume this is a revision WARP2 if not WARP1. I didn't date the two revisions that we have previously noted, roughly at new year and mid-year 2004. For now, let me call the four known editions 2003, Winter, Summer, Fall.

Beginning early 2003, Dan Greenia provides seven data fields including WARP3 for leading players who become HOM-eligible each year, or eligibility cohorts. Those candidate lists appear many years in advance (thru 1950 at the moment) and Dan revises them, for future years only, when he learns that the measure has been revised. From that ongoing source, I believe that I have
: WARP3 2003 for cohorts 1904-1920
: WARP3 Winter for cohorts 1918-1942
: WARP3 Summer for cohorts 1932-1950
(A few months ago, Dan told me that he had compiled version 2003 thru 1932, but I did not find 1921-1932 in the HOM archive.)

Yesterday, I set out to compile the WARP3 for the 1904-1931 cohorts only, for I thought the Summer edition was current. But I discovered the revision and compiled the newly christened Fall edition for 1904-1950. My data is in one MS Access table that I will try to make available in csv format promptly.
   23. Paul Wendt Posted: November 07, 2004 at 07:27 PM (#956961)
From the "New Eligibles" candidate ists by Dan Greenia that ongoing source, I believe that I have
: WARP3 2003 for cohorts 1904-1920
: WARP3 Winter for cohorts 1918-1942
: WARP3 Summer for cohorts 1932-1950

Note that cohort groups 1918-1920, 1932-1942, and 1932-1950 exhibit the three revisions directly.

<u>The latest revision, Fall 2004</u> (or Summer).
The 1932-1950 exhibit includes 247 players of whom 23 pitchers and no others are down 8 to 16 points(WARP3) in the latest revision. #24 Eddie Collins is -7.9. Collins and 4-year pitcher Babe Ruth -5.5 are the only non-pitchers among the 51 players who are down at least 5. There are 5 more non-pitchers (4 OF, 1 C) among the 74 players down at least 3. Only 2 of 247 players are up 3 points, P-outlier Lefty Grove +4.5 and C Muddy Ruel +3.2. 172 of 247 players are up or down fewer than 3.0, with a majority at each position down.

<u>The second revision, Summer 2004</u> (or Spring).
The 1932-1942 exhibit includes 141 players. The 15 men down 5+ pts played {P 1B 2B}, plus longlived OFs Cobb and Speaker. Hornsby is the only "pure" 1B or 2B among the 30 players up at least 5, and he played some 3B-SS.

280 players in cohorts 1918-1942 exhibit the last two revisions jointly. Nap Lajoie joins Eddie Collins and 33 pitchers among the 35 players down at least 10.5. The other non-pitchers down 6+ pts played {2B 1B}, plus a few longlived OFs. The 36 men up at least 6 pts played {C 3B SS}, plus a few OFs and Rogers Hornsby.

<u>The first revision, Winter 2004</u> (or Fall 2003).
The 1918-1920 exhibit is only 17 players, including 1 down 5 points (Kling) and 9 up 5 points, led by pitcher-teammates Walsh +19.4 and White +16.5. 1B, 2B, SS, 3B, C are represented by one player each, with {1B 2B} up and {C 3B SS} down.

The second and third revisions jointly reverse the first one for the five players just named and, similarly, for almost everyone who played their five positions. Within the small 17-player exhibit, the second and third jointly reverse the first for everyone but 5 OFs (4 ++, 1 --).

<u>162 players in cohorts 1904-1920 exhibit all three revisions jointly</u>. Early pitchers Gus Weyhing -21.9 and Matt Kilroy -18.8 "lead" the 29 men down at least 5 points, who played {P SS 2B 3B} in numbers 12-7-4-2, plus 3 OFs and 1 1B. 31 men up 5+ points played {OF 1B C} 16-4-4, plus 5 pitchers and, barely on the list, 3B/2B Sammy Strang and SS Hughie Jennings.

Within this severely limited population, first eligible 1904-1920, the net revision favors more recent players. Half of the 1918-1920 subgroup (8/17) is among the 31/162 who gained at least 5 points. "Rookie" year is 1890-1902 for the five biggest gainers and 1882-1889 for the five biggest losers.

--
Regarding 3B and especially SS, there is a tension between the last two bold statements. (not a logical contradiction)

The data can be manipulated to yield a summary in terms of revision since first appearance in this forum, rather than revision from any one of the three old editions. I haven't done that.
   24. Paul Wendt Posted: November 07, 2004 at 08:07 PM (#956976)
Yesterday, I set out to compile the WARP3 for the 1904-1931 cohorts only, for I thought the Summer edition was current. But I discovered the revision and compiled the newly christened Fall edition for 1904-1950. My data is in one MS Access table that I will try to make available in csv format promptly.

done, for HallofMerit egroup members.

WARP3.editions.csv (and other HallofMerit files)
WARP3, four editions (2003 and 2004 Winter, Summer, Fall), plus other Dan Greenia data for 531 "New Eligibles" 1904-1950. WARP3, Fall edition, and HOM/HOF flags by Paul Wendt.

The file also includes Name and W3-Fall (WARP3, Fall 2004) for HOM1937 and HOF members who were HOM eligible before 1904. Total, 565 records inclg HOM1937 members except 9 from NeL (61 of 69) and HOF members eligible thru 1950 except 5 who did not play MLB (Bulkeley, Cartwright, Chadwick, Hulbert, Selee).

For now, HallofMerit distribution is enough.
( To join, visit
http://sports.groups.yahoo.com/group/HallofMerit/ )

Please tell me about problems with the data.
   25. Paul Wendt Posted: November 09, 2004 at 10:45 PM (#959772)
Has any revision incorporated a change in interleague quality judgments?
I don't know.

I wonder how many of the 531 players who represent at least two editions can be easily assigned to a particular league? I regret that I didn't do it while I was gathering WARP3 data.
   26. Paul Wendt Posted: November 16, 2004 at 04:53 AM (#967568)
WARP3-Fall and Win Shares agree in rating HOF executive Clark Griffith above HOM pitcher Joe McGinnity and in rating HOF manager Miller Huggins above HOM shortstop candidate Hugh Jennings.

What else? Someone remarked that the two systems are differently biased regarding fielding positions, but they both identify lots of CFs as the best eligible players not in the HOM, 1938.

Highest rated, not in Hall of Merit 1938

<u>WARP3-Fall</u> (revised since mid-year 2004)
95.8 - Carey (CF)
93.5 - Hooper (CF)
89.9 Jones (CF)
88.9 Sewell
85.7 Cross
85.6 - Leach (3B/CF)
84.9 - Maranville
84.2 - Van Haltren (CF)
84.0 - Ryan (LF/CF)
82.8 Griffin (CF)
80.9 - Beckley
80.7 - Roush (CF)

<u>Win Shares</u>
351 - Carey (CF)
344 - Van Haltren (CF)
328 - Leach (3B/CF)
321 - Hooper (CF)
318 - Beckley
316 - Ryan (OF/CF)
315 Rixey
314 - Roush (CF)
302 - Maranville

The outdented names Jones, Sewell, Cross, Griffin and Rixey are not common to the two lists.
   27. Paul Wendt Posted: November 16, 2004 at 05:01 AM (#967576)
Here are the converse lists of lowest rated HOMers. These and the preceding lists are limited to players newly eligible 1904-1939.

Lowest rated members of Hall of Merit 1938

<u>WARP3-Fall</u> (revised since mid-year 2004)
49.6 - McGinnity (P)
57.8 - Brown (P)
68.2 - Rusie (P)
74.8 - Jackson
74.9 - Walsh (P)
77.4 - Coveleski (P)
79.3 Baker
79.6 Magee

80 or better, 28 others

<u>Win Shares</u>
245 - Coveleski (P)
265 - Walsh (P)
269 - McGinnity (P)
272 Groh
274 CollinsJ
291 Flick
293 - Rusie (P)
294 - Jackson
296 - Brown (P)

300 or better, 27 others
   28. Joey Numbaz (Scruff) Posted: November 16, 2004 at 07:16 AM (#967703)
Cool stuff Paul, thanks!

I guess this holds with my idea that WARP-WS aren't necessarily 1:3 in relation. More like 1:3.75, although that doesn't necessarily hold steady at each position.
   29. PhillyBooster Posted: November 16, 2004 at 04:00 PM (#968205)
Yes, leaving aside Harry Hooper -- who was a right fielder, not a CF -- as I noted in my 1939 ballot there are currently more eligible players not in the HoM in Bill James's Top 30 than there are at any other position (including Pitcher Top 60).

Here they are by position -- with top candidates not in the Top 30 in brackets. I also put Fielder Jones in brackets, since he was a CF misidentified as a RF, and might have made the Top 30 if put with the correct group.


Cat. (2): Bresnahan (16), Schang (20) [Petway]
1B (2): Sisler (24); Chance (25) [Beckley]
2B (3): Doyle (20); Evers (25); Childs (26) [Monroe]
3B (3): Leach (20); McGraw (26); Gardner (29) [Williamson]
SS (3): Jennings (18); Sewell (23); Bancroft (28) [Moore]
LF (1): Burns (26)
CF (7): Roush (15); Duffy (20); Carey (23); Ryan (26); van Haltren (28); Thomas (29); Seymour (30) [Pike, Browning, F. Jones]
RF (1): Cravath (29)
P (5): Mays (38); Cicotte (50); Waddell (53); Cooper (55); Faber (56) [Rixey, Redding]

So, that's between 7-10 centerfielders (8-11 if you count Leach) -- more than twice as many as any other position on the field!

I think before the next election we should look more closely at "positional balance", not with the goal, necessarily, or achieving it, but just to generally consider whether it is appropriate to have the mix we do (heavy on LF and SS, light on 1B, 3B, and CF). I'm not sure what the justification is for the present mix, which is not supported by any uber-stat (as Paul showed), or any argument that the uber-stat at issue is systematically flawed against a higher-rated position.
   30. TomH Posted: November 16, 2004 at 06:12 PM (#968425)
light on 1B, heavy on SS -

Yes, we are. Might just be how talent was distributed 1860-1925, tho. My guess is that if we had a super-HoM and only allowed 4 1B+SS, that the three ABC guys (plus Honus) would all make it, so we'd be heavy on 1Bmen.

In the NBJHA, we have encountered 12 of his top 43 SS to date, but only 7 of his top 43 1Bmen.

In our Survivor exercise Paul W alluded to, (link is http://survivor.dmlco.com/index.html), we had 10 first basemen to only three shortstops at one point.

In the early 1890s, how many of the 'arguably greatest 30 ballpayers' were active? Cy Young. Around 1915, you can count to 8 or 10 without trying hard. Sometimes stuff happens. I'm gonna give a little to 1B this ballot and take some from the SS, but not too much.
   31. Jim Sp Posted: November 16, 2004 at 07:58 PM (#968616)
A number of our tougher choices went to the corner outfielders, such as Kelley, Keeler, Flick, Stovey, Thompson, and Sheckard. Magee and Wheat could be added to that list as well.

On the other hand a good portion of our CF imbalance is related to random variations in positional balance, and the size of our specific hall.

If we were electing a Hall of Very Good, we'd have a lot of CF candidates, more than at the corner positions. Plus we did elect Hines and Gore, who if I recall correctly weren't very high on the Bill James list.

Some HoVG CF candidates would be:

Pete Browning
Jimmy Ryan
Mike Griffin
George Van Haltren
Fielder Jones
Roy Thomas
Ginger Beaumont
Tommy Leach
Cy Williams
Max Carey
Edd Roush
Hugh Duffy

Each of these may have their advocates, but on the whole the group finds it hard to single one out from the crowd.

A quick look through my spreadsheet gathered a much shorter list of LF candidates (of course we elected a bunch, I'm just pointing out the lack of borderline candidates):

Charlie Jones
Tip O'Neill
Bobby Veach
Ken Williams
George Burns

Similarly in RF I have:

Mike Tiernan
Gavvy Cravath

I'm not sure how much of these results are biased as the results of the quirks of my spreadsheet, but it's at least suggestive that there may just be an unusual clumping of early CFs who were very good but not quite great.

Keeler, Stovey, Thompson, and Sheckard aren't in my PHom, but I would add pitchers, infielders, and catchers instead of CFs.
   32. jimd Posted: November 16, 2004 at 08:30 PM (#968699)
The relationship between WARP-3 and Win Shares is constantly changing due to the league quality adjustments. Investigation of the relationship between the two systems should use WARP-1 which considers each league as a closed universe, as does Win Shares.
   33. Jim Sp Posted: November 16, 2004 at 08:43 PM (#968738)
In thinking about this more, I've decided that Carey should have been on my ballot, and doesn't really belong in the HoVG candidates listed above.
   34. robc Posted: November 16, 2004 at 09:12 PM (#968817)
Joe,

The relationship between WS and Warp shouldnt be a straight ratio anyway. More like A=3B+k, where A is Win Shares, B is Warp and k is a constant representing the lower WS zero point.
   35. KJOK Posted: November 17, 2004 at 03:33 AM (#969630)
Robc is correct.

Win Shares = WARP x 3 - (.38 - .2) x WINS x 3

and the inverse:

WARP = (.38 - .2) x WINS + WinShares / 3

or some similar approximation where WINS represents some type of "absolute" WINS/Player, .38 is the approx replacement level for WARP, and .2 is the approximate replacement level for Win Shares.
   36. Joey Numbaz (Scruff) Posted: November 17, 2004 at 12:06 PM (#970150)
"In the NBJHA, we have encountered 12 of his top 43 SS to date, but only 7 of his top 43 1Bmen."

That may just be because James systematically underrates 1B too - many of us are using his system as our foundation :-)
   37. Joey Numbaz (Scruff) Posted: November 17, 2004 at 12:07 PM (#970151)
Underrates 1B pre-1920 I mean . . .
   38. Joey Numbaz (Scruff) Posted: November 17, 2004 at 12:15 PM (#970152)
Makes sense Rob and KJOK, I was just guesstimating. But I'd think your replacement constants need to be different at every position. WARP overrates players at key defensive positions, WS overrates guys at key offensive positions.
   39. Paul Wendt Posted: November 17, 2004 at 11:21 PM (#971127)
jimd #
The relationship between WARP-3 and Win Shares is constantly changing due to the league quality adjustments. Investigation of the relationship between the two systems should use WARP-1 which considers each league as a closed universe, as does Win Shares.

There's some truth in that. But investigation should use WARP1, WARP2 and WARP3; single-season rather than career data; league, year and fielding position.

Given the career WARP3 data alone, for only these few hundred players, it is probably useful to look at WARP3 and Win Shares with rookie year (ie, a time trend). Also with rookie year and principal league, which requires augmenting the data with the latter (not too onerous).

Analysis by fielding position will be interesting, too, but the sample of players with nearly pure career at any fielding position is dangerously small {3B 33, 2B 35 . . . 1B 44, CF 54} for any finer analysis.
Indeed, since I transcribed the WARP3-fall data by hand, a single clerical error such as 33 for 83 is plausible. That will bias the analysis seriously if the sample is sliced and diced.
   40. Paul Wendt Posted: December 31, 2004 at 05:29 PM (#1048634)
jonesy "1942 Ballot Discussion" #209
Several years ago on some website I was surfing through I read a Davenport post in which he mentioned reading the Grove/Ferrell posts on SABRL and he agreed that my point --Ferrell facing tougher competition in '30 and '31 -- made a lot of sense and needed to be adjusted for. Looks like someone finally paid attention.

Sometime during the 1990s, Pete Palmer personalized the "Wins per Game" factor for each pitcher (which is incorporated in TPI but not in ERA+). I guess that he would have personalzed a schedule factor, incorporated in ERA+, accounting for the ballparks where each pitcher worked and the teams he faced, if that data were available in convenient format for all of MLB history. (re the pattern of work and rest, I guess not)

Pitcher starts by ballpark and opponent are now available in convenient format, the Retrosheet game logs. The mismatches with the official record are small in magnitude, but numerous. Starts aren't innings and there is no similar record for relief appearances, so the data is not really complete for the purpose of adjusting ERA. I don't guess one way or the other, how soon it will be utilized by Palmer or the author of any other comprehensive pitcher rating such as Win Shares or WARP.

--
Chris James has used the game log data to estimate run support for many pitchers with notable careers, presented at the last SABR Convention and published on his website.

Innings per game, for each pitcher, must be a good indicator of the error in estimated run support.
   41. jonesy Posted: December 31, 2004 at 06:46 PM (#1048728)
Here is Grove and Ferrell in 1932. This was a season that I feel they had identical value.

Here are the teams listed in order of offense.

1. NY scored 1002 runs on the season.

Grove had a 5.49 ERA in 41.0 IP
Ferrell was 5.48 ERA in 42.1 IP

Ferrell was 2-4 losing CG by a 0-5 score and winning CG by 6-3 and 5-4 scores. He was knocked out in two other starts, allowing 5 ER in 3 IP and 7 ER in 5.2 IP, taking losses in both.

Grove was 3-2, winning a CG 4-2 score and a CG by a 10-7 score (all seven runs earned). He lost a CG by a 3-9 score (all nine earned). He lost a 3-8 game, allowing 4 ER in 6 IP and won a game in which he allowed just one ER in 8 relief IP.



2. Philly scored 981 runs.

Ferrell was 5.05 ERA in 41.0 IP. He lost a 3-15 game (allowing 7 ER in 6.2 IP). He won a 4-3 CG scoring his team's 3rd and 4th runs. He lost a 1-0 CG in which he allowed but 2 hits over the first 8 innings. He also lost the famous 18 inning game, tossing 11 relief innings just two days after working a CG. His last start was an 0-11 game in which he allowed 6 ER in 4 IP.

3. Cleveland scored 845 runs:

Lefty was 2.18 in 57.2 IP.

4. Washington scored 840 runs.

Grove was 1.61 ERA in 28.0 IP (2-1)
Ferrell 3.27 ERA in 44 IP (3-2)

5. Detroit scored 799 runs.

Grove was 1.97 in 46.0 IP (5-0)
Ferrell 2.85 in 47.1 IP (5-1)

6. STL scored 736 runs.

Grove was 2.33 in 46.1 IP (5-1)
Ferrell 2.73 in 33.0 IP (4-0)

7. Chicago scored 667 runs.

Grove was 3.45 in 44.1 IP (3-2)
Ferrell 3.61 in 52.1 IP (5-1)

8. Boston scored 556 runs.

Grove was 2.86 in 28.1 IP (3-1)
Ferrell 2.00 in 27.1 IP (3-1)


In the teams they mutually faced.

Grove was 21-7 with 78 ER in 234.0 IP for 3.00.
Ferrell 22-9 with 94 ER in 246.2 IP for 3.43.

Versus those same teams (minus pinch-hitting).

Grove had 6 RS and 10 RBI in 85 ATB.
Ferrell 11 RS and 15 RBI in 95 ATB.

So basically, against mutual teams, counting runs allowed as a pitcher and runs talllied as a batter, Grove was about five runs ahead over the whole season.

AND, I might add, Ferrell was suspended for ten days in September for refusing to leave a game in which Peckinpaugh tried relieving him in the first inning against the Red Sox.

Grove missed significant time in late May or early June. One report was a sprained ankle. Another report said Grove was in a blue funk over something and unavailable to the team.

WARP and WS are one thing. It's OK to give an overview of a season, but the only way to determine true value for a pitcher is a game-by-game analysis.
   42. DavidFoss Posted: December 31, 2004 at 10:22 PM (#1049090)
Jonesy... statistics may often be flawed, but their biases can be explained on a basic theoretical level. Manually going through game logs as you've done can introduce biases that aren't as easily documented.

First it looks like you've double-counted R & RBI for batting... if you choose to use those team-dependent numbers as a metric, it would make sense to average the two. You've shown two instances of missed time and given anti-Grove and pro-Ferrell explanations for them... but each pitcher ended up finishing 2-3 in the league in IP anyways, so this point may be a red herring. Also, you've simply ignored Grove's stellar performance against the Indians. I realize the A's had a more formidable offense than the Indians, but Grove's sizeable advantage in A's & Indians games has to count for something. Looks like Grove beat Ferrell twice head to head in 1932 -- those were the 15-3 and 11-0 games.

The bottom line is that there has to be a systematic way of doing these game log comparisons to decrease observational biases. Chris J has one way to do this comparison. Do you have enough information in your box scores to do a SNWLR-style analysis? Retrosheet only has game logs back to the mid-1960s at the moment, after that we can start taking advantage of SNWLR.

As an extra aside, Grove's PPF is significantly lower than Philly's BPF during A's heydey due to the fact that he never had to face his own offense. Are WS, WARP and ERA+ all using this lower park factor for him?
   43. Chris Cobb Posted: December 31, 2004 at 11:08 PM (#1049169)
fwiw, baseball-reference gives Grove 6 runs created in 1932, Ferrell 12. Ferrell's superiority with the bat makes up some of the difference between the two that year, but not all of it.

I'd also agree with David that Grove's domination of the Indians needs to be counted in his favor also in this head-to-head comparison with Ferrell.

That Ferrell was not quite as good as Grove in 1932 doesn't materially affect his HoM case, however. He was an outstanding pitcher that year.
   44. jonesy Posted: December 31, 2004 at 11:16 PM (#1049186)
David,

The reasons that Grove and Ferrell missed time in 1932 is lengthy. But like the other instances I have noted, Grove's missed time was due to his depression. Ferrell's was because Peckinpaugh pitched him into the ground.

The Cleveland Plain Dealer during Ferrell's era published a boxed area at the bottom of the page that included a play-by-play description of all Cleveland games. David Smith of retrosheet has that material. I assume the retrosheet gang is not ready to add that material. I do not know the reason.

.............................
I agree stats are flawed.

Cleveland opened a five-game series with the Athletics on July 22, 1931. It was round two of Ferrell-Grove. Grove won 6-3.

"He won not by outpitching the Inian ace," wrote the Cleveland Plain Dealer, "but by having the better ball club behind him. And Ferrell lost a game he richly deserved to win because his infield support cracked wide open in the seventh inning, allowing the A's to score three runs when, with reasonably tight play, they wouldn't have gotten a runner past first base."

Ferrell passed Foxx in the second and Bishop in the third but the only hit he allowed to that point had been an infield one. Simmons clanged a hot-shot off Wes' leg to begin the fourth. Foxx walked and Miller dropped down a sacrifice. A grounder to third scored Simmons and advanced Foxx to third. Dib Williams was passed intentionally and Grove singled for a 2-1 lead.

Wes took a 3-2 lead to the seventh. Haas singled and Cochrane fouled out. Simmons, after also lifting a foul behind first that Morgan failed to get under, walked.

"Foxx hit to Kamm, who, instead of trying for a double play in the orthodox way, tagged Haas for the second out. Then Bing Miller got a scratch single off of Ferrell's glove that filled the bases.

"And that should have been all, but Eddie Montague, after making a nice pickup of McNair's roller, threw wildly to first and Simmons crossed the plate with the trying run. McNair was credited with a hit, but by what manner of reasoning I am unable to state," commented the Plain Dealer. "Montague's throw had him beaten by a full step, but it pulled Morgan off the bag. At any rate, Williams followed with a clean single and two more runs came in."


The home team scoring decisions happened frequently in Philly. And just so it isn't just my own interpretation, here is the Philly Ledger in 1932 after Grove defeated Ferrell.



The final was 15-3 and all of Cleveland's offense came late in the game after Grove had an insurmountable lead. Only seven of the 12 runs Ferrell allowed were eearned as the Cleveland defense, although charged with just a lone error, was something less than exemplary. Their pitcher's deportment took a similar beating.

"Ferrell is too easily provoked," wrote Ed Pollack in the Ledger. "He has the experience, the ability, and all the necessary requisites to be of greater value to his club and himself if he would remain undisturbed when the breaks of the game turn against him."

"With perfect support behind him Ferrell could have held the rallies of the A's in the sixth and seventh. But perfect is too much to ask of any club, and the Indians are far from the best defensive team in the American League."

"In justice to Ferrell it must be recorded that had it not been for defensive weaknesses the A's would have scored only two runs in the sixth and seventh instead of nine."

..............................

I'm not sure I follow you on my double-counting the offensive contributions of the two for 1932?

Against mutual opponents:

In 1932 Grove had six runs scored and 10 RBI. He had three homers. So he accounted for 16, 13 if you want to minus the homers.

In 1932 Ferrell had 11 runs scored and 15 RBI. So he had 26, 24 if you want to subtract for the two homers he hit.

So it's 26-16 in Ferrell's favor for 10, or 24-13 in Ferrell's favor for 11.

Grove allowed 78 earned runs in the innings in which they faced mutual oppostion and Ferrell 94. That's a 16 run edge to Grove. Knocking off the 11 or 12 extra that Wes accounted for with his bat drops it to a net of 4 or 5. Am I wrong here?

Ferrell had 10 more at bats but and pitched 12.2 more innings
   45. jonesy Posted: December 31, 2004 at 11:31 PM (#1049209)
Shortly after tossing his non-hitter in 1931, and when most of the baseball establishment was openly calling him the best pitcher in baseball, Ferrell hurt his arm. He continued to take his turn throughout the remainder of the season.

He beat the Yanks with a three-hitter in June.

"Just as it was when Ferrell pitched the tribe to victory over the Red Sox a few days ago," wrote Cobbledick, "it was plain today that the boy still is suffering from that crippled shoulder that has made it impossible for him to throw a fast ball for a month or so."

"He didn't throw a fast ball today. Once when he had to make a long throw after fielding a bunt near the third base line he looped the ball fifteen feet in the air, heaving it with a stiff painful motion."

"But his curve, and it was a beauty, was wlring well enough to hold such cluters as Ruth, Gehrig, Combs and Ruffing himself hitless."

..........................

Ferrell coutinued pitching this way for the last 3/4 of 1931. In 1932 he still pitched the slow stuff until confronted by Peckinpaugh.

..........................

"I believe Ferrell is making a serious mistake in trying to get by with the least possible use of his fastball," said Peck. "He was at his best a couple of years ago when, in a jam, he would wind up and blow that hard one past the best hitters in the league."

"But all of last season and thus far in this one he had been pitching the slow curves and very little else. This was fairly sucessful last year for the hitters were constantly looking for that fast ball, and the slow stuff threw them off stride. Now the knowledge has gotten around the league that Wes isn't using his fast ball. It is no longer a threat. The hitters are laying for his slow curve and pounding it hard."
   46. jonesy Posted: December 31, 2004 at 11:37 PM (#1049215)
Peckinpaugh and Ferrell hashed it out. billy Evans traveled to St. Louis to talk to Wes and came out with this classic line. "You have a great fast ball and you have been pitching nothing but slow curves," Evans told him. "For $18,000 we are entitled to an occasional fastball."


Ferrell announced that he would again be tossing heat and then went out on May 28 to beat the Browns 3-1. "Even his warmup before the game was different," wrote Cobbledick. "He bore down on the smoke bal then, too, and when he needed it in the game it was ready. He ought to be convinced tonihgt that that's the way for him to pitch. Not only did he limit the Browns to seven scattered hits...but he didn't issue a single base on balls, which is remarkable for the hurler who led the AL in free passes last year."
   47. DavidFoss Posted: December 31, 2004 at 11:42 PM (#1049222)
I'm not sure I follow you on my double-counting the offensive contributions of the two for 1932?

Its the "Runs Produced" argument here.

Almost every run scored has an RBI associated with it as well. (Off the top of my head RBI/R is about 0.9 or so). Adding up a teams R+RBI (or R+RBI-HR) is going to overcount their runs.

So, Grove's 16 run advantage in pitching corresponds to a 30-32 run advantage in opposing R+RBI.

Ferrell's 26-16 advantage in R+RBI is really about a 13-8 advantage in offensive runs (15+11)/2 and (10+6)/2.

Anyhow, Chris Cobb's way of looking at it is much more straightforward in that Ferrell has a 12-6 advantage in runs created. It eliminates all the confusion about whether to subtract homers and the fact that not every R has an RBI.

Sorry, this a minor nitpick, but a case where an extra 4 runs or so were being attributed to Ferrell.

As for the missed time. Grove & Ferrell had 291.7 and 287.7 IP respective ranking 2 & 3 in IP in 1932. They were both very durable that year. Considering they both were to break down in the next 5 years I can't rationalize working either of them any harder in 1932.

Chris Cobb:That Ferrell was not quite as good as Grove in 1932 doesn't materially affect his HoM case, however. He was an outstanding pitcher that year.

Great point. Thanks. Something that may be lost in my answers to some of jonesy's points is that I think that Wes Ferrell was indeed a great pitcher in his prime. I guess when I see jonesy carry an argument a bit too far I get to urge to chime in with a rebuttal. Ferrell's HOM case is not going to hinge on how he matches up with Lefty Grove. It will be on how he matches up with Lyons, Dean, Griffith, Waddell, Rixey, etc.
   48. jonesy Posted: December 31, 2004 at 11:45 PM (#1049228)
Ferrell was on a tear. Soon Joe McCarthy was openly calling him another Matty. When the averages were announced on June 19 Gomez was 12-1, Grove was 12-3 and Ferrell was 12-4.

After going back to the fastball he went 9-1 with a 2.50 ERA in his next ten starts. That brought his record to the year to 16-5 with a 3.37 ERA. Talk began about a 30-win season.

While the Yanks were securely in first place, the Indians were challenging the A's for second, roughly 7-9 game back of the Yanks.

On July 10, just two days after Ferrell tossed a CG against Washington on July 8, Peckinpaugh left Ferrell in for 11 gruelling relief innings against the A's. Wes had the game won with two down in the 9th before a horrible error (Bill Buckner like) by the Tribe's 1bman tied the game. Ferrell lost it on a bad bounce play in the 18th.

In his next turn the Yanks pounded Wes early in the game. Peck confronted him in the clubhouse after the contest. "Hey, why weren't you bearing down out there today.?"

Ferrell retorted, "I always bear down, I just had nothing to bear down with, that's all."

The strain and the break happened there.
   49. jonesy Posted: January 01, 2005 at 04:37 PM (#1049819)
David Foss,

A little more on runs produced. While I agree with the concept as a general overview, I am not sure it is valid when doing a game by game analyis.

In 1931 Ferrell tossed a 3-hitter against NY, winning 2-1 on the strength of his own late game homer that broke a 1-1 tie. He scored one run and tallied the RBI on the same run. But 1 RS + 1 RBI minus the HR is still one. Didn't he earn full credit for this run?

In 1934 Ferrell beat the White Sox 3-2 in 10 innings. Trailing 2-1 he hit a solo Hr in the 8th to tie the game and another solo, walk-off homer to win the game. Those two at bats accounted for 2 runs scored and 2 RBI. That's four tallies for two actual runs. So it's 2 RS + 2 RBI - 2 HR = 2 full offensive runs to Ferrell's credit?

In 1936 Ferrell beat the Athletics by a 6-4 score. He hit a two-run homer and a grandslam to tally all six of Boston's RBI. So he had 2 RS + 6 RBI for a total of eight but minus the 2 homers he drops to 6.

I know it is a little cloudier here for we can argue that if it weren't for the four teammates being on base, then Ferrell's two homers would have been solo homers. But the fact remains that had Ferrell not hit them, then Boston might not have scored any runs at all.

I do not have any trouble at all crediting Ferrell with the six runs in this situation because here is the flip side to the pitcher as hitter argument; in my mind anyway.

On September 22, 1935, Ferrell lost a 6-4 game to the Yankees. Five of the runs were charged as earned.

"Ferrell's support in the first inning was very bad," wrote the NY Times. "With two out and Chapman on via a pass, Gehrig hit an easy play to short right, whereupon Cooke dropped his glove, stumbled and the fly went for a single. Then Roy Johnson lost Selkirk's fly in the sun for a two-bagger. The inning netted three runs, and Cronin's error cost another in the third."

The ball Johnson lost in the first inning actually hit him on the top of the head, ala Jose Canseco. Mel Almada lost another fly ball in the sun in the fifth. It was scored a triple and accounted for a run.

(Of course play by play analysis based on the boxscore credits this a clean triple. It's only the deeper research of reading the description of the play that shows it was flawed...this to my argument that reading game accounts should always be the preferred method of research, or at least a must addition to the statistical analysis)

So Ferrell is pretty much the stud in both of these games, but loses positive credit in his six-RBI game ( based on runs produced formula) and is saddled with negative credit for his teammates failings in the pitched Yankees game?
   50. OCF Posted: January 01, 2005 at 06:43 PM (#1049897)
I'm not sure it's all that helpful to point out anecdotes about runs charged as earned that nonetheless involved shabby fielding. Don't you think that happened to every pitcher? Is there any reason to think Ferrell was singled out? That's part of the reason I prefer using RA to ERA. Just ask how many runs scored, and go back afterwards to deal once with the issue of defensive support. Just as the offensive support won't necessarily even out even for teammates, there's no reason to assume that defensive support evens out either - but how can we tell?

While I acknowledge that Ferrell was an excellent hitter and the best hitter of any pitcher for a wide swath of time on either side of him, let's put some bounds on our enthusiasm. We're talking about a home run hitter with 38 lifetime HR. We're talking about a guy who played in the richest offensive environment of the 20th century, in leagues that scored over 5 runs per game every significant year of his career. His lifetime .280/.351/.446 in 1176 AB barely even registers on my offensive system.

One other thing to account for in Ferrell's case: he has 321 lifetime decisions but only 2623 innings. That's the fewest innings per decision - 8.17 - of any pitcher I have worked up. My RA+ PythPat equivalent record for him is 167-124. From there we need a whole lot of enhancements - his hitting, the selective usage, extra credit for peak seasons - to bring him up into comparability with the likes of Vance, Rixey, or Griffith.
   51. DavidFoss Posted: January 01, 2005 at 09:24 PM (#1050043)
In 1936 Ferrell beat the Athletics by a 6-4 score. He hit a two-run homer and a grandslam to tally all six of Boston's RBI. So he had 2 RS + 6 RBI for a total of eight but minus the 2 homers he drops to 6.

I know it is a little cloudier here for we can argue that if it weren't for the four teammates being on base, then Ferrell's two homers would have been solo homers. But the fact remains that had Ferrell not hit them, then Boston might not have scored any runs at all.

I do not have any trouble at all crediting Ferrell with the six runs in this situation because here is the flip side to the pitcher as hitter argument; in my mind anyway.


Well its not like you just gave all six runs to Ferrell. You gave six runs to Ferrell AND you gave four runs to the guys he drove in. A total of TEN runs for Boston in a 6-4 gane.

Plus, if you Ferrell had hit triples instead of homers and each time been driven in by a sac fly... that would be six runs for Ferrell, four for the guys he drove in and two for the guys who drove him in... for a total of 12 runs for Boston in a 6-4 game. The run total goes down with more home runs hit. In my opinion, subtracting homers is not the right thing to do, but I've heard otherwise as well. Either way, the scale of runs calculated here is going to be different than RA for the pitcher due to the double counting that is occuring.

Something like Runs Created is what I would use here, but if you need an R & RBI method for your game log, I would use (R+RBI)/2.
   52. KJOK Posted: January 01, 2005 at 09:26 PM (#1050046)
To get this thread back on subject...

Michael Hoban has posted a couple of interesting articles about HOF analysis:

HOF Test for Position Players

HOF Test for Pitchers

While I've disagreed with Mr. Hoban in the past about his player ranking systems, this is an interesting look using Win Shares of what the criteria HAS BEEN for HOF Induction based on Win Shares...
   53. jonesy Posted: January 02, 2005 at 12:32 AM (#1050258)
David,

OK, I see it now based on the six RBI game.


OFC,

I'm not sure about all other pitchers have the same issue with fielding. I just looked at all of Ferrell's games and Grove while with Boston. Most anecdotal analysis of the observers of the day always referred to Peck's biggest problem at the helm of the Indians was the club's infield defense. Cronin, in '35 and '36, I imagine might have been the worst fielding HOF shortstop of all time.

Several years ago I ran across a random post in which Lee Sinins said that Ferrell's hitting was all ERA-induced and more hype than substance (I don't know if he still holds that view) and related it somewhat to Mike Hampton.

Ferrell in his day was considered one of the most dangerous hitters in the AL. He was often compared to Ruth and Foxx as a hitter. I have two homers at 450 feet and one (in an exhibition spring game) at 470 feet. I can easily double his 38 homers in number if I count wall-balls and balls hit over outfielders head. He hit several ball into what would now be bullpens in Fenway Park, and in 1929 the Cleveland Indians were making mid-season outfield fence adjustments in Dunn/League Park's right field, where Wes was forever putting balls off of.

In 1934 the Red Sox offense hit nine home runs in support of Ferrell. Five guys hit one each and Wes hit the other four.

Likely the only two teammates in his Cleveland and Boston years who were better hitters were Averill and Foxx. Ferrell outhit Cronin in their Boston days in games in which they were both starters.

Here are the guys with 75 at bats in Ferrell's 38 starts in 1935:

ATBs-RS-Hits-Hr-RBI-Ave.

Wes......119-22-42-6-23-.353
Johnson..143-23-48-0-23-.336
Cronin...142-22-45-6-28-.317
Rick F...138-13-42-0-17-.304
Melillo...90-15-27-0-09-.300
Almada...153-25-44-0-17-.288
Dahlgren.133-21-37-0-12-.278
Werber...124-23-34-3-15-.274
Cooke.....82-12-22-2-09-.268

Same in 1931:

Morgan...105-26-47-6-20-.448
Wes......106-24-37-9-29-.349
Hodapp...112-19-39-1-21-.348
Averill..148-37-50-7-35-.338
Vosmik...130-17-39-5-29-.300
Burnett...89-14-25-0-10-.281
Kamm......87-13-24-0-14-.276
Sewell...104-10-26-0-15-.250


Ferrell always battered in the number nine slot while pitching.
   54. jonesy Posted: January 02, 2005 at 12:46 AM (#1050280)
OFC,

I'm not sure I follow your point about IP per decision. Of course I didn't get David Foss's point about runs created either :)

If I have correctly added:

From 1929-1938 (virtually all of Ferrell's career) he had just 21 no-decisions in 315 starts.

Over the same period, Grove had 34 no-decisons in 275 starts.

Since it's a 10 year span, thats:

2.1 each year for 31.5 starts for Ferrell.
3.4 each year for 27.5 starts for Grove.

Either you overvalueing relief decisions or I am underestimating them.
   55. OCF Posted: January 02, 2005 at 04:16 AM (#1050711)
From 1929-1938 (virtually all of Ferrell's career) he had just 21 no-decisions in 315 starts.

That's exactly my point. Ferrell had an unusually low number of no-decisions, hence an unusually high number of decisions for the amount that he pitched. In asking questions about bulk - about how much he pitched - if you base things on his decisions, you'll think he pitched more than if you base things on innings.

Did Ferrell pitch more or less than Vance and Coveleski (to name two other ~3000 IP pitchers?)

Vance: 337 decisions, 2967 IP
Coveleski: 357 decision, 3082 IP
Ferrell: 321 decisions, 2623 IP

That's actually the problem: we've been willing to consider 3000 inning pitchers if they have a big enough peak, but Ferrell isn't really even a 3000 inning pitcher.

(I will say that of course I'll put Ferrell ahead of Dean.)

Ferrell had 8.17 IP/decision, an unusually low ratio.
Grove had 8.94 IP/decision, a little above average for a great or very good pitcher.
   56. jonesy Posted: January 02, 2005 at 05:37 AM (#1050857)
That seems odd, what with Ferrell always being among the league leaders in IPs and CG. I have to admit that I have never thought about it before.

I do recall the first game after he hurt his arm in 1931. He worked to just three batters, giving up three ringing doubles, before he walked off the field.

In late 1932, against Boston, he put the first four batters on -- two hits and two walks -- then got into the argument with Peckinpaugh.

So that's two starts -- both losses -- in which he did not record an out, but that's only two games.

Are you just using starting decisions?
   57. jonesy Posted: January 02, 2005 at 05:55 AM (#1050893)
OCF,

In 1930, Grove and Ferrell each had three poor starts. My definition, I think without going back, was knocked out in less than five innings. Ferrell took the loss each time and Grove got three no-decisions.

Pitcher A makes 13 starts. He goes nine innings in 10 of them, earning a decision in each. In the other 3 he is knocked out after 3 innings. Each time in the three innings of work he gets a no decision.

He pitches a total of 99 innings but with 10 decisions he works a total of 9.9 innings/decison.

Pitcher B makes 13 starts. He goes nine innings in 10 of them, earning a decision in each. In the other 3 he is knocked out after 5 innings. Each time in the five innings of work he get the loss.

He pitches a total of 105 innings but with 13 decisions he works a total of 8.08 innings/decision.

Am I following correctly?
   58. OCF Posted: January 02, 2005 at 07:34 AM (#1051079)
You're reading it correctly. All innings, all decisions, whether as starter or reliever. Having a high or low number of innings per decision is neither good nor bad; it's a neutral element in evaluating a pitcher. Walter Johnson has a moderately low number (8.50) of IP/decision (and in his case it feels like all of the "extra" decisions are losses.) Johnson probably got there by getting more than his share of decisions in relief. I'm not saying Ferrell's unusually low number if IP per decision is a bad thing (or a good thing) but I am saying that he only has a little over 2600 IP career.

Incidentally, IP per decision has held pretty constant over the years, even as the roles of pitchers have changed dramatically. Some examples of IP/decision for active pitchers: Maddux 8.73, Randy Johnson 9.01, Clemens 9.13, Pedro Martinez 8.90, Reuter 8.50, Trachsel 8.44, Smoltz 9.54 (distorted by the fact that closers don't get decisions.)
   59. OCF Posted: January 02, 2005 at 07:54 AM (#1051110)
Of course, if you go down to the single season level, some very freaky things can happen.
Bob Welch, 1991: 6.67 IP/decision (you may remember the year.)
Odalis Perez, 2004: 15.10 IP/decsion.

But for their whole careers, Welch and Perez are at a sensible 8.66 and 9.13, respectively.
   60. jonesy Posted: January 02, 2005 at 01:49 PM (#1051278)
Then what actually is the significance of IP/decision if you say it is neither a bad or a good thing?

In 1929 Grove worked 7.26 IP per start but 10.59 IP per decision?

If Ferrell is the outlier on this system, with the lower number (8.17 IP/decison), doesn't that just validate that he was pitched into the ground by Peck and Cronin. He always (relatively speaking) got the decision. There was no expected bullpen help in Ferrell's starts. It didn't matter what the score, Ferrell was going the distance, or at least left in to take a pounding which made him the pitcher of record.

Being the lowman, numberwise, in this system is indicitive of his early demise from overuse, thus the low career inning total.

Yes?
   61. OCF Posted: January 02, 2005 at 06:14 PM (#1051376)
My RA+ PythPat equivalent records for a bunch of 30's-centered pitchers. Number of decisions is based on IP, so Ferrell goes into the system as a 2600-inning pitcher. The order is by equivalent FWP.

Name         EqW   EqL   Big years bonus
Grove        295   143   145
Hubbell      249   150    76
Ruffing      269   214    27
Lyons        260   202    22
Hoyt         234   184    18
Bridges      190   124    17
Warnecke     184   128    38
Gomez        169   109    46
Root         201   156    12
French       195   155     8
Fitzsimmons  195   163     0
Ferrell      167   124    29
Dean         136    82    35
Haines       193   163    10
Zachary      183   165     3
Whitehill    201   195     4 

The two most important things this ignores are (1) defensive support, and (2) the pitcher's own offensive contribution. Uneven usage patterns, (the #1 plank in jonesy's campaign) also affects this.

Does this underrate Ferrell? Sure. It also almost certainly overrates Ruffing and Gomez. Does it overrate Grove? Grove is so obviously elected that it doesn't even matter.

Ferrell had an injury-shortened career. So did Joss, whom we haven't elected, and so did Dean, whom we probably won't elect. Was the injury caused by overwork? I'm not sure it much matters whether it was overwork, a line drive to the toe, a fatal illness, or being drunk and foolish in Niagara Falls - it's part of the history.

The issue is going to be: how does Ferrell compare to Ruffing, Hoyt, Bridges, Warnecke, Gomez, and Root? And should any of them be elected? (I assume we will elect Grove, Hubbell, and most likely Lyons.)
   62. OCF Posted: January 03, 2005 at 02:13 AM (#1052027)
've conceptually figured out a way to include all of the extra information about Ferrell into the RA+ system. It's all a matter of adjustments to context.

The first, and biggest, adjustment is for his offense. The first thing I did was separate out his pinch-hitting from his pitching. That's easy enough to do - all of those games he didn't pitch and didn't play in the field must have been 1 plate appearance each (barring the rare instance of batting around in an inning). 1933 is a little different, with his 13 games in the outfield, but we can estimate that as well. It turns out that most of his hitting was in games in which he was the pitcher, and that's the part we'll apply to his pitching record.

I made the arbitrary assumption that pitchers' RC/out was about 30% of league average. You can find pitchers worse than that; you can find pitchers better than that. (Mental note: remember that Ruffing was a pretty good hitter himself.) So I took the amount by which Ferrell's RC/out exceeded 30% of league average and added it to the adjusted league context for his games. The result in most years is that the context is adjusted upward by about half a run per game.

Run his PythPat equivalent record against that, and Ferrell comes out with an equivalent record of 179-113, with a "big years bonus" of 54. That moves him into the Bridges/Warneke territory, with a bigger peak.

There's now leftover value in his pinch hitting. But the baseline should be higher - compare him not to the people he was hitting for but to the other guy off the bench his team would have used. Ferrell hit about like an average 1B/LF, which makes him an unusually good pinch hitter. Most teams don't have that good a hitter on the bench - but they have someone. Here I used a baseline of 75% of league average RC. It doesn't make all that much difference: I get 24 RC better than that 75% baseline, 10 of them for his 1933 outfield stint. That amounts to a little over 2 extra wins, maybe 3 or 4 if you account for leverage.

There are still two other adjustments I could do, but don't have the data for. jonesy - for the years in which he faced opponents of non-random quality, can you figure for me the IP-weighted R/G average of his opponents, compared to league average? Chris J. - what does his defensive support look like, and can we estimate that (year by year) in runs/game compared to average?
   63. Paul Wendt Posted: January 04, 2005 at 12:10 AM (#1054338)
I wrote in #40:

Pitcher starts by ballpark and opponent are now available in convenient format, the Retrosheet game logs. The mismatches with the official record are small in magnitude, but numerous. Starts aren't innings and there is no similar record for relief appearances, so the data is not really complete for the purpose of adjusting ERA.
. . .
Chris James has used the game log data to estimate run support for many pitchers with notable careers, presented at the last SABR Convention and published on his website.

Innings per game, for each pitcher, must be a good indicator of the error in estimated run support.


Notes on innings per game and the distribution of pitcher workload among complete games, incomplete starts, and reliefs.

Grove's pitching appearances (Gp) were approximately 48% complete games, 26% incomplete starts (74% starts), 26% reliefs. Career IP/G ~ 6.40
Complete games constitute about 68% of his 3941 career innings.*

Ferrell: Gp approxly 61% complete games, 25% incomplete starts (86% starts), 14% reliefs. Career IP/G ~ 7.01
Complete games constitute about 78% of his 2623 career innings.*

Dazzy Vance: Gp approxly 49% complete games, 30% incomplete starts (79% starts), 21% reliefs. Career IP/G ~ 6.71
Complete games constitute about 66% of his 2967 career innings.

Lefty Gomez: Gp approxly 47% complete games, 40% incomplete starts (87% starts), 13% reliefs. Career IP/Gp ~ 6.80
Complete games constitute about 62% of his 2503 career IP.

I suppose that 9 * CG is a good estimate of innings pitched in complete games.

For this epoch, the 1920s-30s, the accuracy of pitcher run support estimated from team statistics should not be taken for granted.

--
Run support bears heavily on W-L record but not at all on several popular sabrmetrics. RS appears here because Chris James' work on RS is here a well-known use of the Retrosheet game logs for individual pitchers.

Does the three-way distribution of pitcher innings {complete games, incomplete starts, reliefs} vary systematically with the quality of the opposing team or with the quality of its offense? Both are plausible: good pitchers completed a greater share of their starts against weak opponents (and against weak offenses?); they worked more often in relief against strong offenses and against strong teams. If so, then the distribution of pitcher innings may indicate where sabrmetrics other than estimated RS and adjusted WL are biased.
   64. OCF Posted: January 04, 2005 at 12:39 AM (#1054385)
Does the three-way distribution of pitcher innings {complete games, incomplete starts, reliefs} vary systematically with the quality of the opposing team or with the quality of its offense?

There's another interesting source of bias which is, I suspect, far more important in the most recent 20 years than at any previous time: A pitcher is more likely to have a complete game on a day on which he is personally pitching well. These days, a pitcher is likely to pitch a complete game, or nearly so, only if he is pitching a shutout or near-shutout. That would tend to bias the ERA+ of starting pitchers upwards a little - but really only the very good pitchers, the ones with a reasonable chance of pitching shutouts. It may play some role in some of our recent extreme ERA+ years by Maddux and Martinez.

The quality of the offense faced has some affect on whether or not the pitcher seems to be pitching well that day, but there are good games pitched against good teams.
   65. Paul Wendt Posted: January 04, 2005 at 12:58 AM (#1054409)
Only a few league averages can be derived without data on the name of pitcher appearances.

1929 league averages
AL: Complete games are about 49% of games, hence constitute about 49% of innings.
8.9 innings per team game played.

NL: Complete games are about 46% of games, hence constitute about 46% of innings.
8.9 innings per team game played.
   66. jimd Posted: January 04, 2005 at 04:44 AM (#1054972)
Ferrell had an injury-shortened career. So did Joss, whom we haven't elected, and so did Dean, whom we probably won't elect.

There are significant differences between the three at their primes.

Joss makes the WARP first-team all-stars (top 4 pitchers) only once, similarly for Win Shares (they disagree on the years, 1907 and 1908, each puts Joss on 2nd team the other year). His relative lack of innings costs him when compared to some of the workhorses in the 00's.

Dean has three or four first-team all-star selections (most valuable pitcher in baseball in 1934, both agree on '35 and '36, WARP likes Dean better in 1932), five selections overall (add 1933).

Ferrell has five or six first-team all-star selections (most valuable pitcher in baseball in 1935, both agree on '30, '31, '32, and '36, Win Shares like Wes better in 1929), six selections overall.

IP ---- IP (translated)
2623.0 2569.3 Ferrell
1967.3 1874.0 Dean
2327.0 1873.0 Joss

BP's "translated" pitching lines indicate that Joss's extra IP relative to Dean are just a by-product of the increased workloads typical of his era. OTOH, Dean was a better pitcher at his peak (relative to his peers), which should be enough to move him past Joss, though that may not be enough for the HOM.

Ferrell has the extra quality years. Through 1936, his ERA+ is 128, with a 163-98 (.625) record. The W-L record is similar to Sandy Koufax (165-87, 131 ERA+) except that Ferrell did it with mediocre teams (slightly over .500, estimated .515). And Ferrell could hit (as jonesy has stressed), Koufax could not (think Randy Johnson). I haven't done a complete analysis of him yet, but I like what I see so far.

OTOH, the comeback attempts are ghastly. System-specific question to be answered by each voter: Can a pitcher pitch himself out of the HOM?
   67. John (You Can Call Me Grandma) Murphy Posted: January 04, 2005 at 04:49 AM (#1054987)
OTOH, the comeback attempts are ghastly. System-specific question to be answered by each voter: Can a pitcher pitch himself out of the HOM?

Only if you're relying on rate stats at the expense of counting stats. IMO, the idea that you could pitch yourself out of the HOM is preposterous.
   68. DanG Posted: January 14, 2005 at 07:38 PM (#1079951)
Recently, someone here brought up a theory that the replacement level (RL) for pitchers is lower than for position players. This had never occurred to me before, and it's been floating around in my brain ever since. It seems to make sense.

It has occurred to me that each position has its own RL. Catchers probably have the lowest RL: DH's the highest.

Naturally, these will change over time. The pitcher RL would be different pre-1893 vs deadball vs post-1920.

It is generally acknowledged that win shares' major flaw is an improper setting of the RL. It seems that if the RL for each position in each era could be calculated then win shares could be adjusted to Win Shares Over Replacement. Setting the RL correctly would take of the "catcher bonus" that most of us apply. It would fix win shares (generally acknowledged) overrating of outfielders.

An average team in a 162-game season has 243 win shares. If we assume a general RL of .300, then 146 WS (record of 49-113) is the team total of replacement win shares. However, that is actually true only if win shares uses a RL of .000, right? If WS uses a RL of .100, then the team total of replacement win shares is only 97.

That's about as far as my theoretical ruminations have led me. Has anyone already done investigations down this alley?
   69. Paul Wendt Posted: January 15, 2005 at 04:08 AM (#1081014)
I know that others have discussed the implied Win Shares replacement level, if not beat it to death, but I don't the discussion.

Win Shares are shares of wins (times three because there are three stars in a hockey game). The numbers are additive, as the shares metaphor or pie chart implies.

For Bill James, it is an axiom that the sum of player win shares is team wins (times three). In any league, a team with .666 W-L record will have sum of player win shares double that of a team with .333 W-L record.

Doesn't that imply replacement level zero, for Dan (not that BJ would adopt the term).
   70. KJOK Posted: January 15, 2005 at 05:06 AM (#1081092)
Doesn't that imply replacement level zero, for Dan (not that BJ would adopt the term).

It implies a zero replacement level, but in REALITY Win Shares does not have a zero replacment level, but a replacement level of around 20% (40% of league average performance).

So, if a team won 100 games, Win Shares allocates the 300 Win Shares among the '25 players' based on their 'marginal wins' or contribution above the 20% floor. Players who contribute right at the 20% level will not receive ANY Win Shares, so those shares go to someone else on the team.
   71. Joey Numbaz (Scruff) Posted: February 02, 2005 at 11:11 AM (#1119808)
Question regarding replacement level in the two systems.

Win Shares I have pretty well set at 8.8 WS per full season for hitters and 337 IP.

For WARP though, this is what I'm thinking. I'm pretty confident that about 2.0 WARP3 per season need to be deducted for position players.

For pitchers I'm not so sure - I'm looking at it, and I don't think anything needs to be deducted replacement level wise for pitchers.

The problems with WARP are with using FRAR instead of FRAA in their WARP calc. They assume a player that is replacement level offensively and defensively is replacement level when he isn't. That's not even really debated at this point.

Does this apply to pitchers? I don't think it does. Am I missing something here, is there some other reason WARP might have too low of a replacement level pitchers?

Any help would be greatly appreciated ASAP. Thanks!
   72. TomH Posted: February 02, 2005 at 01:59 PM (#1119848)
Joe, I haven't doen an extensive stduy, but I recommend two intuitive tests; MVP-type awards (single season greatness) and career value.

Two possibilities: subtract about 2.0 WARP3 per full season (650 PA)from everyone, or merely position players and not pitchers? Well, actually these are the two binary possibilities, with infinite ways to split the difference in between.

Which method will give a 'better' answer in terms of who the league's best players are each year, and which method will give a better answer to 'who has the highest WARP3 ever?"

Anyone have a 'top career WARP3' list handy? If not, I'll look it up and post it during my lunch today.

For the NL 2004, I found 14 position players with 7.5 WARP3 or greater, and 12 pitchers. That sure doesn't seem like we ought to be taking much more WARP away from the hitters; if anything, pitchers already got their share of the credit last year.
Bonds led with 15.1
Pujols, Edmonds, Rolen, Belte, Loretta, RJohnson all were in the 11s.
   73. TomH Posted: February 02, 2005 at 05:45 PM (#1120248)
Top 17 MLB WARP3 careers (all above 150)

caveat #1: WARP3 as posted on 11 am on groundhog's day 2005 :)

caveat 2: unless I missed somebody

player WARP3
Ruth 225
Bonds 212
Mays 209
Aaron 200
Cobb 191
-------Johnson 190
Musial 188
Speaker 182
Wagner 179
-------Clemens 178
-------Young 176
Collins 174
Williams 169
-------Spahn 162
Morgan 158
-------Maddux 156
Mantle 155

5 of the top 17 are pitchers. Contrast: By Career Win Shares, only one pitcher in the top 10 (Cy). Walter J gets 11th, followed by at least 5 more hitters. So WARP3 is more pitcher-friendly than Win Shares.

If I take 2 WARP3 way from hitters (but not pitchers) for every full season (it should be by plate appearances, but I will estimate here), the new scale would be

player adjusted WARP3
Ruth 195
-------Johnson 190
-------Clemens 178
-------Young 176
Bonds 174
Mays 162
-------Spahn 162
-------Maddux 156
Aaron 155
Cobb 147
-------Alexander 145
Musial 144
-------Seaver 142
Speaker 140
Wagner 135
Williams 135
Collins 130

Wow, THAT list looks waaay too pitcher-heavy. I'd recommend as a sanity check, at least 1 WARP3 per 200 equivalent IP for pitchers, and probably more, to balance things out.
   74. jimd Posted: February 02, 2005 at 07:44 PM (#1120580)
The problems with WARP are with using FRAR instead of FRAA in their WARP calc.

No, it's not. We've been over this before. That doesn't work in their model.

To use FRAA when calculating replacement level, you have to calculate a different offensive replacement level at each position. They choose not to do that; instead they incorporate the necessary adjustment for the offensive difference between positions into the fielding runs instead of the hitting runs. Win Shares also does it that way.
   75. EricC Posted: February 02, 2005 at 11:48 PM (#1121192)
Question regarding replacement level in the two systems.

Win Shares I have pretty well set at 8.8 WS per full season for hitters and 337 IP.

For WARP though, this is what I'm thinking. I'm pretty confident that about 2.0 WARP3 per season need to be deducted for position players.

For pitchers I'm not so sure - I'm looking at it, and I don't think anything needs to be deducted replacement level wise for pitchers.


Joe-

FWIW, I've fit player performances to the tail ends of Gaussian distributions to attempt to get empirical replacement level values.

Based on data from 1901 to 1938, I found that replacement level position players earn an average of 1.27 win shares per 100 plate appearances or 6.4 win shares for a typical 154-game player-season of slightly more than 500 plate appearances. This is reasonable, and is in line with other estimates that I've seen. It's a little lower than your 8.8, but I'm counting about 11 position player roster spots per team.

Replacement level pitchers averaged an ERA+ of 82. (Actually I was surprised to find that they were this good.) Perhaps you could look at typical WARP3 rates for pitchers with ERA+ around 82 in order to estimate how much should be deducted from pitcher WARP3's to make the baseline closer to a true replacement-level baseline.
   76. jimd Posted: February 03, 2005 at 02:25 AM (#1121370)
Based on data from 1901 to 1938, I found that replacement level position players earn an average of 1.27 win shares per 100 plate appearances

Using the 154 game seasons (1904-1917,1920-1938) I find that the average team has 5689 AB+BB, which is 632 per batting order position. This normalizes to 665 for a 162 game schedule. (The 8.8 was based on WS normalized to 162.) Using the 1.27 conversion factor noted, this works out to replacement level of 8.45 Win Shares. (Note that this doesn't include S, SF, HBP, etc.)

I'd say the estimates are pretty close.
   77. Joey Numbaz (Scruff) Posted: February 03, 2005 at 02:50 AM (#1121403)
"The problems with WARP are with using FRAR instead of FRAA in their WARP calc.

No, it's not. We've been over this before. That doesn't work in their model."

I know we have been over this before Jim :-)

While what I described may not be their actual model, it's in effect what they are doing. Tango has gone into this many times with me. IIRC we had this semantic debate awhile back, and we ended up agreeing that we were basically saying the same thing . . . I wish I knew where that discussion was.

******

Eric, very promising, I'll do a little checking and see what I can come up with there . . .
   78. KJOK Posted: February 03, 2005 at 03:24 AM (#1121436)
Whatever the methodology that causes it, the CREATOR of WARP posted that the replacement level comes out to be an average AA players, which IMO is a very low base, although not as low as Win Shares...
   79. Joey Numbaz (Scruff) Posted: February 03, 2005 at 03:47 AM (#1121457)
I'll list those I find . . .

player     year ERA+ WARP1 WARP3  IP  DERA*
Capuano    2004  83   1.8   1.8   88  5.15
Mahler     1978  86   1.3   1.1  135  5.25
Beech      1997  84   2.3   2.3  137  4.82
Beech      1998  85    .8    .6  117  5.64
Whitrock   1894  81   -.4  -1.1   74  5.98
Walsh      1928  81    .9    .5   78  4.84
Walsh      1929  76    .9    .2  129  5.75
Walsh      1930  85   1.8   1.3  104  4.34
Maduro     2002  79    .8    .6   57  5.58
Salkeld    1996  80   1.4   1.3  116  5.03
Grimes     1917  80   1.2    .3  194  5.42
Grimes     1919  85    .8   -.3  181  5.54
Grimes     1925  83   4.0   2.5  245  4.88
Grimes     1932  79    .8    .6  141  5.74
Rixey      1919  81   1.1    .3  154  5.06
Faber      1919  83    .0   -.5  162  5.86
*adjusted for season
   80. Joey Numbaz (Scruff) Posted: February 03, 2005 at 03:58 AM (#1121466)
KJOK, I strongly disagree. Well sort of.

I seriously doubt your typical AA player could come up and post a 9 WS season in the majors.

I also think the creator overexaggerated when he said that.

I agree WS has a slightly lower replacement level than WARP (3 wins vs. 2 per season), but neither is at AA level, despite what the creator of the system says.
   81. jimd Posted: February 03, 2005 at 04:29 AM (#1121497)
IIRC, that quote was from a couple of years ago, and the WARP system has changed a few times since then. It seems to me like the fielding replacement level has been bumped up during some of those changes.
   82. Paul Wendt Posted: February 04, 2005 at 02:37 AM (#1123644)
In Fall 2003 or Winter 2004, WARP was revised to incorporate credit/debit for pitcher fielding. Author Clay Davenport was not happy about it then. I don't know the present situation.
   83. Joey Numbaz (Scruff) Posted: February 06, 2005 at 11:32 AM (#1127904)
Based on the numbers I posted above, realizing it's a small data set, here's what I come up with:

All averages are weighted by IP:

ERA+: 82.1
WARP1: 1.39
WARP3: .79
IP: 132
DERA*: 5.30

So if these pitchers truly are replacement level, we are talking 190 IP gives them 2.0 WARP1.

One problem, I did not account for hitting here. I have no idea if this group of pitchers were good or bad hitters, but I assume they are typical.

Does anyone thing 82 ERA+/5.30 DERA isn't a reasonable spot to set replacement level? I'll probably be conservative and set 2.0 WARP per 200 IP as replacement level until further notice . . .
   84. Joey Numbaz (Scruff) Posted: February 06, 2005 at 11:55 AM (#1127907)
Further notice :-)

That didn't work right at all. When I tried to subtract 2.0 WARP3 per 200 IP (adjusted for season length) from Pud Galvin, I wound up with him getting -23 WARP3 for his career. Obviously that is way off the mark.

I've got to be missing something here. Anyone have any ideas?
   85. Paul Wendt Posted: February 06, 2005 at 05:36 PM (#1128074)
If the problem is related to the difference between average-level and replacement-level fielding, it can't be a big problem for pitchers.
If replacement-level fielding has been somehow normalized across positions, perhaps covers only 8 positions.

In Fall 2003 or Winter 2004, WARP was revised to incorporate credit/debit for pitcher fielding. Author Clay Davenport was not happy about it then. I don't know the present situation.

My incomplete reading: C.D. still suspects that there is double counting in the application of 8-position methods to pitchers.
   86. Cblau Posted: April 18, 2005 at 01:18 AM (#1265328)
Bill James let those of us in SABR's Statistical Analysis Committee know that he is working hard on a new version of Win Shares (so hard, in fact, that he is neglecting his Red Sox work.) He identified 4 major flaws, 2 of which are the absence of Loss Shares and overrating 19th Century pitchers. (The low baseline was not one of the major flaws.) It will be a year or two before he publishes anything, though. He describes the system as very different from what was in the book, although it gives similar results in most cases.
   87. John (You Can Call Me Grandma) Murphy Posted: April 18, 2005 at 01:26 AM (#1265345)
He describes the system as very different from what was in the book, although it gives similar results in most cases.

I was very happy to see that when I got the group e-mail from him, Cliff. A major revision would have been disheartening.
   88. jimd Posted: April 18, 2005 at 09:07 PM (#1267708)
He identified 4 major flaws, 2 of which are the absence of Loss Shares and overrating 19th Century pitchers. (The low baseline was not one of the major flaws.)

Did he identify the other two major flaws? If so, what were they?
   89. KJOK Posted: April 18, 2005 at 09:29 PM (#1267773)
Here's the whole posting in SABR-L:

Date: Sat, 16 Apr 2005 13:52:49 EDT
From: BilJames@aol.com
Subject: Re: Loss Shares

I appreciate your interest in Win Shares and Loss Shares.

A year or two after publishing "Win Shares", I made a list of what I had done wrong in constructing Win Shares, and began thinking about how to fix the problems. There are lots of little mistakes, of course, but the four big
mistakes I made were:

1. Failure to include Loss Shares.
2. I allowed the system to become too complex, which led inevitably to
small errors. I should have given more thought, as I was developing the system, to how to do things in a less complicated (and therefore more transparent) manner.
3. The values for 19th century pitchers are too high.
4. I ignored post-season play.

I didn't include Loss Shares, honestly, because there were certain things about the way they work out that I was not emotionally prepared to accept. There are a couple of things about them that I just simply don't like, or
didn't like at the time, and I was fighting to find some way to "repair" or "avoid" things that, in the end, I just had to come to terms with.

Anyway, I have been working on this for the last year or so, and I have essentially completed a system of Win Shares and Loss Shares. I have
been working obsessively on it for the last several months, to the point at which
I have started worrying about getting fired by the Red Sox if I don't get some Red Sox work done. I have tried and tried and tried to control the amount of time I put into the project, but I just can't.

As I said, the system is essentially done, except that:

1) I keep finding huge gaps in my logic and/or execution, and am reluctant to produce anything until I am more confident that I have
gotten rid of these, and

2) I am just building the file of players figured. . .I have figured about 2000 player/seasons so far.

I expect to be publishing something about this within the next year or perhaps two. The system as it is now is very substantially different
than it was in 2002, when the Win Shares book was published, although of course the
results are very similar in most cases. I have not addressed the post-season play issue; I suspect that one will have to wait until the next generation.


Again, I appreciate your interest.


Bill James
   90. jimd Posted: April 18, 2005 at 11:45 PM (#1268206)
Too bad. Not what I would consider to be the major flaws.

1. Failure to include Loss Shares.

I never considered this a major failing. In most cases, an approximation of Loss Shares can be made from the player's playing time. A full-time average position player should work out to 20-20. This approximation should work fine for most players. Dealing with the players whose value (for 162 games) goes past 40 Win Shares becomes the issue (2nd order issue is their impact on their teammates).

2. I allowed the system to become too complex, which led inevitably to small errors. I should have given more thought, as I was developing the system, to how to do things in a less complicated (and therefore more transparent) manner.

I never considered this a major failing either. (Esthetically, yes, but I thought the complexity was essentially unavoidable.) Interesting to see where he simplifies the system.

3. The values for 19th century pitchers are too high.

I believe this to be a symptom of an underlying flaw, not a flaw in and of itself. The split between fielding and pitching has problems, and changing it only for the 19th century will not address how it also undervalues modern pitching.

4. I ignored post-season play.

I never considered this a major failing either.
   91. Cblau Posted: April 19, 2005 at 01:14 AM (#1268767)
I wouldn't expect there to be major changes in rankings of players, for the simple reason that any reasonable system will produce roughly the same results. I mean, if you just look at games played, you'll see mostly great players near the top, mediocre ones in the middle, poor ones at the bottom. Career batting average accurately distinguishes among Babe Ruth, Ron Cey, and Mario Mendoza. If you just combines OPS and playing time, you would probably produce rankings not much different than Win Shares does.
   92. Paul Wendt Posted: April 23, 2005 at 05:42 AM (#1281576)
19 Apr 2005, jschmeagol
one reason for our 'failure' to elect the 1890's trio may be the discrepancies between WS and WARP on those three.

To wit:

[ WIN SHARES ]
Name.......Career..WSAA...WSPS
Duffy.......327.....133....41
GVH.........372.....151....26
Ryan........341.....104....26

[ WARP ]
Name.......Career..WAA....WPS
Duffy.......76.8....31.3...5.2
GVH.........82.5....25.....1.5
Ryan........76.2....21.4...3.0

*WS are schedule adjusted to 154 games and WARP3 scores are most likely not the most up to date version.

These are pieces of my system for evaluating players in Win Shares and WARP3. Career is the career totals in the respective areas. WSAA and WAA means Win Shares or WARP Above Average, average being 15 WS and 4.0W. WSPS and WPS is my peak score or WS above 25 and W above 7.


Win Shares may rate these three players relatively highly because it likes center fielders and long careers. Some people would say 'overrates' one or the other.

Is there a theoretical explanation of average WARP, 4.0?
   93. Mark Shirk (jsch) Posted: April 23, 2005 at 02:36 PM (#1281774)
Nope, no theorectical explanation. I picked it by trying 3.0, 5.0 and a few points in between and 4.0 just seemed about right and even if say, 3.8 is 'average' then 4.0 still works for me. I guess it just seems to match up with 15 WS pretty well.

It may be a little high (in fact it probably is) but I would rather err on the side of a high average than a low average. The problem with those three isn't the WSAA (in which they are at least respectable) but my peak measure. And I dont' really think that 7.0 is too high a number in WARP 3 to skewer the stats. In fact I have thought about raising it in the past.

Could their be a problem in the way that I prorate to 154 games seasons? Possible. I usually take the average number of games that a team played in that league that season, and divide 154 by that number then multiply the WS (I think, I haven't done it in a while). So it isn't like I am figuring WS per game and then adding 20 to 30 extra games. I think my way is the more conservative approach.

So I just think that WS loves CFers. I don't find anything in the methodology that says this (except for their long careers, but as a peak/prime voter that isn't as big a deal for me) but there is a difference bewteen WARP and WS on most every CFer. WARP may vastly underrate them, but I somehow think it is a little of both.
   94. Paul Wendt Posted: April 23, 2005 at 04:26 PM (#1281900)
Bill James is working hard on a new version of Win Shares (so hard, in fact, that he is [. . .]):
"working obsessively on it for the last several months, to the point at which I have started worrying about [. . .]."

There, I have edited in Bill's pecuniary interest.

--
BJ:
2. I allowed the system to become too complex, which led inevitably to small errors.

jimd:
I never considered this a major failing either.


If its complexity "inevitably led to small errors" then it is a problem. I don't know what kind of errors (exposition? calculation? approximation?) or why this may be.

Segmentation is a type of complexity. Relative to all-time parameters, segmented ones provide better approximation to some unknowable truth but [see below #]

BJ:
3. The values for 19th century pitchers are too high.

jimd:
I believe this to be a symptom of an underlying flaw, not a flaw in and of itself. The split between fielding and pitching has problems, and changing it only for the 19th century will not address how it also undervalues modern pitching.</i>

Symptom of what underlying flaw?

Segmentation
The system includes some segmented parameters, which have different magnitudes in different timespans. Further segmentation is a favorite tweak around here, regarding the share of pitching v fielding and, among fielding positions, the basic values of 3B, 2B, 1B. But it is costly and not only in added complexity.

#Segmentatino introduces errors for local purposes near the ends of the segments. Eg, segmentation at 1900, using one parameter value for the 19th century and another for the 20th, introduces errors in comparisons between 90s and 00s.
   95. OCF Posted: April 23, 2005 at 08:01 PM (#1282676)
#Segmentatino introduces errors ...

I'm not quite sure whether that's a Venezuelan right fielder trying to play third base, or whether it's a notation in the printed parts whose meaning the orchestra isn't quite sure of.
   96. jimd Posted: April 25, 2005 at 03:36 PM (#1286092)
Symptom of what underlying flaw?

The split between fielding and pitching has problems,
   97. andrew siegel Posted: April 25, 2005 at 04:12 PM (#1286152)
Here's a question about WARP: fielding replacement level seems to move over time, creeping closer to average as we move forward in time. Therefore, for fulltime 1890's SS's, the baseline appears to be something like 40-50 runs below average; for 1920 SS's, something like 30-35 runs below average; for 1950s SS about 25-30 runs below average; and for modern SS about 20 runs below average.

What do we think of those assumptions?

I ask that question primarily because of its implications for Hughie Jennings. If WARP is right, his fielding value is through the roof, and his five-year peak run when season-length adjusted is one of the top 10 or 15 of All-Time.

However, a substantial percentage of his value comes from simply being an average defensive SS [note: I am NOT saying WARP reads him as an average defensive SS, only that--in addition to his value for being way better than average he is also getting a lot of credit for being average]. If the baseline is too low, then Jennings's peak drops substantially . . . and his candidacy dies.
   98. karlmagnus Posted: April 25, 2005 at 05:10 PM (#1286256)
Andrew, this is surely a philosophical problem ratehr than a measurement one. As the number of rocks in the infield declined, the balls became rounder and cleaner, and the gloves became softer and larger, everybody's fielding moved closer and closer to perfection. However, this didn't take the form particulalry of an increase in range, but of a decline in the error rate, and the runs resulting from erros and near-errors. An error rate that would have been perfectly acceptable in the 1880s, or in the Negro Leagues, was not acceptable by say the 1930s.

Hence the divergence in runs allowed betwen the best fielders and the worst declined sharply between 1885 and 1935. WARP correctly reflects this, and shows the value of fielding above replacement level as declining. However, the skill of the superior fielder did not decline rerlative to that of the poor fielder -- once the real stone-gloves had been moved out of the infield in the 1870s (hi, Levi Meyerle!) the replacement level infielder was just as good as in 1885 as in 1935, but working with more difficult conditions and inferior equipment.

In terms of "Merit" therefore, you can argue that the declining differential is unreal; in terms of runs it is very real indeed. It's yet another problem with uber-stats. Personally, I would like to rate an A shortstop in 1935 with a 120 OPS+ over 5000AB the same as an A shortstop in 1885 with a 120 OPS+ over 5000AB. After all, the 1935SS is actually a BETTER fielder than the 1885 one -- not just because of better equipment, but presumably because of improved technique. However WARP won't let me do so, and in many ways, it's right.
   99. karlmagnus Posted: April 25, 2005 at 05:14 PM (#1286265)
Incidentally, the real beneficiaries if we decide to take as real WARP's declining differential between good and bad fielders should surely be Lave Cross and Deacon McGuire rather than Jennings, because their fielding advantage was received in so many more games.
   100. Paul Wendt Posted: April 27, 2005 at 06:28 PM (#1292953)
jimd:
> Symptom of what underlying flaw?

The split between fielding and pitching has problems,


That isn't underlying very deep!
Do you think the slip between fielding and pitching should be determined analytically for each season?

andrew siegel:
Here's a question about WARP: fielding replacement level seems to move over time, creeping closer to average as we move forward in time. Therefore, for fulltime 1890's SS's, the baseline appears to be something like 40-50 runs below average; for 1920 SS's, something like 30-35 runs below average; for 1950s SS about 25-30 runs below average; and for modern SS about 20 runs below average.

What do we think of those assumptions?


Is it an assumption or a finding? A finding, I think. In effect, the significance of fielding is determined analytically in each season.
Page 1 of 4 pages  1 2 3 4 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Don Malcolm
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 1.3476 seconds
49 querie(s) executed