|
|
Hall of Merit— A Look at Baseball's All-Time Best
Monday, February 05, 2007
|
Support BBTF
Thanks to Chicago Joe for his generous support.
Bookmarks
You must be logged in to view your Bookmarks.
Hot Topics
Mock Hall of Fame Ballot 2024 (13 - 11:23pm, Dec 08)Last: Space Force fan2024 Hall of Merit Ballot Discussion (191 - 7:43pm, Dec 07)Last:  Howie Menckel2024 Hall of Merit Ballot Ballot (4 - 3:10pm, Dec 07)Last: JaackHall of Merit Book Club (17 - 10:20am, Dec 07)Last: cookiedabookieMock Hall of Fame 2024 Contemporary Baseball Ballot - Managers, Executives and Umpires (28 - 10:54pm, Dec 03)Last: cardsfanboyMost Meritorious Player: 2023 Results (2 - 5:01pm, Nov 29)Last: DL from MNMost Meritorious Player: 2023 Ballot (12 - 5:45pm, Nov 28)Last: kcgard2Most Meritorious Player: 2023 Discussion (14 - 5:22pm, Nov 16)Last: Bleed the FreakReranking First Basemen: Results (55 - 11:31pm, Nov 07)Last: Chris CobbMock Hall of Fame Discussion Thread: Contemporary Baseball - Managers, Executives and Umpires 2023 (15 - 8:23pm, Oct 30)Last: Srul ItzaReranking Pitchers 1893-1923: Results (7 - 9:28am, Oct 17)Last: Chris CobbRanking the Hall of Merit Pitchers (1893-1923) - Discussion (68 - 1:25pm, Oct 14)Last: DL from MNReranking Pitchers 1893-1923: Ballot (13 - 2:22pm, Oct 12)Last: DL from MNReranking Pitchers 1893-1923: Discussion (39 - 10:42am, Oct 12)Last: GuapoReranking Shortstops: Results (7 - 8:15am, Sep 30)Last: kcgard2
|
|
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
Seriously, we appreciate you taking the time to create this and allowing us here at the HoM to examine it.
Here is the text of the methodology explanation included in the .zip file.
With some trepidation, I’ve decided to leap headfirst into the überstat wars. The problems with both BP WARP and particularly Win Shares have been well-documented, above all regarding their replacement level definitions and opaque or nonexistent timelines. I’m here to offer my own, that I immodestly believe to be *far* superior to either available system (although I do leech off of them for defense). I’ll post a detailed account of the methodology here, and I have the data available in spreadsheet form. I hope many of you will find this useful for the 1994 and subsequent elections. I look forward to discussions on this thread either about my approach, or about conclusions one might draw from the data and how they might alter voters’ ballots.
I only have data for non-catcher NL starters from 1893-2005, but I will hopefully be able to post estimates for pitchers, NL catchers, and all AL players who are receiving HoM consideration in time for the 1995 ballot.
Methodology
Step 1: Wins above league average
Offense
1. Using Extrapolated Runs (1947-2005) or BaseRuns (1893-1946), find out how many runs a player created.
2. Subtract the player’s batting outs from the average batting outs per team for that league-season to determine the outs left over for teammates on a theoretical average team.
3. Multiply the remaining outs by the league runs scored per out, and add the player’s runs created, to get theoretical team runs scored.
Defense
1. Input raw Fielding Win Shares (FWS) and BP FRAA (1893-1999), Chris Dial’s Zone Rating RSpt (1987-2005), UZR (2000-2003), and Fielding Bible +/- (2003-2005) for every starter (led team in PA at the position) in the league.
2. Calculate the league average FWS per full season at each position for each season. Multiply this by each player-season’s percentage of the season played, and subtract the product from each player-season’s FWS, to get FWS above/below average. Divide by three, and multiply by the league-season’s runs per marginal win (equal to 3.32*((runs per game)^.7103)), to get Win Shares Fielding Runs Above Average (WS-FRAA).
3. Calculate the standard deviation (stdev) of RSpt per full season for each position for the following time periods: 1987-1999, 2000-2003, and 2003-2005. Calculate the stdev of BP FRAA and WS-FRAA per full season for each position from 1987-1999, the stdev of UZR per full season from 2000-2003, and the stdev of +/- from 2003-2005.
4. Multiply each player-season’s BP FRAA and WS-FRAA by the ratio of the RSpt stdev from 1987-1999 to the respective BP FRAA and WS-FRAA stdev for the same time period. Multiply each player-season’s UZR by the ratio of the RSpt stdev from 2000-2003 to the UZR stdev for the same time period. Multiply each player-season’s +/- by the ratio of the RSpt stdev from 2003-2005 to the +/- stdev for the same time period. This standardizes the stdevs for all the different defensive metrics (to a level less than BP FRAA but higher than WS).
5. Average the modified BP FRAA and WS-FRAA scores for each player-season from 1893-1986. Take a weighted average (40% RSpt, 30% modified BP FRAA, 30% modified WS-FRAA) for each player-season from 1987-1999. Take a weighted average (70% modified UZR, 30% RSpt) for each player-season from 2000-2002. Take a weighted average (55% modified UZR, 30% modified +/-, 15% RSpt) for each player-season from 2003. Take a weighted average (65% modified +/-, 35% RSpt) for each player-season from 2004-05. This is the player’s Fielding Runs Above Average (Rosenheck FRAA).
6. For the 1893-1918 period, when LF was more important and difficult than RF, add 2 runs per season to LF and subtract 2 per season from RF.
7. Subtract the player’s Rosenheck FRAA from the league average runs scored per team. This is theoretical team runs allowed.
Record
1. Input the theoretical team runs scored and runs allowed into the Pythagorean theorem (exponent = ((RS+RA)/G)^.285)) to get a winning percentage. Multiply this by 162 to get theoretical team wins. (W1) This is a straight-line season-length adjustment.
Step 2: League adjustment
1. Conduct a multiple regression analysis on the stdev of W1 above average per season from 1893-2005. (For those who are interested: the formula is .00366*Year + .1254*Runs per game - .028777 * NL Teams - .00567 * MLB Teams - .00256 * Season Length - .932 * Win% of worst team in league - .0278 * Years since expansion (Max 12) + .15 * World War (1 or 0) + .00158 * Estimate of player population (14.8M in 1893, 60.1M in 2005) - .2466* Integration (1 or 0) + 2.789, and the r^2 is .5766. The graph is the second tab of the Excel spreadsheet in the file.).
2. Use the regression equation to get a projected stdev for each league-season.
3. Regress each player-season’s W1 to 81 using the following equation: (Reg = 2005 stdev/league-season in question’s projected stdev). (81*(1-Reg)) + (Wins*Reg). This is W2.
Step 3: Replacement level
1. Calculate W2 above/below average per full season played for every starter.
2. Average the W2 below average for the worst three starters at each position in the league for every league-season.
3. Average these worst-three-regulars averages at each position from 1985-2005.
4. Subtract the worst-three-regulars average from 1985-2005 from Nate Silver’s empirically determined Freely Available Talent (FAT) replacement levels for each position for the same time period.
5. Add the difference to the worst-three-regulars average at each position for each year. This is the FAT level at each position for each season.
6. Take a nine-year moving average of the FAT level for each position over time. This is the replacement level (measured in wins below average per full season). A graph of positional replacement levels over time is the third tab of the Excel spreadsheet.
7. For each player-season, multiply the replacement level by his fraction of the season played, and subtract the product from his W2. This is WARP2.
Step 4: Salary estimator
1. Divide a player’s WARP2 by his fraction of the season played (measured by % of the average PA per lineup slot for that year) to get WARP2 per full season.
2. Use Nate Silver’s 2005 salary estimator ($212,730*WARP^2 + $402,530*WARP) to find out how much the player would have earned on the 2005 market had he played a full season. Convert all negative numbers to $0.
3. Multiply this by the player’s fraction of the season played to find out how much he would have earned for that season.
Notes
Step 1: The offensive methodology is fairly straightforward, and quite similar to most other approaches. Note that BP’s BRAA are standardized to a 4.5 runs per game league. The FRAA number is simply a weighted average of the best defensive statistics available to us, but with the standard deviations equalized so that one doesn’t count for more than another.
Step 2: Using a regression-predicted stdev rather than an actual stdev should account for all factors that determine stdev EXCEPT for changes in *concentration* of talent (i.e., very many or very few great players in the league at a given time) and random, meaningless fluctuation (no NL player happened to have a big year in 1995, lots did in 2001). If you did real standard deviations, Zack Wheat would probably look like Tris Speaker, since they were both the premier hitters of their era in their respective leagues. By using projected stdev, we can factor out things like integration and expansion that affect the spread of performance in the league while still giving actual talent its due.
And NOTA BENE: DO NOT CONFUSE THIS WITH EITHER A TIMELINE ADJUSTMENT OR A LEAGUE DIFFICULTY ADJUSTMENT. IT IS NEITHER. To be clear, the 1993 NL is regressed MORE than the 1914 NL. Anyone who thinks that the level of play was higher in 1914 than 1993 is crazy. What this corrects for—and NOTHING ELSE—is what is often colloquially called on this site the “ease of domination,” when people talk about it being “easier” to “accumulate win shares or WARP1” in certain years than in others. If you want to timeline or adjust for league difficulty, you still need to make those adjustments—this does NOT account for them.
Step 3: As far as I’m concerned, Nate Silver’s FAT research is the last word on replacement level. Using a percentage of positional or league average is silly: the former suggests that the presence of, say, A-Rod, Jeter, Nomar, and Tejada makes a replacement player (Neifi Pérez, Pat Meares) better than they otherwise would be; the latter is incapable of capturing changes in the relative depth of positions over time. By contrast, the worst-three-regulars average (adjusted for the gap between it and the FAT level) should always track the real empirical replacement level, since they are the most likely to be close to actual replacement players.
Step 4: The salary estimator concept, which I should give credit to David Zylberberg (‘zop on BTF) for coming up with originally, seems to me to be the ideal way to combine the value of career and peak, durability and rate. “What would the market pay for this player-season?,” a University of Chicago economist would ask. Here’s the answer.
1. Is it any coincidence that two of the last three times the single season home run record was set (1961 and 1998) were expansion years? The stdev-regression equation finds that years since expansion (combined with the typically low worst-team win% in expansion years) has one of the strongest correlations to league stdev, so expansion seasons (Willie Mays' and Frank Robinson's 1962, Willie McCovey's 1969-70, Barry Bonds' 1993 and Bagwell's 1994, and McGwire's 1998) tend to get docked pretty harshly in the W1-W2 adjustment. I think this is fair--we tend to not discount big seasons by saying "it was an expansion year" the same way we do by saying "it was 1930 or 1894." I think we should.
2. Most replacement levels seem pretty stable, but the one that bounces all over the place is 2B. Rather than simply "switching places with 3B before 1920," it seems to be around its historical average in the 1890s, SOARS up to around corner outfielder level around 1910 (ouch, Larry Doyle--if it's the same in the NL then Lajoie and Collins are somewhat overrated), then falls back to a historical norm for the 20s through the 70s, then ZOOMS up again in the 1980s--sorry, Ryno--before returning again to its long term average by the late 1990s. I know (I think) what accounts for the deadball spike (although not for why 2B wasn't so strong in the 1890s as well)--but what happened in the 1980's? The major increase in NL 2B production is reflected in the positional averages as well, I believe.
I haven't dug much into the individual player-season data, but I'll start poking around now. (It took forever just to derive it). I'll get cracking on estimates for AL players, pitchers, and NL catchers, but they will be much less accurate than these numbers, unfortunately--it takes a *long* time to crunch this (I enter in all the FRAA, FWS etc. by hand! If you've got a faster way to do that, let me know!)
It's worth reiterating that these replacement levels are *not* measures of the overall strength of a position--they have nothing to do with the performance of the best players at a position. They are only a measure of the *depth* of the position, what the freely available talent level is.
I opened up your spreadsheet, BTW...do you have player totals somewhere?
I don't, but it's as easy as sorting by player name and hitting AutoSum...just be careful for missing partial seasons, since I only did starters (I'm trying to fill in partial seasons for HoM candidates, but there's lots of stuff to do!).
Sigh ... as well as every other year of his entire career ...
Gosh, to me the 1961 AL always stuck out as a crazy stdev outlier.....
Cash, Mantle, Maris, Howard...
I couldn't agree more. One of my biggest problems with previously existing player evaluation systems is that they don't appropriately account for expansion. It dilutes the league and brings down the average and replacement levels against which players are judged. Expansion is largely responsible for the fact that most systems have disproportionate numbers of players who peaked in the 60s/early 70s and 90s/early 00s toward the top of the rankings, with far lower representation from the mid 70s through the 80s. I've only worked with OPS+ and ERA+ data on this issue, but my research has shown that expansion causes a bump in both of these statistics in terms of the league top 10s for about 5 years before returning to their previous levels. What some may find counterintuitive is that OPS+ and ERA+ always move in tandem at the highest levels, rather than the league-leading hitters improving at the expense of the pitchers or vice versa. The top values appear to be much more influenced by changes in the average level, chiefly through expansion, than by changes in the players occupying the top spots on the leaderboard. I'm interested to see how the player rankings differ from other evaluation systems by incorporating this factor into the model.
I am always stupid in these conversations, so feel free to point out why :)
Bob Dernier Cri, you are right that expansion lowers the absolute empirical replacement level relative to average (and in early versions of this system, guys from the 8-team leagues came out poorly since replacement was so close to average). But my approach here is to first standardize all the wins-above/below-average scores for every player to the projected standard deviation of the league, puttinh all seasons (regardless of expansion, league size, and every other factor under the sun) on an equal footing. THEN I look at the worst three regulars and calculate replacement level. So replacement level is really tied to a z-score--say, 2 standard deviations below the mean (I have no idea what the real number is)--rather than an absolute figure.
My point about expansion was that since those seasons tend to have much higher standard deviations, they get regressed more than other seasons.
Now let's compare him to, say, 1895 Jesse Burkett. After straight-line-adjusting for season length, Burkett was 6.8 wins above average with the bat and 0.9 wins above average with the glove, for 7.7 wins above average total. He too had an above-average PA total, so he was 7.1 wins above average per season. A replacement LF for 1895 would be someone like Tommy McCarthy, who was -0.6 wins per season with the bat and -0.9 wins per season with the glove, for -1.5 wins per season total. The projected standard deviation for the 1895 NL is 3.25 (the real one was 3.27), so Burkett was (7.1-(-1.5))/3.25 = 2.65 standard deviations per season better than McCarthy. Readjust for Burkett's high PA total, and he was 2.89 stdevs above replacement.
Thus, although Burkett's raw .409/.486/.524 line obviously dwarfs Concepción's modest .281/.348/.415, his 154 OPS+ is still far superior to Concepción's 107, and his 9.4 wins above replacement exceed Concepción's 6.7, once you adjust for the "ease of domination" of the 1895 NL, the two seasons were equally valuable: both made a 2.9-standard-deviation-above-replacement contribution to a pennant. In the 2005 NL, with a 2.30 projected standard deviation, a 2.9 stdev-above-replacement contribution is worth 2.3*2.9 = 6.7 WARP.
I hope this example makes clear what this statistic measures, and why I feel it can add to the discussion and perhaps change people's opinions as we move into the final ballots.
Although I recognize that your system is limited at this time, one thing I'd really like to see is a DRpHOM-not-HOM list. In other words, I'd like to know what guys your system would put in and leave out, among the players we've considered that you currently have data for. You could just assume the NgLs and ALs and NL catchers are the same for now, but show us what guys would be different via your system than by HOM consensus. It would just help a bit in understanding the results your system offers.
I appreciate your efforts to adjust for league strength. I'm skeptical that all the formula isn't just randomly throwing various quality of play factors together, but its an honest attempt.
The bigger problem for me is that I believe we have no idea how league strength in say, 1895 to 1995. I am personal believer that league strength is vastly higher in recent years than in the early years of baseball, but I have no idea how much, and frankly, I don't think anyone else does either. Taking expansion into account makes sense when comparing the results of 1961 with 1960, but makes no sense to me when comparing 1993 and 1893.
Its not just that the potential population that MLB draws from is much higher in 1993 than 1893, its that we don't have the foggiest idea what that population really is. Talent aside, not every 25 year old male has the potential to play major league ball no matter how good that player is, and its not just because of segregation and other similar issues. In 1893, how many young people were playing baseball often enough to get good enough to play in the Majors? How many were playing some type of organized baseball? Of those, how many had the potential to be noticed by scouts? How many played for some type of "minor league" team, despite the fact that they were good enough to play in the National League? I have no idea, but my guess is that the real population that MLB was drawing from is much, much smaller than what is estimated in the formula. Not only does modern Major league baseball draw from a much, much larger population (young males in US + young males in other countries), but there is a much more organized system for funneling the best players at each level to the major leagues.
So what do we do about it? I'm not a believer in using a timeline - not because I don't think the level of competition is higher today, but because I don't think there is any realistic way for measuring the difference. However, if we are not going to adjust for the quality of play over time, I also don't think we should be adjusting for minor differences in year to year play either, such as are caused by expansion.
I am aware that people have made attempts to measure differences in quality of play over time, but I have never agreed with the methodologies. I know many, if not most people on BTF disagree with me on some of these points, but its my 2 cents.
Thanks for your comments. I'm afraid I can't be clear enough about this (which is why I put it in bold in the methodology description)--I am not attempting to adjust for league strength or quality of play!!!
People conflate standard deviations and quality of play, I imagine due to Stephen Jay Gould's article, but they are by no means the same thing. To repeat, I regress the 1993 NL more than the 1914 NL--this is not a league "strength" adjustment. It's a league standard deviation adjustment, nothing more and nothing less. All I am doing is measuring the spread of performance in a league and standardizing it across eras, so that a two-stdev contribution to a pennant in 1893 is worth the same as a two-stdev contribution to a pennant in 2005. As I said, if you want to adjust for quality of play or timeline, you have to make that adjustment yourself to my WARP2 numbers. The projected standard deviation of the 1914 NL was the same as that of the 1993 NL, but clearly the level of play was higher in 1993, and I'm not accounting for that. I personally am against timelining. I am just trying to be fair to all eras, which requires looking at the distribution of performance in each season. 10 WARP in 1893 "bought" fewer pennants than they do in 2005, because the stdev was higher. That's true no matter whether 1893 was a particularly strong or weak league in terms of absolute quality of competition.
Dan, I see that your system seems to love 70's-80's sluggers like Mike Schmidt, Pedro Guerrero, Jack Clark, and Dale Murphy. Could you explain why?
</straightman>
Name Career Salary
Barry Bonds $355,075,512
Honus Wagner $301,940,127
Willie Mays $240,268,011
Rogers Hornsby $231,526,395
Stan Musial $210,776,870
Hank Aaron $210,319,344
Mike Schmidt $209,650,472
Joe Morgan $181,519,676
Frank Robinson $160,375,579
Mel Ott $154,068,892
Arky Vaughan $152,338,635
INNER CIRCLE--------------------
Barry Larkin $145,872,934
Ozzie Smith $129,998,781
Jeff Bagwell $128,092,272
Gary Sheffield $123,033,450
Bill Dahlen $119,772,919
Ed Delahanty $117,382,556
George Davis $113,291,040
Tim Raines $111,020,960
Eddie Mathews $110,901,266
Billy Hamilton $110,692,353
Tony Gwynn $110,436,526
Johnny Mize $104,493,744
Roberto Clemente $102,761,223
Paul Waner $101,946,007
Jackie Robinson $98,237,421 (no Negro League credit)
Fred Clarke $97,367,980
Larry Walker $96,967,219
Pee Wee Reese $96,496,818
Pete Rose $96,047,274
Dave Concepción $94,666,913
Frankie Frisch $92,862,321
Jim Edmonds $91,590,321
Hughie Jennings $91,588,914
Dick Allen $90,960,121
Ron Santo $90,817,848
Jesse Burkett $90,105,659
Chipper Jones $89,464,583
John McGraw $89,410,662
Sammy Sosa $87,128,960
Ernie Banks $86,323,002
Darrell Evans $84,203,505
Scott Rolen $83,447,841
Joe Kelley $82,819,556
Duke Snider $82,220,482
Billy Williams $81,748,209
Jimmy Sheckard $81,421,173
Reggie Smith $80,330,007
Ron Cey $79,950,646
Heinie Groh $79,794,093
Willie Stargell $79,323,100
Willie McCovey $78,779,130
Ryne Sandberg $78,449,070
Albert Pujols $78,360,169 (no 2006!)
Enos Slaughter $77,129,289
Vladimir Guerrero $76,795,704
TOP OF BORDERLINE---------------
Will Clark $75,655,086
Cupid Childs $75,028,557
Sherry Magee $74,905,275
Brian Giles $74,785,122
Zack Wheat $73,823,975
Billy Herman $73,634,666
Dale Murphy $73,023,171
Willie Keeler $72,632,593
Max Carey $72,202,231
Luis González $71,945,402
Jim Wynn $71,473,562
Jeff Kent $71,029,423
BOTTOM OF BORDERLINE------------
Pedro Guerrero $69,945,660
Craig Biggio $69,804,540
Dave Bancroft $69,791,255
Fred McGriff $68,297,428
Jack Clark $68,281,910
George Foster $67,141,524
Joe Tinker $67,016,210
Stan Hack $66,777,214
Keith Hernández $66,654,616
Andre Dawson $66,071,050
Art Fletcher $66,009,607
Bobby Bonds $65,228,399
Kiki Cuyler $65,179,605
Eric Davis $64,153,929
Rabbit Maranville $63,869,749
Edd Roush $63,190,215
José Cruz Sr. $63,170,898
Joe Medwick $63,124,835
Ken Boyer $63,070,276
Ken Caminiti $62,564,785
Chuck Klein $60,807,729
Cesar Cedeño $60,623,540
Ralph Kiner $60,580,538
Jake Beckley $60,408,747
Bobby Abreu $60,231,369
Richie Ashburn $60,024,942
Tommy Leach $59,731,485
George Burns $59,294,919
George Van Haltren $56,923,002
Hugh Duffy $56,204,790
Bob Elliott $54,701,989
Rusty Staub $54,127,458
Frank Chance $51,290,002
Tony Pérez $51,120,246
Pie Traynor $49,824,287
Bill Terry $47,695,966
Gavvy Cravath $44,893,823 (no minor league credit)
Orlando Cepeda $40,369,597
Bill Mazeroski $37,847,348
Larry Doyle $36,011,518
Guys who I think should clearly be in the HoM and are not:
Dave Concepción
John McGraw
Reggie Smith
Ron Cey
Guys who I think are HoM mistakes:
Stan Hack (particularly since my system doesn't penalize him for wartime competition)
Joe Medwick
Ken Boyer
Ralph Kiner
Richie Ashburn
Of the borderliners, my instinct is to pick the old-time guys (Childs, Magee, Wheat, Herman, and Carey, probably not Keeler) and leave out the more recent ones (W. Clark, Murphy, L. González, Wynn, Kent). B. Giles is clearly in for me given that he hasn't retired yet.
For ease of comparison of Dan's list to the current HoM roster:
By my count, we have actually elected 47 players from among the group from which this list is drawn: National League players who played the bulk of their careers after 1893 and who were neither pitchers nor catchers.
The 48th eligible or elected player on Dan's list is Dave Bancroft. So assuming that our total of 48 elected from this pool is right, then Dan's system finds that we should have elected
Concepcion, McGraw, Smith, Cey, Wynn, and Bancroft
in place of
Hack, Medwick, Boyer, Kiner, Ashburn, and Terry.
Some of the differences between the lists could be artifacts of the election schedule and not "mistakes" -- i.e. we elected the best player available at the time, but vagaries in the supply of talent let one in but kept another out. Others are surely genuine disagreements about value. We've had plenty of chances to elect McGraw and Bancroft but we haven't.
Did you give Kiner and war credit, Dan?
Well, one theory is that Terry benefited from being a Shiny New Toy in a weak year. The fact that Terry slipped in so easily has caused the electorate to be more cautious with borderline new candidates.
For ease of comparison of Dan's list to the current HoM roster:
By my count, we have actually elected 48 players from among the group from which this list is drawn: National League players who played the bulk of their careers after 1893 and who were neither pitchers nor catchers.
The 48th eligible or elected player on Dan's list is Dave Bancroft. So assuming that our total of 48 elected from this pool is right, then Dan's system finds that we should have elected
Concepcion, McGraw, Smith, Cey, Wynn, and Bancroft
in place of
Hack, Medwick, Boyer, Kiner, Ashburn, and Terry.
Some of the differences between the lists could be artifacts of the election schedule and not "mistakes" -- i.e. we elected the best player available at the time, but vagaries in the supply of talent let one in but kept another out. Others are surely genuine disagreements about value. We've had plenty of chances to elect McGraw and Bancroft but we haven't.
Did you give Kiner any war credit, Dan?
By position
Using the groupings Dan provided I made a chart of which positions were represented in the chart and where. I assigned the positions myself, which can be arbitrary as you know. Leach is a 3B and Rose is 2B, etc etc etc....
1B 2B 3B SS LF CF RF TOTAL
-------------------------------------------
INNER CIRCLE 0 2 1 2 2 1 3 11
HOMERS 5 4 8 8 8 3 9 45
UPPER BORDER 1 3 0 0 3 3 2 12
LOWER BORDER 9 3 6 4 5 7 6 40
===========================================
TOTAL 15 12 15 14 18 14 20 108
I'll be happy to answer any questions about who is at what position, if anyone wants to know.
By decade
Granted decades are arbitrary endpoints and it's often tough to know exactly which decade to put somebody in but I forged ahead. Again, the groupings are Dan's but the decade assignments are mine.
1890s 1900s 1910s 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s TOTAL
--------------------------------------------------------------------------------------------
INNER CIRCLE 0 1 0 1 2 1 0 3 1 1 0 1 11
HOMERS 8 2 1 1 1 3 4 6 5 4 5 5 45
UPPER BORDER 2 1 2 0 1 0 0 1 0 1 1 3 10
LOWER BORDER 3 3 6 3 4 1 2 3 5 6 3 1 40
============================================================================================
TOTAL 13 7 9 5 8 5 6 13 11 12 9 10 108
In this chart in particular, it's worth noting that the blurring of a player's career between decades may make some gaps look bigger than they may actually be. I'll happily elaborate on who is in what group if anyone would find that information helpful.
Anyway, like I said, I'm not offering any judgment, but I thought this might provide the group with some interesting information.
My McGraw pick has everything to do with the salary estimator and very little to do with the WARP system. McGraw played at such a high rate that even after adjusting for the high standard deviation of the time period, he looks like a dominant player--I have his 1899 as the 5th most valuable season between 1893 and 1946 (after Wagner '07-'08, Hornsby '24, and Jennings '96). Because the salary estimator rewards rate exponentially (versus playing time linearly), extremely high peak rate seasons are counted as being, well, extremely valuable--this is by design. But fully 31% of McGraw's value, in my book, is the 1899 season--a guy who had three of those and nothing else would be a Hall of Meriter in my book. If you look at career, McGraw's 43.6 WARP2 aren't much of a case (Jack Clark has over 47), and his best five seasons (31.4 WARP2) don't really stand out either (Pedro Guerrero has 31.3), and nobody on this board thinks Pedro Guerrero's peak with Jack Clark's career is a HoM'er. It's just because my salary estimator values rate so highly (which is how I like it) that McGraw comes out so well.
I've posted the data precisely so that people can dig into it and use it in their own systems and draw their own conclusions. This is just the raw information; how you value peak vs. career, and rate vs. durability, is up to you.
Anyway, like I said, I'm not offering any judgment, but I thought this might provide the group with some interesting information.
Consider, Doc Chaleeko, that the NL represents a varying proportion of the total HoM eligible players over time. For example, we'd expect a dramatic dropoff from the 1890's to the 1900's because of the rise of the AL and a concurrent dilution of talent. We'd expect a rise from the 50's through the 60's as African-American players enter the MLB game in large numbers.
I think when this is taken into consideration, the decade-breakdown looks nearly perfect.
These numbers definitely include my best estimates of AL seasons (and seasons played at catcher, for Frank Chance, and pitcher, for George Van Haltren).
Mark Shirk, that information is in the spreadsheet. But I'll post it here.
RC+: Runs produced per out, relative to league average.
SFrac: Percentage of season played, compared to the league average plate appearances per lineup
spot.
W1AA: Wins above average.
WARP1: Wins above a replacement player at the same position.
LeagueAdj: Ratio of the league's projected standard deviation to the 2005 NL standard deviation.
W2AA: Wins above average, adjusted for standard deviation.
RepW/Yr: Wins below average of a replacement player at the same position per full season, adjusted for standard deviation.
WARP2: Wins above a replacement player at the same position, adjusted for standard deviation.
WARP2/Yr: Wins above a replacement player at the same position, adjusted for standard deviation, projected to a full season.
PennAdd: Pennants Added.
Market Salary: How much the 2005 market would have paid for that performance.
Medwick
Year RC+ FRAA/Yr SFrac W1AA WARP1 LeagueAdj W2AA RepW/Yr WARP2 WARP2/Yr PennAdd Salary
1937 205 -1 1.04 7.7 8.3 .847 6.5 -0.5 7.0 6.7 .100 $12,767,893
1936 169 4 1.02 5.8 6.3 .836 4.9 -0.4 5.2 5.1 .071 $7,848,027
1935 167 4 1.02 5.6 6.1 .808 4.5 -0.4 5.0 4.8 .066 $7,105,488
1941 154 9 0.88 4.5 5.1 .833 3.8 -0.6 4.3 4.8 .056 $6,127,671
1938 153 3 1.01 4.5 5.0 .828 3.7 -0.5 4.2 4.1 .054 $5,359,686
THREE YEAR TOTAL 19.1 20.7 15.9 17.2 .237 $27,721,408
FIVE YEAR TOTAL 28.1 30.8 23.3 25.6 .347 $39,208,765
Kiner
Year RC+ FRAA/Yr SFrac W1AA WARP1 LeagueAdj W2AA RepW/Yr WARP2 WARP2/Yr PennAdd Salary
1951 219 -9 1.10 6.9 7.4 .943 6.5 -0.5 7.0 6.4 .100 $12,310,451
1949 211 -9 1.10 6.8 7.1 .937 6.4 -0.2 6.6 6.0 .093 $11,132,974
1947 193 -2 1.10 6.6 6.9 .939 6.2 -0.3 6.5 5.8 .090 $10,637,409
1948 159 11 1.12 5.8 6.2 .950 5.5 -0.3 5.9 5.3 .081 $8,976,516
1950 172 -11 1.12 4.3 4.7 .922 3.9 -0.4 4.4 3.9 .057 $5,343,011
THREE YEAR TOTAL 20.3 21.4 19.1 20.1 .283 $34,080,833
FIVE YEAR TOTAL 30.4 32.3 28.5 30.3 .422 $48,400,361
DavidFoss, do you mean *complete* AL data (like this), or just estimates for guys on HoM ballots? The latter I hope to have by the 1995 election. The former is probably six months away--like I said, I do this by hand. If anyone's got a faster way (particularly to input WS and BP FRAA data into Excel), I'd love to hear it.
Right, Dan, and that's what I was trying to say above. That the blurring of decades makes it tough to draw particular conclusions about fairness to eras unless there's a really obvious skew somewhere.
You could look at # of HoM'ers playing in each year...
Yeah, complete data like this... OK, I understand these things are not automatic.
I've heard of programs that will scan a website like BP and dump data from pages into tables, specifically people do that in mid-season sometimes and it would also be useful for BP's frequent "updates" of their WARP data. I don't know how to do it though.
I do know that if you store the data in csv format that its quite a bit smaller and zips up nicer. I got the entire first sheet of your XLS file into a zip file that is only 218 KB.
Didn't think to stick in in .csv, I guess I'll redo that now for easier downloading.
ftp://ftp.baseballgraphs.com/winshares/
Its "WS only" with the common "playerid" labels for identifying players (e.g. "killeha01" for killebrew) instead of the mechanism that you have.
Wow, that CSV file is a big help for me! I'm glad I helped you look. :-)
I've made some progress!
Turns out there is a program on linux called "wget" which will copy a webpage to your local machine. A free windows version is easy to find with a google search (I got the one here: here).
You can feed wget an input file with a list of urls. Well, since baseballprospectus urls are simply www.baseballprospectus.com/dt/playerid.php then you can create a list of 16000+ urls containing all the unique playerids in the baseballgraphs csv file. It took about a half an hour at work (might take longer at home) but I dumped all the BP webpages onto my local machine. Its about 300 MB. There were a couple of dozen pages that couldn't be found due to name disagreements (e.g. gwynnto02/gwynnan01) but that's a great success rate. With a good HTML parser then you can automatically parse those pages into csv-style data.
Turns out my work has great in-house software for this. I'll see what I can do. (Though I'm supposed to be working :-)). There may be great freeware out there for that too, for those that want to try at home.
I think I might use VORP+FRAA as a foundation for running a hitter type spreadsheet similar to my pitching one. I realize VORP only goes back to 1959 on the website (according to Neyer's article), I'll have to come up with something else pre-1959, but I'll let you guys know how it works out . . . any deficiences I should be made aware of before starting?
Dan, I think you explained to me that your WARPs are based on freely available talent (as originally defined and studied by Nate Silver, IIRC), whereas Woolner's VORP measures against backups, not FAT. What's the advantage of one over the other?
2. Use Nate Silver’s 2005 salary estimator ($212,730*WARP^2 + $402,530*WARP) to find out how much the player would have earned on the 2005 market had he played a full season. Convert all negative numbers to $0.
Why wouldn't zeroes be equivalent to the minimum major league salary? The minimum salary is paid to players regardless of performance. Alternatively, how about the AAA minimum since, presumably, the parent club bears some of the costs associated with a player it has farmed out?
Three reasons that I can think of right away.
1) The regression on which the formula is based does not produce that result, or
2) Silver acknowledges that WARP's replacement level is too low and accounts for it, or
3) A constant term that adds the MLB minimum salary to the formula has gotten lost somewhere
What's the MLB minimum as of 2005 anyway? 300K or has it gone up?
A WARP of 1.0 should get 615K as a full time player according to the estimator.
A WARP of 0.6 should get 318K as a full time player according to the estimator.
Its 11.8 MB (3.6 MB zipped). Where should I put it? Dan, what is your email address?
Good point about the league minimum salary, although I can't imagine it would change much. I'm just using Nate's equation (no, he doesn't pay me to do his P.R.) found at http://www.baseballprospectus.com/article.php?articleid=4535. The reason why there's no minimum salary in the formula is definitely not jimd's reason #2, because Silver is using the WARP found on the PECOTA cards, which has a realistic replacement level, not the crazy-low one on the DT cards. #3 seems to me the most likely. I could just email Nate and ask if people actually think it's important.
DavidFoss, you are a god...please send it to cooberp@gmail.com. Thanks a zillion.
Yes, I do account for value at different positions, AND the fact that replacement levels vary over time, AND that different leagues were easier to dominate. Also, I agree with Dan that variation is one reasonable proxy to use for ease of domination. Your league-wide regression, Dan, seems very well thought out, good use of dummy variables, and if I were to strap myself to a purely mathematical model I might choose yours over any other I’ve seen.
But. Let me tackle the topics of Replacement Level and Variation
Let’s say all MLB hitters were represented on a scale for 0 to 10.
In one league of 8 shortstops, 7 starting SS were hitters of value 0, 1, 2, 3, 4, 5, and 6 (ignore defense for now).
Dave Concepcion is a 6. The three worst guys were 0, 1, and 2, so replacement level = 1 (avg of those 3, using Dan’s method), and Davey would be +5.
But if a MLB manager had decided that his rotten-hitting SS (0) and mediocre-hitting (3) 2Bman should swap positions for defensive purposes, the 3 lowest shortstops would be 1, 2, 3, average of 2, so Davey is only +4.
I don’t think in scenario two Concepcion is, in the big picture, much less valuable. There is some fungibility among players. So, I prefer to take a bit broader brush of position-replacement-level, which puts me somewhere in the middle of the “value vs. ability” debate. I was an advocate of Joe Sewell, partly because shortstops hit so poorly when he played that he was valuable, but I also understand that maybe he was fortunate to come along at that time and place in history.
Variation:
I am not in favor of using the term “.932 * Win% of worst team in league” to measure league variation (ease of dominance). This has been discussed when the Dynasties book came out, and the authors used the variation in team runs scored and runs allowed to determine how many standard deviations one team was above the rest. While in a general sense this is a good idea, there are many strong reasons why this ain’t such a hot idea to strap yourself to. I’d rather not spend another 1000 words on this now.
So, if the 1970s guys played in a time when the worst team wasn’t TOO bad, I’ll account for that somewhat. I am one who places such guys as Joe Morgan, Mike Schmidt, and Tom Seaver higher on all-time lists than most other baseball nuts who rnak such things. But I’m skeptical that this should be what seems to be a large boost; bottom line is, Dave C does not make my top 30 if I use Win Shares or WARP3 or OWP/RCAA/RCAP + defense. However, I am Very open to analyzing the reasons why he comes out ranked much higher by Dan’s methods. A very hearty welcome, Dan, and I look forward to more of your work and full-throttle debates.
Thanks very much for your thoughtful comments. Responses:
1. I don't think your hypothetical is realistic. Let's say the SS is 2.5 wins below average with the bat, and the 2B is 1 win below average with the bat, and both are league-average fielders at their positions. Why would the manager switch them?
I know I sound like a walking Nate Silver ad, but he wrote, "I suspect that major league shortstops, as a group, have quite a bit more general athletic ability than major league second basemen. You could put Miguel Tejada at second base if you wanted to, and he’d still outhit pretty much everyone at the position, but there’s no reason to since his defense is more valuable at short. You couldn’t put Jeff Kent at shortstop, however, without your pitching staff chipping in on a bounty against you."
So the SS might pick up 5 runs in the field, but the 2B would lose 15. Thus, you'd have the same offense but give up 10 more runs on defense. Why would a manager do that?
2. If I remove win% of worst team from the regression, I still get .5769 r^2, so it's having an effect of less than 1%. I could easily take it out with no consequences for the results. The late 70s and 80s had low standard deviations not because there were few historically awful teams, but because they were low-scoring integrated leagues many years removed from expansion in an era when the player population per team was quite high. The only reason I put in win % of worst team was because I thought 1899 guys like McGraw just got to beat up on the Spiders all day long. But the effect wound up being almost nonexistent.
3. Of course Win Shares won't like Concepción; it has no replacement level. WARP3 loves him, I'm surprised he wouldn't rank high on that basis. I don't know how RCAP + defense is calculated, but if it compares Concepción to the replacement level shortstops *of his day*, he should do fairly well. I'm sure it doesn't take standard deviations into account, though, which further strengthen his case.
What did you mean by "Of course Win Shares won't like Concepción; it has no replacement level"? WARP has a pretty low replacement level too.
In my first example, some guys play 2B or 3B who could play MLB shortstop, but already have a SS on their team (see: A Rod). Or, how about this: let’s say in 2007 the Brewers choose between playing Bill Hall in CF or at SS. If he plays SS and cranks out 35 HR while some rookie plays a poor CF, or if he plays CF and JJ Hardy has a poor year at short, does this really impact Jose Reyes’ value?
What is Concepcion's best year by your WARP score, Dan?
I determine replacement level by averaging 27 different player-seasons for each league-season-position. I'm assuming that situations like the hypothetical Bill Hall one you're describing cancel themselves out in a 27-player sample.
I have Concepción's best year as 1981 (straight line adjusting for season length). His 1974 is right behind.
Because I initially started posting this research in support of Concepción's candidacy (and now because of my handle), people only seem to be discussing this work in regards to him. There are 8,350 player-seasons in here! There are plenty of guys to pay attention to...Ron Cey and Reggie Smith also look like HoM'ers to me and will be on my ballot. Can we talk about them too? :)
Here are the charts, sorted from best season to worst. (Note that tweaks to the stdev regression have caused most career salaries to increase by about $5 million; changes in ranking are insignificant. And the AL data is necessarily less accurate because I'm using NL replacement levels (adjusted for the DH where necessary, as in Smith's 1973) and average Fielding Win Shares at each position).
Glossary
BWAA/Yr: Batting wins above league average per full season played.
BRWAA/Yr: Baserunning wins above league average per full season played.
FWAA/Yr: Fielding wins above positional average per full season played.
Rep: Wins above league average per full season played of a replacement player at the position.
WARP1/Yr: Wins above a replacement player at the position per full season played. This should theoretically equal BWAA/Yr + BRWAA/Yr + FWAA/Yr + Rep, but often differs by a few tenths of a win due to Pythagorean effects.
Se;as: Percentage of the season played, as compared to the league average plate appearances per lineup slot.
WARP1: Wins above a replacement player at the position in the given season (WARP1/Yr * Se;as).
LgAdj: Ratio of the 2005 NL projected standard deviation to the projected standard deviation of the season in question.
WARP2: Wins above a replacement player at the position in the given season, adjusted for the standard deviation (“ease of domination”) of the league.
WARP2/Yr: Wins above a replacement player at the position per full season, adjusted for the standard deviation (“ease of domination”) of the league.
PennAdd: Pennants added, adjusted for the standard deviation.
MarketSal: How much the 2005 market would have paid for the player’s performance.
Reggie Smith
Willie Stargell
On Dave C - Wi Shares also sees 1974 as his best year. I suppose a peak/prime voters answer to "why not Concepcion" might be that if in your best season you had a 106 OPS+, while making 30 errors & finishing just above average in range factor, and your team didn't win the division for the only time in a 5-year period, how great were you?
Reggie Smith
Willie Stargell
Dammit there's still a typo on Smith's chart--he averaged 0.2 fielding wins above average per full seaosn for his career, not 0.5.
Anyways, to analyze these charts, briefly--Stargell was clearly the superior hitter as everyone knows (60.3 batting wins above average, versus 45.9 for Smith). But the gap in their defensive value is greater. While Smith played half his career in center, half in the corners, Stargell played 40% of his career at first base, in a very strong era for first base depth (Guillermo "Willie" Montañez was consistently among the very worst-hitting first basemen in the NL with a 105 OPS+), and the rest in the corners. Thus, the players that would have replaced Stargell had he not been playing were, on average, just 0.4 wins below overall league average, while Smith's replacements averaged 1.0 wins below league average. Moreover, Smith fielded his positions at a slightly above-average rate (2.3 fielding wins above average), while Stargell was consistently bad and occasionally a butcher (7.3 fielding wins below average).
Smith's advantages on defense exactly negate Stargell's on offense for their careers; both have between 60 and 61 WARP1. But Smith debuted four years later than Stargell, and thus contributed more of his value in the very low standard deviation era of the late 1970s, while Stargell had more value in the higher-standard deviation 1960s, closer to expansion years. Thus, Stargell's 60.2 WARP would only be worth 57.7 in the 2005 NL, while Smith's would be worth 59.6.
On a peak basis, Smith didn't quite have a year to measure up to Pops's '71 and '73 (although his '77 was close), but had more All-Star caliber seasons overall. The salary estimator is extremely peak-oriented, and it prefers Smith's. Your mileage may vary.
All in all, I think they are extremely close in value, and given the greater uncertainty in defensive numbers, I'd probably still take Stargell. But I think both are deserving HoM'ers.
The baseball reference park factors, I believe are already multiple year factors - there isn't enough variance in them for this not to be true.
So you might be over-correcting there. You should be able to use the BR factors right out of the box, so to speak. Also those factors (I think) account for situations where the park changed significantly, etc..
Also, not sure if you accounted for this twice somewhere else, but those factors already account for not facing your teammates. So you don't need to adjust for that anywhere if you are using them.
**********
This appears very promising - I can't wait to jump into it further.
I would much prefer using something like Pennants Added (another BPro brainchild) for giving weight to a high peak.
Joe Dimino, is there any way to find out baseball-reference.com's park factor methodology? And yes, I agree that there is one MLB replacement level, not two league-specific ones--I just have to be careful to account for the DH! (That's why Reggie Smith's 1973 replacement level is so much lower than his earlier ones).
Bobby Bonds
Jimmy Wynn
Bob Johnson
I think there's still an issue with the baseball-reference.com park factors being 3 year factors thru 1997, then switching to 1-year factors for 1998-2006.
The calculation page is here:
Baseball Reference Park Factor Calculation
Yes. HOM Reggie Smith Thread
Bob Johnson, done right this time...
<pre>
Year BWAA/Yr BRWAA/Yr FWAA/Yr Rep WARP1/Yr SFrac WARP1 LgAdj WARP2 WARP2/Yr PennAdd Salary
1944 6.9 0.0 -0.1 -0.4 7.3 .96 7.0 .845 5.9 6.2 .082 $10,197,806
1939 5.6 0.3 -0.5 -0.5 6.1 .98 6.0 .813 4.9 5.0 .065 $7,100,719
1937 4.9 0.2 0.7 -0.4 6.4 .86 5.5 .818 4.5 5.2 .059 $6,776,422
1934 3.8 0.3 1.0 -0.4 5.6 .91 5.1 .837 4.3 4.7 .056 $5,941,892
1938 4.1 0.2 -0.2 -0.9 5.2 .99 5.2 .822 4.3 4.3 .056 $5,612,566
1941 3.3 0.0 0.3 -0.6 4.3 .97 4.2 .861 3.6 3.7 .046 $4,261,108
1942 3.8 0.0 -0.2 -0.5 4.2 .98 4.1 .866 3.6 3.7 .046 $4,228,396
1943 3.1 0.0 1.1 -0.5 4.8 .78 3.8 .831 3.1 4.0 .039 $3,920,868
1936 2.7 0.1 0.3 -0.4 3.7 .97 3.6 .813 2.9 3.0 .036 $2,992,086
1940 3.1 0.2 -0.4 -0.6 3.6 .90 3.2 .838 2.7 3.0 .034 $2,818,820
1935 3.2 0.0 -0.4 -0.4 3.3 1.00 3.3 .843 2.8 2.8 .035 $2,775,427
1945 2.5 0.0 0.3 -0.5 3.4 .93 3.2 .833 2.6 2.8 .032 $2,639,446
1933 3.5 0.2 -1.3 -0.4 2.8 .95 2.7 .849 2.3 2.4 .028 $2,066,036
TOTAL 3.9 0.1 0.0 -0.5 4.7 12.18 56.7 .835 47.4 3.9 .612 $61,331,593
<pre>
Bob Johnson
Did you say above that being in the DH league helped or hurt Reggie Smith? Also, while I understand that you are trying to factor in everything, I am not sure I would factor in the DH. Being in a DH league isn't something a player can control and I am not sure I would penalize/reward a player for that. However, I realize that, much like Joe's recommending that you use PA instead of the salary estimator, that comes down to personal taste and how I try to balance value vs. ability.
Would using an MLB wide replacement level help to even out the dips in a league wide replacement level? I realize that you don't yet have the data for an MLB wide replacement level, but I was wondering is some players, hate to mention him again but Concepcion, aren't benefitting a little because of this.
-a few good Campy years
-the bad part of Yount's career
-a couple years of Toby Harrah's misadventure at the position
-Rick Burleson
-most of Roy Smalley's good years
-Alan Trammell's bad years.
I guess that's probably better than Concepcion's league, which included Larry Bowa, Don Kessinger, and the 9 dwarves.
But this raises the following questions for me:
1) Isn't it better to compare a guy to all MLB players at his position (adjuted for run differing contexts), since all MLB teams have an (theoretical) equality of opportunity in competing for amateur talent?
2) Isn't it better to compare a guy only to his own league since that more directly establishes the value of him to his own team?
3) Isn't it better to compare a guy to all MLB players at his position since all MLB teams compete for the same World Series?
4) Isn't it better to compare a guy only to players in his own league up til around 1975-1990 since the leagues had different identies, and players tended to remain in one league until free agency came around (due to the rule which forced teams to take waivers on any cross-league trade during any part of the season, not just after July 31st as it is today)?
5) Isn't this a maddening vortex of supposition?
I will definitely use an MLB-wide rep level...once I finish the AL. That said, I really don't think anything will budge by more than 0.2 wins as a result, or 0.3 max. (to move by 0.3, the 27 worst regulars at a position in the AL would have to be on average over 6 runs better or worse per season than the 27 worst regulars at the same position in the NL, which would be pretty extreme). For what it's worth, I do have data for AL shortstops from 1960-2005 and the AL line almost exactly traces the NL one.
Chris Speier, Gary Templeton and a young no-hit-yet Ozzie were his big competition in the NL.
The AL had better shortstops (by WS anyways). Burleson had a three-year fielding peak that (by FWS) no one else in this era can match. Harrah may have been an embarrassment with the glove, but the guy could rake.
> Johnson's career was equally valuable to Smith's, but in an era that was *much* easier to dominate as
> measured by standard deviations
I'd say your method automatically determines a war discount by accounting for the larger standard deviation in the weaker leagues. In that case I think it is essential to actually give the minor league credit when figuring out Bob Johnson. From your data I'd estimate another .035 to .070 pennants added, which gets him right back in the picture with Bobby Bonds and Jimmy Wynn. I believe the AL was also considered the "stronger" league in the time of Bob Johnson's career. I'm curious if comparing him to the bottom MLB player would make him look better or worse by this method.
DL from MN, apparently not--take a look at the LgAdj numbers for Johnson, they're no different for the war years than for the 1930s. I do include a wartime dummy variable in my regression, but its effect is countered by the extremely low run scoring of WWII-era play, compared to the offensive bonanza of the 1930's AL. I don't know what the real actual stdev was for the AL during the war, but for the NL only 1944 had an exceedingly high stdev--1943 was actually lower than '41 and '42 (and the 1930's average), and '45 was slightly high but just the same as 1940. I can't stress enough that the stdev adjustment is *not* a competition or quality of play adjustment, whatever the late Stephen Jay Gould might have you believe to the contrary.
I find this very intriguing, but part of me is loathe to accept that a weak bottom level of SS's (or a strong bottom level of 1B) shoudl really effective how a player is viewed in a HOM context. As a GM or someone trying to critique salary disbursement, it is extremely valuable. But we are trying to figure out the best players of all time across eras, not necessarily within one. Again, this comes down to how one weighs value vs. ability. Most of us try and find a balance and Dan's work is all about value. Again, not saying it isn't useful, it is very useful and very helpful.
One more question,
You say that McCovey and Stargell are overrated because of the high replcement level of 1B during their careers, when does this replacement level begin to drop (assuming that it does)? Does the advent of the DH and about 12 more jobs for fringe 1B/corner OF effect 1B replcement level? If it did, I wouldn't exactly kow what that means, I was just wondering if there is a connection between the two.
Take a guy who is two wins above league average, three above positional average, and five above positional replacement. Now transfer him to a league where there are some huge superstars at his position and nothing else (like SS in the early 80s or late 90s AL). The positional average will have moved up slightly due to the superstars (let's say, to 0.5 wins below league average), while the replacement level will have dropped (to, say, 4 wins below league average, which is now 3.5 below positional average). Where will our player turn up? If he's "anchored" to the league average, two wins above league average is two wins above league average, and he will be 2.5 wins above positional average ("worse") and 6 wins above positional replacement ("better"). If he's "anchored" to the positional average, he'll remain three wins above positional average, so he "improves" relative to the league average (now 2.5 above) and the positional replacement level (again, 6 above). And if he's "anchored" to the positional replacement level, he'll decline relative to the league average (just 1 above) and the positional average (1.5 above).
We don't know what the right answer to this question is, and I don't know if we can. But the implicit hypothesis or assumption behind my work is that the latter is true, that players are "anchored" to their positional replacement level. I'm saying that if Dave Concepción, a six- or seven-WARP player at his peak, had played in the 1950's NL, with a much higher rep level for SS and a higher standard deviation, he would have looked a lot like Ernie Banks. Your mileage may vary.
My mileage does vary, Dan :) I think that Concepcion might have looked like Ernie Banks if Davey had played in Coors Field in 2000, soaking wet, with the wind blowing out and the ball juiced and the umpires on the take.
Otherwise, it's a real hard sell. The concept you are working with is like saying that the shortstop position in a given few years' span is like a "league"; the best player by far (above the bottom) in that "league" is then equal to the best player by far (above the bottom) in another era's "league" (shortstops of the 1950s, e.g.), so that Concepcion = Banks. Ultimately I can't see how that's much different from saying that the men's hoops MVP of the Little East conference should be as good an NBA draft prospect as the MVP of the Big East. Choosing a Hall of Whatever is different than judging the local market in shortstops in 1974. Essentially it's like "drafting" players onto a higher plane (and the vagaries of immediate supply don't much matter, because we have all eternity to draft them in). I think your work says an immense amount, don't get me wrong, but I am skeptical about its ability to compare players well for Hall purposes.
The same is true for other debates. For instace, take park factors. There are generally two ways to adjust for park, one is to adjust for overall runs coring environment and the other is to adjsut for reach component. If you are a GM, then you want to know if a certain player's style of play will be effective in your park, but when doing a HOM like comparison I see no reason to degrade someone like say, Elston Howard or Mel Ott, because they were able to take special advantage of their park over their careers.
My point is that different types of player analysis are useful for different functions. While the 'less useful' version (for lack of a better term) should still be a part of the debate, it may not be ideally suited for the task at hand.
And I hope that you do not feel that I am brushing aside your work here, Dan. That is not my aim.
One reason is because "ease-of-excellence" at a position may be contextual. For example, I think that Dan's data strongly indicates that the SS drought of the 70's-80's (really, a SS-2B-3B drought for part of that period) was self-selected by MLB conventional wisdom; during that period larger shortstops who were capable of hitting at the levels seen in the 50's and 90's were moved off the position.
Through the hindsight of the modern, I can look at that era and judge, "these teams were screwing themselves with these tiny, good-field-no-hit shortstops." And I grant that's a possibility, in which case perhaps we should dock Concepcion for his inferior competition. But I'm also a big believer in the efficiency of the market, and I suspect that over 10 years if a big shortstop was SUCH a big advantage some wily GM or manager would have picked up on the opportunity and exploited it.
Therefore, we have to consider the possibility that there was something inherently difficult about SS defense during the no-hit replacement era that made it a more difficult defensive position compared to other eras, and therefore limited the offensive value of those who played it. Maybe it's much harder to play SS on turf if you're a big guy, and the widespread turf fields forced larger players to the OF and 3B. If there's a "reason" why SS replacement level dropped during Dave Concepcion's era, then Concepcion should be fully rewarded for playing SS as well as it could be played in the conditions of his time. As the HoM constitution says: All eras should be treated equally.
It could be used to come up with 3-year factors for 1998-2006, for example . . .
Just send me an email if you want it . . .
I think this is a very large part of it.
Turf and huge OFs are also a big reason why CF defense became much tougher with the cookie-cutter ballpark designs starting in the mid-1960s . . .
I think turf is a very good hypothesis. But I think the key point is that it's almost certainly something Dan's replacement level time-series does not look stochastic.
You must be Registered and Logged In to post comments.
<< Back to main