Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Hall of Merit > Discussion
Hall of Merit
— A Look at Baseball's All-Time Best

Monday, February 05, 2007

Dan Rosenheck’s WARP Data

WARP Methodology and Results

Thanks, Dan!

EDIT: Link updated 2/23/2009

John (You Can Call Me Grandma) Murphy Posted: February 05, 2007 at 08:59 PM | 763 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 5 of 8 pages  < 1 2 3 4 5 6 7 8 > 
   401. JPWF13 Posted: September 13, 2007 at 09:38 PM (#2523527)
How does Alfredo Griffin show up? Wasn't he a disaster on the basepaths


I thought baserunning was his only good attribute- he was insanely aggressive- but git away with it enough to justify it

Break even for SB-Cs is around 65-70% if you are successful 65-70% in trying for the extra base (in situations where the typical runner wouldn't) then you are breaking even, if you are successful a greater % of the time then you're ahead
   402. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 13, 2007 at 10:15 PM (#2523558)
Well, I have real data for Griffin, and he was +10 EqBR for his career. If I run the estimator on him, I get +5.4 for his career. Your basic point is correct, of course--some guys may fit the profile of good runners but actually be poor ones (the poster boy for this is Harold Reynolds in 1987: second baseman, 60 steals, 8 triples, -6.2 EqBR.) My r-squared on EqBR is only 21%. But what that means is that we are further from the truth assuming that all players before 1972 were exactly equal non-SB baserunners than we are if we make the best estimate we can of their non-SB baserunning ability, based on the stats available to us.
   403. Dandy Little Glove Man Posted: September 13, 2007 at 10:28 PM (#2523567)
Dan,

I'm a little confused by the new projected stdevs in the NL. Is something wrong or do these numbers reflect the current methodology? I separated the post-1900 NL into 15-year blocks and compared Actual StDev to Projected StDev. Here are the results:

Seasons Act Proj
1901-15 2.83 2.87
1916-30 2.84 2.90
1931-45 2.86 2.92
1946-60 2.89 2.86
1961-75 2.97 2.93
1976-90 2.76 2.89
1991-05 2.90 2.84

The 1976-90 block easily has the lowest actual standard deviation, but it is in the middle of the pack for the projections. Thirteen of the fifteen years are lower than projected, and the chief variables that I thought you discussed in your previous projections -- expansion, run-scoring, and integration -- all suggest low stdevs. The 1991-05 block has the second highest actual stdev (.14 higher than 1976-90), but it projects as the lowest. This block also includes two periods of expansion and an extremely high run environment. I read that you limited the timeframe of expansion effects, but this doesn't make intuitive sense to me.

Perhaps you raised those late 70s and 80s projected stdevs because you're concerned about the reaction if Smith and Concepcion move even higher in your rankings? ;) Once again, thanks for putting all of this together.
   404. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 13, 2007 at 11:40 PM (#2523605)
Dandy Little Glove Man, thanks so much for your careful attention to my research.

The most obvious explanation for such a miss is a star drought--that the league actually wasn't so difficult to dominate, but the players failed to take advantage of it nonetheless. That appears to be the case here. Here are the leaders in WARP1 for the 1976-90 period:

Schmidt (NL), 96
Yount (AL), 82
Brett (AL), 81
Trammell (AL), 76
Henderson (AL), 74
Ripken (AL), 68
Smith (NL), 65
Murray (AL), 65
Boggs (AL), 61
Dw. Evans (AL), 60
Winfield (split, but more AL) 59

Of the top 11 players in the time period you mention, nine played primarily in the AL. Since it is the stars that drive the standard deviation, it would be quite strange if the NL *didn't* register lower stdevs than the regression would project.
   405. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 14, 2007 at 12:02 AM (#2523610)
As for the 1991-05 NL, I actually get it spot on from 1991 to 1999 (2.87 real, 2.86 projected). I then undershoot it tremendously from 2000 to 2004. That is a compelling measure of the impact of steroid use on the game, and in particular that of Barry Bonds' usage. Bonds alone accounts for 16 points of stdev in the 2001 and 2002 NL, 8 in 2003, and 14 in 2004. Sosa and Luis González in 2001, Adrián Beltre in 2004...all these seasons jack up the standard deviation. That is NOT, I think, because those league were easier to dominate--the tremendous decline in standard deviations accompanying the introduction of steroid testing in 2004-05 is pretty dramatic. Rather, it is because some players used and others didn't, which increases the variation between them as measured by standard deviation.
   406. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 14, 2007 at 12:20 AM (#2523630)
Briefly, here are the other extended periods of high residual errors, and my theories as to what accounts for them:

Early aughts: expansion-fueled league strength difference
Teens: star glut in the AL
1920s AL: George Herman Ruth, the one-man star glut
Early 40s AL: Teddy Ballgame, the other one
Early 50s AL: Integration
60s and early 70s: star glut in the NL
Late 70s and 80s: star glut in the AL
2000s: Steroids.
   407. Dandy Little Glove Man Posted: September 14, 2007 at 12:53 AM (#2523657)
That makes sense, but I'm not quite sure I buy it. Didn't the NL win just about every All-Star game in the 70s and 80s? It's somewhat like arguing that the NL was the better league in the 90s because Bonds, Larkin, Piazza, Biggio, and Bagwell were 5 of the top 6 players by WARP over the period (just a guess, but I think something close to that is probably true). It seems clear that the AL had the better depth of stars in the average year from the fact that it tended to dominate the All-Star game, just as the same seems true of the NL for the previous period. What do you think?
   408. TomH Posted: September 14, 2007 at 02:03 AM (#2523718)
That is a compelling measure of the impact of steroid use on the game, and in particular that of Barry Bonds' usage.

It might also be in part that Bonds, like Ruth before him, was an outlier becase he was really really good.
   409. Paul Wendt Posted: September 14, 2007 at 03:51 AM (#2523799)
Dan,
You explained a problem with 1890s outfielders, which you have corrected.

Do you know why shortstops Bill Dahlen and Tommy Corcoran are the all-time #1 and #5 "losers"?
George Davis #10, Herman Long #12, Monte Cross #17, Bobby Wallace #21; Hugh Jennings and Honus Wagner not far "behind". This selection is limited to Dahlen and his shortstop contemporaries. Given the numbers of shortstops and outfielders, it seems that SS not OF is the big loser among fielding postions.
   410. 'zop sympathizes with the wrong ####### people Posted: September 14, 2007 at 05:02 AM (#2523845)

It might also be in part that Bonds, like Ruth before him, was an outlier becase he was really really good.


For chrissake, man, I don't even care that much about 'roids, but lets not play ostrich.
   411. Howie Menckel Posted: September 14, 2007 at 05:19 AM (#2523850)
Seems like that's placing a huge impact on the ultimate sample size issue - a once(twice)-a-year All-Star Game.
   412. KJOK Posted: September 14, 2007 at 05:26 AM (#2523856)
Just like in high school baseball where the best hitters play SS (because they are simply the best players, period), in the 1890s many of the absolute worst hitters played RF,


and just like in high school, some of the best 19th century hitters played SS, yet they appear to be taking a beating in your new, improved calculations?
   413. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 14, 2007 at 06:57 AM (#2523886)
Dandy Little Glove Man,

I'm very glad you noticed that discrepancy, because in trying to analyze what might account for it, I tried a few different versions of the regression and found one that increased my r-squared by nearly 5%, a very significant improvement. I've recalculated the WARP using this more accurate regression and posted the new file to the Yahoo group. It still does overshoot the late 70s/80s NL--I really do believe there was a star imbalance during that era, and I don't think a handful of All-Star games in which the best players only play a few innings is enough evidence to disprove my argument--but not by nearly as much. Thanks for drawing attention to it.

TomH,

I don't think they're really comparable, because Bonds didn't become that much of an outlier until he started using steroids.
   414. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 14, 2007 at 07:03 AM (#2523888)
Paul Wendt and KJOK, that's true and there's a simple reason for it. I multiply all replacement levels by a constant to bring the overall standard deviation-adjusted rep level for each year in line with Nate Silver's FAT levels. Since SS replacement level is the furthest from 0, an X% reduction will cause a bigger absolute increase in SS rep level than it would for OF or 1B rep level. The *relative* position of SS to other positions in the 1890s did not suffer in this WARP update--in fact, it *improved*, since the OF were adjusted so much--but in absolute terms, it probably takes the biggest hit.
   415. TomH Posted: September 14, 2007 at 11:21 AM (#2523902)
I'm not trying to get into a roids argument. I merely pointed out that
a) outliers can affect data trends
b) Babe is the prime example
c) Barry could be anohter one, because
....1 he was really good in 1990-93
....2 while he very probably took 'stuff', there are other indicators that he may have been still amaxingly good in 1999 ff without them, such as
------- his KO per AB rate from 2003 til today is much better than his career avg, despite KOs/AB going up in general since the 80s.
------- he really has been a workout freak, suggesting his drive to become better would help him stay in shape until age 42 better than most.

And I specifically in my post wrote "It might also be in part"; using hedging words twice. I don't like Barry one bit, nor what has happened to the records of our national game. But the baby & bath water analogy COULD apply.
   416. Paul Wendt Posted: September 14, 2007 at 01:09 PM (#2523962)
405. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 13, 2007 at 08:02 PM (#2523610)
As for the 1991-05 NL, I actually get it spot on from 1991 to 1999 (2.87 real, 2.86 projected). I then undershoot it tremendously from 2000 to 2004. That is a compelling measure of the impact of steroid use on the game, and in particular that of Barry Bonds' usage.

Which is the year when the revised treatment of expansion kicks in?

The worst W-L record in the majors is .420 (a couple days ago). How much does bear on WAR for the best players, in contrast to .270-.280 for Detroit (phony numbers giving .150 and 50% difference)?
   417. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 14, 2007 at 02:33 PM (#2524024)
Paul Wendt,

Well, thanks to Dandy Little Glove Man's inspirational inquiry, I went back to using 12 years of expansion points (*along with* winning percentage of the worst team). I'm not sure what you mean by which year it kicks in...1901 is the first expansion year I address.

If I increase the winning pct of the worst team in the 2003 AL from .265 to .420, its projected stdev decreases from 2.67 to 2.56.
   418. Paul Wendt Posted: September 14, 2007 at 03:00 PM (#2524056)
I meant which is the first year for which you dropped the expansion variable, 1999 or 2000. But the question is moot, given that there is a subsequent revision at yahoogroups.com and in any numerical details you provide here-after.
   419. Jim Sp Posted: September 14, 2007 at 04:20 PM (#2524162)
Dan,
I didn't get to your very latest file, but I did notice on the previous one that the sum of Warp1 goes from over 42K to about 39K, indicating a higher replacement level or maybe a transfer to the pitchers...is that intended?
   420. Jim Sp Posted: September 14, 2007 at 04:21 PM (#2524165)
I mean a change in global/overrall replacement level, I know there are tweaks in particular cases.
   421. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 14, 2007 at 04:26 PM (#2524170)
Indeed! In the old system, I let the overall replacement level float with time, which let it get as high as about 1.3 wins below average per position in some years and as low as 1.9 in the 1890s. Now, I adjust the overall replacement level for every league-season to match the original Nate Silver FAT level (1.5 wins below average per position). I guess there were more years above 1.5 than there were below it if the overall sum of WARP has dropped.
   422. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 20, 2007 at 04:58 AM (#2533329)
I think one thing that has prevented some members of the group from making use of my WARP data is that most people aren't paying enough attention to distinguish between the system's results and my own personal idiosyncratic voting preferences (particularly my emphasis on peak rate). To help separate the two, I've applied a different salary estimator--the same equation Nate Silver currently uses to determine the market value of players today--which I think much better reflects the consensus approach to voting.

The equation is $1,200,000 * WARP2^1.5 + $380,000. It rewards peak, but not too steeply: one season of 8 WARP2 (a year like Rod Carew's 1975 or Ted Williams's 1948) comes out 38% more valuable than two seasons of 4 WARP2 (a year like Julio Franco's 1986 or Mo Vaughn's 1995), and one season of 12 WARP2 (there aren't many, but Ty Cobb's 1917 is a pretty good fit) is worth 68% more than three seasons of 4 WARP2. And I apply it to seasonal WARP totals rather than rates, so in-season durability is rewarded as most of the group supports.

So if you're fairly consensus-minded, and don't feel like taking the time to pore through a spreadsheet with tens of thousands of rows but just want the upshot on the conclusions of my research, here are the results of this Consensus Estimator. 2007 numbers are straight-line adjusted for the remaining 10 games. Strike seasons are filled out using the player's three-year average. 1943 and 45 are penalized with an extra 10% park factor (eg a 95 PF becomes 105) and -2 FRAA, and 1944 is penalized with an extra 20% park factor and -4 FRAA. I did a study on catcher playing time and found they play 58% fewer games than the other 8 positions on average, so catchers receive a flat 58% bonus to their salaries (prorated, so if a guy plays half a season at C and half somewhere else he gets a 29% bonus). War credit is given using the average production of the four surrounding seasons (two on either side); no other account is made for league strength above and beyond the standard deviation. Minor league and Negro League credit is taken from the group's MLE's. I only have data for MLB position players since 1893.

1. Babe Ruth, $696,740,476 (pitching credit is estimated from BP WARP and Win Shares)
2. Ted Williams, $642,511,769 (war credit for 1943-45, 1952, and the missing part of 1953)
3. Barry Bonds, $606,839,692
4. Honus Wagner, $574,301,059
5. Ty Cobb, $560,213,066
6. Willie Mays, $528,360,414 (war credit for the missing part of 1952 and 1953)
7. Tris Speaker, $502,833,731
8. Stan Musial, $456,610,473 (war credit for 1945)
9. Rogers Hornsby, $449,575,391
10. Hank Aaron, $435,614,208
11. Eddie Collins, $430,841,789
12. Mickey Mantle, $429,332,950
13. Nap Lajoie, $409,681,770
14. Lou Gehrig, $378,512,721
15. Joe Morgan, $376,821,200
16. Mike Schmidt, $373,017,397
17. Mel Ott, $349,906,431
18. Alex Rodríguez, $349,592,236
19. Rickey Henderson, $330,880,364
20. Cal Ripken Jr., $327,424,590
21. Frank Robinson, $317,650,621
22. Jimmie Foxx, $305,669,785
23. Arky Vaughan, $295,564,797
24. Joe DiMaggio, $287,146,741 (war credit for 1943-45)
25. Mike Piazza, $285,931,500
26. Johnny Mize, $276,311,565 (war credit for 1943-45)
27. Eddie Mathews, $273,265,852
28. Johnny Bench, $272,721,442
29. Luke Appling, $262,437,868 (war credit for 1944 and the missing part of 1945)
30. Barry Larkin, $256,711,764
31. Hank Greenberg, $256,212,329 (war credit for 1942-44 and the missing parts of 1941 and 1945)
32. Robin Yount, $255,070,137
33. Ed Delahanty, $253,937,639 (pre-1893 value estimated from BP WARP)
34. Wade Boggs, $249,808,731
35. Alan Trammell, $249,502,295
36. Yogi Berra, $248,200,649
37. Ozzie Smith, $245,384,391
38. Charlie Gehringer, $245,287,462
39. Paul Waner, $245,048,631
40. George Brett, $244,423,630
41. Gary Carter, $244,240,349
42. Bill Dahlen, $241,002,341 (pre-1893 value estimated from BP WARP)
43. Jeff Bagwell, $236,083,576
44. Carlton Fisk, $233,186,483
45. George Davis, $230,529,318 (pre-1893 value estimated from BP WARP)
46. Joe Cronin, $230,150,451
47. Sam Crawford, $230,120,164
48. Tim Raines, $228,830,953
49. Carl Yastrzemski, $228,782,238
50. Al Kaline, $228,754,148
51. Ken Griffey, Jr., $227,853,391
52. Gary Sheffield, $225,198,850
53. Pete Rose, $224,756,620
54. Pee Wee Reese, $221,565,864 (war credit for 1943-45)
55. Iván Rodríguez, $219,303,959
56. Frank Thomas, $218,041,423
57. Bobby Grich, $215,174,829
58. Reggie Jackson, $214,045,692
59. Billy Hamilton, $213,670,922 (pre-1893 value estimated from BP WARP)
60. Bill Dickey, $213,586,049
61. Mickey Cochrane, $212,655,010
62. Gabby Hartnett, $212,309,070
63. Jesse Burkett, $211,668,925 (pre-1893 value estimated from BP WARP)
64. Joe Jackson, $211,396,444
65. Jackie Robinson, $210,253,998 (Negro League credit for 1945 and minor league credit for 1946)
66. Frankie Frisch, $209,705,820
67. Tony Gwynn, $208,179,880
68. Fred Clarke, $207,043,096
69. Rod Carew, $206,824,020
70. Ernie Banks, $205,812,850
71. Roy Campanella, $204,488-381 (Negro League credit for 1940-48)
72. Roberto Clemente, $201,492,552
73. Enos Slaughter, $200,123,112 (war credit for 1943-45)
74. Chipper Jones, $199,874,429
75. Bobby Wallace, $198,775,437
76. Lou Boudreau, $198,692,666
77. Craig Biggio, $193,866,990
78. Harry Heilmann, $193,498,528
79. Billy Williams, $193,214,123
80. Ron Santo, $193,141,638
81. Elmer Flick, $192,580,450
82. Larry Doby, $191,879,659 (Negro League credit for 1943 and 1946-7, war credit for 1944-5)
83. Al Simmons, $191,724,923
84. Mark McGwire, $187,831,620
85. Manny Ramírez, $187,582,301
86. Ryne Sandberg, $187,431,974
87. Heinie Groh, $185,001,350
88. Richie Ashburn, $184,212,537
89. Frank Baker, $183,807,918
90. Duke Snider, $183,508,525
91. Dick Allen, $182,934,445
92. Darrell Evans, $182,778,985
93. Lou Whitaker, $182,649,952
94. Ted Simmons, $182,257,531
95. Willie Keeler, $182,257,531 (pre-1893 value estimated from BP WARP)
96. Eddie Murray, $180,501,14
97. Jimmy Collins, $179,645,069
98. Jim Thome, $178,458,878 (might have moved up a slot after tonight's pair of dongs)
99. Derek Jeter, $178,432,157
100. Rafael Palmeiro, $177,709,900
101. Jim Edmonds, $177,560,374
102. Paul Molitor, $176,864,780
103. Hughie Jennings, $176,844,480 (pre-1893 value estimated from BP WARP)
104. Jorge Posada, $176,366,795
105. Albert Pujols, $176,198,445
106. Scott Rolen, $175,714,729
107. Dwight Evans, $175,433,415
108. Sammy Sosa, $175,161,939
109. Joe Kelley, $174,808,494 (pre-1893 value estimated from BP WARP)
110. Jimmy Sheckard, $173,782,257
111. Phil Rizzuto, $173,175,176 (war credit for 1943-45)
112. Charlie Keller, $173,121,159 (war credit for 1944 and the missing part of 1945)
113. Joe Sewell, $172,113,587
114. Orestes Miñoso, $171,250,228 (Negro League credit for 1945-50)
115. Max Carey, $170,942,663
116. Larry Walker, $169,760,677
117. Dave Winfield, $169,614,193
118. Brooks Robinson, $169,570,271
119. Zack Wheat, $168,508,810
120. Roberto Alomar, $168,037,864
121. Sherry Magee, $167,909,968
122. Dagoberto Campaneris, $167,565,867
123. Bobby Doerr, $160,855,223 (war credit for 1945)
124. Willie Stargell, $163,643,675
125. John McGraw, $163,585,393 (pre-1893 value estimated from BP WARP)
126. Graig Nettles, $163,563,273
127. Harmon Killebrew, $163,453,527
128. Reggie Smith, $162,388,814 (Japan credit for 1983)
129. Will Clark, $162,263,422
130. Willie McCovey, $161,580,642
131. Johnny Pesky, $161,323,415 (war credit for 1943-45)
132. Goose Goslin, $161,218,798
133. Jake Beckley, $160,305,046 (pre-1893 value estimated from BP WARP)
134. Roger Bresnahan, $159,724,848
135. David Concepción, $159,343,562
136. Brian Giles, $158,913,441
137. Billy Herman, $158,724,020 (war credit for 1944-5)
138. Jimmy Wynn, $158,445,038
139. Stan Hack, $157,378,910
140. Bill Freehan, $157,371,800
141. Vladimir Guerrero, $156,934,337
142. Andre Dawson, $156,357,771
143. Jason Giambi, $156,000,024
144. Keith Hernández, $155,837,952
145. Tommy Leach, $154,776,307
146. Joe Gordon, $154,628,334 (war credit for 1944-5)
147. Dave Bancroft, $154,096,301
148. Edgar Martínez, $154,010,986
149. Gavvy Cravath, $152,470,860 (minor league credit for 1906-7, 9-11)
150. George Sisler, $152,320,148
151. Jim Fregosi, $152,193,178
152. Bob Johnson, $151,852,428
153. Chuck Klein, $151,248,045
154. Willie Randolph, $150,884,925
155. Rabbit Maranville, $150,875,410 (war credit for 1918)
156. Joe Medwick, $150,137,493
157. Ralph Kiner, $149,839,412 (war credit for 1944-45)
158. Bobby Bonds, $148,958,773
159. Ken Boyer, $148,253,634
160. Bernie Williams, $147,888,863
161. Buddy Bell, $147,695,883
162. Ron Cey, $146,837,114
163. Toby Harrah, $146,788,510
164. Gene Tenace, $146,753,677
165. Earl Averill, $146,304,307 (minor league credit for 1926-8)
166. Robin Ventura, $146,265,236
167. Kiki Cuyler, $145,816,749
168. Cupid Childs, $145,685,369 (pre-1893 value estimated from BP WARP)
169. Dale Murphy, $144,777,231
170. Norm Cash, $144,665,638
171. Bobby Abreu, $142,590,458
172. Brett Butler, $141,823,282
173. Thurman Munson, $141,355,236
174. Bobby Veach, $141,335,556
175. Bill Terry, $141,205,582
176. Joe Torre, $141,135,719
177. Edd Roush, $140,737,906 (holdout credit for 1922 and 1930)

Other top 100 returnees...and Nellie Fox

Luis Aparicio, $136,192,622
Nellie Fox, $134,857,927
George Burns, $134,290,442
Vern Stephens, $130,962,024
Pie Traynor, $127,883,494
Wally Schang, $127,322,015
George Van Haltren, $127,190,968 (pre-1893 value estimated from BP WARP)
Ken Singleton, $124,829,044
Tony Lazzeri, $124,095,267
Jack CLark, $122,354,790
Rusty Staub, $120,188,034
Kirby Puckett, $119,934,105
Atanasio Pérez, $119,567,021
Hugh Duffy, $119,068,434 (pre-1893 value estimated from BP WARP)
Fielder Jones, $118,150,306
Frank Chance, $117,903,321
Tony Oliva, $117,551,078
Brian Downing, $117,019,707
Sal Bando, $116,979,294
Elston Howard, $115,172,998 (minor league credit for 1954)
Bob Elliott, $115,037,913
Lance Parrish, $111,790,422
Jim Rice, $111,073,654
Frank Howard, $110,035,741
Don Mattingly, $108,734,032
Dave Parker, $108,732,541
Ernie Lombardi, $108,316,911
Hack Wilson, $105,317,662
Orlando Cepeda, $104,799,732
Sam Rice, $101,995,196
Larry Doyle, $100,478,746
Lou Brock, $99,297,793
Bill Mazeroski, $97,601,211
Mickey Vernon, $95,881,937
Al Rosen, $95,192,575
Al Oliver, $91,451,225
Bill Madlock, $85,615,833
George Kell, $75,162,010

HoM "mistakes"
Earl Averill, Ken Boyer, Cupid Childs, Nellie Fox, Ralph Kiner, Joe Medwick, Willie Randolph, Edd Roush, George Sisler, Bill Terry, Joe Torre

Their replacements
Dave Bancroft, Dagoberto Campaneris, David Concepcion, Gavvy Cravath, Andre Dawson, Tommy Leach, John McGraw, Graig Nettles, Johnny Pesky, Phil Rizzuto, Reggie Smith
   423. Jim Sp Posted: September 20, 2007 at 05:23 PM (#2533723)
I've applied a different salary estimator--the same equation Nate Silver currently uses to determine the market value of players today

So the old salary estimator wasn't Nate Silver's formula? Or he's changed his formula? Where did the old formula come from?
   424. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 20, 2007 at 07:53 PM (#2533950)
Yes, he changed his formula. The old one was for 2005, this one is for 2007. The old one was about twice as peaky--it had 8 + 0 as 68% better than 4 + 4, and 12 + 0 as 136% better than 4 + 4 + 4 (compared to 38% and 68% with the new one). Also, the absolute numbers are obviously higher with the new one.
   425. KJOK Posted: September 20, 2007 at 09:26 PM (#2534043)
No Sam Thompson in the list?

Otherwise, this looks good, although I like the 'high peak' favorable calculation better.

I'm going to post a spreadsheet of Dan's data with Position added as a sort column.
   426. Bleed the Freak Posted: September 20, 2007 at 09:46 PM (#2534056)
161. David Concepcion de la Desviacion Estandar (Dan R) Posted: April 02, 2007 at 02:31 PM (#2322791)

Ted Simmons $76,589,060 with 20% catcher bonus--I wish I could take back my vote for Simmons, not that would have mattered. From about 1950 to 1985, catcher was between 2B and 3B on the defensive spectrum. You can see it in the number of big-hitting catchers in those years--Berra, Campanella, Bench, Fisk, Carter. Compare that to nobody in the 1940s or the deadball era (which is why I will vote for Bresnahan--and no I don't support Lombardi, he was just the "best of a bad lot.") With that kind of replacement level, in the non-DH league, neither Simmons nor Torre nor Freehan is close to my PHoM. I think the group has substantially overrated catchers from this era.

422. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 20, 2007 at 12:58 AM (#2533329)

94. Ted Simmons, $182,257,531

Glad to see Ted make a huge leap upward in the rankings from George Burns territory to HOM Eddie Murray. What prompted the uptick?

Dan, Thanks for the knowledge and tireless effort you have brought to the project to make everyone else here wiser. I'm a lurker who truly enjoys your insightful posts as well as our other noteworthy contributors to the HOM project.
   427. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 20, 2007 at 10:03 PM (#2534083)
KJOK--I didn't do Thompson since he accumulated the majority of his value prior to 1893. I'd just be parroting BP's take on him. The peakier salary data is of course in the spreadsheet in the Hall of Merit Yahoo group.

Bleed the Freak--What boosted Simmons (and all the catchers) was my actually doing an empirical study on catcher playing time, which showed that the correct bonus for them is not 20% as I was giving but a much beefier 58%. Thanks for the kind words.
   428. Paul Wendt Posted: September 20, 2007 at 10:44 PM (#2534139)
DanR #422
I did a study on catcher playing time and found they play 58% fewer games than the other 8 positions on average, so catchers receive a flat 58% bonus to their salaries (prorated, so if a guy plays half a season at C and half somewhere else he gets a 29% bonus).

Do you mean that catchers play 58% of 8-position (DH and 7 other fielders?) games and you inflate their raw ratings in ratio 100:58?
   429. Paul Wendt Posted: September 20, 2007 at 10:45 PM (#2534141)
Oh, I suppose it is that 8-position men play 58% more games than catchers. Inflate in ratio 158:100.
   430. Jim Sp Posted: September 20, 2007 at 10:47 PM (#2534144)
I think the new salary estimator is just a trick so that you don't have to vote for Gooden next year :)
   431. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 20, 2007 at 10:51 PM (#2534151)
Paul Wendt, that's correct.

Jim Sp, I just eyeball pitchers for now. Moreover, I don't plan to use this estimator to vote--I prefer the one on the spreadsheet (since it reflects my preference for peak rate). I just did this to help people distinguish the conclusions of my research from my idiosyncratic voting.
   432. Brent Posted: September 21, 2007 at 03:32 AM (#2534824)
172. Brett Butler, $141,823,282
Kirby Puckett, $119,934,105


Dan - I'm aware that you don't support either of these players, but this difference jumped out at me. I can understand a career-oriented voter ranking Butler ahead of Puckett because of the difference in career length, but you've got Butler 18% ahead, meaning you see Butler worth nearly as much per game played as Puckett. Comparing Puckett's 124 OPS+ to Butler's 110 and considering the widespread consensus when they were active that Puckett was the better fielder, I'm finding your rankings for this pair to be a bit hard to swallow.
   433. 'zop sympathizes with the wrong ####### people Posted: September 21, 2007 at 03:37 AM (#2534833)
Comparing Puckett's 124 OPS+ to Butler's 110 and considering the widespread consensus when they were active that Puckett was the better fielder, I'm finding your rankings for this pair to be a bit hard to swallow.

Dan can speak for himself, of course, but what jumps out at me is that Butler had a .377 career OBP playing at Dodger Stadium, while Puckett had a .360 career OBP playing at the Metrodome. By the time you park adjust that, that's probably a substantial OBP edge for Butler, and OPS+ underrates players whose offensive strength is their on-base skills.

Just my $0.02.
   434. Dr. Chaleeko Posted: September 21, 2007 at 03:40 AM (#2534838)
Just my $0.02.

$0.02 with the new salary estimator or the old one? ; )
   435. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 05:08 AM (#2534941)
Brent, I'm actually working on a post comparing Puckett to all the modern CF backlog candidates as we speak, but here's my analysis of Butler vs. Puckett, one-on-one:

Quick glossary:
SFrac is the percentage of the season played (compared to a player with league average PA/G in 162 games). BWAA is batting wins above average, BRWAA is baserunning wins above average, FWAA is fielding wins above average, Rep is wins above average a replacement CF would have accumulated in the same playing time, and WARP is the first three minus the fourth (wins above replacement). Note that Rep is 0.6 wins lower in the AL than in the NL to account for the DH. 1994, and 1995 are adjusted to 162 games. aTotal (where included) are the career totals excluding sub-replacement seasons.

Puckett

Year  SFrac BWAA BRWAA FWAA   Rep WARP Salary
1984 .85 
-1.7  -0.1  2.6  -1.8  2.5   $5,225,588
1985   1.09 
-0.7   0.1  0.9  -2.2  2.5   $5,099,991
1986   1.05  3.1   0.0 
-0.6  -2.1  4.6  $12,067,527
1987 .97  2.4   0.0 
-0.4  -1.8  3.8   $9,178,331
1988   1.02  4.5   0.2  0.6  
-1.8  6.9  $22,239,828
1989   1.01  2.2   0.1  0.3  
-1.7  4.4  $11,280,845
1990 .90  1.2   0.3  0.0  
-1.6  3.1   $6,806,804
1991 .95  1.1   0.5  0.1  
-1.7  3.4   $7,885,674
1992   1.02  3.1   0.4  0.6  
-1.8  6.0  $18,020,804
1993 .99  1.5   0.2 
-1.0  -1.9  2.6   $5,506,959
1994 .98  2.3   0.2 
-0.6  -1.7  3.5   $8,247,342
1995 .97  2.6   0.2 
-0.8  -1.7  3.5   $8,374,412
TOTAL 11.80 21.6   2.1  1.7 
-21.8 46.8 $119,934,105 


Butler

Year   SFrac BWAA BRWAA FWAA   Rep WARP Salary
1981  .32  0.1   0.3  0.0  
-0.2  0.6  $918,850
1982  .39 
-1.5   0.4 -0.3  -0.6 -0.9  $0
1983  .90  0.8  
-0.4  1.1  -0.7  2.1   $4,149,260
1984 1.02  0.5   0.2 
-0.2  -2.1  2.7   $5,568,111
1985  .97  2.0   0.1  0.2  
-2.0  4.3  $11,012,926
1986  .97  0.8   0.4  0.1  
-1.9  3.2   $7,219,639
1987  .89  2.1   0.2  0.6  
-1.7  4.5  $11,830,230
1988 1.00  4.2   0.3  0.4  
-1.1  6.1  $18,274,865
1989  .98  1.4   0.2  0.7  
-1.1  3.4   $7,793,117
1990 1.07  3.7   0.5  0.1  
-1.2  5.4  $15,590,434
1991 1.08  2.9  
-0.4  1.1  -1.3  4.9  $13,285,083
1992  .97  4.3  
-0.3 -0.3  -1.2  4.9  $13,290,830
1993 1.03  2.1  
-0.1  0.4  -1.3  3.8   $9,311,751
1994 1.01  4.2   0.7 
-0.4  -1.4  5.8  $17,312,693
1995  .96  1.4   0.4 
-0.2  -1.4  3.1   $6,950,807
1996  .21 
-0.3   0.1  0.2  -0.3  0.4  $645,653
1997  .56  0.3  
-0.4 -0.2  -0.9  0.6  $885,771
TOTAL  14.33 29.0   2.2  3.3 
-20.4 54.9 $144,040,021
aTOTAL 13.94 30.5   1.8  3.6 
-19.8 55.8 $144,040,021 


Let's see if this formats right, and then I'll discuss.
   436. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 05:29 AM (#2534953)
Ewwww that came out horribly. It doesn't seem to be able to handle multiple white spaces...I'll try again using different abbreviations so it lines up...this won't be pretty but hopefully it will be legible.


Puckett

Year SFrac BWAA BRWA FWAA Replc WARP Salary
1984 00.85 
-1.7 -0.1 +2.6 -1.80 +2.5 $5,225,588
1985 01.09 
-0.7 +0.1 +0.9 -2.20 +2.5 $5,099,991
1986 01.05 
+3.1 +0.0 -0.6 -2.10 +4.6 $12,067,527
1987 00.97 
+2.4 +0.0 -0.4 -1.80 +3.8 $9,178,331
1988 01.02 
+4.5 +0.2 +0.6 -1.80 +6.9 $22,239,828
1989 01.01 
+2.2 +0.1 +0.3 -1.70 +4.4 $11,280,845
1990 00.90 
+1.2 +0.3 +0.0 -1.60 +3.1 $6,806,804
1991 00.95 
+1.1 +0.5 +0.1 -1.70 +3.4 $7,885,674
1992 01.02 
+3.1 +0.4 +0.6 -1.80 +6.0 $18,020,804
1993 00.99 
+1.5 +0.2 -1.0 -1.90 +2.6 $5,506,959
1994 00.98 
+2.3 +0.2 -0.6 -1.70 +3.5 $8,247,342
1995 00.97 
+2.6 +0.2 -0.8 -1.70 +3.5 $8,374,412
TOTL 11.80 21.6 
+2.1 +1.7 -21.8 46.8 $119,934,105 


Butler

Year SFrac BWAA BRWA FWAA Replc WARP Salary
1981 00.32 
+0.1 +0.3 +0.0 -0.20 +0.6 $918,850
1982 00.39 
-1.5 +0.4 -0.3 -0.60 -0.9 $0
1983 00.90 
+0.8 -0.4 +1.1 -0.70 +2.1 $4,149,260
1984 01.02 
+0.5 +0.2 -0.2 -2.10 +2.7 $5,568,111
1985 00.97 
+2.0 +0.1 +0.2 -2.00 +4.3 $11,012,926
1986 00.97 
+0.8 +0.4 +0.1 -1.90 +3.2 $7,219,639
1987 00.89 
+2.1 +0.2 +0.6 -1.70 +4.5 $11,830,230
1988 01.00 
+4.2 +0.3 +0.4 -1.10 +6.1 $18,274,865
1989 00.98 
+1.4 +0.2 +0.7 -1.10 +3.4 $7,793,117
1990 01.07 
+3.7 +0.5 +0.1 -1.20 +5.4 $15,590,434
1991 01.08 
+2.9 -0.4 +1.1 -1.30 +4.9 $13,285,083
1992 00.97 
+4.3 -0.3 -0.3 -1.20 +4.9 $13,290,830
1993 01.03 
+2.1 -0.1 +0.4 -1.30 +3.8 $9,311,751
1994 01.01 
+4.2 +0.7 -0.4 -1.40 +5.8 $17,312,693
1995 00.96 
+1.4 +0.4 -0.2 -1.40 +3.1 $6,950,807
1996 00.21 
-0.3 +0.1 +0.2 -0.30 +0.4 $645,653
1997 00.56 
+0.3 -0.4 -0.2 -0.90 +0.6 $885,771
TOTL 14.33 29.0 
+2.2 +3.3 -20.4 54.9 $144,040,021
aTTL 13.94 30.5 
+1.8 +3.6 -19.8 55.8 $144,040,021 


Take two here....
   437. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 06:33 AM (#2534982)
Wow, that is really ugly but the columns do line up now. (The difference between Butler's salary on the list and the one here is that his 1994 is regressed on the liset). So, here's what I see:

1. My raw batting numbers tell a very different story than OPS+ (although not than BP's more sophisticated EqA, which has Puckett at .284 career and Butler at .283). Based on Jim Sp's finding that one point of OPS+ is worth .087 BWAA on average, we would expect Puckett's career BWAA rate to be (124-110)*.087 = 1.218 better than Butler's. Instead, after moving the DH adjustment from the replacement column to the BWAA column, I have Butler with a career 2.18 BWAA per 162 games, and Puckett with 2.43 BWAA per 162 games, a gap of just 0.25 in favor of Puckett. What accounts for this disparity?

a. Double plays. Butler was an absolute whiz at avoiding them, hitting into an extremely impressive 65.7 fewer double plays over his career than a league-average player would have hit into in his opportunities. That's worth a beefy 3.7 wins. By contrast, Puckett hit into them at a slightly above average rate, with 15.84 more double plays than a league-average player would have hit into in his opportunities. This costs him 0.7 wins. This factor increases Butler's career BWAA/162 from 1.92 to 2.18, and decreases Puckett's from 2.49 to 2.43, thus accounting for (2.18-1.92+2.49-2.43)/(1.218-.25) = one-third of the gap between the per-game hitting advantage we would expect from Puckett's OPS+ and the one he actually compiles in BWAA/162.
b. OBP-heaviness. OPS+ *drastically* underrates the offense of extremely OBP-heavy guys like Butler (see McGraw, John). Plug anyone whose career OBP is higher than their career SLG (as in Butler's case) into a run estimator, and you'll find they come out far better than OPS+ would suggest. A simple example: let's compare Butler's 1987 to Puckett's. Butler had a 119 OPS+, Puckett a 132. If their OBP to SLG ratios were the same, you'd expect Puckett's BWAA/162 to be (132-119)*.087 = 1.13 higher than Butler's. But now let's do the full analysis. XR gives Butler 90.0 runs created with a 101 park factor (not counting his double play avoidance), so 89.1 park-adjusted, in 370 batting outs. The AL produced .19 runs per out and had 4,186 outs per team that year, so an Average Team Plus Butler would score (4186-370)*.19 + 89.1 = 814 runs, against a league average of 794 allowed, means the team would win 82.8 games, making Butler 1.8 wins above average. He had 89% of an average player's plate appearances per 162 games that year, giving him 1.8/.89 = 2.02 BWAA per 162 games. By contrast, XR gives Puckett 112.1 runs created with a 104 park factor (not counting his double play preopensity), so 107.7 park-adjusted, in 423 batting outs. The Average Team Plus Puckett would score (4186-423)*.19 + 107.7 = 822 runs, against a league average of 794 allowed, means the team would win 83.3 games, making Puckett 2.3 wins above average. Puckett had 97% of an average player's plate appearances per 162 games that year, giving him 2.3/.97 = 2.37 BWAA per 162 games. So the gap between their offensive rates (ignoring double plays) is just 0.35 wins per 162 games, instead of the 1.13 wins per 162 OPS+ would tell us. This is because OPS+ simply breaks down when confronted with extremely OBP-heavy players (and presumably with extremely SLG-heavy players as well in the opposite direction). This is true over both players' entire careers, and accounts for the other 2/3 of the gap between their OPS+ and BWAA/162.

2. The defensive statistics available to us simply do not show Puckett as a superior defensive center fielder to Butler (and remember, I have Chris Dial's Zone Rating data for 1987 to the present, so a good chunk of these defensive stats are PBP-based). Much like Dwight Evans, Puckett started out as a great fielder, but regressed to average just as soon as he started hitting. Butler was slightly above average throughout his prime. On the whole, fielding quality only accounts for 1.6 wins' worth of difference between the two.

3. Puckett's best two years are better than Butler's best two, but Butler beats him in years three, four, and five, so peak is roughly comparable, slight edge to Puckett I suppose. And Butler simply has a lot more career value than Puckett did. On the whole, I have the two players exactly equal by rate at 4 WARP per 162 games for their career, and since Butler's career was 18% longer with the same peak, his salary is 18% higher. Seems logical to me.

Brent, let me know if this explains it, and if you're convinced. If not, please do clarify what your doubts are and I'll try to address them.
   438. Dandy Little Glove Man Posted: September 21, 2007 at 02:39 PM (#2535174)
OPS+ *drastically* underrates the offense of extremely OBP-heavy guys like Butler (see McGraw, John). Plug anyone whose career OBP is higher than their career SLG (as in Butler's case) into a run estimator, and you'll find they come out far better than OPS+ would suggest.

To what extent is this actually true? Is it always the case that an OBP-heavy guy will be drastically underrated by OPS+? I tried to run a study to get a better impression of the differences in value between an OBP-heavy player and a SLG-heavy player at each spot in the batting order. Just about everyone here agrees that the weighting given to OBP in OPS+ is too low overall, but I don’t think enough consideration is given to the fact that lineup position has a tremendous impact on the extent to which OPS+ misrepresents the relative importance of OBP and SLG.

To get a better handle on this relationship, I looked at Tom Ruane’s situational hitting charts representing the aggregate of the 1982, 1983, and 1987 seasons (http://www.baseballthinkfactory.org/btf/scholars/ruane/articles/situational_hitting.htm). At every spot in the batting order over this time period, a hitter comes to bat more often with the bases empty than with runners on. Overall, 55.6% of PA fall into the 3 base-out states in which the bases are empty, and the other 21 base-out states account for the remaining approximately 4/9 of game situations. Here are the percentages of plate appearances with none on for each lineup spot:

#Out On ALL Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
0out --- 24.3 40.6 18.0 17.3 25.5 24.3 22.0 23.1 23.7 23.1
1out --- 17.4 14.0 28.5 13.5 12.9 18.4 17.9 16.7 17.2 17.5
2out --- 13.8 11.6 10.8 21.9 11.8 11.1 15.0 14.7 13.8 14.1

The leadoff hitter bats with none on and none out a whopping 40.6% of the time. In contrast, with the bases empty, a #3 hitter has both the fewest PA with none out and the most PA with 2 out. It is the only spot in which 2-out situations exceed 0-out situations with the bases empty.

I tried to analyze the ramifications of these discrepancies by combining this information with run expectancy tables, which can be found on the BP website. I created batting lines for 2 types of players, each of which would correspond to a 135 OPS+ in a fairly typical 1980s offensive environment (.333 league OBP & .400 league SLG). Player 1 is a power hitter with an average OBP; Player 2 is a high OBP guy on the basis of higher single and walk totals.

Player 1: 100 PA, 6 HR, 7 2B, 12 1B, 8 BB = .330 OBP, .543 SLG, 135(134.97) OPS+
Player 2: 100 PA, 2 HR, 7 2B, 18 1B, 13 BB= .400 OBP, .460 SLG, 135(135.06) OPS+

Here is the pertinent run expectancy table for 1983, which is fairly representative of the 1980s as a whole:

MenOn 0out 1out 2out
_ _ _ 0.48 0.25 0.098
x _ _ 0.87 0.50 0.21
_ x _ 1.11 0.65 0.33


Using the above run expectancy table, here are the bases-empty run values per 100 PA in each of the 3 out states for Player 1 and Player 2:

#Out P1 P2
0out 2.8 4.7
1out 3.6 3.4
2out 3.3 1.2

As should be expected, the OBP-heavy player contributes significantly more runs with no outs, when reaching first base is worth nearly .4 runs, and the SLG-heavy player adds significantly more runs with 2 outs, when the benefit from a home run relative to a walk or single is highest. The overall differences in value largely result from the fact that a home run is roughly 2.5 times as valuable as a walk or single to lead off the inning, but approximately 9 times as valuable with 2 outs and the bases empty. The additional substantial factor in the difference between the players is the negative value of an out, as an out is far more costly in terms of run expectancy at the top of the inning.

Incorporating the table of percentages in each out-state by lineup position, we can establish the percentage difference in runs between Player 1 and Player 2 at each spot in the order:

#Out All Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
0out 24.3 40.6 18.0 17.3 25.5 24.3 22.0 23.1 23.7 23.1
1out 17.4 14.0 28.5 13.5 12.9 18.4 17.9 16.7 17.2 17.5
2out 13.8 11.6 10.8 21.9 11.8 11.1 15.0 14.7 13.8 14.1

Pl1 1.76 2.03 1.89 1.69 1.57 1.71 1.76 1.74 1.74 1.74
Pl2 1.91 2.53 1.95 1.54 1.78 1.91 1.83 1.84 1.87 1.86

Diff 0.08 0.25 0.03 -0.09 0.14 0.11 0.04 0.06 0.08 0.06

Overall, the OBP-heavy player is worth 8% more runs than the SLG-heavy player with the bases empty, and in the leadoff spot, the high-OBP man adds 25% more runs. In the third spot, however, the high-SLG hitter contributes 9% more runs. The #3 position is the only spot in which the SLG-heavy hitter is preferable, but it is interesting to note that the #2 spot shows very little difference between them A high OBP seems more important for the fourth and fifth hitters, who have more plate appearance with no outs.

Intuitively, we should realize that as run environments rise, the high-OBP player becomes more valuable. Here is the comparable chart for the higher-offense 1987 season, by % difference in runs between the OBP-heavy player and the high SLG-heavy player:

All Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
0.17 0.39 0.14 -0.06 0.24 0.23 0.12 0.14 0.16 0.15


Notice that the SLG-heavy player’s advantage in the #3 spot is now smaller than his disadvantage at every other lineup position. As offense levels have climbed higher in subsequent years, this trend has continued. Below is the 2006 chart:

All Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
0.24 0.45 0.21 -0.01 0.30 0.30 0.18 0.20 0.23 0.22

It must be noted that the players would no longer be equal by OPS+ as league OBP and SLG averages have increased. Given current conditions, roughly a .340 OBP and .425 SLG would be the standard, so the SLG-heavy player would be down to a 125(124.9) OPS+ and the OBP-heavy player would fall to a 126(125.8) OPS+. The SLG-heavy player has lost 1 point of OPS+ relative to the high-OBP guy, but his relative value has fallen to a far greater extent. The OBP-heavy player accounted for just 6% more runs with the bases empty in 1983, but he was worth 24% more in 2006.

My point here is mainly to show how much lineup position and offensive levels can alter the value of a player with certain batting splits. A high-OBP guy like Tim Raines or Brett Butler who bats leadoff is greatly underrated by OPS+. A high-OBP guy like Ken Singleton who primarily hits in the third spot may in fact be overrated by it. OPS+ vastly overrates any SLG-heavy player in 2000, when the run environment is ridiculously high, just as it substantially underrates any SLG-heavy player in 1968 who doesn’t hit leadoff (SLG player is worth 9% more runs overall with the bases empty in 1968). In terms of the other 7 spots in the batting order, it seems that the more OBP-heavy players are slightly better than their OPS+ figures indicate in all but the most depressed run environments, though that’s basically what we assumed already.

Stats like EQA, VORP, and the various run estimators also don’t take lineup position into account. This leads me to believe that they substantially underrate the value of OBP-heavy leadoff hitters and SLG-heavy #3 hitters. Since they are designed to more correctly gauge the overall difference between OBP and SLG, players in other lineup positions should be evaluated more accurately.

It may sound like I’m stating all of this as fact, but that’s not my intention. I recognize that there are extreme limitations to this study and that each of these apparent conclusions is more like a guess. I only looked at the 5/9 of plate appearances that occur with the bases empty. Perhaps the plate appearances with runners on have a huge impact on the numbers, though for now I am acting as if they do not. From a glance I’d suppose that those 21 base-out states are fairly evenly balanced throughout the order and that they serve to dampen the extreme run disparities but not nearly reverse them. I might be completely wrong in this regard. Also, the run expectancy tables don’t account for lineup construction. A leadoff single by the #7 hitter likely does not have as high a run value as a leadoff single by the #1 hitter, simply due to the quality of hitters that follow. This is a reason why OBP is probably slightly more important for the #2 hitter than the above analysis indicates. If most managers put their best hitters third and fourth, a #2 hitter who reaches base is clearly more likely to score a run than a player lower in the order. Hopefully the differences in run expectancy as a result of lineup construction are generally far smaller than the differences resulting from the base-out states, but once again I can’t say this with certainty.
   439. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 03:14 PM (#2535230)
Dandy Little Glove Man--very nice work! Wow, there's the confirmation of the resarch I'd seen that #3 is really an ISO slot and that you shouldn't put one of your top hitters there, conventional wisdom be damned. I would note that my WARP most definitely do capture the changing relative value of OBP in different run environments--again, that's a big part of why the uberstats underrate John McGraw--although they certainly don't capture the effect of lineup slot on value.

For the Hall of Merit, there's an additional question about whether a player should be credited or penalized for a manager's decision on where to hit him (this is another ability vs. value thing, I suppose). It somehow does "feel" fair to me to give extra credit to OBP-heavy leadoff men, but not to penalize Singleton for being imporperly utilized, although I recognize that's logically inconsistent.
   440. Dandy Little Glove Man Posted: September 21, 2007 at 04:44 PM (#2535359)
If you think it's proper to give extra credit to OBP-heavy leadoff men for the added value of their performance, shouldn't you also give extra credit to SLG-heavy #3 hitters for theirs? Maybe this is more of a question for the HOM discussion thread, where several voters have indicated a strong preference against hitters of this type. Isn't the fundamental idea behind most of the voters' ballots to rank players according to value? If you think that hitters recognize the relative importance of reaching base and reward them for doing so, then isn't it necessary to grant that hitters in situations more suited to sacrifice OBP for SLG can be rewarded for that behavior? I fully understand not downgrading an OBP-heavy player for the misutilization of his performance, but I feel that SLG-heavy players in the third spot certainly shouldn't be penalized relative to the actual value of their performance simply because they would be less valuable elsewhere in the lineup.
   441. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 04:48 PM (#2535367)
I don't even know where a lot of candidates happened to hit in the lineup!
   442. OCF Posted: September 21, 2007 at 05:04 PM (#2535404)
Double plays. Butler was an absolute whiz at avoiding them, hitting into an extremely impressive 65.7 fewer double plays over his career than a league-average player would have hit into in his opportunities.

So much of the theory of hitting mechanics is devoted to hitting the ball hard. All the stuff that Carlos Gomez and his friend Jeff write about hitting prospects is devoted to whether they think the player will hit the ball hard. The fundamental limitation of Brett Butler's value is his failure to hit the ball hard, which shows up in part in his low XBH numbers. But for all that, there is one benefit to hitting the ball softly - and here it is. There are several causes to Butler's DP aviodance, of course. He batted left-handed, giving him an advantage going down the line, and with his non-vicious swing, he got a quick getaway from home plate. The third baseman typically played him in on the grass, gaurding against the bunt. That would mean that balls headed for the hole would get past the 3B, when with normal positioning he might have cut the ball off and initiated a DP. (The SS might still get to those balls, but deeper and slower - no DP.) Some part of it is that he did bunt frequently. (Even if he didn't reach, those would likely - but not inevitably - scored as sacrifices if there were runners on base.) But it's still true that one of the biggest reasons Butler avoided the DP is that he hit the ball softly.

Of course, you do want players who hit the ball hard. You'd much rather have Honus Wagner than Roy Thomas. But Roy Thomas never could have been a Wagner or Cobb - he wasn't strong enough. Brett Butler couldn't be a George Brett - those weren't his physical skills. He did what he could do, and what he did do was much more valuable that all those Joe Carter types who hit the ball harder.
   443. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 05:18 PM (#2535434)
I would imagine part of Butler's DP avoidance as well was just his speed, no?
   444. rawagman Posted: September 21, 2007 at 05:27 PM (#2535451)
Much of DP avoidance has nothing whatsoever to do with the hitter. Some examples:
- Was the runner on first in motion?
- Was he exceptionally good at taking out the 2B?
- Where were the fielders positioned?
Also - should a hitter be rewarded for striking out with a man on first? Obviously, the result is not as catastrophic for his team as a double play would have been, but at least on the ground ball, there is the chance that something good happens, while with the strike out, the chance is quite minimal (the odd passed ball/wild pitch/ stolen base attempt)
   445. TomH Posted: September 21, 2007 at 05:33 PM (#2535467)
And leadoff men hit into many fewer DPs! Of course, they also have the disadvanatge of NOT batting with men on, especially a fast man on first, which opens the hole and raises the batting avg of the man at the plate.
   446. TomH Posted: September 21, 2007 at 05:34 PM (#2535470)
LHB also tend to ground into fewer DPs.
   447. rawagman Posted: September 21, 2007 at 05:40 PM (#2535475)
In terms of looking for a statistic that would help someone understand the overall effectiveness of a given baseball player, the GIDP is about as useful as the RBI. But less so, as we would tend to be dealing with smaller sample sizes.
   448. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 05:44 PM (#2535485)
Rawagman, I guess you'd expect that to even out over a 15 year career...no? And yes, I think a hitter should most definitely be rewarded for King with a man on first as opposed to generating a fielded out that could be a potential double play...that's a meaningful chunk of value for guys like Jim Thome.

Here are the top and bottom 20 for DP avoidance since 1933, measured in wins above average:

Best

1. Joe Morgan, 7.5
2. Mickey Mantle, 6.4
3. Eddie Mathews, 6.1
4. Darrell Evans, 5.0
5. Willie Davis, 4.7
6. Barry Bonds, 4.6
7. Kirk Gibson, 4.6
8. Richie Ashburn, 4.5
9. Mickey Rivers, 4.4
10. Ken Griffey, Sr., 4.4
11. Reggie Jackson, 4.4
12. Vada Pinson, 4.3
13. Lou Whitaker, 4.2
14. Darryl Strawberry, 4.2
15. Steve Finley, 4.1
16. Joe Carter, 4.0
17. Rick Monday, 4.0
18. Mel Ott, 4.0 (post 1933 only)
19. Brett Butler, 3.9
20. Dagoberto Campaneris, 3.7

Worst

1. Ernie Lombardi, -6.0 (post 1933 only)
2. Joe Torre, -5.8
3. George Scott, -5.4
4. Julio Franco, -5.4
5. Tony Peña, -4.6
6. Dick Groat, -4.6
7. David Concepción, -4.6 (d'oh!)
8. Jerry Adair, -4.3
9. Danny Cater, -4.0
10. Ted Simmons, -3.9
11. Bob Bailey, -3.9
12. Tommy Davis, -3.8
13. Cal Ripken, -3.8
14. Joe Adcock, -3.7
15. Rico Carty, -3.7
16. Jim Rice, -3.7
17. Vinny Castilla, -3.6
18. Todd Zeile, -3.4
19. Ken Singleton, -3.4
20. Lou Piniella, -3.4
   449. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 05:47 PM (#2535490)
TomH, my numbers for Butler most definitely take account of how many times he batted with a man on first and less than two outs. Leading off certainly reduces his overall DP, but has absolutely no effect on his NetDP, which is the number I use.

Rawagman, GDP propensity/avoidance is an extremely small aspect of the game. It's comparable in magnitude to non-SB baserunning, for example. That doesn't mean it should be ignored--it definitely can add up to meaningful totals for the most extreme players (such as those listed above).
   450. sunnyday2 Posted: September 21, 2007 at 06:54 PM (#2535661)
So, lemme get this straight. Mickey Mantle won 6+ games simply by striking out with men on base? And Jim Rice probably would be a HoMer today if he had just struck out a lot more? I may understand this, or maybe I don't, but it sounds like the sort of insight that gives sabermetrics a bad name. Help me out here.
   451. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 07:11 PM (#2535708)
I don't know exactly how Mickey avoided so many DP's. Part of it is that he simply had a high OBP, so he didn't make as many outs with men on first and less than 2 outs. And part of it, yes, is that when he did make an out with men on first and less than 2 outs, he often did so via the K, thus keeping said runner on first and not creating an extra out. Mickey Mantle won X more games (less than 6, clearly) through striking out with men on base *than a league average player would have, because that player would have hit into more double plays instead of striking out.* Conversely, Jim Rice won X fewer games by not striking out with men on base *than a league average player would have, because that player would have hit into fewer double plays through striking out more.* The best thing to do is to get on base, of course. But *if* you are going to make an out with a runner on first and less than 2 outs, *then* it is preferable to have a strikeout rather than a double play. Is that clear?
   452. TomH Posted: September 21, 2007 at 07:46 PM (#2535834)
DanR, thanks for qualifying your use of Net GIDP.

we measure GDIP - because they are there. But we DON'T measure possible (small) benefit of non-KO outs. Previous studies have, I believe, shown non-strikeout, non-DP outs to be worth .01 runs more than KOs.
   453. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 07:51 PM (#2535856)
My WARP system most definitely *does* account for that factor, TomH. The absolute values float by league-season but K always subtract 9% more runs than non-K outs in my run estimator (typically it's about -0.1 runs for a fielded out and -.109 for a K).
   454. DavidFoss Posted: September 21, 2007 at 07:54 PM (#2535866)
So, lemme get this straight. Mickey Mantle won 6+ games simply by striking out with men on base?

I think the list says "DP Avoidance". Any such list should really clarify what "zero" means. Dan, what does it mean? How much of it (if any) is already included in a general statement of how good a hitter is (like OBP or OBP+). Joe Morgan is at the top of the list and he didn't strikeout that much, so there are other ways to make the list as well (helps to be a lefty with a runner on first, etc)

As for the sort of reasoning that says strikeouts are "good", then it might help to look at plate-appearance outcome coefficients for a linear formula. Here they are for Extrapolated Runs:

AB - H - K : -0.090
K : -0.098
SF : 0.37
SH : 0.04
GIDP : -0.37 (in *addition* to AB-H-K)

Here, the implication is that the order of outs from best to worst is:

SF > SH > (AB - H - K - GIDP) > K > GIDP

So, indeed a K is slightly worse that a normal batted non-GIDP out, but its a *ton* better than a GIDP.
   455. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 08:17 PM (#2535927)
Zero is hitting into GDP's at the league average rate (typically around 12.5% of opportunities). If you hit into them in 10% of your opportunities, you have negative NetDP, which means you have positive DP avoidance runs and wins (Morgan, Mantle etc.). If you hit into them in 15% of your opportunities, you have positive NetDP which means you have negative DP avoidance runs and wins (Lombardi, Torre etc.). Having a high OBP most definitely helps you to accumulate negative NetDP/positive DP avoidance runs/wins, as does having a high percentage of your outs come via the strikeout, but the biggest factors are probably speed, handedness, and GB/FB ratio.

I use a slightly tweaked version of Extrapolated Runs for my WARP. I do not count sac hits, since they count against a player (adding .04 runs to his XR, but taking away an out from his teammates which is worth anywhere from .13 to .29 runs depending on the year) and are usually called from the bench. I do not count absolute double plays with the -.37 coefficient either. I let the AB-H absorb those events, and set the value of an AB-H for each league-season so that Extrapolated Runs equals actual runs for the entire league (holding the value of a K at 9% worse than the value of a fielded out). Then for each individual player-season, I subtract his NetDP from his XR (at 0.37 runs per NetDP, so -10 NetDP will increase XR by 3.7 runs) and from his outs created (so -10 NetDP will reduce outs by 10, freeing up 10 outs for the player's teammates, which again are valued at anywhere from .13 to .29 runs depending on the season).
   456. Jim Sp Posted: September 21, 2007 at 08:37 PM (#2535969)
Don't forget that Mickey Mantle was really really really fast. He beat the throw to first. At least when he was young.
   457. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 08:45 PM (#2535999)
Sure, and no doubt his speed helped in his youth. But he was avoiding double plays at an above-average rate even in 1967 and 68.
   458. OCF Posted: September 21, 2007 at 09:54 PM (#2536104)
Mantle was four things: (1) a switch hitter, hence batting left the majority of the time, (2) an extreme fly ball hitter - he hit nearly twice as many HR as 2B over his career, which is surely a sign of an extreme fly ball hitter, (3) very fast, and (4) somewhat strikeout prone. All of those factors matter - this might be the priority order. Morgan was left-handed and very fast. In fact, let's look at the list from #448:

Morgan: L, fast
Mantle: B, fast, flyball
Mathews: L, fast, flyball
Evans: L, more HR than 2B, but not as extreme as Mantle and Mathews
Davis: L, fast
Bonds: L, fast, flyball
Gibson: L, fast
Ashburn: L, fast, slap hitter
Rivers: L, fast
Griffey: L
Jackson: L, fyball/K
Pinson: L, fast
Whitaker: L
Strawberry: L, flyball/K
Finley: L, fast
Carter: R (!), flyball/K
Monday: L, fast
Ott: L, flyball
Butler: L, fast, extreme slap hitter
Campaneris: R, fast

- so they were all leftys except the switch-hitting Mantle and Carter and Campaneris. Carter may be the most surprising name on the list. Now, the flip side list, with "LD" standing for someone with the reputation of a "line drive" hitter, one who always hit the ball hard but not that often elevated.

Lombardi: R, LD, very slow (catcher)
Torre: R, LD, slow (catcher)
Scott: R, LD?, slow
Franco: R, LD
Pena: R (catcher)
Groat: R
Concepcion: R
Adair: R
Cater: R
Simmons: B, LD (catcher)
Adcock: R, slow?
Carty: R, LD
Rice: R
Castilla: R
Zeile: R, part-catcher
Singleton: B
Piniella: R

(And I could probably have hung the "LD" designation on a lot more of them.) So this time, they're all right handed except for Simmons and Singleton.

Handedness looks like an awfully important part of the mix.
   459. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 10:23 PM (#2536120)
Don't forget Carter's speed.
   460. OCF Posted: September 21, 2007 at 10:45 PM (#2536135)
Mathews wasn't fast, I don't think - that's a typo in #458. Being a lefty flyball/K hitter was enough. And "fast" (as in, perhaps, Carter) isn't always synonymous with "Willie Davis."

With Lombardi, the issue for the DP would be whether the infielder could get to 2B in time to get the lead runner; if that, then they'd have all the time in the world for the relay to first.
   461. TomH Posted: September 21, 2007 at 11:26 PM (#2536226)
DanR, thansks again for clarifying your use of specific KO values. I'm sure you've said it at least twice before, but over tiem we all mis-remember :)

re: Butler vs Kirby, OPS+ and OBP-heaviness, somethign I've written about before and will try to say concisely here:

remember that as a leadoff man, Butler batted more often, but Kirby batted with more men on (more important per at-bat).

Butler gets credit for his extra PAs by most systems that use PAs as a metric of playing time. But he does not LOSE credit, relative to Puckett, as he should.

Many have posited that the "right" mix of OBP and SLG is something like 1.7*OBP+SLG. Well, for leadoff hitters, it's closer to 2.2*OBP+SLG. And not because the OBP is worth much more! - rather, slugging iw worht less when there are few men on, obviously. Leadoff hitters avg batting with about .45 runners on base per appearance; for cleanup men, it's about .75. Most other lineup spots are .65 to .70.

If OWP, RC, EqA, whatever you use show the Butlers or Raineses of the world to be even with Pucketts and Clementes, I'll go with the middle-order guys. Assuming the OBP men really DID bat #1 much of their carrers. The effect for #2 hitters is smaller.
   462. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 21, 2007 at 11:41 PM (#2536280)
But TomH, OBP in the leadoff spot *is* worth more than it is in any other lineup position....
   463. Paul Wendt Posted: September 22, 2007 at 01:31 AM (#2536768)
Having a high OBP most definitely helps you to accumulate negative NetDP/positive DP avoidance runs/wins, as does having a high percentage of your outs come via the strikeout, but the biggest factors are probably speed, handedness, and GB/FB ratio.

and batting first, eh?
   464. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 22, 2007 at 01:48 AM (#2536824)
No, Paul, batting first actually makes it harder to accumulate very high negative NetDP/positive DP avoidance runs/wins. I use NetDP--double plays created or avoided above the league average in the player's opportunities. So batting position (and, therefore, DP opportunities) has absolutely no effect on the *rate* at which a player accumulates NetDP. It does, however, mean he has fewer DP opportunities, thus reducing the overall magnitude of his NetDP score (either positive or negative). Batting first makes it harder to have either very high or very low NetDP, whereas batting behind a high-OBP guy (esp. with low XBH) makes it easier to have very high or very low NetDP.
   465. Dandy Little Glove Man Posted: September 22, 2007 at 01:57 AM (#2536860)
Many have posited that the "right" mix of OBP and SLG is something like 1.7*OBP+SLG. Well, for leadoff hitters, it's closer to 2.2*OBP+SLG. And not because the OBP is worth much more! - rather, slugging iw worht less when there are few men on, obviously.

Certainly slugging is worth less for a leadoff hitter. Leadoff hitters bat with runners on base in just 34% of their plate appearances, while every other lineup spot bats with runner on base at least 43% of the time.

However, OBP is also worth much more for leadoff hitters. They bat with no outs in 47% of their plate appearances, while no other position in the batting order hits in that situation more than 35% of the time. This is a massive difference. In 2006, getting a runner on first base with no outs was worth .39 runs, compared to .27 runs with 1 out and .13 runs with 2 outs. The cost of an out with no outs was .24 runs, compared to .19 runs with 1 out and .11 runs with 2 outs.

Thus, here was the value of reaching first base versus making an out in all 3 situations:

0 out = .63 runs
1 out = .46 runs
2 out = .24 runs

Since a leadoff hitter bats with no outs in such a higher percentage of his plate appearances than all other batters, OBP is worth much more for him. This was one of my main points in post 438.
   466. TomH Posted: September 22, 2007 at 02:39 AM (#2537014)
Yes, OBP is worth more when you are the leadoff man.
But not nearly as MUCH as SLG is worth less!

Try this:
1 get yourself a lineup simulator.
2 set up some typical lineup, see how many runs it socres over 1,000,000 games or so.
3 add a bonus to the leadoff man: turn 25 outs (per 600 PA) into 10 walks, 10 singles, 5 home run. This should increase his OBP by .042, and SLG by about .055.
4 re-run the simulator
5 then reset the leadoff man back to original, and add the same bonus to the cleanup man
6 re-run the simulator
7 the leadoff man will get about 8% more PAs then the 4th hitter, so the total team stats wil look better. But because the leadoff man's PAs are about 17% less valuable per PA than the 4th hitter, the team will score fewer runs.
   467. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 22, 2007 at 02:44 AM (#2537021)
Where does one acquire a lineup simulator?
   468. TomH Posted: September 22, 2007 at 03:10 AM (#2537067)
email han60man@aol.com and I'll send you one. Not sure how great it is; it's called star simulator v1.2. I got it, oh, maybe 6 years ago. Probably other more sophisticated ones on the market.
   469. TomH Posted: September 22, 2007 at 03:16 AM (#2537079)
lineup sim test run #1
500,000 games each
................ runs/162g AB . hits . dbl .trp HR walk OBP SLG
standard guys 710.26 5424 1409 296 33 134 623 .337 .401
beter leadoff. 736.53 5444 1438 297 33 142 640 .342 .409
better cleanup 737.25 5445 1437 297 33 141 640 .342 .408

okay, I need bigger differences to make effects noticeable. take two later on...
   470. TomH Posted: September 22, 2007 at 03:41 AM (#2537161)
take 2

the names are from an Angels lineup some year; I made 'typical' batters for all spots except 1 and 4. Both 1 and 4 men are poor for their pos, altho of the right 'shape'. Batter #9 is a compromise between NL pitcher and AL hitter. Call him Neifi :) The lineup scored 859.3 runs per 162 games, over 1 million games. I turned stealing off.

Name AVG OBP SLG . AB . H . 2B 3BHRBB R RBI
Phillips .240 .317 .304 662 159 34 4 .0 73 79 41
Erstad. .265 .352 .392 630 167 33 4 13 84 89 63
Edmon .280 .351 .464 626 175 32 4 25 69 88 94
Salmo. .252 .307 .450 630 159 44 4 24 49 82 97
Hollins .270 .343 .424 599 162 31 4 18 66 79 79
Anders .260 .334 .384 584 152 30 3 12 64 68 65
Alicea. .250 .325 .374 568 142 29 3 11 62 63 66
Kreute .250 .325 .374 552 138 29 3 11 61 60 65
DiSarc .214 .272 .305 551 118 28 3 . 5 43 52 52
team.. .254 .326 .386 .......................... 659 622
avg runs = 659.3
   471. TomH Posted: September 22, 2007 at 03:46 AM (#2537182)
now I painted a red S on the leadoff man's chest. His stats go thru the roof, team OPS goes up by 51, runs scored up to 757 per 162g.

Name AVG OBP SLG . AB . H 2B 3BHR BB . R RBI
Phillip .363 .477 .576 614 223 42 4 27 135 135 88
Erstad .265 .351 .392 645 171 34 4 13 .86 . 91 75
Edmo .280 .350 .464 640 179 33 4 26 . 70 . 91 109
Salmn .253 .307 .450 644 163 45 4 25 .50 . 84 106
Hollins .270 .343 .424 612 165 32 4 18 .67 . 81 83
Anders .260 .334 .385 597 155 31 4 12 .66 . 71 68
Alicea. .250 .325 .374 581 145 30 3 12 .64 . 69 67
Kreute .250 .325 .374 565 141 29 3 11 .62 . 71 67
DiSarci .214 .273 .305 565 121 28 3 .5 .44 . 65 53
team.. .268 .345 .418 ............................. 757 716
STAR = 757.0
   472. TomH Posted: September 22, 2007 at 04:14 AM (#2537252)
now with cleanup man being devatstating. I raised his rates the exact same amount as the leadoff guy previously

team stats not as good (leadoff man gets more ABs), but a few more runs scored. team is more 'efficient'

Name AVG OBP SLG .AB . H . 2B 3BHR BB R RBI
Phillips .240 .317 .304 676 162 35 4 .0 .74 88 43
Erstad .265 .352 .392 644 171 34 4 13 .85 103 65
Edmon .280 .351 .464 639 179 33 4 25 .70 106 96
Salmo .373 .466 .729 587 219 51 4 50 106 130 150
Hollins .270 .342 .424 612 165 32 4 18 .67 79 93
Ander .260 .334 .385 597 155 31 4 12 . 66 70 78
Alicea .250 .325 .374 582 145 30 4 12 . 64 65 72
Kreute .250 .325 .374 566 141 29 3 11 .62 62 68
DiSarc .214 .272 .305 566 121 29 3 .5 . 44 54 54
team . .267 .344 .416 ..................... 758 717
STAR = 757.7
   473. TomH Posted: September 22, 2007 at 04:17 AM (#2537257)
I raised the cleanup batter even more, to get team stats even. Runs scored went up to 766.
   474. Brent Posted: September 22, 2007 at 04:33 AM (#2537291)
Tom,

I think you're missing Dandy Little Glove Man's point. It's not surprising that you'll get more bang for the buck from adding general offense (both OBP + SLG) to the # 4 position than from adding the same offense to the # 1 position. His point (I think) is that if the offense is specifically in the form of leadoff skills--i.e., drawing walks (and presumably, also speed), you may get more bang for the buck by having it go to the leadoff spot. I suggest re-trying your simulations adding 60 walks to the leadoff spot and compare it to adding 60 walks to the # 4 spot.
   475. Brent Posted: September 22, 2007 at 04:38 AM (#2537308)
I use NetDP--double plays created or avoided above the league average in the player's opportunities.

You're showing numbers for a few pre-retrosheet players (Lombardi, post-1933 Mel Ott, etc.) What's your source for DP opportunities?
   476. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 22, 2007 at 04:40 AM (#2537313)
Brent, did my explanation of why I have Butler as 18% more valuable than Puckett add up for you? Any further questions/comments?
   477. Dandy Little Glove Man Posted: September 22, 2007 at 04:41 AM (#2537316)
I still don't really understand your argument. Doesn't that just show that the cleanup spot is more important than the leadoff spot? From my thinking, OBP is most valuable for the leadoff hitter and SLG is least valuable compared to the rest of the lineup. The cleanup hitter's SLG is most valuable and OBP is second or third most valuable. Therefore, it's not surprising that significantly adding to both aspects will benefit the team more in the cleanup position. I don't see how that tells us that a decrease in SLG is a much larger component of the higher optimal ratio for OBP/SLG at the leadoff position than an increase in OBP. Is there something I'm missing?
   478. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 22, 2007 at 04:44 AM (#2537319)
I did a multiple regression on NetDP, which I use to estimate NetDP from the player's other stats. It gets about 70% r-squared. After seeing this handedness stuff on NetDP, I could probably further improve the r-squared by including handedness in the regression, but I don't think such a minor improvement would be worth recalculating and reposting WARP all over again.
   479. TomH Posted: September 22, 2007 at 05:01 AM (#2537365)
I think my argument is this:
for leadoff hitters, OBP may be 3% (pick a number from the air) more valuable per AB than for 4th or 5th batters. And when you factor in they get more ABs, it might be 10% more. But their SLG is maybe 15% less valuable.

Overall, if you compare a Butler with a Puckett, IF you give Butler credit for the extra PAs he got, you have to dock him for the less-valuableness of those PAs, all relative to Puckett. WARP (to use one system as an example) gives Butler credit for the PAs. It does not assess that his PAs were less valuable. Which they were. Hence it overrates Butler, relative to Puckett.
   480. Brent Posted: September 22, 2007 at 05:17 AM (#2537374)
Brent, did my explanation of why I have Butler as 18% more valuable than Puckett add up for you? Any further questions/comments?

The part about double plays makes sense. With respect to the run estimator, I'm still working through that part. I've long been aware that OPS overweights OBP and underweights SLG, but based on other material I've read and examples I've worked out, it's always seemed that the bias in OPS wasn't that large. I may get back to you later with further comments or questions. And on defense, I'm generally skeptical of sabermetric methods for evaluating defense. Although Gold Glove awards and opinions of knowledgeable observers aren't perfect, I tend to trust them more than purely statistical methods for rating defense. We'll just have to agree to disagree regarding Puckett's defense.
   481. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 22, 2007 at 05:37 AM (#2537377)
Brent, the bias in OPS+ usually isn't that large. But Butler is an extreme player. There are two separate phenomena here. The first is that OPS+ systematically underweights OBP at all OBP/SLG ratios. But the second is that it breaks down at the extremes. See, when OPS+ "sees" an SLG, it assumes that that SLG is comprised of a league-average mix of BA and ISO. The further a player moves from that league-average mix, the bigger OPS+'s error will be in assessing him. Butler is about as far from average as you can get in that regard--his career SLG was 77% BA, 23% ISO, while Puckett's was a much more typical 67% BA, 33% ISO (and at the other extreme, Rob Deer was 50/50). So OPS+ grossly underrates Butler because its SLG half is assuming he's a .250 hitter with average power, instead of a .290 hitter with no power, and the latter is much more valuable than the former. So for players with an average proportion of BA to ISO, OPS+ does underrate OBP, but not by a terribly large amount. For players with an extremely high or extremely low proportion of BA to ISO, it can be wildly off. Again, John McGraw hopes you're listening.
   482. OCF Posted: September 22, 2007 at 06:26 AM (#2537389)
But the second is that it breaks down at the extremes. ... Again, John McGraw hopes you're listening.

But McGraw still has under a thousand games played ...

The right thing to say is that Willie Randolph and Ozzie Smith appreciate the support they have already received. Granted, neither needed that offensive evaluation in quite the same way that Butler does, but both were hitters of extreme shape.
   483. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 22, 2007 at 03:28 PM (#2537518)
Yes, but even with his short career and durability problems, McGraw should get much more of a shout from peak voters than he does. I suspect they don't pay attention in part because he gets gypped by OPS+--his hitting (ignoring his 73 steals for the moment) in 1899 was worth as much in its context as a 182 OPS+ with a normal shape in a normal run environment (in fact it was 168). Factoring in the basestealing it was worth as much as a normally shaped 189 OPS+ with no steals in a normal run environment. (The other reason is probably that people don't distinguish between 1890s 3B and modern 3B).

Ozzie was elected for his defense. Randolph, you may have a point.

OCF, do me a favor and (re)post your offensive analysis of Puckett in his thread? Thanks.
   484. OCF Posted: September 22, 2007 at 04:19 PM (#2537535)
Just look at post #33 on the Puckett thread.

No argument that McGraw was an offensive monster - when he played. The Stats Handbook RC numbers show that clearly. As for Ozzie - yes, you can say that we elected him for his defense, but had he been a poor offensive player that wouldn't have been enough. That he was the best offensive SS in the NL for several years in the mid-80's was part of his case.
   485. 'zop sympathizes with the wrong ####### people Posted: September 22, 2007 at 04:36 PM (#2537551)
But McGraw still has under a thousand games played ...

How does that matter? Even if you're a career voter, you should be voting on the total value accrued by a player over the course of his career. If McGraw could be more valuble in <1000 games than Nellie Fox was in 1,000,000 games, who cares if he was a short-career player?
   486. Paul Wendt Posted: September 22, 2007 at 05:40 PM (#2537607)
Brent, the bias in OPS+ usually isn't that large. But Butler is an extreme player. There are two separate phenomena here. The first is that OPS+ systematically underweights OBP at all OBP/SLG ratios. But the second is that it breaks down at the extremes.

Mike Hargrove is a different extreme player.
.396 .391
not speedy (career 24 sb, 37 cs)
usually batting third rather than first (career roughly 20% first, 50% third, 15% fifth, 20% other).

For the retrosheet era, visit the career batting splits for a count of batting positions in the starting lineup.
career batting splits, Mike Hargrove
   487. KJOK Posted: September 23, 2007 at 06:40 AM (#2538388)
But McGraw still has under a thousand games played ...


False. McGraw had 1099 games played, and if you adjust for the shorter 19th century seasons, he should come out around 1,200 I believe.

5,000 Plate Appearances should be a large enough sample size to demonstrate a player's ability, and McGraw's was well above the minimum HOM ability line.
   488. Paul Wendt Posted: September 23, 2007 at 02:24 PM (#2538457)
5,000 Plate Appearances should be <u>a large enough sample size to demonstrate a player's ability</u>, and McGraw's was well above the minimum HOM ability line.

For every student of HOM thought, this belongs on the list of peak concepts.
   489. Paul Wendt Posted: September 23, 2007 at 02:25 PM (#2538460)
concepts of peak --not to say KJ was at his peak when he wrote it
   490. Brent Posted: October 27, 2007 at 04:22 AM (#2595539)
Dan,

I’m working my way through your WARP methodology to try to understand it. I’d like to ask a few questions to get a bit more detail on how the calculations work. A worked example showing the calculations would be great—I’ll suggest using John McGraw’s 1899 season, which was discussed recently on the McGraw thread.

1. For BWAA, you say you’re using BaseRuns during 1893-1946. On the Web, I’ve seen three or four slightly different versions of the BaseRuns formula. Could you please specify which one you’re using. Also, did you make any modifications to the formula—for example, to adjust for any league-level discrepancies? Again, I’d find an example walking step by step through the calculations from BaseRuns to BWAA to be very helpful.

2. I’m pretty unclear about how BRWAA is calculated, especially during this era when data don’t exist on CS. You also say that you “did a regression on James Click’s non-SB baserunning runs...to estimate non-SB baserunning runs for all seasons when they are not available.” What variables do you have in the regression? Do you generally find that the variations in BRWAA mostly come from SB or from the non-SB regression?

3. For fielding in this period, I understand that you’re using BP FRAA and WS-FRAA after standardizing. Is the effect of the standardization to make the standard deviation of each position for each season constant over time (and equal to the standard deviation of Dial’s Zone Rating) for that position? Don’t the standard deviations vary quite a bit from year to year? Are you using the raw SDs for each position/year combination, or are you using a regression / smoothing technique like you describe later for the league adjustments? Again, it would be helpful to see a numerical example, if you don’t mind.

Thanks in advance for your response, as well as for your past explanations of the methodology. Once I really understand the methodology, I may be able to provide you some feedback.
   491. David Concepcion de la Desviacion Estandar (Dan R) Posted: October 27, 2007 at 06:45 PM (#2595829)
Brent, thanks very much for your interest and your pertinent, excellent questions.

The BaseRuns equation is ((A*B)/(B+C)) + D. A is H + BB +HBP - HR + ROE. B is X*((1.4*(TB+ROE) - (0.6*H+ROE) - (3*HR) + (.1*(BB+HBP)) - (.9*(SB-CS))), where X is allowed to float so that league BaseRuns equal actual league runs scored for the year in question. C is AB-H-ROE+CS, and D is HR.

In the modern game, the X number is usually very close to 1. For 1916 to the present, X is below 1.10, and so I simply ignore ROE. For 1901-1916, I estimate league ROE by fixing X at 1.10 and adding in enough ROE so that league BaseRuns equal actual league runs scored. I then divide that by league AB-H to get the estimated league ROE rate, and multiply each player's AB-H by that number to get their estimated ROE. For 1893-1900, I do the same thing but let X increase gradually from 1.10 in 1901 to 1.20 in 1893, to reflect the higher likelihood of scoring after reaching base due to all the advancement errors.

To estimate Caught Stealing, I use the following equation: SB*(.36-(.476*((SB/(H-HR+BB+HBP-3B+ROE))^1.041))) + ((H-HR+BB+HBP-3B+ROE)/100). That gives a reliable estimate of modern CS--obviously, I am underestimating CS in the pre-liveball era. However, since I use the same equation to estimate CS for all players and for the league as a whole, the comparison *between* players is still fair, which is what matters for calculating WARP--their relative value.

To estimate EqBR, I use a fairly ghastly equation: ((.0159*BA) - (.0103*(BB+HBP)/PA) + (.0035795*ln((SB/(H-HR-3B+BB+HBP+ROE))+.01)) + (.0017585*ln(3B/(H-HR))+.01) + (.0631858*(SFrac*SeasonLength/G)^2) - (.11269*SFrac*SeasonLength/G) + Positional Constant + League Constant +.0633)*(H-HR+BB+HBP+ROE). SFrac is the player's PA times 9 times the number of teams in the league divided by the league PA. If SFrac*SeasonLength/G is below 0.75, I use 0.75, and if it is greater than 1.1, I use 1.1. The positional constants are -.0026 for catcher, -.0023 for first base, .0021 for second base, .0002 for third base, .0030 for shortstop, .0010 for left field and right field, .0040 for center field, and -.0026 for DH. The league constant is a number added in to make EqBR for the whole league add up to 0. I haven't counted whether variance in BRWAA is attributable more to SB or to EqBR for post-1972. It is certainly much more heavily SB for pre-1972, because the range of values produced by the EqBR regression equation is quite narrow, generally between -3 and 3 runs per full season.

To standardize BP FRAA and WS-FRAA, I calculate the standard deviation of Dial Zone Rating for each position from 1987 to 1999, compare that to the standard deviation of BP FRAA and WS-FRAA for the same time period, and take the ratio of those two stdevs, which I use as the flat multiplier for all league-seasons. If the stdev of BP FRAA or WS-FRAA is higher or lower in any given year than it was for the 1987-1999 period, that will be reflected in FWAA.

OK, McGraw 1899. He had 537 non-SH plate appearances (I don't count SH), 399 AB, 156 H, 178 total bases, 138 BB+HBP, and 73 steals. The equations above give him 21 CS (a 77.7% success rate, compared to the estimated league average of 73.0%), 0.7 non-SB baserunning runs, and 9 ROE. A league average team had 3,833 AB - H + estimated CS in 1899, and McGraw had 264, so if you put McGraw on an otherwise league average team, his teammates would have 3833-264 = 3569 batting outs to play with. In 3,569 batting outs, a league average offense in 1899 would have 4866 AB, 1374 H, 27 HR, 1781 TB, 453 BB+HBP, 207 SB, 76 estimated CS, and 124 estimated ROE. Adding those totals to McGraw's, we get the Average Team Plus McGraw with 5265 AB, 1530 H, 28 HR, 1959 TB, 591 BB+HBP, 280 SB, 98 estimated CS, and 133 estimated ROE. So that's an A of 2225, a B of 2312, a C of 3700, and a D of 28, with an X constant of 1.12 for 1899, which yields 885 runs scored for the Average Team Plus McGraw. McGraw's park factor was 105, and he had 9.2% of the Average Team Plus McGraw's plate appearances, so that gives the Average Team Plus McGraw a weighted park factor of 100.46, reducing its runs scored to 881.

McGraw had 16 BP FRAA, which are worth 1/9 of a win, so that translates to 19 FRAA in the 1899 run environment. The standard deviation of 3B FRAA in Dial Zone Rating for 1987-99 was 90% of the standard deviation of 3B FRAA in BP for 1987-99, so we multiply the 19 by 0.9 and get 17 adjusted BP FRAA. McGraw had 7.7 Fielding Win Shares in 117 games, and an average 1899 NL 3B would have had 4.2 Fielding Win Shares in 117 games, so McGraw is 7.7-4.2 = 3.5 Fielding Win Shares above average /3 = 1.17 wins above average, which is 12 runs in the 1899 run environment. The standard deviation of 3B FRAA in Dial Zone Rating for 1987-99 was 2.3 times as big as that of these WS-FRAA for 1987-99, so we multiply the 12 by 2.3 to get 28 adjusted WS-FRAA. The average of 17 adjusted BP FRAA and 28 adjusted WS-FRAA is 22.5, so McGraw was 22.5 adjusted FRAA in 1899. A league average team scored 804 runs in 1899, so the Average Team Plus McGraw would have allowed 782 runs.

Stick 881 runs scored and 782 runs allowed in 154 games into PythagenPat, and you get a winning percentage of .559, or 90.5 wins after straight-line adjusting to a 162-game season. The regression-projected standard deviation for 1899 is 3.04 wins per player per 162 games, while for 2005 it was 2.74 wins per player per 162 games, so we multiply McGraw's 90.5-81 = 9.5 wins above average by 2.74/3.04 and get him with 8.5 wins above average after correcting for standard deviation.

The worst 3/8 of MLB starting third basemen averaged 2.3 standard deviation-adjusted wins below overall league average from 1895 to 1903. The worst 3/8 of MLB starting third basemen averaged 1.3 standard deviation-adjusted wins below average per 162 games between 1985 and 2005, while third basemen over age 27 making less than twice the league minimum salary averaged 1.4 standard deviation-adjusted wins below average per 162 games over the same period, so the gap between the worst-regulars average and the Freely Available Talent level is 0.1 win. Thus, our initial estimate of the replacement level for 3B in 1899 is 2.3 - 0.1 = 2.4 wins below average per 162 games. However, repeating this process at every position, we get an overall replacement level (averaged across all the positions) of 1.9 wins below average per 162 games for 1899, which is lower than the 1985-2005 FAT average of 1.5 wins below average per 162 games. So we multiply 2.4 by 1.5/1.9 and get a replacement level of 1.9 wins below average per 162 games for 3B in 1899.

McGraw had 537 non-SH PA in 1899, while an average player playing every game on an average team had 635 non-SH PA in 1899, so McGraw had 85% of an average player's plate appearances. Multiply 1.9 wins by .85 gives a 1.6 win difference between average and replacement for 3B in 1899 in McGraw's playing time. 8.5 wins above average, plus 1.6 wins for the difference between average and replacement, is 10.1 WARP2. (He actually has 10.2; there's some rounding going on somewhere along the line).

Well, that's about as exhaustive a description as I can provide. I hope you find this helpful.

Dan
   492. Brent Posted: October 27, 2007 at 06:53 PM (#2595839)
Thank you very much Dan! That's as thorough an explanation as I could ask for!

I'll need to take some time to go through all of this helpful information and will get back to you with comments or follow-up questions.
   493. David Concepcion de la Desviacion Estandar (Dan R) Posted: November 06, 2007 at 09:43 PM (#2607017)
Quick question for people who are familiar with my WARP. In my view, there are certain pairs of players who encapsulate the flaws of WS and BP WARP beyond dispute. Just looking at career totals, WARP3 has Red Schoendienst over Dick Allen, while WS has Rusty Staub over Arky Vaughan. These are results that fall so far short of passing any conceivable laugh test that they should make any thoughtful voter pause before putting too much faith in them.

My question is, are there any pairs of players in my system that seem equally outlandish? Maybe Campaneris over McCovey? Except I actually don't think that's a mistake...I imagine if you looked at RCAP or something and adjusted it for defense and baserunning, you'd probably get a similar finding there.

Anyways, I hope to get some responses here...I'm looking for stuff that goes far beyond voter preference stuff like peak vs. prime vs. career or what replacement level you use; examples that are like there-is-no-way-on-God's-green-earth-that-player-X-could-have-been-better-than-player-Y.
   494. Dizzypaco Posted: November 06, 2007 at 09:50 PM (#2607025)
Maybe Campaneris over McCovey?

That's one...
   495. David Concepcion de la Desviacion Estandar (Dan R) Posted: November 06, 2007 at 10:03 PM (#2607043)
Could anyone post the career RCAP for Campaneris and McCovey, ignoring Campaneris's '78 and '81 and McCovey's '72, '76, '78, and '80? I think KJOK has those numbers....then just add 27 to Campaneris and subtract 25 from McCovey for non-SB baserunning, and add 43 to Campaneris and subtract 60 from McCovey for fielding. They should come out pretty close to even...unless there are playing time issues.
   496. TomH Posted: November 06, 2007 at 10:31 PM (#2607077)
Mel Ott vs Frank Robby
same pos, about equal on defense, career length not much diff (small edge Robinson)

Ott was a slighlty better hitter.
Robinson played 30 years later; after integration; more than 1/2 his career in the tougher league (except 66-70ish). When fewer players put up dominating stats like many OFers did in the 30s. When there wasn't a milion players missing for WWII.

So this is a test of the 'league qual' adjustment.

BB-ref has them almost dead equal in career batting wins and OWP. I think the league strength makes it clear that Robinson was better. While Ott is greatly underrated historically, off the cuff I canNOT see how he'd be about $30 million ahead in DanR's salary system.
   497. DCW3 Posted: November 06, 2007 at 11:14 PM (#2607143)
Could anyone post the career RCAP for Campaneris and McCovey, ignoring Campaneris's '78 and '81 and McCovey's '72, '76, '78, and '80? I think KJOK has those numbers....then just add 27 to Campaneris and subtract 25 from McCovey for non-SB baserunning, and add 43 to Campaneris and subtract 60 from McCovey for fielding.

Excluding the stated seasons, I have McCovey with a career RCAP of 416 and Campaneris at 135. Using those defense and baserunning numbers gives McCovey a score of +331 and Campaneris one of +205...still a pretty significant edge for McCovey.
   498. David Concepcion de la Desviacion Estandar (Dan R) Posted: November 06, 2007 at 11:23 PM (#2607165)
Interesting comparison, Tom.

One factor you didn't mention is that that Robinson spent a good chunk of time at 1B and DH, while Ott played one year in center and another at third. That's a four-win difference. Another edge for Ott is that his GDP totals were extremely low for his era, while Robinson's were slightly above average--I have that as 5.5 wins.

The "fewer players put up dominating stats" issue is a standard deviation question, which is most definitely accounted for. It's worth three wins. (Robinson's Orioles years stand out more because the AL was so weak then relative to the NL).

On career value I have 110.5 WARP2 for Ott after war deductions, 106.1 for Robinson, so a rather piddling 4% difference. The salary estimator sees a bigger difference between them (10%) because a) Ott was slightly more durable and b) he packed more of his value into fewer seasons. Ott's top 15 years account for 95% of his value, while the top 15 years represent just 86% of F-Rob's. Ott's prime was a bit higher, Robinson's a bit longer, and the estimator prefers the former.

My system makes no attempt to measure league quality. Subjectively, I'd probably rank Robinson ahead of Ott precisely because Robinson wouldn't have been in the majors if he had been born when Ott was.
   499. David Concepcion de la Desviacion Estandar (Dan R) Posted: November 06, 2007 at 11:29 PM (#2607177)
Wow! That's very surprising. Are RCAP park-adjusted? Do they include SB/CS? Do they include double play avoidance? All three are major benefits to Campaneris which I assumed were included. Also, by subtracting 4 years from McCovey and 2 from Campaneris, I'm giving Campaneris more below-average time than McCovey. DCW3, how would it look if you just did above-average seasons, ignoring all below-average ones?
   500. DavidFoss Posted: November 06, 2007 at 11:36 PM (#2607191)
My system makes no attempt to measure league quality.

Well, there's the stdev factor. I realize WWII's presence immediately before integration makes a direct comparison tricky, but do you see the MLB stdev's dropping consistently from 1947-1960 as minority stars displaced white replacement players? And how much did the 1961-62 expansions "correct" things?

(Apologies if this has been brought up before, or is accessible easily from data you've posted.)
Page 5 of 8 pages  < 1 2 3 4 5 6 7 8 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Edmundo got dem ol' Kozma blues again mama
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 1.1909 seconds
49 querie(s) executed