Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Hall of Merit > Discussion
Hall of Merit
— A Look at Baseball's All-Time Best

Monday, February 05, 2007

Dan Rosenheck’s WARP Data

WARP Methodology and Results

Thanks, Dan!

EDIT: Link updated 2/23/2009

John (You Can Call Me Grandma) Murphy Posted: February 05, 2007 at 08:59 PM | 763 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 4 of 8 pages  < 1 2 3 4 5 6 7 8 > 
   301. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 24, 2007 at 01:14 AM (#2452240)
That said, replacement level for 3B in the 70s and 80s was still lower than it was in the early 50s or late 60s...I think it would be very wrong to penalize guys like Nettles for playing in the same league as Schmidt, Cey etc.--this is why I don't support using measures like RCAP. 3B in those days was like SS in the late 90s--a high-stdev, "feast or famine" position. While some teams had their Buddy Bells and George Bretts, others were stuck with their Ken Reitzes and Len Randles (1977 aside). It seems to me the basic conclusion to draw is that the greatest overall athletes were CF in the 50s, 3B in the 70s, and SS in the 90s. These players are on the extreme end of the distribution, and don't say anything about the overall depth of the position past the handful of superstars.
   302. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 24, 2007 at 10:58 PM (#2453573)
Some of the BWAA, BRWAA, FWAA, and Rep weren't properly adding up to WARP in the spreadsheet I posted online for some reason, so I redid the calculation and reposted. The overall WARP1, WARP2, and salaries haven't changed, but some of the component BWAA, BRWAA, FWAA, and Rep have. If you found those useful, please download the new version from the Yahoo group (although in 99% of the cases the numbers should be the same).
   303. Jim Sp Posted: July 25, 2007 at 11:01 PM (#2455216)
Regarding the earlier discussion of Frank Howard as a replacement level fielder:

warp = bwaa + brwaa + fwaa - rep

Since fwaa and rep depend on position, while bwaa and brwaa do not:

Let defensive wins above replacement = fwaa - rep = dwar

Let offensive wins above average = bwaa + brwaa = owaa

Then warp = owaa + dwar

So one way of interpreting warp is that offense is measured against average and defense in level above Frank Howard's outfield defense.

Since indeed for Frank Howard fwaa - rep is about 0 while he was in the outfield (he had a late career season at first that was even worse).

The only players with careers worse than -2.5 dwar2 are Willie McCovey and Dick Stuart, at -5.2 and -5.1. Other notables are Killebrew at -0.6 dwar2 and Luzinski at -0.8 dwar2.

At the other end are Ozzie Smith at 84.9, Ripken at 84.0, followed by Wagner, Maranville, and Concepcion.

Digging a little deeper for a sanity check, to see if Frank Howard's defense is the right replacement level...well, Ed Kranepool sustained a career with the same level defense and very slightly above league average offense. Aurelio Rodriguez did the same for longer at the opposite end of the spectrum. Looking at long careers with low warp/SFrac ratios, I see a mix of fielders and hitters, but I'd have to look at the data more carefully to say something conclusive.

Anyhow, it's certainly reasonable that Frank Howard wouldn't have had a major league job with his defense, if he had average hitting and baserunning (in a non DH league). So I would say that emphasis on defense in Dan R's warp numbers is somewhere in the right ballpark, maybe even with not quite enough emphasis on defense.
   304. Jim Sp Posted: July 25, 2007 at 11:28 PM (#2455278)
hmmm...actually I think the 0 point for fielding replacement determines how much credit players get for playing badly vs. not playing at all, while the offense/defense emphasis is set by the replacement level differences at the various positions, and the "spreads" put on the fielding wins.
   305. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 26, 2007 at 01:34 AM (#2455633)
Jim Sp,

My own view is that Baseball Prospectus is wrong to separate BRAR and FRAR, and I would not call Frank Howard a "replacement level fielder." As I've said time and time again, there are no replacement statistics, only replacement players. One replacement player might be -80 with the glove and +50 with the bat, while another at the same position +50 with the glove and -80 with the bat, but they are both a combined -30, and that is the replacement level no matter how it breaks down. It seems to me when you say "replacement level fielder," what you mean is "the amount of (negative) FWAA needed to bring a league-average hitter down to a total value equal to replacement level," in which case Frank Howard probably was that bad, but of course his hitting was far above average.

Please do download the new spreadsheet; there were some notable players with erroneous component numbers in the old one.

Thanks,

Dan
   306. Jim Sp Posted: July 26, 2007 at 04:36 PM (#2456318)
Dan R,
I like this system a lot, thanks for the hard work on this.

An example may illustrate better what I was getting at in the jumble above.

In any league, a DH with offense equal to league position player offense average, will have exactly 0 warp. Essentially that defines the 0 point of the warp system.

That may be close to current major league reality, but I don't see a reason for that to be true by definition. In a different DH usage pattern even the bad DHs could have better offense than that.
   307. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 26, 2007 at 04:53 PM (#2456352)
It's not true by definition--it's just the empirical finding made by Nate Silver. I suspect that, in fact, replacement DH's *did* hit higher than the league average in the 1970's, judging by the rep level for 1B's. But there weren't enough full-time DH's (at least as listed by Win Shares) for me to compile a meaningful worst-3/8-of-regulars average, so I've just applied the 1985-2005 average to the entire 1973-2005 period. This is clearly wrong, but it's not wrong by more than 0.2 wins, and it affects very few players.
   308. Jim Sp Posted: July 27, 2007 at 01:19 AM (#2457212)
I understand that the replacement level is set by looking at the worst regulars, but since the FWAA portion refers to positional average, wouldn't the FWAA penalize an unusual concentration of good fielders at a position? Not good players, but good fielders.
   309. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 27, 2007 at 01:40 AM (#2457285)
We just got into this discussion on the Andre Dawson thread. All MLB players are good fielders, as well as good hitters, compared to minor leaguers, who are in turn good fielders and good hitters compared to non-professional baseball players. But at the MLB level, everything is relative. Is a .300 batting average good? It sure is in 1968, but it's below average in 1894. Fielding is no different. Is a .900 Zone Rating or whatever good? It depends on what every other team's Zone Rating at that position is.

Now, what you'd expect to see is that if teams start placing a greater emphasis on defense, the average Zone Rating at the position would move up, meaning that a fielder who was +5 before might be -5 afterwards with no change to his talent. But then we'd expect the offensive average at the position to drop accordingly, which would lead to a lower replacement level. So sure, Adam Everett might have only been a +10 fielder in the 70s, and maybe Concepción would be +30 today. Or maybe not. We'll never know. But what we can say with certainty is that summing offense and defense, Concepción was X total stdev-adjusted wins above replacement in his day, and Everett is Y total stdev-adjusted wins above replacement today. Once again, there are no replacement statistics, only replacement players. So sure, perhaps that means one should take the component BWAA, BRWAA, and FWAA with a grain of salt. But the end product, the WARP, are reliable no matter what the league characteristics are.
   310. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 27, 2007 at 03:48 AM (#2457510)
The other possibility, I should add, is that if there just happen to be a bunch of great fielders at a position in a league at a given time, then yes, everyone else's FWAA would look worse--including the replacement player's FWAA. That would cause the replacement level to drop, leaving everyone else in the league unaffected.
   311. Jim Sp Posted: July 30, 2007 at 02:13 AM (#2461080)
If anyone has info on what Roy Thomas was doing before he showed up in the major leagues, this method would make him a legitimate candidate with a little bit of minor league credit.
   312. Joey Numbaz (Scruff) Posted: July 30, 2007 at 06:35 PM (#2461741)
Dan, I'd like to take a crack at converting your wins for players by season to a pennants added number instead of a salary number (I think your salary calculator places too much emphasis on huge years), I can do that with the spreadsheet on the yahoo group site, right?
   313. Jim Sp Posted: July 30, 2007 at 06:47 PM (#2461758)
The spreadsheet already has pennants added.
   314. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 30, 2007 at 08:37 PM (#2461918)
Joe, you most certainly can, but as Jim Sp said, the spreadsheet on the Yahoo! group already has Pennants Added (using precisely the formula that you provided to me). But remember that Pennants Added numbers from my system will NOT be directly comparable to those derived from BP WARP or WS, since my system uses the actual MLB replacement level rather than the 1899 Cleveland Spiders level (BP) or worse (WS). All the Pennants Added numbers will be lower, but the relative rankings will be different as well, since more credit is given for above-average seasons relative to just-showing-up seasons.
   315. Joey Numbaz (Scruff) Posted: July 30, 2007 at 09:17 PM (#2461969)
All good Dan - I hadn't read Jim Sp's entire post, thanks for the extra tip.

I didn't realize the PA numbers are already in there.

They will probably jive pretty well with my pitcher ratings as well - since those are also using a much higher replacement level.
   316. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 30, 2007 at 09:49 PM (#2462012)
If you're a pure prime voter, Thomas *clearly* belongs IMO--his 10 full seasons are eminently HoM-worthy. But he had *no* shoulder stat-padding seasons, and only had one MVP-type peak year. There's not enough career for me there, given how easy the early-aughts NL was to dominate.
   317. Joey Numbaz (Scruff) Posted: July 31, 2007 at 03:56 PM (#2462879)
Dan - I imported everything into a database last night - that was as far as I could get though.

Just curious what years does this cover? Do you go back before 1893? To at least 1893? AL/NL I see are both included now, is that for all years?

Just want to know what gaps I'm going to have to fill in.

Also, you do adjust offense based on the replacement level for a particular season, right? So a 1977 SS doesn't have to hit like a 2007 SS to get the same credit? Do you combine leagues to find this replacement level, or is it different for each league?

Sorry if this is rehash, but it's been awhile. Thanks!
   318. Jim Sp Posted: July 31, 2007 at 04:57 PM (#2462960)
Aha, I just figured out a puzzle.

BBRef's OPS+ excludes pitchers, so a 100 OPS+ creates BWAA in Dan R's system.

I ran some correlations for the 1953-1972 period (no DH) and you can very well estimate BWAA1 from OPS+.

For the period:

BWAA1 = SFrac * [(OPS+) - 90.8] / 11.43

Adding a gross fielding adjustment is easy as well, just add an ops+ adjustment based on fielding position and you've got a pretty good approximation to Dan's WARP:

C: 20.5
1B: 2.7
2B: 27.2
3B: 14.1
SS: 38.5
LRF: 9.6
CF: 15.3

The only thing you need to do then is adjust up down based on above/below average baserunning and fielding compared to average at the position.

Now my system long ago was [(ERA+) -90]*IP system for pitchers, which I then carried over to position players, just as above. Somewhat different replacement levels per position, better FWAA estimation, etc...Dan's method is better methodology but same idea and same overall replacement level (about 90 Ops+/era+, in other words a high one).

No wonder I like Dan's numbers so much.

In a rush, so I may have typo'd a detail or two.
   319. Jim Sp Posted: July 31, 2007 at 05:13 PM (#2462982)
The ops+ adjustments are just 11.43*[average of dan's rep].
   320. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 31, 2007 at 05:34 PM (#2463009)
1. This covers all AL and NL (but not FL or Negro League) position player-seasons over 50 PA from 1893 to 2005. I can't do WARP for pre-1893 guys because I can't get any run estimator to be accurate enough.

2. I'm not sure what you mean by whether I "adjust offense." WARP are broken up into four components, in both the standard deviation-adjusted (2) and unadjusted (1) versions: batting wins above average (which include double play avoidance post-1959), baserunning wins above average (which include non-SB baserunning post-1972), fielding wins above positional average, and Rep, which is the difference in wins betwen a replacement player at the position and a player hitting and fielding at the league average in the given amount of playing time. If a SS in 1977 and 2007 have the same OPS+ (roughly), the same playing time, and the same league (AL or NL), they will have the same BWAA. If a SS in 1977 and 2007 have the same SB/CS + EqBR, they will have the same BRWAA. If they have the same FRAA and FWS-converted-to-FRAA (not exactly, because I use UZR for recent years, but you get the idea), they will have the same FWAA.

The only differences in the treatment of a SS in 1977 and 2007 are in the final column, Rep, and then in the standard deviation adjustment. The Rep number is calculated as follows:
Calculate the average standard deviation-adjusted wins below overall league average (BWAA + BRWAA + FWAA) per 162 games of the worst 3/8 of major league starters (both leagues) at the position in the 1985-2005 period. Compare this 1985-2005 average to Nate Silver's empirically determined Freely Available Talent (FAT) level for the position from 1985 to 2005, and record the difference, which is the gap between the FAT level and the worst-regulars average. Then calculate the average standard deviation-adjusted wins below overall league average per 162 games of the worst 3/8 of major league starters at the position in the decade surrounding the year in question (the given year, plus four seasons on either side). That is the worst-regulars average for the year in question. Now add the gap between the FAT level and the worst-regulars average previously determined for 1985-2005 to the worst-regulars average for the year in question to get the FAT level for the year in question.

A quick example may serve to illustrate this: compare 2001 Ricky Gutiérrez with 1982 Bill Russell. Gutiérrez had 606 PA and a 96 OPS+, while Russell had 576 PA and a 99 OPS+. They were, thus, identical hitters. (I actually have them both as 0.6 batting wins above average, so either they were both OBP-heavy or I'm using different park factors than OPS+ is, or maybe they had a below-average IBB/BB rate, or something). Russell added 0.3 wins with his baserunning, while Gutiérrez was just a league-average baserunner, but Gutiérrez's fielding was 0.3 wins better than Russell's. Thus, they accumulated exact same value above overall league average (BWAA + BRWAA + FWAA) in roughly the same playing time, 0.4 wins above average.

The next step is the standard deviation adjustment. I calculate that the 2005 NL was 94% as easy to dominate as the 2001 NL (with the 6% difference due largely to lower run scoring and a longer time since expansion in 2005), while it was 97% as easy to dominate as the 1982 NL. So to convert 2001 NL wins and 1982 NL wins to 2005 NL wins, we multiply Gutiérrez's 0.4 wins above average by .94, which is still 0.4, and Russell's 0.4 wins above average by .97, which is also still 0.4. (If they were further away from average, this effect would be more pronounced--1894 Billy Hamilton is 8.9 wins above average before accounting for standard deviations, 7.2 wins above average afterwards).

The Rep column, however, is not the same. Using the method explained above, I calculate that replacement shortstops in the 1982 NL would have been 3.8 total 2005 NL wins below average (BWAA + BRWAA + FWAA) per 162 games, while in the 2001 NL they would have been 3.0 total 2005 NL wins below average per 162 games. Multiply that replacement level by their playing time (about 85% of the season in both cases), and we get that the replacement SS in Russell's playing time would have been 3.2 2005 NL wins below average, while the replacement SS in Gutiérrez's time would have been just 2.6 2005 NL wins below average. Thus, Gutiérrez is 0.4 2005 NL wins above average + 2.6 2005 NL wins separating a replacement SS from average in 2001 = 3.0 2005 NL wins above replacement, while Russell is 0.4 2005 NL wins above average + 3.2 2005 NL wins separating a replacement SS from average in 1982 = 3.6 2005 NL wins above replacement. Thus, you see 2001 Ricky Gutiérrez with 3.0 WARP2, and 1982 Bill Russell with 3.6 WARP2, despite the fact that both their playing time and their value above average were equal.

A final caveat: the 0.6 wins per 162 games DH adjustment is applied to the replacement level--logically, since a replacement player generating 50 runs a season would be say 2 wins below average in an NL where the average player generates 70 runs a season, and 2.6 wins below average in an AL where the average player generates 76 runs a season. So BWAA and BRWAA are NOT comparable between the AL and NL post-1973. The bottom-line statistic is WARP2, which incorporates all the necessary adjustments and is the one I use to base my voting on.
   321. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 31, 2007 at 05:38 PM (#2463015)
Jim Sp, thanks very much for your helpful comments. If my BWAA1 did NOT correlate extremely highly to OPS+, I'd be very, very concerned. I owe you an email which I'll get to now.
   322. David Concepcion de la Desviacion Estandar (Dan R) Posted: July 31, 2007 at 11:07 PM (#2463694)
Jim Sp, also, the replacement level in my system is definitely not 90 OPS+. If 90 OPS+ = league average offense with the pitcher included, then replacement clearly has to be below 90, since replacement is below average! That said, that's definitely why Russell and Gutiérrez show positive BWAA in my system despite having sub-100 OPS+; thanks for catching it.
   323. Jim Sp Posted: July 31, 2007 at 11:25 PM (#2463724)
I should say replacement level for players with very minimal defensive value, such as Frank Howard in the OF, is about 90 OPS+.
   324. DL from MN Posted: August 04, 2007 at 05:40 PM (#2470839)
Do you check the normality of your distributions?
   325. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 05, 2007 at 01:47 AM (#2471281)
Yes, and they come out pretty good. Here are the # of player-seasons by starters with a given # of standard deviation-adjusted wins above average per 162 games from 1893 to the present:

Below -8 2
-8 - -7 6
-7 - -6 36
-6 - -5 91
-5 - -4 261
-4 - -3 673
-3 - -2 1209
-2 - -1 1794
-1 - 0 2457
0 - 1 2685
1 - 2 2581
2 - 3 2093
3 - 4 1574
4 - 5 992
5 - 6 585
6 - 7 288
7 - 8 134
8 - 9 65
9 - 10 29
10 - 11 18
11 - 12 5
12 - 13 3
13 - 14 3

The mean is 0.9 wins above average, which sounds right given that I'm only including starters. The stdev is 2.6. Kurtosis is .25 and skewness is .23, which are both what you would expect--the presence of Bonds/Ruth/Williams etc. means we should have slightly fat tails, and the fact that good players get more playing time while bad players lose their jobs means the distribution should be slightly right-skewed. But it's more than normal enough for the stdev to be a useful measure.
   326. Dandy Little Glove Man Posted: August 06, 2007 at 06:56 PM (#2473987)
Hey Dan, provided that I have the most recent update of your WARP, there's one spot where your league adjustment just seems wrong to me at a glance. In 1998, the AL adjustment goes up to 1.048 from 1.025 in 1997, while the NL adjustment falls to .907 from .946. The difference between the two leagues is even much larger than in the 1993 expansion year. Could it be that you credited the NL with a 2-team expansion and the AL with no expansion at all? While this is technically true, in terms of diluting talent through the addition of expansion teams, the effects should be split fairly evenly. The AL replaced a 78-83 team with a team that has hovered around 100 losses for its entire existence. In fact, from the start of interleague play in 1997 through 2005, the NL had a winning record overall against the AL. This doesn't seem consistent with the record .13 to .14 disparities between the leagues beginning in 1998.
   327. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 06, 2007 at 08:48 PM (#2474156)
Dandy Little Glove Man--yes, you are entirely correct that I credit the NL with two expansion teams in 1998 and the AL with 0, which accounts for the change in league adjustment factor. The 1998 Brewers were only 11 wins better than the 1998 Devil Rays--enough that maybe I should give the AL say 20% of the expansion adjustment, and the NL 80%, but not more than that, since the point is that two teams which combined for a .429 winning percentage were added to the NL, while the AL only replaced one losing team with a more-losing team. I'm not particularly inclined to change it, though, because the projections match the results much better if I don't. Standard deviations in the AL were actually slightly higher than those in the NL through 1996, but starting in 1998 the NL ones become *much* higher than the AL ones and stay there for the following years. This correlates very nicely to the 1998 expansion of the NL, and suggests that that league bore the vast majority of the expected expansion effect on standard deviation.
   328. Dandy Little Glove Man Posted: August 06, 2007 at 09:28 PM (#2474204)
Standard deviations in the AL were actually slightly higher than those in the NL through 1996, but starting in 1998 the NL ones become *much* higher than the AL ones and stay there for the following years. This correlates very nicely to the 1998 expansion of the NL, and suggests that that league bore the vast majority of the expected expansion effect on standard deviation.

I think that another event had a substantial effect on the phenomenon you attribute to expansion: the absolute best hitter in the AL switched to the NL at the trading deadline in 1997. McGwire had posted 200+ OPS+ figures in the 1995 and 1996 AL, easily outpacing his nearest competition. While Pedro Martinez moved to the AL in 1998, the Randy Johnson trade basically made that a moot point. No comparable hitting talent went to the AL at that time, and I believe that one player switching leagues can have a highly significant effect on the standard deviations if he is enough of an outlier. McGwire certainly meets that requirement. The expansion draft affected all teams equally, and as I said previously, the NL had a slight winning record in interleague play from its late-90s inception through 2005. Regardless of whether you would define those league-switching Brewers as mediocre or sub-mediocre, the AL was not nearly as superior to the NL in the aftermath of the 1998 expansion as your current numbers indicate.
   329. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 07, 2007 at 12:18 AM (#2474311)
Well, remember that I'm not exactly measuring league quality, I'm measuring standard deviation, but in this particular case (as opposed to many others like the 1910's league disparity) they are probably the same thing.

The stdevs for the 1998 NL and 1998 AL were 3.29 and 2.83 wins above average per 162 games (remember, this is just position players; pitchers have a totally different stdev equation). If I switch McGwire from the NL to the AL in 1998 without changing his raw wins above average, the NL stdev drops to 3.20, and the AL stdev increases to 2.96--yes, a very significant difference. I don't think McGwire *would* have put up quite the season that he did in 1998 if he had been in the AL--he probably would have been about one win worse by my estimation, which would make the AL stdev 2.93 instead of 2.96. Nonetheless, it's a valid point. If I change the expansion adjustment to 1.5 teams in the NL and 0.5 teams in the AL, the LgAdj values for the NL from 1998 to 2005 become .919, .929, .942, .950, .972, .983, .964, and 1.006, while those for the AL become 1.016, 1.024, 1.037, 1.045, 1.042, 1.028, 1.027, and 1.051. If I make the expansion adjustment one team per league, the LgAdj values for the NL from 1998 to 2005 become .932, .941, .953, .961, .981, .991, .971, and 1.012, while those for the AL become .986, .996, 1.01, 1.021, 1.020, 1.01, 1.011, and 1.038. It's a big difference. Here are the stdevs for both leagues from 1995 to 2005. LgAdj is just equal to 3 divided by whatever stdev you want to use for that year. The forecast numbers are of course the ones I use for my own LgAdj calculations.

Year   Actual AL Forecast AL Actual NL Forecast NL
1995  3.11  2.95   2.94  3.24
1996  3.12  3.05   3.02  3.22
1997  2.85  2.93   3.12  3.17
1998  2.83  2.86   3.29  3.31
1999  2.92  2.85   3.00  3.27
2000  2.99  2.82   3.21  3.22
2001  3.00  2.80   3.57  3.19
2002  3.03  2.82   3.22  3.12
2003  2.92  2.86   3.20  3.08
2004  2.67  2.88   3.37  3.14
2005  2.66  2.82   2.91  3.00 


For whatever it's worth, the 2004 and '05 AL had two of the five lowest stdevs of any league-season since 1893, along with the 1907 AL and the 1913 and '15 NL. The '13 and '15 NL numbers are clearly the result of the concentration of stars in the AL, the '07 AL I guess is just random fluctuation. NL stdev is above trendline for 2001-2004 and below it for 2005, although that is basically just the Bonds factor, I think. Could this be evidence of the effect of steroid testing? Or just random variance? I just haven't gotten around to doing 2006 numbers, I probably should just to see whether the low stdevs have continued (it certainly would seem like they did in the AL, where none of the MVP candidates posted particularly remarkable seasons, but Howard and Pujols were both pretty nuts in the NL).

One thing I could try to do would be to toy with the expansion variable so it's not linear. I currently just use "years since expansion" going down from 12 to 0 (so the 1998 NL gets 19 expansion points, 12 for the '98 expansion and 7 for the '93 expansion), but I might be able to improve my r-squared if I had the expansion factor decline at an exponential rather than linear rate. Squaring that number only improves r-squared to 29.0% from 28.2%, not enough to make a meaningful difference, but I'll keep toying with it. I'd love for my regression to be more accurate and explain more variance than it currently does.
   330. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 07, 2007 at 12:19 AM (#2474312)
Wow, that didn't format right. Anyways, you can understand it.
   331. Dandy Little Glove Man Posted: August 07, 2007 at 04:30 AM (#2474926)
Great stuff, Dan. Thanks. I didn't realize that you use a different standard deviation / league adjustment for hitters than for pitchers. I tend to view them as inseparable, and I would guess that they have a highly direct relationship. I can understand how the actual standard deviations would lead you to attribute the entire 1998 expansion to the NL, but if the actual standard deviations are your guide, shouldn't you give the AL the majority of the expansion adjustment for 1993? From 1988 to 1992, the NL stdev was 3.13 and the AL 2.88. In what I would term the 1993 expansion era of 93-97, the NL stdev declined to 3.03 while the AL stdev shot up to 3.05 wins above average. The 1998 expansion had almost the exact opposite effect, with the NL stdev rising .23 runs and the AL stdev falling .09 runs for the period from 1998 to 2002.

Obviously it would be improper to give the AL most of the expansion adjustment for the NL's addition of 2 teams in 1993, but I have a theory regarding the relatively similar overall effects of expansion on the leagues in the 90s. There were 2 aspects of 90s expansion that separate it from previous times. First, both leagues were equally subject to the expansion draft even when only the NL was expanding. Players such as Jeff Conine, Carl Everett, and Charlie Hayes were taken from AL teams by the Marlins and Rockies. Second, the NL expansion teams were competitive almost immediately thanks to the aggressive signing of free agents, including several key players from the AL. In their third year of existence, the Rockies were a playoff team with substantial contributions from Dante Bichette and Ellis Burks, and the Rockies and Marlins had a cumulative winning record that year. The Diamondbacks won 100 games in just their second season, and 3 of their top 4 position players -- Luis Gonzalez, Jay Bell, and Matt Williams -- came to Arizona from the AL. Contrast that with the 1977 expansion teams that were league doormats for several years before becoming competitive. Neither Seattle nor Toronto had a season above 67-95 until 1982. This is not to suggest that you give equal expansion credit for 1993 and 1998, but you may want to consider something like 1.5 teams to the NL and .5 teams to the AL each time to account for the talent effects on both leagues.

With regard to the expansion variable, I have typically found in my work with OPS+ and ERA+ figures that values are mostly restored to their previous levels after 5-6 years but that the expansion effects are still quite strong in years 3 and 4. What I have done is not nearly as comprehensive nor as scientific as your work, but the results aren't too far off. OPS+ are ERA+ leaderboards would seem to indicate that the 2004-06 AL is among the most difficult to dominate leagues of all-time, though both the AL and NL from 1993 to 2003 were significantly easier than they had been from 1982 to 1992. As I alluded to before, I strongly disagree with the apparent conclusion from your current league adjustments that the 1998 AL was more difficult than any preceding league year in history, though I wouldn't be surprised if the 2006 AL merits that distinction.
   332. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 07, 2007 at 05:01 AM (#2475011)
I have done a regression on stdev for pitchers--just using RA+, nothing else--and gotten the same r-squared (30%), but with very different variables and coefficients. (I can send you the data if you want).

That is an interesting suggestion about distributing the expansion variable across the leagues. If it increases my r-squared, I will definitely make that change.

I get the best regression results if I let the expansion variable last for *twelve* years, believe it or not--12 in the expansion year, 11 in year 2, 10 in year 3, etc. If I cap the variable at 11 or 13, my accuracy goes down, and continues to decline as I move further away from 12.

Certainly what you are describing anecdotally about OPS+ leaderboards matches my standard deviation findings exactly.

Finally, remember that standard deviation is not the same as quality of play!, contrary to what Stephen Jay Gould would have you think. This is not just a quibble or ###-covering on my part, it is a huge misconception. Standard deviation is standard deviation, nothing more and nothing less. It can move up, down, or not at all with quality of play, depending on where in the distribution the quality of play changes are found. If you weaken a league by adding players at the bottom, as with an expansion, then yes, standard deviation will go up as quality of play goes down. Similarly, if you weaken a league by removing players in the middle, as with the 1901 NL, standard deviation will go up as quality of play goes down as well. However, if you weaken it by removing players at the top, as in the wartime AL or teens NL, standard deviation will go down alongside quality of play, because the absence of stars tightens the distribution. On the flip side, if you strengthen a league by removing players from the bottom, as in the contracted 1900 NL, then you will get the expected result of a lower standard deviation accompanying higher quality of play. Yet if you strengthen it by adding players at the top, as in the teens AL or early integration-era NL, you will increase the standard deviation by creating a "star glut," expanding the distribution. Knowing changes in quality of play tells you *nothing* about the accompanying standard deviation, until you know what caused the change in quality of play. This is why I still apply quality-of-play discounts to seasons like Nap Lajoie's 1901, Gavvy Cravath and George Burns, and AL players of the 50s and 60s, despite the low standard deviation of their leagues.

The 1998 AL may not have been more difficult to excel in than any preceding league in history, but even if you give it full expansion credit for the Devil Rays, it was extremely difficult to dominate. Albert Belle's monster second half is especially impressive in light of this.
   333. Jim Sp Posted: August 08, 2007 at 04:48 PM (#2476928)
Dan,
Can you describe more the quality of league discounts that you consider appropriate? I understand that the warp system adjusts only for standard deviation and not for league quality, but above you mention quality-of-play discounts, what are they exactly?
   334. TomH Posted: August 08, 2007 at 05:18 PM (#2476957)
I would recommend studying the expansion v std dev relationship further, if you are looking for mathematical ways to better quantify your league qual/strength adjustment factor, Dan. I suspect an exponential decay type of relationship would likley be the best descriptor.
   335. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 08, 2007 at 06:26 PM (#2477039)
Jim Sp, my quality-of-play adjustments are completely subjective as a voter and do not form part of my WARP system. I reward 1900 for being contracted, ding the early aughts (particularly the AL) for the mega-expansion, tweak the teens AL vs. NL a bit (another reason why Bancroft is not my favorite backlog shortstop), reward the integration-era NL and penalize the AL a nudge (contributing to the placement of Rizzuto and Pesky below Concepción), and will do the same for the last few years' AL/NL disparity whenever current stars become eligible.

TomH, I tried doing it exponentially rather than linear-ly and it only improved my r-squared by .008, not enough to justify redoing everything from scratch. But I'll keep toying with it and post if I find a meaningfully better fit.
   336. Joey Numbaz (Scruff) Posted: August 19, 2007 at 06:38 AM (#2491289)
I put Dan's WARP data into a database and recalculated his Pennants Added to match mine (using W/L data through the end of the 2002 season).

Here are the leaders among the non-HoMer eligibles - quite an interesting list. I'll also list their overall rank among eligibles. This does not include war credit for anyone.

XX  Eddie Murray  (551.013
XX  Ryne Sandberg 
(63.9688
1.  Graig Nettles 
(68.9446
2.  Dave Bancroft 
(72.9356
3.  Dave Concepcion  
(75.9272
4.  Tommy Leach   
(80.9094
5.  Bert Campaneris  
(85.9035
6.  John McGraw   
(93.8810
7.  Toby Harrah   
(95.8775
8.  Bob Johnson   
(96.8773
9.  Reggie Smith 
(100.8705
10. Vern Stephens   
(102.8590
11. Buddy Bell   
(104.8582
12. Rabbitt Maranville 
(105.8577
13. Andre Dawson 
(106.8498
14. Brett Butler 
(107.8413
14. Dick Bartell 
(111.8181
15. Bobby Bonds  
(112.8164
16. Kiki Cuyler  
(113.8085
17. Jose Cruz 
(116.8063
18. Ron Cey   
(117.8002
19. Norm Cash 
(118.7901
20. Roy Thomas   
(119.7891
21. Amos Otis 
(123.7699
22. Chuck Klein  
(124.7692
23. Bob Elliott  
(125.7683
24. Chet Lemon   
(126.7617
25. Tony Lazzeri 
(128.7543
26. Pie Traynor  
(129.7537
27. Fielder Jones   
(131.7481
28. Bobby Veach  
(132.7455
29. Ken Singleton   
(134.7436
30. Jim Fregosi  
(135.7419 


If you give Rizzuto 3 years at his 1941 level he'd be first on the list of backloggers (.9602).

Can we try to come up with a comprehensive list of position player candidates who could conceivably get in with war credit (i.e. players that would be worth figuring)?

Here's everyone that retired between 1943 and 1962 with at least .5 Pennants Added and hasn't been elected:

Red Schoendienst
Gil Hodges
Gene Woodling
Jackie Jensen
Gil McDougald
Alvin Dark
Mickey Vernon
Andy Pafko
Carl Furillo
Phil Rizzuto
Al Rosen
Vern Stephens
Eddie Joost
Sid Gordon
Johnny Pesky
Bob Elliott
Bill Nicholson
Eddie Stanky
Dom DiMaggio
Marty Marion
Tommy Henrich
Tommy Holmes
Augie Galan
Dixie Walker
Jeff Heath
Ken Keltner
Lonny Frey
Rudy York
Ernie Lombardi
Cecil Travis
Roy Cullenbine
Bob Johnson
Harlond Clift
Dolph Camilli
Dick Bartell

That would cover it, right?
   337. Joey Numbaz (Scruff) Posted: August 19, 2007 at 07:18 AM (#2491299)
Here are the Dan R. MVP Awards, btw (pitchers not eligible):

Year LG First Last WAR
1893 NL Ed Delahanty 9.5
1894 NL Billy Hamilton 9.5
1895 NL Hughie Jennings 9.5
1896 NL Hughie Jennings 12.5
1897 NL Hughie Jennings 10
1898 NL Hughie Jennings 9.9
1899 NL John McGraw 10.8
1900 NL Honus Wagner 9.1
1901 AL Nap Lajoie 11.7
1901 NL Honus Wagner 9.9
1902 NL Honus Wagner 10.1
1902 AL Ed Delahanty 8
1903 NL Honus Wagner 9.7
1903 AL Nap Lajoie 8.5
1904 AL Nap Lajoie 10.4
1904 NL Honus Wagner 9.5
1905 AL George Davis 6.7
1905 NL Honus Wagner 11.5
1906 NL Honus Wagner 11.4
1906 AL Nap Lajoie 10.2
1907 NL Honus Wagner 11.5
1907 AL Ty Cobb 7.9
1908 AL Nap Lajoie 8
1908 NL Honus Wagner 13.7
1909 NL Honus Wagner 10.4
1909 AL Ty Cobb 11.2
1910 AL Ty Cobb 11
1910 NL Honus Wagner 7.5
1911 AL Ty Cobb 11.2
1911 NL Honus Wagner 7.6
1912 AL Tris Speaker 11.9
1912 NL Honus Wagner 9.7
1913 AL Eddie Collins 9.3
1913 NL Gavvy Cravath 6.1
1914 NL George Burns 6.6
1914 AL Tris Speaker 10.9
1915 AL Ty Cobb 11.6
1915 NL Gavvy Cravath 8.2
1916 AL Tris Speaker 9.4
1916 NL Zack Wheat 7.2
1917 AL Ty Cobb 12.3
1917 NL Rogers Hornsby 8.5
1918 AL Ty Cobb 7.9
1918 NL Heinie Groh 8.2
1919 AL Babe Ruth 9.7
1919 NL Heinie Groh 7.8
1920 AL Babe Ruth 12.7
1920 NL Rogers Hornsby 11
1921 AL Babe Ruth 13.7
1921 NL Rogers Hornsby 11.4
1922 NL Rogers Hornsby 11.5
1922 AL Ken Williams 8.2
1923 NL Rogers Hornsby 6.8
1923 AL Babe Ruth 15.7
1924 AL Babe Ruth 11.8
1924 NL Rogers Hornsby 13
1925 NL Rogers Hornsby 9.6
1925 AL Al Simmons 7.3
1926 AL Babe Ruth 11.2
1926 NL Paul Waner 6.6
1927 NL Rogers Hornsby 9.7
1927 AL Babe Ruth 12.3
1928 AL Babe Ruth 9.3
1928 NL Rogers Hornsby 9.3
1929 NL Rogers Hornsby 10.1
1929 AL Al Simmons 7.8
1929 AL Jimmie Foxx 7.8
1930 NL Chuck Klein 7.3
1930 AL Joe Cronin 9.5
1931 AL Babe Ruth 10.2
1931 NL Wally Berger 6.9
1932 AL Jimmie Foxx 10.6
1932 NL Chuck Klein 8.3
1933 NL Chuck Klein 9.6
1933 AL Jimmie Foxx 9.3
1934 NL Arky Vaughan 10.4
1934 AL Lou Gehrig 10.5
1935 AL Jimmie Foxx 8.7
1935 NL Arky Vaughan 11.6
1936 NL Arky Vaughan 10
1936 AL Lou Gehrig 9.7
1937 NL Joe Medwick 8.5
1937 AL Joe DiMaggio 8.1
1938 NL Arky Vaughan 9.3
1938 AL Joe Cronin 8.5
1939 AL Joe DiMaggio 8.1
1939 NL Johnny Mize 8
1940 AL Lou Boudreau 8.1
1940 NL Johnny Mize 7.5
1941 NL Pete Reiser 8.6
1941 AL Ted Williams 11.4
1942 NL Enos Slaughter 8.5
1942 AL Ted Williams 13.2
1943 AL Luke Appling 9.7
1943 NL Stan Musial 10.9
1944 NL Stan Musial 9.4
1944 AL Snuffy Stirnweiss 9.6
1945 AL Snuffy Stirnweiss 9.4
1945 NL Tommy Holmes 8.3
1946 AL Ted Williams 12.6
1946 NL Stan Musial 9.7
1947 NL Ralph Kiner 7.5
1947 AL Ted Williams 10.5
1948 AL Lou Boudreau 10.1
1948 NL Stan Musial 10.2
1949 NL Stan Musial 9.4
1949 AL Ted Williams 9.7
1950 AL Phil Rizzuto 8.5
1950 NL Eddie Stanky 7.3
1951 NL Jackie Robinson 9.9
1951 AL Ted Williams 7.6
1952 AL Larry Doby 7
1952 NL Jackie Robinson 8.6
1953 AL Al Rosen 9.5
1953 NL Stan Musial 7.4
1954 NL Willie Mays 10.2
1954 AL Minnie Minoso 8.3
1955 NL Willie Mays 9.7
1955 AL Mickey Mantle 9.3
1956 AL Mickey Mantle 11.2
1956 NL Duke Snider 7.4
1957 NL Willie Mays 7.9
1957 AL Mickey Mantle 12.1
1958 NL Willie Mays 9.3
1958 AL Mickey Mantle 9.6
1959 NL Ernie Banks 10
1959 AL Mickey Mantle 7.5
1960 NL Ernie Banks 8.2
1960 AL Roger Maris 7.7
1961 NL Hank Aaron 8.5
1961 AL Mickey Mantle 10.5
1962 NL Willie Mays 8.3
1962 AL Mickey Mantle 6.9
1963 AL Carl Yastrzemski 6.4
1963 NL Hank Aaron 9.2
1964 AL Brooks Robinson 7.1
1964 NL Willie Mays 8.8
1965 NL Willie Mays 9.5
1965 AL Zoilo Versalles 6.3
1966 AL Frank Robinson 9
1966 NL Ron Santo 8.2
1967 NL Ron Santo 8.6
1967 AL Carl Yastrzemski 10
1968 NL Hank Aaron 7.4
1968 AL Carl Yastrzemski 8.7
1969 AL Rico Petrocelli 9.9
1969 NL Willie McCovey 8
1970 AL Jim Fregosi 7.3
1970 NL Johnny Bench 6.7
1971 AL Bobby Murcer 7.9
1971 NL Willie Stargell 7.3
1972 AL Dick Allen 9.1
1972 NL Joe Morgan 10.8
1973 AL Bobby Grich 8.1
1973 NL Joe Morgan 9.9
1974 AL Rod Carew 7.9
1974 NL Joe Morgan 9.5
1975 AL Rod Carew 8.6
1975 AL Toby Harrah 8.6
1975 NL Joe Morgan 11.9
1976 AL Bobby Grich 7.1
1976 AL George Brett 7.1
1976 NL Joe Morgan 10.5
1976 AL Graig Nettles 7.1
1977 NL Joe Morgan 7.6
1977 AL Rod Carew 8.7
1978 NL Dave Parker 6.5
1978 AL Jim Rice 7.3
1979 AL Fred Lynn 8.2
1979 NL Mike Schmidt 8.3
1980 NL Mike Schmidt 9.4
1980 AL George Brett 9.4
1981 NL Mike Schmidt 11.5
1981 AL Dwight Evans 10.2
1982 NL Mike Schmidt 7.5
1982 AL Robin Yount 10.3
1983 AL Cal Ripken 10.1
1983 NL Dickie Thon 8.6
1984 AL Cal Ripken 10.6
1984 NL Ryne Sandberg 8.1
1985 AL George Brett 9.3
1985 NL Pedro Guerrero 8.5
1986 NL Tim Raines 7.3
1986 AL Cal Ripken 8
1987 NL Tony Gwynn 8.2
1987 AL Alan Trammell 9.2
1988 AL Wade Boggs 8.1
1988 AL Jose Canseco 8.1
1988 NL Kirk Gibson 7.8
1989 NL Will Clark 8.5
1989 AL Rickey Henderson 7.6
1990 NL Barry Bonds 8.2
1990 AL Rickey Henderson 10.1
1991 AL Cal Ripken 11.6
1991 NL Barry Larkin 7.9
1992 AL Frank Thomas 7.6
1992 NL Barry Bonds 10.3
1993 NL Barry Bonds 9.3
1993 AL John Olerud 8.4
1994 NL Jeff Bagwell 10.9
1994 AL Frank Thomas 9.4
1995 NL Barry Bonds 7.8
1995 AL Tim Salmon 9
1996 NL Barry Bonds 8.9
1996 AL Alex Rodriguez 9.1
1997 AL Ken Griffey 8.7
1997 NL Mike Piazza 9
1998 AL Alex Rodriguez 8.5
1998 NL Mark McGwire 8.7
1999 AL Derek Jeter 9.5
1999 NL Jeff Bagwell 7.1
2000 AL Alex Rodriguez 12.3
2000 NL Barry Bonds 8.8
2001 NL Barry Bonds 14.3
2001 AL Alex Rodriguez 10.7
2002 AL Alex Rodriguez 11.2
2002 NL Barry Bonds 13
2003 AL Alex Rodriguez 8.7
2003 NL Albert Pujols 10.3
2004 AL Miguel Tejada 7.7
2004 NL Barry Bonds 12.1
2005 NL Chase Utley 7.5
2005 AL Alex Rodriguez 8.9 
   338. Joey Numbaz (Scruff) Posted: August 19, 2007 at 07:27 AM (#2491302)
And a list of how many each has won:

First Last Total
Honus Wagner 13
Rogers Hornsby 10
Barry Bonds 9
Babe Ruth 9
Mickey Mantle 7
Alex Rodriguez 7
Ty Cobb 7
Willie Mays 7
Stan Musial 6
Ted Williams 6
Joe Morgan 6
Nap Lajoie 5
Hughie Jennings 4
Cal Ripken 4
Mike Schmidt 4
Arky Vaughan 4
Jimmie Foxx 4
Rod Carew 3
George Brett 3
Chuck Klein 3
Tris Speaker 3
Hank Aaron 3
Carl Yastrzemski 3
Ernie Banks 2
Lou Boudreau 2
Lou Gehrig 2
Johnny Mize 2
Gavvy Cravath 2
Al Simmons 2
Ed Delahanty 2
Jackie Robinson 2
Jeff Bagwell 2
Ron Santo 2
Heinie Groh 2
Joe DiMaggio 2
Bobby Grich 2
Rickey Henderson 2
Joe Cronin 2
Frank Thomas 2
Snuffy Stirnweiss 2
George Davis 1
Graig Nettles 1
Fred Lynn 1
George Burns 1
Derek Jeter 1
Alan Trammell 1
Albert Pujols 1
Barry Larkin 1
Billy Hamilton 1
Bobby Murcer 1
Brooks Robinson 1
Dickie Thon 1
Dave Parker 1
Frank Robinson 1
Dick Allen 1
Duke Snider 1
Dwight Evans 1
Eddie Collins 1
Eddie Stanky 1
Enos Slaughter 1
Chase Utley 1
Tommy Holmes 1
John McGraw 1
Ralph Kiner 1
Rico Petrocelli 1
Robin Yount 1
Roger Maris 1
Ryne Sandberg 1
Tim Raines 1
Pete Reiser 1
Toby Harrah 1
Pedro Guerrero 1
Tony Gwynn 1
Wade Boggs 1
Wally Berger 1
Will Clark 1
Willie McCovey 1
Willie Stargell 1
Zack Wheat 1
Tim Salmon 1
Kirk Gibson 1
Jim Rice 1
Joe Medwick 1
Al Rosen 1
John Olerud 1
Zoilo Versalles 1
Johnny Bench 1
Jose Canseco 1
Phil Rizzuto 1
Ken Williams 1
Jim Fregosi 1
Larry Doby 1
Luke Appling 1
Mark McGwire 1
Miguel Tejada 1
Mike Piazza 1
Minnie Minoso 1
Paul Waner 1
Ken Griffey Jr
   339. Joey Numbaz (Scruff) Posted: August 19, 2007 at 07:38 AM (#2491307)
By the way, Honus' 13 MVPs are consecutive - 1900-1912. Wow.
   340. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 19, 2007 at 10:20 AM (#2491319)
Joe, thanks very much for your very helpful compilation and analysis of my work. I'm surprised to see how different the Pennants Added numbers are from the salary estimator numbers I use--Reggie Smith is one of my favorite candidates, but his lack of in-season durability kills him on Pennants Added, I suppose. My basic contention--and this has been *endlessly* debated on these boards--is that SS was uniquely weak in the 1970's, and the presence of not just Concepción but Campaneris and Harrah as well is evidence of this.

I never checked to see that Wagner had 13 straight MVP's. That's insane...but also totally plausible. (Who else was in the aughts NL?) WS gives the 1901 award to Burkett by 1 WS, the 1910 one to Magee by 5 over Hofman and 6 over Wagner, and the 1911 one to Schulte by 1 over Wagner. I have Burkett as a below-average fielder in 1901 and Magee as a poor one in 1910, which explains it. I'd have to dig in a bit deeper to see what's going on in 1911.
   341. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 19, 2007 at 10:23 AM (#2491320)
Mantle has a nice run too, I have him as the AL MVP every year from 1955 to 1962 except for 1960, when Maris juuust edges him out by 0.2. (WS gives him all 8 in a row, although 1959 is a tie with Fox).
   342. Joey Numbaz (Scruff) Posted: August 19, 2007 at 11:19 AM (#2491323)
BTW - ARod is #23 on the pennants added list without even counting 2006-07. He'll be #19 after this season most likely.
   343. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 19, 2007 at 12:09 PM (#2491329)
That's not at all surprising. A-Rod through 2005 is very close to Arky Vaughan's career in my opinion, which is to say an inner-circle Hall of Famer. A SS who hit like a Hall of Fame OF, fielded above average, and played every day in an exceedingly difficult-to-dominate league. I don't quite think he can make it into the first group (Ruth/Bonds/TWilliams/Wagner/Cobb), but I think it's highly likely he'll break into the second (Mays/Speaker/Hornsby/Aaron/Collins/Mantle/Lajoie) by the time he's done.
   344. Paul Wendt Posted: August 20, 2007 at 02:10 PM (#2492107)
>>Can we try to come up with a comprehensive list of position player candidates who could conceivably get in with war credit (i.e. players that would be worth figuring)?
<<

Here's everyone that retired between 1943 and 1962 with at least .5 Pennants Added and hasn't been elected:
. . .

That would cover it, right?


There were some players in the service during 1942 (even part of 1941, eg Greenberg?), so I would go back to cover anyone who last played in 1941.

There were other wars. Marc has recently focused on Korea, whose veteran Willie Mays played in 1973, but that is a poor guideline.
   345. Joey Numbaz (Scruff) Posted: August 20, 2007 at 03:57 PM (#2492236)
Right Paul - my pitcher data covers a ton of players who missed time for Korea and Vietnam, the 1969 MacMillan is a great source for that data.

1918 also. Really impacted Rixey, Shocker and Shawkey to name a few.

Of course I cannot find mine right now, I purposely didn't pack it when I moved, it made the drive with me. But I've since misplaced it, and the office is a mess right now, and the books that were packed haven't been unpacked yet. But I still can't find the damned thing.
   346. Jim Sp Posted: August 20, 2007 at 10:00 PM (#2492762)
What's the difference between the pennants added in the spreadsheet and the pennants added as calculated by Joe?
   347. Jim Sp Posted: August 20, 2007 at 10:02 PM (#2492763)
In particular is Joe's recalc of Dan R's Pennants Added directly comparable with Joe's Pennants Added in the pitchers section?
   348. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 21, 2007 at 12:27 AM (#2492920)
I have no idea, but I used a formula Joe Dimino gave me in order to calculate my Pennants Added, so it would be strange if they were different...perhaps they have his league quality adjustments?

By the way, I find it interesting that so many people are moving up Leach based on my system, when I myself am not particularly impressed by him. I'm glad to see people are finding their own uses for the data!
   349. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 23, 2007 at 05:57 PM (#2496767)
I'm thinking of tweaking the system to modify the flat 0.5 wins/162 games CF vs. LF/RF bonus for pre-1930. Have any quantitative studies been done on the depth of these positions historically that anyone could alert me to as a reference?
   350. Joey Numbaz (Scruff) Posted: August 23, 2007 at 06:29 PM (#2496823)
I adjust the formula every 'year' by including the W-L records all teams only through the current election in the formula (it uses Standard Deviations of all teams). That's probably where the mix up is.
   351. AROM Posted: August 23, 2007 at 06:53 PM (#2496867)
and AL players of the 50s and 60s, despite the low standard deviation of their leagues.


Dan, what is your reason for counting the AL as weaker than the NL for this period? I looked at players who played in both leagues in the 1950's once, and found no significant differences in their quality of play from one league to the other. I didn't check the 1960's, and someone may have gotten different results with a different method. I'm just wondering what you are basing this on.
   352. Joey Numbaz (Scruff) Posted: August 23, 2007 at 07:02 PM (#2496878)
ARom - I know the Prospectus timeline adjustments show the NL as much stronger during this period. Just eyeballing the players in each league, I don't see how one could come to any other conclusion.
   353. Joey Numbaz (Scruff) Posted: August 23, 2007 at 07:03 PM (#2496879)
I was referring to the 1960s. I didn't see the 1950s reference, and don't remember the results for that timeframe off the top of my head.
   354. Joey Numbaz (Scruff) Posted: August 23, 2007 at 07:06 PM (#2496886)
Oh yeah, I did one other thing - I zeroed out all negative seasons. So if Dan gave a player negative WAR I treated it as zero. That probably had a bigger impact than the slightly different standard deviations.
   355. TomH Posted: August 23, 2007 at 07:46 PM (#2496951)
the original study by Dallas Adams in the early 1980s had the NL much stronger in those periods (referenced and shown in chart form in The Hidden Game of Baseball).
   356. AROM Posted: August 23, 2007 at 07:55 PM (#2496972)
Thanks guys. The NL had more star players, but without interleague play baseball is a zero sum game. For every HR Willie or Hank hit, some pitcher gave it up, and if the pitchers don't look worse in the NL, and the AL doesn't have as many stars, then the NL has to be balancing the stars out with some dudes slugging .297 or something.

I do own The Hidden Game of Baseball, so I'll have to see what they did.
   357. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 23, 2007 at 09:48 PM (#2497151)
AROM, who has given up--the AL was weaker than the NL in the 50's and 60's because it was much slower to integrate than the NL was.
   358. AROM Posted: August 23, 2007 at 10:18 PM (#2497176)
That's the theory I've heard. Is it actually true though? Can we quantify it, and how much better was the NL?
   359. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 23, 2007 at 10:35 PM (#2497195)
The disparity is enormous...the NL had Jackie Robinson, Roy Campanella, Don Newcombe, Monte Irvin, Willie Mays, Hank Aaron, Ernie Banks, Frank Robinson, Roberto Clemente, Orlando Cepeda, Willie McCovey, Dick Allen, Juan Marichal, Billy Williams, etc., while the AL had Larry Doby, Minnie Miñoso, Elston Howard, and...Tony Oliva? That's about it. As for quantifying it, the main thing to do would be to look at league-switchers, no?
   360. AROM Posted: August 23, 2007 at 10:47 PM (#2497215)
Just looked at the data from Hidden Game. On average for the 1950's, as compared to his base of National League 1976, the AL hitter loses 28 points of BA and the NL hitter loses 15. However, on slugging percentage the AL loses 46 points and the NL 47. Too bad he didn't calculate on base average. For the 60's, AL hitters lose .004/.027 and NL loses .002/.019.

If I assume ISO OBA is equal for both, then the NL was about 6% better in the 50's, and 3% in the 60's.
   361. AROM Posted: August 23, 2007 at 10:55 PM (#2497225)
As for quantifying it, the main thing to do would be to look at league-switchers, no?

When I did that, part of the problem is that pre-free agency, most of the league switchers were shitty players, like Walter "shitty batter" Dugan. What I found was that the league switchers hit slightly better in the NL, though the pitchers pitched a little better in the AL, so overall it was close to a wash.

No question the NL had the best of the black players, but that doesn't prove anything. The AL may have had better white players, plus keep in mind that 1/8 of the AL were Yankees, the best players out there regardless of color. If the NL was vastly superior we should be able to find concrete evidence.

Adam's data suggest the NL was better, but hardly an enormous difference. Looks to be about half the current difference between the AL and NL.
   362. AROM Posted: August 23, 2007 at 11:00 PM (#2497231)
Correction: The data in the hidden game was the work of Dick Cramer.
   363. Joey Numbaz (Scruff) Posted: August 24, 2007 at 12:55 AM (#2497441)
Using the Baseball Prospectus adjustments (I think I've culled them from the differences in NRA adjusted for season and all time), the difference is enormous.

There is not one season between 1952 and 1989 where the leagues are even. The NL is better every year, usually by a large margin. From 1977 on it makes sense because the NL was already stronger, and then the AL expanded on top of that.

The annual numbers below. A positive number means the NL has better pitchers, negative means the AL was better.

The difference is what you need to adjust NRA by to even the leagues out, 4.50 is considered average.

Remember this is only the pitchers - when you use ERA+ you are comparing the guy to the other pitchers in his league not the hitters (since you are comparing to average). The AL is obviously better the last few years than the NL, but it's because the hitters are better, not the pitchers.

I pulled this date by comparing NRA adjusted for season to NRA adjusted for all-time for the top 5 IP pitchers in each league that season, and averaging them. I did this about 6 months ago, so if Prospectus has updated their numbers, I wouldn't have that accounted for.

Year Diff.
1901 0.27
1902 
-0.24
1903 
-0.22
1904 
-0.17
1905 
-0.14
1906 0.04
1907 
-0.07
1908 0.07
1909 
-0.10
1910 
-0.07
1911 0.03
1912 0.08
1913 0.07
1914 0.08
1915 0.03
1916 
-0.13
1917 
-0.10
1918 0.02
1919 
-0.16
1920 
-0.10
1921 
-0.09
1922 
-0.22
1923 
-0.19
1924 
-0.17
1925 
-0.05
1926 
-0.11
1927 
-0.07
1928 0.02
1929 0.06
1930 0.13
1931 0.14
1932 0.16
1933 0.23
1934 0.18
1935 0.13
1936 0.16
1937 
-0.12
1938 
-0.04
1939 0.02
1940 
-0.06
1941 
-0.04
1942 0.29
1943 0.13
1944 0.07
1945 0.23
1946 
-0.06
1947 0.06
1948 0.00
1949 0.05
1950 0.01
1951 0.00
1952 0.14
1953 0.10
1954 0.21
1955 0.20
1956 0.24
1957 0.23
1958 0.26
1959 0.16
1960 0.17
1961 0.21
1962 0.14
1963 0.16
1964 0.22
1965 0.19
1966 0.19
1967 0.10
1968 0.03
1969 0.09
1970 0.18
1971 0.08
1972 0.12
1973 0.10
1974 0.16
1975 0.08
1976 0.08
1977 0.10
1978 0.13
1979 0.16
1980 0.22
1981 0.13
1982 0.16
1983 0.19
1984 0.06
1985 0.06
1986 0.07
1987 0.08
1988 0.12
1989 0.09
1990 0.09
1991 0.01
1992 
-0.03
1993 0.01
1994 
-0.04
1995 
-0.03
1996 
-0.05
1997 
-0.01
1998 
-0.10
1999 
-0.01
2000 0.00
2001 0.05
2002 0.09
2003 0.07
2004 0.02
2005 0.03
2006 0.08 
   364. TomH Posted: August 24, 2007 at 02:54 AM (#2497835)
The above appears to show that the NL is considered stronger currently. Which is exactly the opposite of what most believe to be true from interleague play and other data.
   365. Joey Numbaz (Scruff) Posted: August 24, 2007 at 03:02 AM (#2497855)
Tom - if you read my comment - this is only for the pitchers. It says the NL has better pitchers the last few years. I'd imagine the hitters more than make up for it, though I haven't looked.
   366. AROM Posted: August 24, 2007 at 05:05 AM (#2497926)
Something isn't right. If pitchers and hitters were equal in the two leagues, the AL RPG should be half a run higher than the NL just due to the DH. If they have better hitters the gap should be higher. If the AL pitchers were worse, the gap would be higher still. Yet the ERA gap between leagues has been much less than 0.50 the last few years.

The likely reason to me is that AL pitchers are also better.
   367. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 24, 2007 at 08:38 AM (#2497966)
In my research for my NY Times piece on league strength, both MGL and Nate Silver told me that AL pitchers are definitely better than NL ones, about 0.25 runs' worth *after* accounting for the DH.

No takers on my question about pre-1930 CF?
   368. Joey Numbaz (Scruff) Posted: August 24, 2007 at 12:01 PM (#2497975)
Dan, I don't think CF was nearly as important as it is in the modern game (post-WWII), especially before the lively ball I think CF steadily became a more and more important position, especially once large turf parks came into the league. I think it's declined a little in importance with the disappearance of those parks.

But I'm really not sure of what you are asking.
   369. AROM Posted: August 24, 2007 at 12:54 PM (#2497996)
Weren't a lot of ballparks in the early days extremely short to the lines, and very deep in center? Seems like that would make CF more important, not just catching the ball but stopping singles before they roll away and become triples.
   370. Dr. Chaleeko Posted: August 24, 2007 at 01:51 PM (#2498043)
Weren't a lot of ballparks in the early days extremely short to the lines, and very deep in center? Seems like that would make CF more important, not just catching the ball but stopping singles before they roll away and become triples.

Not if
a) shorter lines also meant shorter alleys
b) or if shorter lines allowed corners to roam a little further into the gaps
c) if the ball was dead
d) if players weren't swinging for the fences and were instead trying to place the ball.

I've always wondered if a good throwing arm might have been more important to deadball outfielders than to modern outfielders. If place hitting were more common, but aggressive baserunning were also more common (that is leg doubles and leg tripes more common as well as increased attempts for runners on base to go 1st to 3rd or 2nd to Home), then wouldn't managers want to select outfielders for better arms too?
   371. sunnyday2 Posted: August 24, 2007 at 03:00 PM (#2498141)
Well, at a minimum I think we know that even back in the day CF were probably the fastest and more often than not the best all-around athletes (more toolsy) in the OF. Which is pretty much the same as today. I agree that throwing was more important then than now, at all 3 OF positions. But if you had to hide a really bad fielder, you still hid him in the OF corners.
   372. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 24, 2007 at 03:07 PM (#2498144)
Joe, I'm asking whether I should modify the flat 0.5 wins per 162 games boost I give to CF relative to corner OF for the early game, and whether anyone is aware of any quantitative studies that might help me to adjust it.
   373. Joey Numbaz (Scruff) Posted: August 24, 2007 at 03:26 PM (#2498156)
I don't know that I'd give any boost. If CF was a tougher defensive position, they wouldn't hit as well and the adjustment would be there. I wouldn't artificially add anything - the numbers show CF as not as valuable, it's probably because it wasn't.
   374. Joey Numbaz (Scruff) Posted: August 24, 2007 at 03:32 PM (#2498169)
That should say if the numbers show CF as not as valuable . . .
   375. Joey Numbaz (Scruff) Posted: August 24, 2007 at 03:34 PM (#2498171)
I would also say looking at your numbers IIRC (don't have the database handy) Tris Speaker showed up as the #4 player in Pennants Added. That's probably too high, probably because of the boost.

Please correct me if I'm missing something though.

Also, I finally got my All-Time Sourcebook and Handbook unpacked last night, I'll get you those pre-1893 formulas sometime this weekend.
   376. Joey Numbaz (Scruff) Posted: August 24, 2007 at 03:35 PM (#2498173)
Oh and pre-1900, RF hit a lot better than LF and 1B. RF was clearly the DH equivalent of the 19th Century. That's where you hid the awful fielder.
   377. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 24, 2007 at 03:46 PM (#2498187)
Well, Joe, the worst-regulars average for CF was about equal to that of LF/RF straight through about 1980. I do NOT think it would be right or fair to give NO credit for playing CF as opposed to corner in the 1960s or 70s...I think the point is that teams just tended to put their best overall athlete at CF, and so hitting was pretty deep at the position.

Actually, the worst-regulars average (just looking at batting and FRAA) for RF is MUCH lower than it is for CF in the 1890s, by more than a full win per season. In that decade at least, RF was where you hid the awful player, who was often your worst hitter as well as your worst fielder! (See Farmer Weaver in 1894, among many others I could name if you'd like). Basically, the worst-regulars approach for measuring replacement level breaks down as you move back into the 19th century, since the lower quality of play meant that some of the best hitters also played the toughest positions just because they were the best athletes (much like Little League or high school where the best hitter plays SS).

I have Speaker as #6 on salary (behind Ruth with pitching credit, Bonds with estimates for 06/07, Williams with war credit, Wagner, and Cobb), just a hair ahead of Willie Mays with war credit and a little more ahead of Hornsby. I would imagine Pennants Added would be similar. That doesn't seem too irrational to me...Speaker was a damn great player. That said, if I recalibrate my method for handling SB in years where there is no CS data available as I am looking into doing, he might fall by a few million.
   378. AROM Posted: August 24, 2007 at 03:57 PM (#2498200)
In that decade at least, RF was where you hid the awful player, who was often your worst hitter as well as your worst fielder!

That's what they did for me.

I'll get you those pre-1893 formulas sometime this weekend.

I sent you some baseruns formulas that try to estimate ROE last year. Did you find those useful?
   379. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 24, 2007 at 04:02 PM (#2498207)
I honestly don't remember getting them, although I already use BaseRuns with ROE estimates for the 1890s myself. It would be interesting to see if our approaches are similar.
   380. Joey Numbaz (Scruff) Posted: August 24, 2007 at 04:28 PM (#2498228)
Good points Dan . . .

So the worst CF before 1980 hit as well as the worst, LF/RF, really? That's pretty shocking to me.
   381. KJOK Posted: August 24, 2007 at 04:40 PM (#2498238)
Well, Joe, the worst-regulars average for CF was about equal to that of LF/RF straight through about 1980. I do NOT think it would be right or fair to give NO credit for playing CF as opposed to corner in the 1960s or 70s...I think the point is that teams just tended to put their best overall athlete at CF, and so hitting was pretty deep at the position.


I'm positive that if you were using average instead of replacement as the baseline, you'd get a different answer here.

Teams did put their best defensive starting OF in CF, but 'replacement OFers' were just that - replacement OUTFIELDERS, not replacement CF, or LF, or RF (plus on lesser teams their best fielding OF was probably also their best hitting OF) so you would in theory get nearly the same replacement level for all 3 OF positions.
   382. DL from MN Posted: August 24, 2007 at 05:56 PM (#2498311)
"I don't know that I'd give any boost. If CF was a tougher defensive position, they wouldn't hit as well and the adjustment would be there. I wouldn't artificially add anything - the numbers show CF as not as valuable, it's probably because it wasn't."

I was thinking this morning that there is one big "gotcha" using only offensive numbers to measure toughness of defensive position: handedness. It is possible that CF is a more important defensive position than 3B even though the offensive levels are approximately the same. 3B will show a lack of offense because you can't have a lefthanded thrower there (which weeds out most LH batters) but CF can have a LH thrower and the replacements will be more readily available. I have a feeling the true defensive spectrum is more like SS/C/2B/CF/3B/RF/LF/1B but it is easier to find a replacement at CF who can put up decent numbers than at 3B just because of the platoon advantage. This artificially makes it look like 3B is a more difficult defensive position.

If you think about it a little it becomes clearer - I've seen lots of SS/2B who were able to handle CF but I haven't heard of many 3B since Tommy Leach who were moved there unless they were first moved off SS (Sheffield, Bill Hall). 3B usually get moved to RF. I'll admit the lines get blurry around 3B and CF but it explains why CF hitting typically has a high replacement value despite the defensive demands of the position.

This may explain why I have a bunch of 3B on my ballot while the CF glut sits mostly together off the ballot. Or I might be completely wrong.
   383. Joey Numbaz (Scruff) Posted: August 24, 2007 at 06:08 PM (#2498325)
DL - I would think not too many move from 3B to CF (or CF to 3B, don't forget) because they have different skill sets - entirely different skill sets.

CF need a lot of range and a passable arm, sometimes less. 3B don't need any range, just quick reflexes and a good arm.
   384. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 24, 2007 at 06:20 PM (#2498344)
Joe Dimino, yes they did, I can show you the data if you'd like.

KJOK, I don't track positional averages so I just have no idea.
   385. Dr. Chaleeko Posted: August 24, 2007 at 07:36 PM (#2498471)
Right the other point I saw in DL's post is that few LH hitters play 3B, lots of LH hitters play CF. If an athletic LH player comes along he MUST be a CF to maximize his athleticism, he can't play throwing-IF as a lefty in the big leagues (whether theoretically or actually). But LH batters have a natural advantage in batting, they hit for higher averages overall, they ground into fewer DPs, they have bunting advatnages, etc.... So if you were to compare offense to offense between the positions, you'd need to account for the difference in batting handedness to see whether CF and 3B truly have some measure of equivalence on offense even before you made any inference about defensive difficulty on a (lack of) offense grounds.

Or at least that's what I think I thought he might have implied.
   386. DL from MN Posted: August 24, 2007 at 08:22 PM (#2498547)
Good point. Would Ken Griffey Jr have played 2B in bizzaro world baseball where you run the bases clockwise? There's almost 2 different defensive spectrums - one for lefties and one for righties.
   387. AROM Posted: August 24, 2007 at 09:33 PM (#2498654)
I honestly don't remember getting them, although I already use BaseRuns with ROE estimates for the 1890s myself. It would be interesting to see if our approaches are similar.

I didn't send them to you, but I can send them now if you like. I'm pretty sure I sent them to Joe after the DC primate meetup last year, I was asking him in #378.
   388. Paul Wendt Posted: August 29, 2007 at 05:13 PM (#2503551)
Regarding the 1890s,
how many innings did part-time pitchers play in the outfield?
and how many part-time pitchers show up among the worst regulars at any fielding position?
   389. David Concepcion de la Desviacion Estandar (Dan R) Posted: August 29, 2007 at 07:02 PM (#2503658)
I'll double-check, but did any part time pitcher lead his team in PA accumulated playing a given fielding position? Seems highly doubtful to me...
   390. jimd Posted: August 29, 2007 at 08:17 PM (#2503721)
It certainly happened in the 1880's; see Bob Caruthers and the OF, and other examples (Foutz, Whitney, Hecker, Ward, Bradley, etc). E.g. for 1887 StL, he, Dave Foutz, and Silver King were the pitching staff, and he, Foutz, and a cast of thousands split the RF duties into thirds. That year Bob was #2 in IP (341) and #3 in GP in the OF (54) over a 140 game schedule.
   391. jimd Posted: August 29, 2007 at 08:33 PM (#2503735)
Here's one post 1892. Watty Lee for 1903 Senators. 3rd OF (47 GP, 37 as the team leader in RF) and 4th SP with 20 GS and 166.2 IP. He also played 96 G in OF in 1902 as 3rd OF, with 10 GS and 98 IP.
   392. jimd Posted: August 29, 2007 at 08:46 PM (#2503749)
Here's one in living memory. Wonderful Willy Smith for the 1964 Angels. Led the team in GP in the OF (87, though did not lead at either corner) and had a 125 OPS+. He also made 15 mound appearances (1 start) for 31.2 IP at a 116 ERA+. Unfortunately for this budding Caruthers, the Angels appear to have made him an OF only, and that would be his career high in OPS+. He would play 9 seasons 1963-71, 6 as a regular.
   393. Paul Wendt Posted: August 30, 2007 at 01:57 PM (#2504558)
The numerical highlights, in my opinion, include the two AL heydays and the two world wars.

Year Diff.
1901 0.27
1902 -0.24 (maximum margin in favor of AL and, easily, maximum one-year change)
1903 -0.22
1904 -0.17
1905 -0.14

1916 -0.13
1917 -0.10
1918 0.02
1919 -0.16
1920 -0.10
1921 -0.09
1922 -0.22
1923 -0.19
1924 -0.17

1940 -0.06
1941 -0.04
1942 0.29 (all-time maximum interleague difference)
1943 0.13
1944 0.07
1945 0.23
1946 -0.06
   394. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 13, 2007 at 06:13 PM (#2523301)
I am pleased to report that I have completed an update to my position player WARP system. While I recognize that some voters find constant revisions to statistics tiresome, I feel that there are enough significant improvements in this version to justify the effort.

What's new:

1. Stolen bases
I have improved the methodology for estimating caught stealing, so that the relationship between a player's SB/CS runs and his SB attempt rate matches that of the entire period for which caught stealing data is available. I also have incorporated caught stealing data for the pre-1947 seasons when it is available.

2. Estimated non-SB baserunning
I did a regression on James Click's non-SB baserunning runs and found several significant relationships. This has enabled me to estimate non-SB baserunning runs for all seasons when they are not available.

3. Estimated NetDP
I did a regression on net double play data, and also found numerous significant relationships. I am thus able to estimate double play avoidance runs for the seasons when GDP data is available but NetDP data is not (1933-1958). Double play avoidance is still not included for the 1893-1933 period.

4. Standard deviations

I changed my regression methodology for league standard deviations. The tweaks are:

a. Previously, I took the standard deviation of wins above overall league average for the whole league as my dependent variable. But I subsequently realized that that means I was measuring the difference *between* positions as well as the difference *within* positions, and thus that a high number of excellent seasons by players at defense-first positions could actually *decrease* the calculated standard deviation (since a 4 WARP player at a position with a replacement level of -4 has zero wins above overall league average). I now use wins above *positional* average as my dependent variable, since the spread of performance between positions rather than within them does not make a league easier or harder to dominate.

b. I now only count the first two seasons after an expansion as "expansion" years for regression purposes--the lingering expansion effect in subsequent seasons is now measured solely through the winning percentage of the worst team in the league variable. This improves accuracy.

c. I no longer include the DH variable in my league adjustment calculations. As you can see on the regression results, the presence of the DH substantially reduces the league standard deviation, by adding a bunch of hitters who are typically just a bit above average to the pool. But this doesn't actually make a league easier or more difficult to dominate, it just means there are extra average-ish hitters in one league that aren't in the other. I was effectively double-counting the DH effect by including it both in the replacement level (where it belongs) and also in the LgAdj calculation. This correction should put DH-era players from both leagues on a more equal footing.

5. Replacement levels

Two changes:

a. My approach to measuring outfield replacement level was not working. First, by keeping the gap between CF and corner OF flat, I lost the ability to calculate changes in the relative strength of the outfield positions over time. Second, the basic premise that the easiest positions to field should have the best hitters and vice versa--a signature of a high quality of play--simply does not hold for the 1890s. Just like in high school baseball where the best hitters play SS (because they are simply the best players, period), in the 1890s many of the absolute worst hitters played RF, where the costs of their poor fielding were likely to be minimized (as in Little League where the fat kid plays right). As a result, the worst-regulars methodology for determining replacement level is incapable of accurately assessing OF in those years. Thus, for lack of a better option, I now use the worst regulars method (treating CF as one position and corner OF as another) for 1925 to the present, but have reverted to the percentage-of-positional-average approach for OF before 1925. I take the average production of all outfielders for each league-season, do some arithmetic to determine a replacement level, and then add 0.2 wins to CF (and subtract 0.2 from RF up through 1918). (This is why the OF replacement levels do not have nearly as many year-to-year fluctuations as the other positions do before 1925). This is not an ideal solution, but it is reliable and gives results that are intuitively correct, rather than the exceedingly low replacement levels (and, therefore, high WARP scores) that the old approach was producing for 1890s outfielders.

b. In theory, since I adjust for standard deviation, the overall league replacement level (the average production of the worst 3/8 of regulars in the league) should be constant from year to year. In practice, this is not the case, for two main reasons. First, in seasons where the actual standard deviation differs substantially from the regression-projected one, the replacement level will differ as well (lower in seasons where the stdev is higher than expected and higher in seasons where the stdev is lower than expected). Second, I do not correct for kurtosis, and so the replacement level will be lower in high-kurtosis ("fat tails") seasons than in low-kurtosis ("shoulders" and thin tails) seasons. While the latter effect may be real--the overall league replacement level probably *is* lower relative to average in high-kurtosis eras than in low ones--it does not seem to me to be in keeping with the "fairness to all eras" principle to have the overall league replacement level (as opposed to individual positional replacement levels) float with time. Thus, I now adjust the leaguewide replacement level for each year to be equal to the 2005 replacement level. If replacement goes up at one position from year to year, it now must go down at another.

6. Baserunning wins accounting
This isn't a change to the underlying WARP calculation, but simply an adjustment to how I credit wins between batting and baserunning in the results sheet. Before, baserunning wins were calculated relative to a player with 0 SB and 0 CS. Now they are calculated relative to league average. This means that if a player has 0 SB and 0 CS in a league where the overall SB/CS rate is negative in terms of runs, he will be credited with positive baserunning wins (although he is likely to give some of them back in the non-SB baserunning estimation).

The upshot:

The biggest changes are in the 1890s, particularly its outfielders. Due to the "fat kids" playing right, and to the actual standard deviations being higher than the regression-projected ones, the overall replacement level in the 1890s was extremely low in the old system, giving players from that era a systematic boost relative to subsequent ones. Outfielders from that period now get roughly the same replacement level as they do for the rest of MLB history (around 0.8 wins below average per 162 games) rather than the 1.6 range they were getting previously, so they all get their wings clipped. The next-biggest change is the removal of the DH from the league adjustment calculations, which takes a smaller bite (on the order of 0.3 wins per 162 games) out of DH-era AL players. Following that would be the adjustments to seasons 3, 4, and 5 following an expansion, which are penalized much less than they were under the old system (unless there was still a really bad team in the league). Guys with SB success rates that were particularly good (Fritz Maisel) or bad (Charlie Hollocher) relative to the league average before 1947 will see major adjustments. Center fielders from periods after 1925 where the gap between CF and corner OF replacement level was particularly large (like around 1980) will be rewarded, while those from periods after 1925 where it was unusually small (like around 1990) will be docked. Leadoff men, center fielders, basestealers, and triples hitters before 1972 will all get some non-SB baserunning credit (and those who did all four, such as Richie Ashburn, will get a lot of it); while middle-of-the-order guys, catchers, non-stealers, and non-triples hitters before 1972 will take a hit on their baserunning (the player-season which suffers the most is 1923 Steve O'Neill, a catcher who hit .248/.374/.285 with 0 3B and 0 CS). Finally, the champions of double play avoidance from 1933 to 1958 (Mickey Mantle is Exhibit A) get some help, while GDP machines like Ernie Lombardi suffer.

I think that covers all of it. I encourage you to download the new version from the Hall of Merit Yahoo group, and don't hesitate to email me or post on this thread if you have any questions/comments.
   395. Jim Sp Posted: September 13, 2007 at 06:31 PM (#2523335)
Dan,
Thanks. I'm glad you revised your spreadsheet, unlike BP at least you give an explanation :)

Are there any candidates that you formerly advocated that you don't advocate now based on this, or vice versa?
   396. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 13, 2007 at 07:01 PM (#2523360)
Jim Sp, here is a list of the biggest gainers and biggest losers in the adjustment, measured in WARP2. A lot of 1950's center fielders on top, and 1890's guys on the bototm.

1. Mickey Mantle, +6.1
2. Richie Ashburn, +6.1
3. Bill Bruton, +4.3
4. Maury Wills, +3.6
5. Luis Aparicio, +3.3
6. Eddie Mathews, +3.1
7. Willie Mays, 2.9
8. Tony Taylor, 2.6
9. Eddie Yost, 2.5
10. Larry Doby, 2.5
11. Jim Gilliam, 2.4
12. Don Buford, 2.3
13. Jim Fregosi, 2.3
14. Ken Boyer, 2.2
15. Lou Brock, 2.2
16. Jim Landis, 2.0

1. Bill Dahlen, -10.9
2. Ernie Lombardi, -10.5 (the perfect storm of poor estimated non-SB baserunning and lots of double plays)
3. Joe Kelley, -10.1
4. Jesse Burkett, -9.9
5. George Van Haltren, -9.4
6. Tommy Corcoran, -9.3
7. Billy Jurges, -8.5
8. Fred Clarke, -8.5
9. Ed Delahanty, -8.2
10. George Davis, -8
11. Luke Appling, -7.9
12. Herman Long, -7.8
13. Dummy Hoy, -7.6
14. Kip Selbach, -7.5
15. Hugh Duffy, -7.5
16. Joe Cronin, -7.5
17. Monte Cross, -7.3
18. Steve Brodie, -6.7
19. Alex Rodriguez, -6.6
20. Fielder Jones -6.5
21. Bobby Wallace, -6.5

other notables: Robin Yount -6.4, Billy Herman -6.4, Lou Boudreau -6.2, Jimmy Ryan -6.2, Hughie Jennings -6.1, Roy Thomas -6.1, Honus Wagner -6.1, Willie Keeler -5.8, Gabby Hartnett -5.8, Dave Bancroft -5.6, Jimmy Sheckard -5.5, Bobby Doerr -5.5, Dick Bartell -5.5, Joe Sewell -5.4, Rabbit Maranville -5.4, Marty Marion -5.2, John McGraw -5.1, Pudge Rodriguez -4.9, Derek Jeter -4.7, Manny Ramirez -4.6, Mike Tiernan -4.5, Vern Stephens -4.4, Charlie Gehringer -4.4, Jake Beckley -4.1, Joe Gordon -4.1, Roberto Alomar -4, Frank Thomas -4, Bobby Grich -4, Frankie Frisch -4, David Concepción -3.9

In terms of my PHoM, Richie Ashburn definitely moves from out to in, I'd have to check the rest. In terms of my ballot, Reggie Smith will move ahead of the shortstops, Campaneris will make the middle of my ballot (thanks to non-SB baserunning), Cravath will debut with a lower ballot placement (he's helped by a friendlier league adjustment for the teens NL), and Tenace will replace Schang (although Tenace should actually be $68M, not the $73 or so he's credited with in the spreadsheet, since he spent so many years as a C/1B). Nothing particularly revolutionary here.
   397. Dr. Chaleeko Posted: September 13, 2007 at 08:01 PM (#2523436)
Dan R,

How does Tommy Leach's baserunning and GIDP avoidance look? Anything worth noting about it?
   398. Dr. Chaleeko Posted: September 13, 2007 at 08:07 PM (#2523444)
Dan R,

Also does Dick Allen show up as a good baserunner?

Thanks!
   399. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 13, 2007 at 08:50 PM (#2523491)
I don't have GDP avoidance for Leach, as GDP weren't recorded pre-1933. Leach was basically an average runner, outside of a nice year on the basepaths in 1907. I have Allen as a plus baserunner in 1965 and 67, average otherwise. I estimate EqBR with SB per time on base, 3B, BA, BB rate, PA per game, and position. Allen has nice triples and some steals, but he doesn't have the super high batting averages, low walk rates, high PA per game, and up-the-middle positions generally associated with high EqBR. Which doesn't mean he wasn't an excellent baserunner--it just means we can't tell he was from his other stats.
   400. Joey Numbaz (Scruff) Posted: September 13, 2007 at 09:09 PM (#2523506)
How does Alfredo Griffin show up? Wasn't he a disaster on the basepaths, but he's the type of guy that should be considered fast, based on the typical criteria for guessing . . .
Page 4 of 8 pages  < 1 2 3 4 5 6 7 8 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Edmundo got dem ol' Kozma blues again mama
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 1.0887 seconds
49 querie(s) executed