User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Page rendered in 0.7297 seconds
41 querie(s) executed
You are here > Home > Hall of Merit > Discussion
| ||||||||
Hall of Merit — A Look at Baseball's All-Time Best Friday, March 01, 2002Estimating League Quality - Part 1 (the concept)First of all, let me apologize for the lack of material posted to the Hall of Merit BLOG. In the coming weeks, I’m confident this will no longer be a problem. When we consider players who played over 100 years ago, it is vital to look at the quality of the leagues they played in. Using a method that is similar to what Clay Davenport has been doing for some time (for examples of this kind of work, see Clay’s recent postings on Baseball Prospectus concerning the quality of play in the Japanese Baseball Leagues), I attempted to estimate the quality of baseball in the “major” leagues of the 19th century. I focused on hitting stats, since at this time there were only a handful of pitchers active at a given time in a given league. My method assumes that a player’s overall batting skill does not change appreciably from one year to the next. This assumption is not true on an individual basis, but it starts to make sense when we are talking about a large group of players. The individual changes in skill should become less important as the size of the group increases. In leagues that are stable, there isn’t a very high turnover in personnel from year to year. In the 19th century National League, in most years, about 70%-80% of the players returned to play regularly the following year. In cases where new leagues started up and players jumped, the percentage of holdovers was much much lower - and this makes comparison much more difficult. I estimated the quality of each hitter?s batting by using a runs produced ratio [(R+RBI)/PA] and compared it to a league average performance. The reason I chose this, and not Runs Created or Linear Weights, is that I wasn’t going to adjust for park and I assumed that the batting order bias of the R anbd RBI stats was not going to be relevant for a large group of players either. In the 19th century, where more advanced run estimation formulas are much less accurate than for “modern” baseball, I opted for the simplicity of using Runs Scored and RBI. Because we are comparing each group of players to league average the result shouldn’t be far from 1.00 for a relatively stable league (where the majority of regulars return the next year). In practice, it’s unlikely to be exactly 1.00 of course. If the newcomers to the league in a given year were better than typical newcomers, the performance of the holdovers would be worse than in a typicla league and this would be a sign that the league was getting stronger. On the other hand, if a lot of good players jumped to a rival league and their places were filled by less skilled batsmen, the holdovers would improve their performance relative to league average and this would be a sign that the league was weakening. By comparing the overall performance of the SAME group of players from year to year and league to league, it should be possible to track the changes in the overall quality of play. In the next part, I’ll apply these methods to a specific example.
This thread will now be included with the Hall of Merit links. -John Murphy |
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsMock Hall of Fame 2024 Contemporary Baseball Ballot - Managers, Executives and Umpires
(16 - 5:52pm, Nov 28) Last: reech Most Meritorious Player: 2023 Ballot (12 - 5:45pm, Nov 28) Last: kcgard2 2024 Hall of Merit Ballot Discussion (169 - 1:15pm, Nov 26) Last: kcgard2 Most Meritorious Player: 2023 Discussion (14 - 5:22pm, Nov 16) Last: Bleed the Freak Reranking First Basemen: Results (55 - 11:31pm, Nov 07) Last: Chris Cobb Mock Hall of Fame Discussion Thread: Contemporary Baseball - Managers, Executives and Umpires 2023 (15 - 8:23pm, Oct 30) Last: Srul Itza Reranking Pitchers 1893-1923: Results (7 - 9:28am, Oct 17) Last: Chris Cobb Ranking the Hall of Merit Pitchers (1893-1923) - Discussion (68 - 1:25pm, Oct 14) Last: DL from MN Reranking Pitchers 1893-1923: Ballot (13 - 2:22pm, Oct 12) Last: DL from MN Reranking Pitchers 1893-1923: Discussion (39 - 10:42am, Oct 12) Last: Guapo Reranking Shortstops: Results (7 - 8:15am, Sep 30) Last: kcgard2 Reranking First Basemen: Ballot (18 - 10:13am, Sep 11) Last: DL from MN Reranking First Basemen: Discussion Thread (111 - 5:08pm, Sep 01) Last: Chris Cobb Hall of Merit Book Club (15 - 6:04pm, Aug 10) Last: progrockfan Battle of the Uber-Stat Systems (Win Shares vs. WARP)! (381 - 1:13pm, Jul 14) Last: Chris Cobb |
|||||||
About Baseball Think Factory | Write for Us | Copyright © 1996-2021 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| Page rendered in 0.7297 seconds |
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
As Robert said though, by looking at the changes over a SMALL set period of time, we might be able to learn something GENERALLY, without getting too specific.
* Five year floating comparisons. A player's performance can be compared with what he did up to five years before or after that year. This is the Cramer study, with a partial buffer against the effects of aging.
* Pitchers hitting, relative to the league. After 1975, this becomes less useful because pitchers bat less often in the minors due to the DH rule.
* Fielding percentage, adjusted for strikeouts (PO-SO)/(PO-SO+E). Less useful early in history since players did not start using gloves at the same time.
On the other hand, that is not a fair comparison. The NA represented the best baseball, the state of the art, at the time. The AA did not. So I think it is two different things to discount Ross Barnes or Al Spalding versus discounting Stovey, Browning, Caruthers and Mullane. At their best Stovey, Browning, Caruthers and Mullane may never have been as good as their more or less exact contemporaries Brouthers, Connor, Ewing, Glasscock and Clarkson. But at their peak, there was nobody, no rough contemporaries anywhere near as good as Spalding, Barnes and Wright. To me, those are two quite different things that just happen to be described by the same concept--that is, "-.020."
>I would subjectively put the league quality at a level so that the early stars (Barnes-Spalding-Wright) come out as no-better-than-even in terms of peak value with Brouthers-Glasscock-Nichols....
That seems sensible. And in fact they would be more or less equal in terms of peak value, and then the short careers of the early stars would have to be factored in. So, no, none of the stars of the NA is probably the very best of the 19th century at his position, but in terms of peak value (in terms of their contributions toward winning pennants) they were "in the ballpark."
It is interesting to note from the Cramer study that the rates of increase levelled off some in both 1920 and 1960, so we can assume the increase in quality of play has slowed. I do not think the baseball of today is all that much better than the baseball of 1960 -- were you to have an average team from 1960 play an average team from 2000 for 1000 games, my guess is the 2000 team would win 504 games on average, or something like that.
As I've pointed out on the Pitchers thread, there should be little difference in quality between the NA of 1875 (when Boston went 71-8) and the NL of 1876. I'll let those with better numbers track the evolving quality of the earlier NA years and the succeeding ones of the NL. (scruff has published indications that he believes the NA of 1875 may have been tougher.)
However Harry Wright did it, there is no question that he determined who the best players of that era were and get most of them onto his team. He managed to keep them together and winning with apparently few ego problems (at least none that I've read about). This was good for them, but ultimately, this was bad for the NA (we think that there is no hope of competing against the Yankees now; they are nowhere near this dominant). A priori, they are the NA All-Star team; it will be interesting to see if anyone else can crack that lineup.
I'd try to come up with something a little more comprehensive, like XR or RC or something. But then you run into problems with the formulas, etc.
This is what I'd do, if I had the time. Figure runs created for each player on each team, normalize the totals of the individuals to the team total runs, so they come out the same, i.e. the total for Chicago individuals in 1876 equals the actual number of runs the 1876 Cubs scored.
I know some people are opposed to this in theory, but the accuracy of RC and XR is questionable for that time period, and I think the adjustment is necessary, for the stabilizing effect.
Once you have the RC, figure RC per 27 outs (don't use 27, use the league average per game of batting outs recorded, will make a big difference back then, because of all the errors).
Finally, adjust that for the league. Now you've got a number you make meaningful comparisons with.
Alternatively, if you have the WS book (and the digital update from Stats), you can figure out WS per 162 team games or something like that (adjusting for the player's relative playing time), and use that as your comparison number. That would be a helluva lot easier, and the number is meaningful. But I don't think batting average is a good number to use. Just because fielding gets better (moving batting averages down), doesn't mean the quality of league play goes up. Take a league of Ozzie Smith's everywhere, even out in LF. Add Barry Bonds and Rickey Henderson and Jeff Bagwell to the mix (removing two Ozzie LF's and an Ozzie 1B), and the league fielding will get worse, but you're going to have a higher quality league. WS is the kind of number that is perfect for a study like this, even though it is seriously flawed with the pitcher/fielder split on defense for this era.
If you use WS, you can include pitchers as well, or do two separate studies.
Just my .02 . . .
David Foss:
It's possible that the AL stars may not have affected the "base-level" of play for the league. Is there any evidence that would support that the AL also had more weak players and weak teams... not enough to reverse the discount, but enough to balance things?
Michael Schell (Biostatistics, UNC) believes that "standard deviation measures league talent" inversely. He uses a version of league SD as a measure, which is not an argument, but it is clear that he believes it. [Baseball's All-Time Best Hitters, Princeton U P, 1999, chapter 4.]
For example, "Many of Cobb's American League compatriots would likely have ridden the National League bench during the same era."
That era is 1910-1914, when SD batting average was AL .043, NL .030. In the AL, 4% batted above .350 and 12% below .220 (mean-adjusted). In the contemporary NL and in both leagues 1980-1984, the shares were 0% and 6%. By those measures, the number of extra "low outliers" in AL1910-1914 was greater than the number of extra "high outliers".[*]
Schell's argument implies that the AL was weaker than the NL, 1910-1930; stronger, 1901 and 1960-1975. That isn't plausible.
[*]
jimd suggested that standard deviation is low in NL1910-1914 (s.d. mean-adjusted batting average) because the NL lacked outliers such as Cobb, Jackson, Speaker, Collins, Lajoie on the high side.
http://www.baseballthinkfactory.org/files/primer/hom_discussion/1932_ballot_discussion/P200/#9
Such calculations do not measure league quality, but they have a place in interleague comparisons.
This is largely irrelevant to those who use OPS+/ERA+, Win Shares or WARP to evaluate players.
Again this is irrevelent for those who use statistics that are scaled like OPS+/ERA+, etc.
This has been mentioned before, but this thread looks like a good place to restate this.
Schell's estimates for Shibe Park suggest the same for 1938-1954. Shibe was a better batting park in the AL because the other AL parks were not so good for batting as the other NL parks.
It should be easy to study the matter by using a consistent series of park factors as a resource, eg the Total Baseball park factors.
Isn't this also true for Sportsman Park?
Isn't this also true for Sportsman Park?
Yep. See above.
Sorry, Jim. I missed that post.
(By eye and mind) Error rates seem to show that NL fielding was significantly better than AL and FL in 1914; that NL and FL fielding improved significantly in 1915.
BTW, the FL signed many more NL regulars than AL.
That's enough for me, since someone else probably has an electronic database with the relevant data for every league-season. (IP, SO and E, I think)
OK, if I understand correctly:
Suppose that a replacement-level full-time player is worth 6.5 win shares (thus 78 WS or 26 wins for a team of replacement-level players). 6.5 is someone's estimate.
Ruth joins one of 16 teams, which increases the MLB average talent by 3 WS per team on the old scale, or 0.25 per full-time player-position (8 regulars, 4 pitchers). Because the number of wins is fixed, that decreases measured talent by 0.25 per full-time player-position for all slots but his own. Because he joins one team in one league, his impact on the talent scale is 0 in the other league and -0.5 per full-time player-position in his own league. After his entry, the measured talent of replacement-level players is 6.5 WS in the other league, 6.0 in his own league.
Right?
For Wheat vs Hooper, they start at 385-330 (all seasons adjusted to 154 G), and this narrows the gap to 385-370 (or 345-330). I don't see them as being that different. Maybe the line between being barely in and being barely out passes between them, but not the line between being first ballot and having no chance at all.
jimd, I know these are back-of-the-envelope estimates, but looking at the careers of these players, I don't think the imbalance was typically 7-2, or that each superstar had the effect of Ruth at the top of his game. If my modeling of great-player effects is correct, pitchers pull more WS from other pitchers than from position players, and vice versa. So let's have Johnson and Alexander cancel each other out, and look at the hitters.
Superstar years
Cobb, 07-19
Speaker, 09-23
Collins, 09-20
Jackson, 11-13, 16-17, 19-20
Baker, 11-14
Ruth, 16-32
Hornsby 17, 20-29
Prior to 1911, there's another set of superstars to deal with, so let's start in 1911.
1911-13 5-0, AL
1914 4-0 AL
1915 3-0 AL
1916 5-0 AL
1917 5-1 AL
1918 4-0 AL
1919 5-0 AL
1920 4-1 AL
1921 2-1 AL
After 1920 we also would need to look at a different set of players in both leagues. On average, from 1911-1920 there's a 4.3 AL advantage in superstars. If we estimate that these superstars average 40 ws/season as a group (which is generous), each one drops a regular in his own league about .33 ws. 4.3 superstars together is 1.4 ws/year during this period. Assuming the average superstar advantage for the AL remains constant throughout the Wheat/Hooper period (I doubt it would be wider), that would mean more like a 25 win-share cost to Hooper for better competition during his career, leading to a 385-355 edge for Wheat.
Maybe the line between being barely in and being barely out passes between them, but not the line between being first ballot and having no chance at all.
I see the gap as a bit wider than that, but remember 1933 was an odd ballot. It's not clear that players who finished 4th and 5th in that election, Jake Beckley and George Van Haltren, will manage election in, say, the next 20 years, so the barely in/barely out line was very close to the top of this ballot. The electorate saw Wheat as above the Beckley/Van Haltren line, Hooper as below it.
If the case for Hooper is to be made successfully, competition quality arguments need to be advanced to run him past George Van Haltren and Jake Beckley as the top career-value candidates from the backlog. I'm not ready to make that move (at least with respecct to Van Haltren) but I hope the electorate is paying attention to jimd's case on league quality differences; there's absolutely an issue here!
I believe that Win Shares overrates OF'ers from this era because of the fixed ratios for allocating Win Shares to positions. It makes no sense to me to see this long-term evolution that converts defensive plays by the infielders into strikeouts and to then assert (as Win Shares does) that infielders were no more important then than they are now even though they were making many more plays per game.
Nonetheless, I added up the WARP-1's for the 9 players mentioned in each year, from 1910-1925 (Hooper and Wheat's overlapping careers). The difference between the groups ran from 18.2 WARP-1 advantage in 1925 to an 84.5 WARP-1 advantage in 1912.
Now 84.5 certainly seems like a large gap, but that's what you get when you compare 6 AL stars (all but Ruth) to 1 NL Star (Alexander). Left out of the calculation are that the ACTUAL stars of 1912 were Honus Wagner, Heinie Zimmerman, Chief Meyers, Johnny Evers, and Christie Mathewson, (one could also include Rube Marquard and Larry Doyle here, but I won't). So that's 3 HoMers, plus at least Meyers and Evers, who are out of the HoM due to short careers, not to any challenge of how good they were in their primes.
Compare THOSE 6 to the AL 6, and the AL's WARP-1 advantage in 192 drops to 23.5.
The fact that the NL may have had fewer All-Time Greats does not mean that it had fewer stars in any given year.
It's only a first step to say 7 is more than 2, and assume a difference of 5. If you look year by year, the actual difference in quality between each league's Top X players (for that year) will be relatively small.
What do we do with Gavvy Cravath? Even with the park adjustments, he's putting up big numbers. We mentally discount him, but does he act like a "superstar" for the purposes of this discussion?
1000 PA to make the lists:
<pre>
AMERICAN LEAGUE
CAREER
1911-1920
RUNS CREATED RATE PLAYER LEAGUE
1 Babe Ruth 277 498 180
2 Ty Cobb 248 1299 525
3 Joe Jackson 214 1141 532
4 Tris Speaker 211 1239 588
5 George Sisler 178 587 330
6 Eddie Collins 176 1076 613
7 Sam Crawford 168 641 381
8 Home Run Baker 159 771 486
9 Jack Fournier 149 240 161
10 Birdie Cree 148 301 203
11 Wally Schang 144 414 288
12 Bobby Veach 143 735 512
13 Harry Wolter 141 161 114
14 Braggo Roth 139 441 318
15 Joe Judge 131 362 275
16 Sam Rice 130 308 236
17 Happy Felsch 129 428 331
18 Baby Doll Jacobson 129 272 211
19 Eddie Murphy 128 336 262
20 Harry Heilmann 128 388 304
NATIONAL LEAGUE
CAREER
1911-1920
RUNS CREATED RATE PLAYER LEAGUE
1 Rogers Hornsby 184 485 264
2 Gavvy Cravath 181 723 399
3 Ross Youngs 169 280 165
4 Edd Roush 158 399 252
5 Joe Connolly 154 216 140
6 John Titus 152 194 128
7 Heine Groh 146 653 448
8 Benny Kauff 144 316 219
9 Bill Hinchman 144 223 155
10 Zack Wheat 143 792 552
11 Sherry Magee 142 632 445
12 Johnny Bates 141 238 169
13 George Burns 139 718 516
14 Larry Doyle 138 749 541
15 Honus Wagner 137 506 368
16 Chief Meyers 136 347 255
17 Hal Chase 136 254 187
18 Charlie Hollocher 135 190 141
19 Beals Becker 134 290 217
20 Heinie Zimmerman 133 692 521
1000 PA to make the lists:
The AL does have more superstars, but the NL catches up in the second ten. There are 82 players at 100 RC+ or better in the NL and 65 players at 100 RC+ AL.
Its not clear to me how much of that is due to the averages shifting from the superstars.
Anyways, the proper way to do this is to examine what happens to players who switch leagues. Unfortunately, this was extremely rare in this era.
jimd supposes or estimates that Babe Ruth is 16+ wins above replacement; call it 49 win shares.
. . .
Posted by jimd on September 01, 2004 at 09:05 PM (#832239)
Paul, my original argument was made using WARP. It looks like you've constructed the Win Shares version of it here. Adding Ruth to the league cost every regular about 0.5 Win Share per year
Thanks for the clarification.
Relying on someone's estimate of full season replacement level 6.5 win shares. (JoeD?)
Babe Ruth is 48-49 win shares above replacement only in 1923.
01 37 36 40 43 - 1915-1919
51 53 29 55 45 - 1920-1924
13 45 45 45 32 - 1925-1929
38 38 36 29 20 - 1930-1934 (finale, 2 in 1935)
Best 5-year record (a period much shorter than the career of anyone considered here), 233 win shares; annually 46-47 or 40 above replacement.
Best 10-year record, 419; p.a. 42 or 35-36 above replacement.
Best 15-year record, 608; p.a. 40-41 or 34 above replacement.
I don't know how the impact is distributed to pitchers and others, so that's all for now.
While on the one hand I completely understand how the addition of a Babe Ruth to one league could depress other players' Win Shares, I could also see how if the top two or three teams in terms of drafting marginal talent or making overlooked "finds" were all in the same league, the effect could be the same or greater.
Cravath has "found" by the NL at 31. So was George Suggs at age 27. I one league's teams are systematically replacing lost talent with higher quality "replacement players", that could be like spreading an All-Star throughout the lineup.
Put another way:
Win totals for last (8th) place teams, 1910-1920:
AL: 47 wins (range, 36 to 57 wins; 4 different teams finished last, but the A's finished last 6 consecutive years )
NL: 55 wins (range, 44 to 69 wins; 6 different teams finished last)
AL players got years of feasting on the 36 win Philly A's in 1916 -- with every other team finishing within one game of .500! (Lots more Win Shares to spread around), meanwhile the NL had enough parity that the Giants finished in last place in 1915 with 69 wins -- two years after and two years before winning the pennant.
It all depends. If we're comparing 1912 with 2002, I'm on your side Andrew. The fact that so many "all-time greats" are playing at the same time indicates to me that they probably weren't so great because the competition wasn't either.
But if we're comparing AL and NL in 1912, it's a different story. It could be that Cobb, Collins, Speaker, etc., are tearing up a minor league like Browning did in 1882, but there are other factors that indicate it isn't so. The NL All-Star team of 1882 is mostly HOMers, the NL All-Star team of 1912 is not close. The AL of 1912 has been in operation for over a decade; they'd pretty much have to deliberately run their league into the ground to turn it into a minor league quality operation. The AA of 1882 is just getting started and so highly likely to be full of replacement level players.
The hypothesis that the two leagues were basically equal but that the NL had had a bad run of luck at snagging their share of the high-impact superstars is much more plausible than the hypothesis that the NL had been quietly getting the huge majority of the mid-level players leaving the AL with the leftovers and a handful of inflated faux superstars.
24 WARP difference for 6 players is an average of 4.0 WARP each. That's pretty significant. Assuming the rest of the leagues are of equal quality, it's enough to say that a 77-77 NL team was equivalent to a 74-80 AL team. (3=24/8) It's enough to consider a 4% discount.
jimd, I know these are back-of-the-envelope estimates,
Agreed. They prove nothing, except that there is a plausibility to Davenport's calculations. A similar superstar imbalance will appear again in the 1950's, and his calculations will show a similar league imbalance, which will again corroborate the public opinion of the time.
Except the fifties imbalance can be corrobotated by year-to-year comparisons player by player, while the Deadball Era can't.
I'm more in the camp of "the AL had more of the stars, but overall was basically the same as the NL" camp for now.
But John, if the AL had more of the stars, then it can't have been basically the same as the NL. The only way that the value of an average player in the two leagues could have been the same, with the AL having more stars, is if the NL had either more good players, or fewer bad players than the AL to offset the effects of the AL's stars.
If the AL and the NL were basically the same except for the superstars in the AL, then the next tier of above average players -- the Harry Hoopers and the Larry Gardners -- of the AL are going to have their totals suppressed relative to players of the same ability in the NL, because of the stiffer competition.
Overall, I think the two leagues were equal. If that hurts the lower tier of AL players, then that's the case until I'm proven wrong.
Its possible that the NL had a tighter distribution of talent that the AL, yet still have the same average level of play. This way, the best AL teams would be better than the best NL teams (though the worst NL teams would be better than the worst AL teams).
This would explain why the AL had a larger stdev of talent yet still won the WS every year.
This would explain why the AL had a larger stdev of talent yet still won the WS every year.
That's exactly what I was trying to say.
Don't know exactly what you mean by this, but I'll guess. Davenport builds those ratings by year-to-year comparisons player-by-player within each league so that each league-season is rated compared to the adjacent ones. Those ratings use the largest comparison set available, every player, and are much more solid than anything built on a handful of trades in any given season. All of the interleague movement over a hundred years would then calibrate the leagues relative to each other.
Just because it's harder to verify doesn't mean it doesn't exist. I bring up the topics of public opinion, and superstar imbalance, in an effort to find alternative approaches to verifying the disparity. If people have ideas on disproving it, those are welcome too.
I'll agree with that, Jim.
I've pointed this out quite a few times, but the Dick Cramer study have both leagues as roughly equal during the Deadball Era. Either Davenport or Cramer is wrong. Which one it is I haven't a clue.
As for distortions with the metrics, they seem to agree for the rest of baseball history, so that's probably not the case (but who knows?)
It may be relevant to demonstrating the existence of differences in levels of competition. If by comparing park factors, it can be convincingly demonstrated that the AL was a "pitcher's league" during this period, does then the following argument hold true?
I calculate estimated win shares for Negro League players by matching their translated batting statistics to major-league contemporaries and then using the batting win shares of the closest matching player (prorated by PA) as the total for the Negro-Leaguer. I have consistently found that the same batting totals will get you more win shares in the National League than in the American League, at least during the teens.
If the American League is demonstrably a pitchers' league by way of park factors, then one would expect to find just the opposite: American league players should get _more_ win shares for the same OPS than National League players would. Is there any explanation for this finding (if what I have observed can be systematically demonstrated) other than that the level of competition, at least for hitters, was higher in the American League than in the National?
I Googled, but no luck finding it, Jim.
The NL hit .248/.304/.331 and scored 3.62 runs/game.
The AL hit .248/.319/.326 and scored 3.96 runs/game.
The Polo Grounds played as an extreme pitcher's park in the NL (94/93).
The Polo Grounds played as an average park in the AL (100/100).
The Polo Grounds being a typical AL park but a pitcher's park in the NL implies that the rest of the AL parks (on average) would also be considered pitcher's parks in the NL.
If this is typical of other seasons in the vicinity, the implication is that the NL cannot hit or has fantastic defense (or the AL cannot pitch or has great hitting, or some mixture thereof).
The same batting line will get more Win Shares in 1908 or 1968 than it will in 1930 or 1894. That the same batting line gets more Win Shares in the NL than the AL indicates that the AL was the higher scoring league (see 1915 above for example). That they were the higher scoring league despite playing in parks that depressed scoring overall indicates a dramatically different balance between offense and defense than the NL. How this relates to overall league quality, beats me.
Or something else entirely. What about the sizes of the umpires strike zones? Did they use the same balls? Change them as often for the deadball years?
This doesn't pertain to our discussions, yet, but offense levels in the NL dropped quite a bit compared to the AL starting in 1931. This was a response to the record-breaking 1930 season. Offense levels in the AL stayed high until WWII. How did the NL manage to do that independent of the AL?
I have the Cramer batting numbers on a spreadsheet; I can post it to the Yahoo! group site a little later.
His averages are normalized and create the appearance of batters getting "worse" when BA & SA spike up, as in ~1911-13 and in the 20s.
This probably wasn't the right way to do it, but on one of the tabs, I tried subtracting out the differences between the real league BA & SA and those of the reference league, the 1976 NL. The picture looks a little different that way; whether it removes illusions or creates different ones, I can't say. My math skills are largely limited to the basic functions. I'm sure one of you smarter guys can do better.
You can probably also figure out how to carry the numbers forward from 1979 to the present.
That certainly matters in the intertemporal comparisons. There may be a systematic bias in the Cramer measure of general improvement (especially around large changes in the number of MLB teams, I think). It mattes to interleague comparisons if there interleague differences in age patterns.
How looks the population of players who change leagues? (That can't be said succinctly in English.) Is the subgroup that moves from AL to NL different from the subgroup that moves from NL to AL? If yes, that implies a bias in interleague comparisons a la Cramer. A significant YES is most likely around a disruptive event. 1891. 1901. 1915? (numerous Federal Leaguers moved from NL to AL, but not vice versa). 1977? (expansion in AL only).
That is my three cents. All I have today.
Correct. Cramer admitted that he was wrong about this in the eighties.
That shouldn't matter, however, for inter-league comparisons. As I have pointed out, Cramer agrees with Davenport except for the Deadball Era. If Cramer's lack of age data was the culprit for the difference, I think we would be noticing the same problem throughtout Cramer's study (which we're not).
It's probably worth investigating. The local validity of the study assumes that league aging patterns are pretty similar through time. I do remember reading that the two greatest youth revolutions (good rookie crops over a period of a few years) occurred during the early 60's and the early 10's. Maybe that's a factor.
Another factor might be the disproportionate impact of superstars. A player with a 4 year career contributes 12 comparisons to the study, 3 for each of the 4 years he played. Ty Cobb contributes 476 comparisons to the study, 23 for each of the 24 seasons he played. He played 6 times longer but has an impact on the study 46 times greater. Since he peaked around 1915, those samples all say that the AL was weakest then when compared to the years when Cobb was younger or older. This is mitigated by other players in other stages of their career, but the point is that superstars are given disproportionate weight due to the lengths of their careers. Much better would be to only compare adjacent seasons, or to adjust the season weights some other way.
Each league had their own official baseball. Both leagues attempted to minimize the number used until the Chapman tragedy changed that attitude.
IIRC, the NL deadened their ball somewhat after the 1930 season.
That's what James says in NBJHBA, but he doesn't provide any details.
It appears they juiced the ball in '34 to increase attendance.
It's probably worth investigating. The local validity of the study assumes that league aging patterns are pretty similar through time.
Yes, and local is "far enough" from a disruption such as 1898-1903 or maybe 1913-1916. How persistent over time is a measured difference in league quality? If very persistent, then "far enough" is very far. (This point holds for any bias. Age pattern is merely a plausible source of bias re which we suppose a difference between Cramer and Davenport methods.)
Fewer league changes, as in the deadball era, implies more persistent measured differences. (Right?) Get it wrong in 1902 and that may have some impact even in 1912.
Another factor might be the disproportionate impact of superstars. A player with a 4 year career contributes 12 comparisons to the study, 3 for each of the 4 years he played. Ty Cobb contributes 476 comparisons to the study, 23 for each of the 24 seasons he played.
There may be a difference between C and D in the time span of the elementary intertemporal data. Eg, five years for Cramer(?): 1928 is compared with 1923,33 but not with 1922,34.
1933 looks like the outlier to me.
Anyhow, year-by-year micro-analysis of the variations of offense is not going to be too fruitful. The point I was trying to make was that the AL had a much higher offense levels from 1931-1941. The biggest differences were in 31, 33, 36-39.
This was following a period of 1922-1930 where the offense-levels of the two leagues more closely tracked each other. 1920-21 had the AL exiting the deadball era a little earlier than the NL. Then the two leagues tracked each other fairly closely for over a decade before that.
A plot would help here. :-)
I've been rereading The Dizziest Season lately and one of the big stories that year was the "rabbit ball" of '34 in the NL.
Offense did increase 17% that year, FWIW.
And it dropped 16-17% the year before. Attendance was down quite a bit in the NL, so they could have made some sort of correction.
Here is what I was talking about with the NL/AL:
year - NL - AL
OK... that's the second table I've messed up this week. I may give up. One more try... sorry guys...
year - NL - AL
IIRC, Cramer used all possible comparisons (over some PA threshold), but that memory could be wrong, and IAC it's a memory of a summary, as I have never seen the original study.
If he did use all possible comparisons, then the fluke circumstance of having a number of long-career superstars peaking at around the same time in the same league would severely distort the league measurements at that peak.
Yes, Cramer used all pairs (one player, two league-seasons) with at least 20 PA each season.
Perhaps Davenport limits the comparisons. I was recalling a conversation that I initiated in the lobby at SABR34 this July. Dick Cramer observed that Davenport's method must be fairly close to his. He alluded to limiting the comparisons or the differences in some way. I am not sure that timespan of comparisons was the point, but I know I mentioned that Pete Palmer utilized only 1913-1916 data in his assessment of FL 1914-1915; only 1883-1885 data for UA 1884.
The approach should have been implemented many times. (Cramer agrees.) Does anyone know why that has not happened? Databases are widely available; computation is cheap; there are more sabermetricians. The empirical question is exceptionally interesting to many people.
Given any implementation of the approach, it should be trivial to vary the weights on observed differences according to timespan and number of BFP, and learn whether the results are robust. (Excluding all comparisons across time greater than some threshold is a special case using weight 0.)
League-average performance, UA and FL.
Presuming contemporary NL=AA=1 and NL=AL=1.
UA 1884
OPS .76
ERA .875
FL 1914-1915
OPS .90
ERA .924
"League Performance" in the Glossary, Total Baseball 6 (1999).
UA 1884, stipulated league averages
OPS+ = ERA+ = 80
FL 1914-1915, stipulated league averages
OPS+ = ERA+ = 90
Average is 100 for every other MLB league-season.
How is league quality handled by TB7's descendants, Total Baseball 8 (Thorn) and The Baseball Encyclopedia (Palmer).
Even if we can show, for example, that the NL has statistically better hitters in a particular year than the 1904 AL (whether by judging the average player using various methods, or by looking at standard deviations or through an analysis of the outliers), does that mean the NL was actually better than the AL that season? Couldn't that mean the NL pitching that year was worse than the AL pitching, thus reflecting better on the NL hitters? If so, then the NL wouldn't overall be a better league that season.
In other words, since all hitting stats are dependent not only on the quality of the hitters but also on the quality of the pitchers, and vice versa with respect to pitching stats, how can any hitting study (or pitching study) produce a conclusive result about league quality? Not to mention the fielding component.
Assuming that hurdle is overcome, quantifying it will be a separate thorny issue.
As you suggest, such intraleague analysis of batting and pitching statistics (not to mention one without the other) cannot support any interleague quality judgments. But Cramer (batters only), Palmer, and Davenport share a general interleague method, analysing only the records achieved in two leagues by the people who played in both.
There isn't much migration within a season, so the comparison of NL04 and AL04, for example, is mainly derived from comparisons of NL04 and AL03, AL04 and NL03, AL04 and NL02, and so on.
I agree. But I see two lingering issues:
1. When I was looking at the NL vs. AA, the problem was there weren't a significant number of players who played in both leagues as regulars. That might be less of a problem with NL/AL, but you have to confine the analysis to a few years (maybe five years on either side), and that REALLY cuts down the sample size. You also have to make sure the average age of the sampled players is about the same, or you have other factors creeping in.
2. Even if you only analyze players who played in both leagues -- hitters for example -- you still have to know the level of pitching. So, it seems you'd need to know the records of hitters who played in both leagues during a specified time against the pitchers who pitched in both leagues during the same time. There is a hypothetical hybrid league in that scenario, but the sample size gets even smaller.
I'm not suggesting it not be studied; only that it is a very difficult problem.
The pitcher-batter simultaneity should not be a source of bias. Cramer, Palmer, and Davenport, at least, use batting statistics that are relative to league average, which incorporates the quality of league pitchers; and vice versa, except that Cramer does not look at pitchers.
--
By the way, Cramer and (I am practically certain) Davenport also use the data for NL04 and NL03, AL04 and AL03, etc, generated by those who play multiple years in the "same league" in the ordinary sense.
In effect, all of the interleague quality measures are estimated simultaneously. The estimated difference between NL09 and AL09 is not much influenced by the sparse data on NL08-AL09, AL08-NL09, etc, when there was little movement between NL and AL. Most of the data supporting relative quality in 1909 is ample data on NL03-NL04 ... NL08-NL09 and AL03-AL04 ... AL08-AL09 and ample data on NL00-AL01, AL00-NL01, ... AL02-NL03.
jimd and I alluded to this in #52 and #75, or something like that.
It's time to try posting this much.
Eg, Pete Palmer's estimates for UA1884 mean that an average UA1884 pitcher, who also played in AA/NL/1883/1885, was 12.5% below average in the latter leagues (ERA+ .875). Sea-level in the UA was 12.5% below 1883/85 major league sea-level for pitchers; 24% below, for batters (OPS+ .76). Rolling hills in the UA pitchers box appear to be mountains. Molehills in the UA batters box appear to be mountains.
These figures mean for instance that an AA player in 1882 with a .260 EQA would be equivalent to an NL player that year with a .196 EQA.
1882 AA 64.0
1883 38.6
1884 30.5
1885 22.0
1886 14.6
1887 12.1
1888 12.9
1889 11.5
1890 29.1
1891 20.4
1884 UA 70.9
1890 PL -3.0
This is a very important difference in methodology.
When every available year comparison is used (as does the study cited in The Hidden Game of Baseball), the superstars receive a lot of extra weight, simply because they play so many years. A 5 year player contributes 4 samples to each of his 5 years, and a 21 year player contributes 20 samples to each of his 21 seasons. For his peak seasons, there are 10-15 samples implying that the league was "weak" those seasons, due to the assumption that the player's performance is constant over his career. Put a number of such stars in parallel in the same league (e.g. Cobb, Collins, Jackson, Baker) and there is most likely a noticeable impact on the results. In Davenport's study, most comparisons of peak seasons are only with other peak seasons or near-peak seasons; the problem is not completely eliminated, but it is greatly reduced.
Note: the total sample universe for each league season during the 1910's is around 500-600 samples from players that were full-time in both seasons plus a number of partial samples from non-regular players; Davenport's study has about half that number of full-time samples per season.
Testing.
n0: the number of regulars (my definition) in MLB that year
n1: the number of regulars that were also regulars the following year
n2: the number of regulars that were also regulars two years in the future
n6: the number of regulars that were also regulars six years in the future
The point of providing the 25 year span is to allow one to get an idea of typical turnover, and to then compare the effect of the WWII years on that typical turnover.
Some specific points: the transition from 1941 to 1942 (1941-n1) is not way out-of-line, though it is a little low. The war did not have a major impact on MLB in 1942. The following four years show significant turnover, culminating in the dramatic return in 1946 (1945-n1) when only about 40% of the 1945 regulars kept their jobs.
Look at the data points 1942-n4, 1941-n5, 1940-n6. These represent the number of players in these years who were regulars in 1946. They were MUCH more likely to have retained/regained their jobs than typical MLB regulars after the same time interval with no war situation. I don't know if this represented an effort on MLB's part to give the returning veterans every opportunity to regain their jobs, or the impact of the war on the development of the minor league players that would normally have replaced some of these players.
1882 AA 64.0
1883 38.6
1884 30.5
1885 22.0
1886 14.6
1887 12.1
1888 12.9
1889 11.5
1890 29.1
1891 20.4
1884 UA 70.9
1890 PL -3.0
Anyone off hand happen to have Davenport's Federal League adjustment on this same scale?
The following table shows 3 Federal League CF'ers from 1915:
As you can see, the amount of value lost going from WARP-1 to WARP-2 is fairly constant (though not completely). Kauff loses more absolute value, showing that there is also a percentage involved, but Oakes loses almost all of his value, presumably based on the notion that he was very close to AL/NL replacement level.
So some Federal League value is removed purely because it has no Major League value, because it is sub-replacement value. The residue from this adjustment is apparently then modified by applying a percentage.
This is obviously a system that is going to be much more forgiving to star players in an inferior league than a straight % discount.
And it should be. A typical Federal Leaguer had positive value in that league, but would be unable to land a Major League starting job after the collapse (near-zero real value). OTOH, Kauff moved into the majors and was a second-tier star (though not the Ty Cobb/Tris Speaker that his raw FL stats might indicate).
A straight discount does not capture this, and so doesn't correspond to the true situation.
Adjusted WARP = .957 * Raw WARP - 3.25
As you can see, after the subtraction takes place, the actual percentage adjustment is only 4.3.
Of course, it's more complicated than that, because batting and fielding are regressed separately.
Look at the data points 1942-n4, 1941-n5, 1940-n6. These represent the number of players in these years who were regulars in 1946. They were MUCH more likely to have retained/regained their jobs than typical MLB regulars after the same time interval with no war situation. I don't know if this represented an effort on MLB's part to give the returning veterans every opportunity to regain their jobs, or the impact of the war on the development of the minor league players that would normally have replaced some of these players.
Some right of return to a civilian job was provided by law. I don't know details.
For a time including the 1945 and 1946 seasons, MLB roster limits were increased by 20%, partly to make compliance easy.
Cliff Blau, "League Operating Rules"
I'm hating the Bob Caruthers induction more and more . . .
I've calculated career league-adjusted Win Shares above replacement. Replacement level is defined as 1.27 WS per 100 plate appearances, a value determined empirically for the 1901-1940 2-league seasons.
League adjustments are only done when there are mulitiple leagues in a single season, and are done to "equalize" the leagues. No attempt is made to compare leagues across seasons. The parameters in the league adjustments are determined by comparing the performances of individual players, in between seasons. To prevent uncontrolled divergences, 9 average players are added to the data set as players who switched leagues between seasons without change of performance.
The league parameters are determined for each season. Rather than give the full set of data, I just give the 20th century decade-averaged factors that I use to convert actual performance to neutral-league performance.
1900s NL: 0.9473 * WS + 0.00043 * PA
1900s AL: 1.0530 * WS - 0.00044 * PA
1910s FL: 1.0576 * WS - 0.00631 * PA
1910s NL: 1.0037 * WS - 0.00206 * PA
1910s AL: 0.9849 * WS + 0.00332 * PA
1920s NL: 0.9994 * WS - 0.00287 * PA
1920s AL: 1.0007 * WS + 0.00286 * PA
1930s NL: 0.9940 * WS - 0.00126 * PA
1930s AL: 1.0061 * WS + 0.00126 * PA
In presenting the position player leaders in career LAWSAR, I divide players' seasons into 4 roles: C, 1B, "IF" (2B/3B/SS), "OF" (LF/CF/RF), according to the position where they played the plurality of their games. If a position player played some games in a season as a pitcher, I subtracted estimated pitching WS from their totals; if they played a plurality of games as a pitcher, I regretfully did not include that season. Recognizing that the following data is not appropriate for short-season 19th century players, I nonetheless give the top 100 career LAWSAR, 1876-1940 for each role, as well as grand totals for players with more than one role in their career, and the top 100 overall:
This chart makes clear that we are doing an excellent job overall. We're really fighting over the borderline guys. While the above ratings are not exactly my system, they make clear why I rate Schang so highly. Rice and Hooper, as the top unelected players in career LAWSAR, post-Goslin, seem particularly underrated.
It seems to me that the most essential piece of information, needed to fairly evaluate players from different leagues and eras, is the relative strengths of those leagues and eras. Without this information, we are reduced to little more than hand-waving when comparing statistics from differing league-seasons.
It seems to me that there is in fact a way to create a table of relative league-season strengths for each statistic being looked at (i.e. OPS+). This method utilises the year-to-year performances of the players who participated in a league for at least two years in a row.
I had found that when I compared the performance of all the individual players in a league in OPS+ from one year to the next, the percentage change in performance correlated strongly with age, in a linear fashion. The player's performance grew strongly in their early to mid twenties, leveled off at 29-30, and started to decline at later ages, with the rate of decline increasing with increasing age.
I was a bit surprised that the peak years were 29-30, since I had read Bill James's study suggesting that 27-28 are the peak years, but the difference is easily explained by the fact that I was looking at those players who remained regulars in both years, while James apparently included all plate appearances, including those players who, due to injuries or other causes, were no longer considered good enough to remain regulars. In other words, if a player remains healthy, they will generally peak at 29-30, but if you look at the players as a whole, including those who become injured, the peak years shift downward by about two years.
I have only done this for a few sets of league seasons, but I noticed that the SLOPE of the linear best-fit (LBF) barely changes, while the height of the best-fit at any given age does change. This means that for any given pair of consecutive league-seasons, the linear best-fit can be described by a constant (the slope of the LBF, which barely changes) and a variable (the height of the LBF at some specified age--say, 30 years old), which DOES change.
If we have two pairs of league-seasons, and one pair of league-seasons yields a LBF at age 30 of 0%, and the other pair yields a LBF at age 30 of -10%, then we can say that whatever change in league quality happened between the first and second league-seasons of the first pair (call it x%), the second pair showed a change of (x+10)% between the first and second league-seasons of the pair. The (x+10)% general change in the league, from one season to the next resulted in a -10% change in year-to-year OPS+ for the average player, in addition to the expected aging pattern, compared to the other pair of league-seasons.
This fact means that for each league, a table of year-to-year changes in OPS+ can be created, of the form (making these numbers up):
1901-02: +3%
1902-03: -1%
1903-04: +2%
.
.
.
2003-04: 0%
where we are measuring the LBF for that season pair at some particular player age (30.0 is a nice even number).
This in itself would not give an ABSOLUTE measure of year-to-year change in the league, but we have two leagues that span the twentieth century, the AL and the NL, and we can cross-reference them! By repeatedly running the full 1901-2004 AL and NL tables, with different constant year-to-year percentage changes added to the listed year-to-year percentage changes, we can generate numerous sets of yearly percentage differences between the AL and NL over the past century, the year-to-year differences varying by what constant year-to-year percentage is added. By comparing the known historical league-to-league differences (determined by looking at players who switch leagues) with the generated lists, we should be able to find out which constant year-to-year change best fits the data.
This would give us the ABSOLUTE league strengths (with respect to OPS+) of the AL and NL since 1901, and since we would now have the constant year-to-year change, we can extend the chain back into the nineteenth century, and cross-link with the other major leagues as well.
I am offering this idea to the members of this discussion forum, to pick apart and (hopefully) put back together in a workable form. I cannot see any problems myself, but if there are any, and if those problems can be fixed, I believe that this procedure will finally give us the absolute strengths of each major league season, allowing absolute comparisons of every player-season.
Bill
As someone who isn't terribly well informed on the math side of things, how would be able to tell if 1902 is strong than 1997 without any players who played in both leagues? (The LBF and slope/height/intercept talk isn't so much scary or anethma to me as it is simply outside the boundaries of my education.)
In other words, the results may be listed as
1901 +1
1902 +2
1903 0
.
.
.
1997 -5
but what are those numbers saying and how do they relate to one another when they are not contiguous seasons? Or put another way, what's the baseline against which these absolute values are being measured?
The first is: how can I use OPS+ as an example of changes from year to year when it is, by definition, normalised to the league? Wouldn't the LBF always be anchored to an average value of 0% change?
The answer is that no, it would not be. I am assuming that the aging patterns of the individual players follow (on average) a typical path of growth and then decline, with the peak absolute production, at age 29-30, determined by the player's intrinsic ability. This intrinsic ability would not change over the course of a particular player's career, although their production, as a percentage of their peak production, would change as they aged. The league as a whole would increase in strength when the intrinsic ability of the new full-time players exceeds the intrinsic ability of those full-time players that they are replacing. Similarly, if the intrinsic ability of the replacement players is lower than the intrinsic ability of those they replace (such as during WWII), the league as a whole would decrease in strength.
The key fact in this is that it is the first-time regular players (and departing regulars) who determine the change in the intrinsic league strength, and since I only include players with two consecutive full-time seasons, they are not included in my study. OPS+ measures how the players I include perform relative to the league as a whole, including the first-time regulars, and hence if the first-time regular players are stronger than those they replace, the rest of the full-time players will have an OPS+ lower than they would have otherwise, in inverse proportion to the change in intrinsic strength of the league as a whole.
In answer to your question, Dr. Chaleeko, on my own this study would take almost forever. I am limited to using a spreadsheet to reduce each league-season-to-league-season pair, entering each player's OPS+ by hand. It takes me the better part of a day to do each pair, so at one day per pair, to do all league pairs would take 250 days, with no days off.
To answer your second question, about how to compare seasons separated by decades or longer, we cannot directly compare them. We can only compare successive league-seasons. This is where the fact that the twentieth century had two recognised major leagues comes in. We have to find a value for the 'constant' year-to-year change that results in the observed year-to-year differences in strength between the NL and the AL, as measured by players switching leagues--for example, if a player (after adjusting for the player's expected aging pattern, and the player's expected adjustment time) drops from an OPS+ of 110 in the NL in 1950, to an OPS+ of 105 in the AL in 1951, then this suggests that the 1951 AL is 4.76% stronger than the 1950 NL. Of course we would need to look at all such league switches, in order to reduce the effects of random variation to a minimum.
Without being able to compare the two leagues, determining the absolute strength of each league-season would be impossible. By comparing them, the task is difficult but possible.
Bill
I looked at a few BP hitter's pages and compared EqA raw with EqA adjusted for league quality.
There is great similarity between hitters; for example, if Ted Williams in 1955 loses 6 points of EqA, then Mickey Mantle inthe same year will lose 6 or maybe 5 or 7. The reuslts are almost always within one pt.
Armed with the comforting info that the system is at least consistent within itself, I decided a smaple of 2 hitters for each league/year was enough to create a table of "BP league quality estimates from 1931 to 1960". Which follows.
What the numbers mean is that a NL player who hit .300 in 1931 is of the same quality as a player who hit .306 in the AL that year; assuming we have alreayd accounted for park facors and league offensive levels.
year AL NL
1931-10 -4
1932 -7 -1
1933-10 .0
1934 -7 .0
1935 -6 .1
1936 -7 -1
1937 .2 -1
1938 .3 .3
1939 .0 .0
1940 .4 .2
1941 .2 .1
1942 -5 .6
1943 -4 .1
1944 -9 -7
1945-14 -7
1946 .2 .0
1947 -1 .3
1948 .1 .1
1949 -1 .1
1950 -1 .0
1951 -1 -1
1952 -7 -1
1953 -9 -4
1954 -9 .0
1955 -6 .0
1956 -7 .4
1957 -7 .2
1958 -7 .4
1959 -3 .2
1960 -3 .3
conclusions from this data
NL was stronger thru 1936
AL caught up 1937 thru 1941
AL was weaker in 42 and 43
real 'war years' were only 44 and 45
small weakening effect during Korean conflict
AL thru mid-late 1950s was actually Weaker than it was in 1940.
NL strongest by late 1950s, but not by much over earlier years.
I don't buy all of this, but there is the data, and they certainly did more work to create it than I did.
You must be Registered and Logged In to post comments.
<< Back to main