Hall of Merit — A Look at Baseball's AllTime Best Wednesday, January 19, 2005Major League EquivalenciesThis thread will be used for examining and analyzing MLE's throughout baseball history.
Posted: January 19, 2005 at 03:22 AM
> who played on the Cuban national team, then defected?
I'll say yes, but I'm not the final authority on matters such as this.
Right. I suppose only the author can be an authority here.
I have had offense MLEs finished for Oscar Charleston using my system as I had left it a couple of years back, but I have been hoping to improve my system by handling walks and stolen bases more systematically and appropriately, now that we have fuller data on both, at least for the players for whom the HoF has released their data.
This post is going to focus on walks.
When Brent released his MLEs for John Henry Lloyd, this was his description of his system {my emphases added):
The approach I’m taking in deriving MLEs is a simplification of Chris Cobb’s method (and is the same method that I previously used for Carlos Morán’s MLEs). Here’s a summary of the method: (1) I restrict my analysis to data for which league averages are available or can be calculated. (2) I build my estimates from three rates: the walk rate (BB+AB)/AB, the batting average, and isolated power. (3) I adjust these rates for quality of play, multiplying BA by 0.9 and ISO and walk rates by .81. (No qualityofplay adjustment is made for games played against major league teams.) (4) I adjust the BA and ISO statistics to a National League context by multiplying by the ratio of the two league averages for the period. (However, because the HoF study doesn’t report league averages for walk rates or OBP, I decided not to make any contextual adjustments to walk rates. Gary’s data for 1921 and ’22 show NeLg walk rates similar to those in the NL.) (5) I report Lloyd’s MLEs as average rates for four periods: 1907–18 (ages 23 to 34), 1919–23 (ages 35 to 39), 1924–28 (ages 40 to 44), and 1929–30 (ages 45 to 46), and also as career rates.
This system is indeed what I use, except (1) I go season by season and use regression and (2) I haven't been adjusting walk rates by .81. I never developed a consistent BB conversion factor because I never did conversions for players who had a full career of walk data available.
So the question I have is: is .81 an appropriate conversion factor, from a theoretical perspective? That is, should walk rates vary as the square of hit rates, as isolated power does? Should we expect any sort of relationship between walk rates and other offensive statistics?
Brent doesn't give his rationale for choosing that factor (or I failed to notice it), so I am not sure his choice is justified. If anyone has any insights into this question, I'd be glad to read them!
I don't have a theoretical answer to this question, but I thought I might be able to get an empirical view by looking at walks in the same way I looked at batting average in the study I did when I first began my MLE project: I looked at the walk rates of players who went from the NeL to the majors. Walk data is only available for HoFers or shortlisted HoF candidates, so I was only able to look at Roy Campanella, Larry Doby, Monte Irvin, and Minnie Minoso. (Jackie Robinson's NeL stint is too short in the HoF data to be meaningful.) When I compared their NeL walk rates to their ML walk rates, with appropriate adjustments for aging patterns (using Tangotiger's figures for this), I found that these players' NeL walk rates were most predictive of future ML walk rates when a conversion factor of about 1.0 was applied. (Without showing all the gritty details, Campy's and Doby's results suggested conversion factors of about .9, while Irvin's and Minoso's suggested conversion factors of about 1.1. I can post the gritty details if there is interest.) The walk rates of the top NeL power hitters with reputations for plate disciplineBuck Leonard and Josh Gibson, are comparable to the walk rates of the top hitters with plate discipline in the majors, so there's no obvious need for a conversion factor far removed from 1.0 there, either.
We lack, of course, the league walk rate data that would enable us to interpret systematically what this means. It could be that lowwalk conditions in the NeL offset the weaker competition levels, so that, overall, earning walks was about as difficult in the NeL as in the majors. It could be that walk rates are not affected by competition levels in the same way that hits and hitting for power are.
So, the two points of data on walk rates that I have are (1) irrespective of competition levels, the conversion factor for individual NeL players in the 1940s looks to be about 1 and (2) in the early 1920s, NeL walk rates matched major league walk rates closely. The big question, then, is what happened to NeL league walk rates between the early 1920s and the early 1940s relative to the majors?
It might be that there is suggestive data on the effect of competition levels on walk rates hidden somewhere in CWL data. If there is CWL seasonal data that includes walks for years in which players participated who also played in the major leagues (e.g. Armando Marsans), comparing the walk rates of those players relative to their leagues might reveal the effects of competition levels on walk rates. I am not aware that any seasonal data of this kind is available for seasons with majorleague crossovers. If anyone knows where such data could be found, please let me know! (Whoops, I just realized that the 190809 CWL data Gary A. has compiled has Marsans himself! I'll have a look at him and see what I find. So, if _more_ data of this kind can be found, let me know.)
In the absence of theoretical analysis or empirical data to resolve my uncertainties about walk rates, I plan to handle walk rates as follows: (1) in seasons before 1940 for which no NeL league data is available, I will use a conversion factor of .95 for walk rates, just to hold down outliers a bit in seasons when walk rates might have spiked (2) in seasons for which I have NeL league data, I will adjust for context but not apply a competition conversion factor, and (3) I will regress walk rates in the same way I regress BA and sluggingto a fiveyear average, centered on the season in question. (Dan R's study of war credit suggests that more sophisticated regression studies using majorleague data could produce a more precise regression formula, but since his findings there indicate that taking an average of the four surrounding seasons leads to similar results, I have some confidence that my simple regression formula is basically sound.)
This approach is different from Brent's, so our results will differ with respect to walks, though our systems are otherwise the same. Without a clear basis for applying a 20% discount to walk rates for competition level, I can't see my way to doing it. My limited review of CWL data suggests that it was a high walk rate league, so if one is working with that data, a contextual discount certainly appears to be appropriate in many cases, but, based on what I know at this moment, I would want to treat that purely as a contextual discount, not a competitive one.
Your thoughts?
I didn't do any kind of empirical study to select the .81 factor for walks, so I agree that the choice may not be justified. There are two reasons that I used it:
(1) Some of you may recall that several years ago I calculated MLEs for minor league seasons of HoM candidates like Buzz Arlett, Earl Averill, and Gavy Cravath, using the method MLE methods published by Bill James for Class AAA translations in his 1985 Baseball Abstract. I eventually realized that the Bill James method was essentially equivalent to multiplying batting average by M and isolated power and walk rates by M^2, where M^2 = .82. Since Chris was doing about the same thing for Negro league players with M=.9 and M^2 = .81, it seemed like a natural extension of the Bill James methodology.
(2) I started doing translations of Cuban League players with Carlos Morán, and without reducing his walk rate by a factor of .81, none of you would have believed the resulting MLEs. (None of you believed them anyway! :) At any rate, it seemed reasonable in his case to lower the walk rates to account for differences in quality of pitching.
Chris's suggestion is a good oneGary has posted enough data on Marsans and Almeida that we could take a look at how their Cuban walk rates compare with their major league rates. I'll see what I can put together.
Your thoughts?
Do you generate MLEs in such a way that you can handle this discount a parameter? That is, after some setup stage, "easily" generate low and high estimates with d=0.9 and d=1.0.
I do the MLEs with spreadsheets, so, yes, I could do this sort of thing quite easily.
After several seasons in minor league baseball, Marsans and Almeida both debuted with the Cincinnati Reds on July 4, 1911. Marsans soon became a regular and a minor star for the Reds, while Almeida hit fairly well but didn't get much playing time. While with the Reds, 191114, Marsans appears to have been an impatient hitter, drawing 66 walks in 1,179 AB+BB, a rate of .056 compared to the NL rate (excluding pitchers) of .087. Almeida, also with the Reds from 191113, was closer to average, drawing 25 walks in 310 AB+BB, a rate of .081 compared with the league average rate of .089. (Their league average rates differ slightly because Almeida didn't play in 1914 and I weighted the annual league average rates by each players AB+BB.)
For the Cuban data, in order to get a large enough sample I had to include data from as early as the fall of 1907. (Gary's website, agatetype.typepad.com, includes compilations of walks for regular Cuban League seasons through 190809, but after 1909 the data with walks are limited to a few series between visiting Negro league teams and Cuban teams. Figueredo, the other main source of Cuban data, doesn't include information on batter walks.) I analyzed walk rates for the following leagues and series: 1908 Cuban League season (winter of 190708), 190809 Cuban League season, and series against the Philadelphia Giants (fall 1907), Brooklyn Royal Giants (fall 1908), Leland Giants (fall 1910), Lincoln Giants (fall 1912), Lincoln Stars (fall 1914), and Indianapolis ABCs (fall 1915).
The strange thing is that the roles of Almeida and Marsans flip in the Cuban data. Almeida, who had a nearaverage walk rate in the majors, was impatient in the Cuban data. He drew 16 walks in 420 AB+BB, a rate of .038 compared to a league rate of .096. Marsans, on the other hand, was close to the league average in Cuba, drawing 35 walks in 474 AB+BB, a rate of .085 compared to the league rate of .096. So while the example of Marsans might provide support for the idea that walk rates dropped when facing major league pitching, Almeida is an obvious counterexample.
How about restricting it to the time period after they debuted in the majors (i.e., Cuban series held after 1911)? The sample sizes get uncomfortably small, but the same tendencies appear: Almeida drew 4 walks in 67 AB+BB, a rate of .060 compared to the series average of .113. Marsans drew 7 walks in 71 AB+BB, a rate of .099 compared to his series average of .118. Again, Marsans walked quite a bit more than his major league rate, while Almeida walked less.
The only conclusions I can draw from this little study are: (a) individual players make different adjustments when they move from one league to another, and (b) we would need a much larger sample than 2 players to accurately calculate the translations for walk rates from one league to another.
Given the lack of information, I'll be happy to go along with Chris's recommendation to use a factor of .95 for walks. I will revise my MLE calculations for Lloyd and Hill and repost them.
It's interesting that Almeida and Marsans so such diametrically opposed trends, but it's too bad their careers don't give us any insight into the problem of the effects of league strength on walks.
When the HoF releases the full data from their study and gives us NeL league walk rates for the 1940s, we'll be able to get some answers, I guess, but perhaps not until then. (Unless independent researchers like Gary A., who release their findings publicly, complete some studies first, which is beginning to seem the more likely outcome.)
2. Mike Emeigh Posted: January 19, 2009 at 09:16 AM (#3055090)
> [John Murphy:] Martinez is most likely with MiL credit
I am of the opinion that Martinez doesn't deserve MiL credit, if you look at his career arc honestly and in context. In 1985 and 1986, he didn't hit especially well in the Southern League. In 1987, you need to consider that his numbers were posted (a) in the PCL and (b) in Calgary, one of the better parks for hitters in that league. I would argue that it was not until 1988 that he definitely established his credentials for a major league job  and he got one a year later, albeit as a backup with a .618 OPS.
47. Rocco's Notso Malfunctioning Mitochondria Posted: January 20, 2009 at 04:29 PM (#3056382)
I'm with Mike here. Realistically, it wasn't the Mariners that held Martinez back (and even if it was, that's not REALLY the kind of rationale that merits giving minor league credit); rather, it was Edgar holding Edgar back. He was pretty mediocre in two seasons in AA, . . .
69. Bob "Jugement" Dernier Posted: January 21, 2009 at 10:43 AM (#3056935)
> He was pretty mediocre in two seasons in AA
That would be the factor that I think y'all should consider wrt Edgar and the minors. If you just extrapolate his MLB record backwards, à la Monte Irvin, Minnie Minoso, or someone like that, he will look great. But he hit .258 at age 22 in a full AA season, and then .264 at age 23. That can't add up to much if translated into majorleague performance.
70. David Concepcion de la Desviacion Estandar (Dan R) Posted: January 21, 2009 at 10:50 AM (#3056949)
Bob "Jugement" Dernierno one's suggesting Edgar should be credited for those seasons. The issue is whether he deserves it for 1987 and 1988, where his numbers would MLE to major league caliber.
71. Rocco's Notso Malfunctioning Mitochondria Posted: January 21, 2009 at 11:15 AM (#3056971)
when I initially said that Edgar wasn't great in AA, it wasn't to say that he shouldn't get MLE for those years (of course he shouldn't), but that they would have given Mariners management good reason to be cautious about bringing him up even after mashing in AAA, considering that pretty much anyone who was a AAAA player could mash in Calgary. It was like Vegas or Albequerque today  numbers from a park that hitterfriendly should be taken not with a grain, but with a pound of salt.
72. Chris Cobb Posted: January 21, 2009 at 11:35 AM (#3056984)
Responding to Bob "Jugement" Dernier  when the HoM looked at Irvin and Minoso, we (at least, most voters) did not extrapolate their MLB records backwards. Instead, we used translations of their Negro League and minor league play into a major league context. If Edgar Martinez is given credit for minorleague play, it would be on that basis as well (at least, for most voters).
"John Beckwith" includes
Gadfly #151160 on Monte Irvin. Within this series Gadfly estimates and explains conversion rates 0.93 for batting and 0.87 for slugging, between the NNL and NL in Irvin's time. (Gadfly supports conversion rates as high as 0.95 and 0.90)
"John Beckwith" also includes
 earlier on page two, chiefly by GaryA, some run scoring and "raw park factor" data on 1920s Negro Leagues
 earlier on page two, some biographical tidbits on Sam and John Beckwith (and a third?), Juan and Luis Padron (and a third), and so on
 on page one, explanation of Chris Cobb's conversion rates 0.87 for batting and about 0.82 for slugging, between some NeL and some ML. (Cobb supports batting conversion rate as low as 0.85 in page two response to sunnyday2 iirc)
 on page one, argument how demographics and economics bear on the expected numbers of great players in the Negro Leagues and majors (karlmagnus, jimd, others)
>>
74. jimd Posted: December 28, 2004 at 04:48 PM (#1043881)
Thought experiment: Let's assume that John McGraw was able to successfully integrate the majors around 1905, or, even better, that segregation never happened. Let's also assume (more controversially) that the percentage of black players mirrors their proportion in the general population as modified by geography. This is a null hypothesis, and yields an MLB that is about 15% black in the 1920's/1930's. What might this look like?
Our expectation is that the bottom 23 players at each position would be replaced by the top 23 Negro League players at that position. There could be a large variance in this; it's statistically reasonable that some position might have no black players, and another might have 7 (within 3 standard deviations), depending on the relative talent depth of MLB and the NeLs. Based on a typical team having 14 regulars (7 position players + 2 catchers + 5 pitchers), that's 224 regulars for 16 teams and 2050 black regulars (again 3 standard deviations).
I have no idea where to set the replacement level for Negro League baseball relative to MLB. I have no idea whether the null hypothesis outlined above has any real merit. But it's a starting point.
<<
jimd, or anyone else who follows this,
What is the model here?
For example is there a single distribution of talent, uniform by race, whose upper tail (what size?) plays baseball in the major leagues?
Or does this concern major league players of two kinds, fixed in aggregate number, randomly allocated to particular fielding positions or roles?
Negro Leagues 19391947 0.90
Negro Leagues 19481950 0.87
Mexican League 19401948 0.90
Canadian Border League 1950 0.80 (Class C)
Canadian Provincial League 1948 0.87 (Independent)
Eastern League 1950 0.93 (Class A)
PCL 19491951 0.93 (Class AAA)
International League 1947 0.93 (Class AAA)
Cuban Winter League 1938, 1940 0.94
New England League 1946 0.80 (Class B)
Venezuela 1940 0.90
Are these discount rates correct? In all cases, I tried to use Chris Cobb’s most uptodate BA conversions (as I understand it, Eric Chalek uses a runs converter, so I ignored those conversion rates). In some cases (the Eastern League, for instance), Chris simply used a 93% blanket rate for all minor leagues, which explains why I have an A league and a AAA league at the same rate (I think the Eastern League should be somewhat lower than 0.93, but I’m not sure what the exact rate should be). Also, I could not find a BA conversion for the 1946 New England League, but I did find Eric Chalek’s run conversion rate of 0.7, from which I guesstimated a BA conversion of 0.8.
Here are the league averages I used:
League averages:
Mexican League (from Eric Chalek’s compilations of Mexican League stats)
Year BA
1940 0.290
1941 0.288
1942 0.289
1943 0.273
1944 0.284
1945 0.280
1946 0.281
1947 0.278
1948 0.273
Negro Leagues (from the HOF averages)
Year BA
1939 0.260
1940 0.272
1941 0.256
1942 0.249
1943 0.269
1944 0.274
1945 0.276
1946 0.259
1947 0.273
1948 0.276
1949 0.268
Major and minor league averages were from Baseballreference.com, where available.
To find players who played in the PRWL and the Negro Leagues, I looked through the various player threads here at the HOM, the scans from Holway at the Yahoo site, BR Bullpen, and the Negro Leagues eMuseum. Here are the 50 playerseasons I used in this project (each year refers to the year in which the winter season began; for instance, Josh Gibson is marked as 19391940, meaning that I considered his 1939/1940 and 1940/1941 PRWL seasons for this project)
Josh Gibson 19391940
Willard Brown 1941, 19461949
Ray Dandridge 1941
Bus Clarkson 1946, 1948, 1949
Pee Wee Butts 1944, 1949
Piper Davis 1947, 1949
Monte Irvin 1940, 1941, 1946
Luis Marquez 19461949
Artie Wilson 19471950
Marvin Williams 1944
Roy Campanella 1940, 1941, 1944, 1946
Francisco Coimbre 1940, 1943, 1944
Silvio Garcia 1939
Tetelo Vargas 1943
Buck Leonard 1940
Willie Wells 1941
Sam Bankhead 1944, 1945
Leon Day 1940, 1941
Quincy Trouppe 1944
Bill Wright 1941
Sam Jethroe 1944, 1946
Johnny Davis 1945
Larry Doby 1946
Luke Easter 1948
Wilmer Fields 1947
After aggregating everything together, weighting each playerseason equally, I came up with a PRWL conversion rate, for 19391950, of 0.897. This is a somewhat preliminary finding, dependent upon the conversion rates, though I think that the conversion rates I’ve used are fairly accurate. Overall, my findings corroborate Eric Chalek’s earlier conversion rate of 0.90 for the PRWL (based off Gadfly’s advice). However, I’m going to hold off doing any MLEs until someone has checked my conversion rates.
Unfortunately, I have no data from either the 1938/1939 season or the 1942/1943 season. This is a fairly big deal in the case of Perucho Cepeda, whose case depends on a .465 PRWL batting average in 1938/1939. The lack of foreign players in 1938/1939 can probably be attributed to the league’s newness; as a result, the proper conversion rate should be somewhat less than 0.90, maybe 0.80 or 0.85 (from 19391941, with 15 switchers, I calculate a league conversion rate of 0.91, indicating that the PRWL quickly became a highquality league). In 1942/1943, according to Wikipedia, “World War II affected the league directly, reducing the 194243 season's length with only four active teams. This amount of teams continued until 1946, while the rule that allowed the participation of three imported players per team, was suspended from 1942 to 1944.” I’m not entirely sure what this means—were teams prohibited from having any imported players, or were they allowed to have as many as they liked? Regardless, I think the absence of leagueswitchers in 1942/1943 (and the relative paucity in 1943/1944) reflects the impact of the war, and probably indicates a decline in league quality in these years.
Here's a link to my conversion rate spreadsheet, if anyone’s curious.
