Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

201. Dr. Chaleeko
Posted: May 13, 2006 at 09:49 PM (#2018385)

test

202. Paul Wendt
Posted: May 15, 2006 at 01:33 AM (#2020221)

> Excellent question. Are there going to be MLEs for players
> who played on the Cuban national team, then defected?

I'll say yes, but I'm not the final authority on matters such as this.

Right. I suppose only the author can be an authority here.

203. Chris Cobb
Posted: August 14, 2008 at 12:58 AM (#2902199)

Since new work is being done on MLEs, I thought it would be worthwhile to revive discussion of some of the finer points, and this thread seemed the right place.

I have had offense MLEs finished for Oscar Charleston using my system as I had left it a couple of years back, but I have been hoping to improve my system by handling walks and stolen bases more systematically and appropriately, now that we have fuller data on both, at least for the players for whom the HoF has released their data.

This post is going to focus on walks.

When Brent released his MLEs for John Henry Lloyd, this was his description of his system {my emphases added):

The approach I’m taking in deriving MLEs is a simplification of Chris Cobb’s method (and is the same method that I previously used for Carlos Morán’s MLEs). Here’s a summary of the method: (1) I restrict my analysis to data for which league averages are available or can be calculated. (2) I build my estimates from three rates: the walk rate (BB+AB)/AB, the batting average, and isolated power. (3) I adjust these rates for quality of play, multiplying BA by 0.9 and ISO and walk rates by .81. (No quality-of-play adjustment is made for games played against major league teams.) (4) I adjust the BA and ISO statistics to a National League context by multiplying by the ratio of the two league averages for the period. (However, because the HoF study doesn’t report league averages for walk rates or OBP, I decided not to make any contextual adjustments to walk rates. Gary’s data for 1921 and ’22 show NeLg walk rates similar to those in the NL.) (5) I report Lloyd’s MLEs as average rates for four periods: 1907–18 (ages 23 to 34), 1919–23 (ages 35 to 39), 1924–28 (ages 40 to 44), and 1929–30 (ages 45 to 46), and also as career rates.

This system is indeed what I use, except (1) I go season by season and use regression and (2) I haven't been adjusting walk rates by .81. I never developed a consistent BB conversion factor because I never did conversions for players who had a full career of walk data available.

So the question I have is: is .81 an appropriate conversion factor, from a theoretical perspective? That is, should walk rates vary as the square of hit rates, as isolated power does? Should we expect any sort of relationship between walk rates and other offensive statistics?

Brent doesn't give his rationale for choosing that factor (or I failed to notice it), so I am not sure his choice is justified. If anyone has any insights into this question, I'd be glad to read them!

I don't have a theoretical answer to this question, but I thought I might be able to get an empirical view by looking at walks in the same way I looked at batting average in the study I did when I first began my MLE project: I looked at the walk rates of players who went from the NeL to the majors. Walk data is only available for HoFers or short-listed HoF candidates, so I was only able to look at Roy Campanella, Larry Doby, Monte Irvin, and Minnie Minoso. (Jackie Robinson's NeL stint is too short in the HoF data to be meaningful.) When I compared their NeL walk rates to their ML walk rates, with appropriate adjustments for aging patterns (using Tangotiger's figures for this), I found that these players' NeL walk rates were most predictive of future ML walk rates when a conversion factor of about 1.0 was applied. (Without showing all the gritty details, Campy's and Doby's results suggested conversion factors of about .9, while Irvin's and Minoso's suggested conversion factors of about 1.1. I can post the gritty details if there is interest.) The walk rates of the top NeL power hitters with reputations for plate discipline--Buck Leonard and Josh Gibson, are comparable to the walk rates of the top hitters with plate discipline in the majors, so there's no obvious need for a conversion factor far removed from 1.0 there, either.

We lack, of course, the league walk rate data that would enable us to interpret systematically what this means. It could be that low-walk conditions in the NeL offset the weaker competition levels, so that, overall, earning walks was about as difficult in the NeL as in the majors. It could be that walk rates are not affected by competition levels in the same way that hits and hitting for power are.

So, the two points of data on walk rates that I have are (1) irrespective of competition levels, the conversion factor for individual NeL players in the 1940s looks to be about 1 and (2) in the early 1920s, NeL walk rates matched major league walk rates closely. The big question, then, is what happened to NeL league walk rates between the early 1920s and the early 1940s relative to the majors?

It might be that there is suggestive data on the effect of competition levels on walk rates hidden somewhere in CWL data. If there is CWL seasonal data that includes walks for years in which players participated who also played in the major leagues (e.g. Armando Marsans), comparing the walk rates of those players relative to their leagues might reveal the effects of competition levels on walk rates. I am not aware that any seasonal data of this kind is available for seasons with major-league cross-overs. If anyone knows where such data could be found, please let me know! (Whoops, I just realized that the 1908-09 CWL data Gary A. has compiled has Marsans himself! I'll have a look at him and see what I find. So, if _more_ data of this kind can be found, let me know.)

In the absence of theoretical analysis or empirical data to resolve my uncertainties about walk rates, I plan to handle walk rates as follows: (1) in seasons before 1940 for which no NeL league data is available, I will use a conversion factor of .95 for walk rates, just to hold down outliers a bit in seasons when walk rates might have spiked (2) in seasons for which I have NeL league data, I will adjust for context but not apply a competition conversion factor, and (3) I will regress walk rates in the same way I regress BA and slugging--to a five-year average, centered on the season in question. (Dan R's study of war credit suggests that more sophisticated regression studies using major-league data could produce a more precise regression formula, but since his findings there indicate that taking an average of the four surrounding seasons leads to similar results, I have some confidence that my simple regression formula is basically sound.)

This approach is different from Brent's, so our results will differ with respect to walks, though our systems are otherwise the same. Without a clear basis for applying a 20% discount to walk rates for competition level, I can't see my way to doing it. My limited review of CWL data suggests that it was a high walk rate league, so if one is working with that data, a contextual discount certainly appears to be appropriate in many cases, but, based on what I know at this moment, I would want to treat that purely as a contextual discount, not a competitive one.

Your thoughts?

204. Brent
Posted: August 14, 2008 at 02:39 AM (#2902408)

Brent doesn't give his rationale for choosing that factor (or I failed to notice it), so I am not sure his choice is justified.

I didn't do any kind of empirical study to select the .81 factor for walks, so I agree that the choice may not be justified. There are two reasons that I used it:

(1) Some of you may recall that several years ago I calculated MLEs for minor league seasons of HoM candidates like Buzz Arlett, Earl Averill, and Gavy Cravath, using the method MLE methods published by Bill James for Class AAA translations in his 1985 Baseball Abstract. I eventually realized that the Bill James method was essentially equivalent to multiplying batting average by M and isolated power and walk rates by M^2, where M^2 = .82. Since Chris was doing about the same thing for Negro league players with M=.9 and M^2 = .81, it seemed like a natural extension of the Bill James methodology.

(2) I started doing translations of Cuban League players with Carlos Morán, and without reducing his walk rate by a factor of .81, none of you would have believed the resulting MLEs. (None of you believed them anyway! :) At any rate, it seemed reasonable in his case to lower the walk rates to account for differences in quality of pitching.

Chris's suggestion is a good one--Gary has posted enough data on Marsans and Almeida that we could take a look at how their Cuban walk rates compare with their major league rates. I'll see what I can put together.

205. Paul Wendt
Posted: August 14, 2008 at 03:25 AM (#2902446)

In the absence of theoretical analysis or empirical data to resolve my uncertainties about walk rates, I plan to handle walk rates as follows: (1) in seasons before 1940 for which no NeL league data is available, I will use a conversion factor of .95 for walk rates, . . .

Your thoughts?

Do you generate MLEs in such a way that you can handle this discount a parameter? That is, after some set-up stage, "easily" generate low and high estimates with d=0.9 and d=1.0.

206. Chris Cobb
Posted: August 14, 2008 at 12:06 PM (#2902664)

Do you generate MLEs in such a way that you can handle this discount a parameter? That is, after some set-up stage, "easily" generate low and high estimates with d=0.9 and d=1.0.

I do the MLEs with spreadsheets, so, yes, I could do this sort of thing quite easily.

207. Brent
Posted: August 18, 2008 at 02:45 AM (#2906674)

I've finished looking at data on Marsans and Almeida to try to address the question posed by Chris Cobb--whether a batter's walk rate in the majors tended to be lower than his walk rate in the Cuban League, when he played in both leagues. (Because the Cuban League was heavily populated with Negro league players, the answer to this question also has obvious relevance for Negro league translations.) Unfortunately, it looks like the analysis of Marsans and Almeida raises more questions than it answers.

After several seasons in minor league baseball, Marsans and Almeida both debuted with the Cincinnati Reds on July 4, 1911. Marsans soon became a regular and a minor star for the Reds, while Almeida hit fairly well but didn't get much playing time. While with the Reds, 1911-14, Marsans appears to have been an impatient hitter, drawing 66 walks in 1,179 AB+BB, a rate of .056 compared to the NL rate (excluding pitchers) of .087. Almeida, also with the Reds from 1911-13, was closer to average, drawing 25 walks in 310 AB+BB, a rate of .081 compared with the league average rate of .089. (Their league average rates differ slightly because Almeida didn't play in 1914 and I weighted the annual league average rates by each players AB+BB.)

For the Cuban data, in order to get a large enough sample I had to include data from as early as the fall of 1907. (Gary's website, agatetype.typepad.com, includes compilations of walks for regular Cuban League seasons through 1908-09, but after 1909 the data with walks are limited to a few series between visiting Negro league teams and Cuban teams. Figueredo, the other main source of Cuban data, doesn't include information on batter walks.) I analyzed walk rates for the following leagues and series: 1908 Cuban League season (winter of 1907-08), 1908-09 Cuban League season, and series against the Philadelphia Giants (fall 1907), Brooklyn Royal Giants (fall 1908), Leland Giants (fall 1910), Lincoln Giants (fall 1912), Lincoln Stars (fall 1914), and Indianapolis ABCs (fall 1915).

The strange thing is that the roles of Almeida and Marsans flip in the Cuban data. Almeida, who had a near-average walk rate in the majors, was impatient in the Cuban data. He drew 16 walks in 420 AB+BB, a rate of .038 compared to a league rate of .096. Marsans, on the other hand, was close to the league average in Cuba, drawing 35 walks in 474 AB+BB, a rate of .085 compared to the league rate of .096. So while the example of Marsans might provide support for the idea that walk rates dropped when facing major league pitching, Almeida is an obvious counterexample.

How about restricting it to the time period after they debuted in the majors (i.e., Cuban series held after 1911)? The sample sizes get uncomfortably small, but the same tendencies appear: Almeida drew 4 walks in 67 AB+BB, a rate of .060 compared to the series average of .113. Marsans drew 7 walks in 71 AB+BB, a rate of .099 compared to his series average of .118. Again, Marsans walked quite a bit more than his major league rate, while Almeida walked less.

The only conclusions I can draw from this little study are: (a) individual players make different adjustments when they move from one league to another, and (b) we would need a much larger sample than 2 players to accurately calculate the translations for walk rates from one league to another.

Given the lack of information, I'll be happy to go along with Chris's recommendation to use a factor of .95 for walks. I will revise my MLE calculations for Lloyd and Hill and re-post them.

208. Brent
Posted: August 18, 2008 at 02:54 AM (#2906678)

Re-reading Chris Cobb's comments, I should clarify a point about how I calculated MLEs for Hill and Lloy. Although I didn't adjust the Negro league walk rates for the difference in context with major league walk rates (since data on average Negro league walk rates are not available for most years), I did adjust the Cuban League data that I used for context. As you can see from my last post, deadball era Cuban League walks rates were a little higher than major league walk rates, and I adjusted for those differences. (More notable is the fact that during 1900-1909, Cuban League batting averages and isolated power were much lower than the major league rates.)

209. Chris Cobb
Posted: August 18, 2008 at 12:14 PM (#2906759)

Thanks, Brent, for looking into this!

It's interesting that Almeida and Marsans so such diametrically opposed trends, but it's too bad their careers don't give us any insight into the problem of the effects of league strength on walks.

When the HoF releases the full data from their study and gives us NeL league walk rates for the 1940s, we'll be able to get some answers, I guess, but perhaps not until then. (Unless independent researchers like Gary A., who release their findings publicly, complete some studies first, which is beginning to seem the more likely outcome.)

210. Paul Wendt
Posted: January 21, 2009 at 09:34 PM (#3057347)

[copied from 2010 Ballot Discussion, chiefly #69-72 posted this morning. I have jumped back to #2 and #47 in order to set the stage but I have not gleaned intervening material on this theme, Edgar Martinez and "minor league credit", chiefly for 1987-88.

2. Mike Emeigh Posted: January 19, 2009 at 09:16 AM (#3055090) > [John Murphy:] Martinez is most likely with MiL credit

I am of the opinion that Martinez doesn't deserve MiL credit, if you look at his career arc honestly and in context. In 1985 and 1986, he didn't hit especially well in the Southern League. In 1987, you need to consider that his numbers were posted (a) in the PCL and (b) in Calgary, one of the better parks for hitters in that league. I would argue that it was not until 1988 that he definitely established his credentials for a major league job - and he got one a year later, albeit as a backup with a .618 OPS.

47. Rocco's Not-so Malfunctioning Mitochondria Posted: January 20, 2009 at 04:29 PM (#3056382) I'm with Mike here. Realistically, it wasn't the Mariners that held Martinez back (and even if it was, that's not REALLY the kind of rationale that merits giving minor league credit); rather, it was Edgar holding Edgar back. He was pretty mediocre in two seasons in AA, . . .

69. Bob "Jugement" Dernier Posted: January 21, 2009 at 10:43 AM (#3056935)
> He was pretty mediocre in two seasons in AA

That would be the factor that I think y'all should consider wrt Edgar and the minors. If you just extrapolate his MLB record backwards, à la Monte Irvin, Minnie Minoso, or someone like that, he will look great. But he hit .258 at age 22 in a full AA season, and then .264 at age 23. That can't add up to much if translated into major-league performance.

70. David Concepcion de la Desviacion Estandar (Dan R) Posted: January 21, 2009 at 10:50 AM (#3056949) Bob "Jugement" Dernier--no one's suggesting Edgar should be credited for those seasons. The issue is whether he deserves it for 1987 and 1988, where his numbers would MLE to major league caliber.

71. Rocco's Not-so Malfunctioning Mitochondria Posted: January 21, 2009 at 11:15 AM (#3056971) when I initially said that Edgar wasn't great in AA, it wasn't to say that he shouldn't get MLE for those years (of course he shouldn't), but that they would have given Mariners management good reason to be cautious about bringing him up even after mashing in AAA, considering that pretty much anyone who was a AAAA player could mash in Calgary. It was like Vegas or Albequerque today - numbers from a park that hitter-friendly should be taken not with a grain, but with a pound of salt.

72. Chris Cobb Posted: January 21, 2009 at 11:35 AM (#3056984) Responding to Bob "Jugement" Dernier -- when the HoM looked at Irvin and Minoso, we (at least, most voters) did not extrapolate their MLB records backwards. Instead, we used translations of their Negro League and minor league play into a major league context. If Edgar Martinez is given credit for minor-league play, it would be on that basis as well (at least, for most voters).

211. Paul Wendt
Posted: April 19, 2009 at 07:50 PM (#3144771)

(Some of these references probably appear on page one above.)

"John Beckwith" includes Gadfly #151-160 on Monte Irvin. Within this series Gadfly estimates and explains conversion rates 0.93 for batting and 0.87 for slugging, between the NNL and NL in Irvin's time. (Gadfly supports conversion rates as high as 0.95 and 0.90)

"John Beckwith" also includes
- earlier on page two, chiefly by GaryA, some run scoring and "raw park factor" data on 1920s Negro Leagues
- earlier on page two, some biographical tidbits on Sam and John Beckwith (and a third?), Juan and Luis Padron (and a third), and so on
- on page one, explanation of Chris Cobb's conversion rates 0.87 for batting and about 0.82 for slugging, between some NeL and some ML. (Cobb supports batting conversion rate as low as 0.85 --in page two response to sunnyday2 iirc)
- on page one, argument how demographics and economics bear on the expected numbers of great players in the Negro Leagues and majors (karlmagnus, jimd, others)

212. Paul Wendt
Posted: April 20, 2009 at 04:02 PM (#3145498)

jimd from "John Beckwith" page one
>>
74. jimd Posted: December 28, 2004 at 04:48 PM (#1043881)
Thought experiment: Let's assume that John McGraw was able to successfully integrate the majors around 1905, or, even better, that segregation never happened. Let's also assume (more controversially) that the percentage of black players mirrors their proportion in the general population as modified by geography. This is a null hypothesis, and yields an MLB that is about 15% black in the 1920's/1930's. What might this look like?

Our expectation is that the bottom 2-3 players at each position would be replaced by the top 2-3 Negro League players at that position. There could be a large variance in this; it's statistically reasonable that some position might have no black players, and another might have 7 (within 3 standard deviations), depending on the relative talent depth of MLB and the NeLs. Based on a typical team having 14 regulars (7 position players + 2 catchers + 5 pitchers), that's 224 regulars for 16 teams and 20-50 black regulars (again 3 standard deviations).

I have no idea where to set the replacement level for Negro League baseball relative to MLB. I have no idea whether the null hypothesis outlined above has any real merit. But it's a starting point.
<<

jimd, or anyone else who follows this,
What is the model here?
For example is there a single distribution of talent, uniform by race, whose upper tail (what size?) plays baseball in the major leagues?

Or does this concern major league players of two kinds, fixed in aggregate number, randomly allocated to particular fielding positions or roles?

213. Alex King
Posted: February 16, 2011 at 07:50 AM (#3751429)

Recently I’ve been investigating the Puerto Rican Winter League MLE conversion rates, hoping to shine some light on stars such as Perucho Cepeda, Pancho Coimbre, and Tetelo Vargas. To construct a batting average conversion rate (similar to the ones used by Chris Cobb), I used chaining: first I compared each player’s PRWL BA to his batting averages in the surrounding summer seasons, then I multiplied by those leagues’ conversion rates, and finally I adjusted for the differences in league contexts. I considered 50 player-seasons from 1939/1940 to 1950/1951 (I applied a cutoff at 1950 based on Gadfly’s comment that PRWL quality dropped precipitously in the early 50’s, though the 1950 cutoff feels increasingly arbitrary to me). I used the following MLE discount schedule for league-neutralized BA:

Negro Leagues 1939-1947 0.90
Negro Leagues 1948-1950 0.87
Mexican League 1940-1948 0.90
Canadian Border League 1950 0.80 (Class C)
Canadian Provincial League 1948 0.87 (Independent)
Eastern League 1950 0.93 (Class A)
PCL 1949-1951 0.93 (Class AAA)
International League 1947 0.93 (Class AAA)
Cuban Winter League 1938, 1940 0.94
New England League 1946 0.80 (Class B)
Venezuela 1940 0.90

Are these discount rates correct? In all cases, I tried to use Chris Cobb’s most up-to-date BA conversions (as I understand it, Eric Chalek uses a runs converter, so I ignored those conversion rates). In some cases (the Eastern League, for instance), Chris simply used a 93% blanket rate for all minor leagues, which explains why I have an A league and a AAA league at the same rate (I think the Eastern League should be somewhat lower than 0.93, but I’m not sure what the exact rate should be). Also, I could not find a BA conversion for the 1946 New England League, but I did find Eric Chalek’s run conversion rate of 0.7, from which I guesstimated a BA conversion of 0.8.

Here are the league averages I used:

League averages:
Mexican League (from Eric Chalek’s compilations of Mexican League stats)
Year BA
1940 0.290
1941 0.288
1942 0.289
1943 0.273
1944 0.284
1945 0.280
1946 0.281
1947 0.278
1948 0.273

Negro Leagues (from the HOF averages)
Year BA
1939 0.260
1940 0.272
1941 0.256
1942 0.249
1943 0.269
1944 0.274
1945 0.276
1946 0.259
1947 0.273
1948 0.276
1949 0.268

Major and minor league averages were from Baseball-reference.com, where available.

To find players who played in the PRWL and the Negro Leagues, I looked through the various player threads here at the HOM, the scans from Holway at the Yahoo site, BR Bullpen, and the Negro Leagues eMuseum. Here are the 50 player-seasons I used in this project (each year refers to the year in which the winter season began; for instance, Josh Gibson is marked as 1939-1940, meaning that I considered his 1939/1940 and 1940/1941 PRWL seasons for this project)

Josh Gibson 1939-1940
Willard Brown 1941, 1946-1949
Ray Dandridge 1941
Bus Clarkson 1946, 1948, 1949
Pee Wee Butts 1944, 1949
Piper Davis 1947, 1949
Monte Irvin 1940, 1941, 1946
Luis Marquez 1946-1949
Artie Wilson 1947-1950
Marvin Williams 1944
Roy Campanella 1940, 1941, 1944, 1946
Francisco Coimbre 1940, 1943, 1944
Silvio Garcia 1939
Tetelo Vargas 1943
Buck Leonard 1940
Willie Wells 1941
Sam Bankhead 1944, 1945
Leon Day 1940, 1941
Quincy Trouppe 1944
Bill Wright 1941
Sam Jethroe 1944, 1946
Johnny Davis 1945
Larry Doby 1946
Luke Easter 1948
Wilmer Fields 1947

After aggregating everything together, weighting each player-season equally, I came up with a PRWL conversion rate, for 1939-1950, of 0.897. This is a somewhat preliminary finding, dependent upon the conversion rates, though I think that the conversion rates I’ve used are fairly accurate. Overall, my findings corroborate Eric Chalek’s earlier conversion rate of 0.90 for the PRWL (based off Gadfly’s advice). However, I’m going to hold off doing any MLEs until someone has checked my conversion rates.

Unfortunately, I have no data from either the 1938/1939 season or the 1942/1943 season. This is a fairly big deal in the case of Perucho Cepeda, whose case depends on a .465 PRWL batting average in 1938/1939. The lack of foreign players in 1938/1939 can probably be attributed to the league’s newness; as a result, the proper conversion rate should be somewhat less than 0.90, maybe 0.80 or 0.85 (from 1939-1941, with 15 switchers, I calculate a league conversion rate of 0.91, indicating that the PRWL quickly became a high-quality league). In 1942/1943, according to Wikipedia, “World War II affected the league directly, reducing the 1942-43 season's length with only four active teams. This amount of teams continued until 1946, while the rule that allowed the participation of three imported players per team, was suspended from 1942 to 1944.” I’m not entirely sure what this means—were teams prohibited from having any imported players, or were they allowed to have as many as they liked? Regardless, I think the absence of league-switchers in 1942/1943 (and the relative paucity in 1943/1944) reflects the impact of the war, and probably indicates a decline in league quality in these years.

Here's a link to my conversion rate spreadsheet, if anyone’s curious.

## Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.3> Excellent question. Are there going to be MLEs for players

> who played on the Cuban national team, then defected?

I'll say yes, but I'm not the final authority on matters such as this.

Right. I suppose only the author can be an authority here.

I have had offense MLEs finished for Oscar Charleston using my system as I had left it a couple of years back, but I have been hoping to improve my system by handling walks and stolen bases more systematically and appropriately, now that we have fuller data on both, at least for the players for whom the HoF has released their data.

This post is going to focus on walks.

When Brent released his MLEs for John Henry Lloyd, this was his description of his system {my emphases added):

The approach I’m taking in deriving MLEs is a simplification of Chris Cobb’s method (and is the same method that I previously used for Carlos Morán’s MLEs). Here’s a summary of the method: (1) I restrict my analysis to data for which league averages are available or can be calculated. (2) I build my estimates from three rates: the walk rate (BB+AB)/AB, the batting average, and isolated power. (3)I adjust these rates for quality of play, multiplying BA by 0.9 and ISO and walk rates by .81. (No quality-of-play adjustment is made for games played against major league teams.) (4) I adjust the BA and ISO statistics to a National League context by multiplying by the ratio of the two league averages for the period. (However, because the HoF study doesn’t report league averages for walk rates or OBP, I decided not to make any contextual adjustments to walk rates. Gary’s data for 1921 and ’22 show NeLg walk rates similar to those in the NL.)(5) I report Lloyd’s MLEs as average rates for four periods: 1907–18 (ages 23 to 34), 1919–23 (ages 35 to 39), 1924–28 (ages 40 to 44), and 1929–30 (ages 45 to 46), and also as career rates.This system is indeed what I use, except (1) I go season by season and use regression and (2) I haven't been adjusting walk rates by .81. I never developed a consistent BB conversion factor because I never did conversions for players who had a full career of walk data available.

So the question I have is: is .81 an appropriate conversion factor, from a theoretical perspective? That is, should walk rates vary as the square of hit rates, as isolated power does? Should we expect any sort of relationship between walk rates and other offensive statistics?

Brent doesn't give his rationale for choosing that factor (or I failed to notice it), so I am not sure his choice is justified. If anyone has any insights into this question, I'd be glad to read them!

I don't have a theoretical answer to this question, but I thought I might be able to get an empirical view by looking at walks in the same way I looked at batting average in the study I did when I first began my MLE project: I looked at the walk rates of players who went from the NeL to the majors. Walk data is only available for HoFers or short-listed HoF candidates, so I was only able to look at Roy Campanella, Larry Doby, Monte Irvin, and Minnie Minoso. (Jackie Robinson's NeL stint is too short in the HoF data to be meaningful.) When I compared their NeL walk rates to their ML walk rates, with appropriate adjustments for aging patterns (using Tangotiger's figures for this), I found that these players' NeL walk rates were most predictive of future ML walk rates when a conversion factor of about 1.0 was applied. (Without showing all the gritty details, Campy's and Doby's results suggested conversion factors of about .9, while Irvin's and Minoso's suggested conversion factors of about 1.1. I can post the gritty details if there is interest.) The walk rates of the top NeL power hitters with reputations for plate discipline--Buck Leonard and Josh Gibson, are comparable to the walk rates of the top hitters with plate discipline in the majors, so there's no obvious need for a conversion factor far removed from 1.0 there, either.

We lack, of course, the league walk rate data that would enable us to interpret systematically what this means. It could be that low-walk conditions in the NeL offset the weaker competition levels, so that, overall, earning walks was about as difficult in the NeL as in the majors. It could be that walk rates are not affected by competition levels in the same way that hits and hitting for power are.

So, the two points of data on walk rates that I have are (1) irrespective of competition levels, the conversion factor for individual NeL players in the 1940s looks to be about 1 and (2) in the early 1920s, NeL walk rates matched major league walk rates closely. The big question, then, is what happened to NeL league walk rates between the early 1920s and the early 1940s relative to the majors?

It might be that there is suggestive data on the effect of competition levels on walk rates hidden somewhere in CWL data. If there is CWL seasonal data that includes walks for years in which players participated who also played in the major leagues (e.g. Armando Marsans), comparing the walk rates of those players relative to their leagues might reveal the effects of competition levels on walk rates. I am not aware that any seasonal data of this kind is available for seasons with major-league cross-overs. If anyone knows where such data could be found, please let me know! (Whoops, I just realized that the 1908-09 CWL data Gary A. has compiled has Marsans himself! I'll have a look at him and see what I find. So, if _more_ data of this kind can be found, let me know.)

In the absence of theoretical analysis or empirical data to resolve my uncertainties about walk rates, I plan to handle walk rates as follows: (1) in seasons before 1940 for which no NeL league data is available, I will use a conversion factor of .95 for walk rates, just to hold down outliers a bit in seasons when walk rates might have spiked (2) in seasons for which I have NeL league data, I will adjust for context but not apply a competition conversion factor, and (3) I will regress walk rates in the same way I regress BA and slugging--to a five-year average, centered on the season in question. (Dan R's study of war credit suggests that more sophisticated regression studies using major-league data could produce a more precise regression formula, but since his findings there indicate that taking an average of the four surrounding seasons leads to similar results, I have some confidence that my simple regression formula is basically sound.)

This approach is different from Brent's, so our results will differ with respect to walks, though our systems are otherwise the same. Without a clear basis for applying a 20% discount to walk rates for competition level, I can't see my way to doing it. My limited review of CWL data suggests that it was a high walk rate league, so if one is working with that data, a contextual discount certainly appears to be appropriate in many cases, but, based on what I know at this moment, I would want to treat that purely as a contextual discount, not a competitive one.

Your thoughts?

Brent doesn't give his rationale for choosing that factor (or I failed to notice it), so I am not sure his choice is justified.I didn't do any kind of empirical study to select the .81 factor for walks, so I agree that the choice may not be justified. There are two reasons that I used it:

(1) Some of you may recall that several years ago I calculated MLEs for minor league seasons of HoM candidates like Buzz Arlett, Earl Averill, and Gavy Cravath, using the method MLE methods published by Bill James for Class AAA translations in his 1985

Baseball Abstract.I eventually realized that the Bill James method was essentially equivalent to multiplying batting average by M and isolated power and walk rates by M^2, where M^2 = .82. Since Chris was doing about the same thing for Negro league players with M=.9 and M^2 = .81, it seemed like a natural extension of the Bill James methodology.(2) I started doing translations of Cuban League players with Carlos Morán, and without reducing his walk rate by a factor of .81, none of you would have believed the resulting MLEs. (None of you believed them anyway! :) At any rate, it seemed reasonable in his case to lower the walk rates to account for differences in quality of pitching.

Chris's suggestion is a good one--Gary has posted enough data on Marsans and Almeida that we could take a look at how their Cuban walk rates compare with their major league rates. I'll see what I can put together.

In the absence of theoretical analysis or empirical data to resolve my uncertainties about walk rates, I plan to handle walk rates as follows: (1) in seasons before 1940 for which no NeL league data is available, I will use a conversion factor of .95 for walk rates, . . .

Your thoughts?

Do you generate MLEs in such a way that you can handle this discount a parameter? That is, after some set-up stage, "easily" generate low and high estimates with d=0.9 and d=1.0.

Do you generate MLEs in such a way that you can handle this discount a parameter? That is, after some set-up stage, "easily" generate low and high estimates with d=0.9 and d=1.0.I do the MLEs with spreadsheets, so, yes, I could do this sort of thing quite easily.

After several seasons in minor league baseball, Marsans and Almeida both debuted with the Cincinnati Reds on July 4, 1911. Marsans soon became a regular and a minor star for the Reds, while Almeida hit fairly well but didn't get much playing time. While with the Reds, 1911-14, Marsans appears to have been an impatient hitter, drawing 66 walks in 1,179 AB+BB, a rate of .056 compared to the NL rate (excluding pitchers) of .087. Almeida, also with the Reds from 1911-13, was closer to average, drawing 25 walks in 310 AB+BB, a rate of .081 compared with the league average rate of .089. (Their league average rates differ slightly because Almeida didn't play in 1914 and I weighted the annual league average rates by each players AB+BB.)

For the Cuban data, in order to get a large enough sample I had to include data from as early as the fall of 1907. (Gary's website, agatetype.typepad.com, includes compilations of walks for regular Cuban League seasons through 1908-09, but after 1909 the data with walks are limited to a few series between visiting Negro league teams and Cuban teams. Figueredo, the other main source of Cuban data, doesn't include information on batter walks.) I analyzed walk rates for the following leagues and series: 1908 Cuban League season (winter of 1907-08), 1908-09 Cuban League season, and series against the Philadelphia Giants (fall 1907), Brooklyn Royal Giants (fall 1908), Leland Giants (fall 1910), Lincoln Giants (fall 1912), Lincoln Stars (fall 1914), and Indianapolis ABCs (fall 1915).

The strange thing is that the roles of Almeida and Marsans flip in the Cuban data. Almeida, who had a near-average walk rate in the majors, was impatient in the Cuban data. He drew 16 walks in 420 AB+BB, a rate of .038 compared to a league rate of .096. Marsans, on the other hand, was close to the league average in Cuba, drawing 35 walks in 474 AB+BB, a rate of .085 compared to the league rate of .096. So while the example of Marsans might provide support for the idea that walk rates dropped when facing major league pitching, Almeida is an obvious counterexample.

How about restricting it to the time period after they debuted in the majors (i.e., Cuban series held after 1911)? The sample sizes get uncomfortably small, but the same tendencies appear: Almeida drew 4 walks in 67 AB+BB, a rate of .060 compared to the series average of .113. Marsans drew 7 walks in 71 AB+BB, a rate of .099 compared to his series average of .118. Again, Marsans walked quite a bit more than his major league rate, while Almeida walked less.

The only conclusions I can draw from this little study are: (a) individual players make different adjustments when they move from one league to another, and (b) we would need a much larger sample than 2 players to accurately calculate the translations for walk rates from one league to another.

Given the lack of information, I'll be happy to go along with Chris's recommendation to use a factor of .95 for walks. I will revise my MLE calculations for Lloyd and Hill and re-post them.

didadjust the Cuban League data that I used for context. As you can see from my last post, deadball era Cuban League walks rates were a little higher than major league walk rates, and I adjusted for those differences. (More notable is the fact that during 1900-1909, Cuban League batting averages and isolated power weremuchlower than the major league rates.)It's interesting that Almeida and Marsans so such diametrically opposed trends, but it's too bad their careers don't give us any insight into the problem of the effects of league strength on walks.

When the HoF releases the full data from their study and gives us NeL league walk rates for the 1940s, we'll be able to get some answers, I guess, but perhaps not until then. (Unless independent researchers like Gary A., who release their findings publicly, complete some studies first, which is beginning to seem the more likely outcome.)

2. Mike Emeigh Posted: January 19, 2009 at 09:16 AM (#3055090)

> [John Murphy:] Martinez is most likely with MiL credit

I am of the opinion that Martinez doesn't deserve MiL credit, if you look at his career arc honestly and in context. In 1985 and 1986, he didn't hit especially well in the Southern League. In 1987, you need to consider that his numbers were posted (a) in the PCL and (b) in Calgary, one of the better parks for hitters in that league. I would argue that it was not until 1988 that he definitely established his credentials for a major league job - and he got one a year later, albeit as a backup with a .618 OPS.

47. Rocco's Not-so Malfunctioning Mitochondria Posted: January 20, 2009 at 04:29 PM (#3056382)

I'm with Mike here. Realistically, it wasn't the Mariners that held Martinez back (and even if it was, that's not REALLY the kind of rationale that merits giving minor league credit); rather, it was Edgar holding Edgar back. He was pretty mediocre in two seasons in AA, . . .69. Bob "Jugement" Dernier Posted: January 21, 2009 at 10:43 AM (#3056935)

> He was pretty mediocre in two seasons in AA

That would be the factor that I think y'all should consider wrt Edgar and the minors. If you just extrapolate his MLB record backwards, à la Monte Irvin, Minnie Minoso, or someone like that, he will look great. But he hit .258 at age 22 in a full AA season, and then .264 at age 23. That can't add up to much if translated into major-league performance.

70. David Concepcion de la Desviacion Estandar (Dan R) Posted: January 21, 2009 at 10:50 AM (#3056949)

Bob "Jugement" Dernier--no one's suggesting Edgar should be credited for those seasons. The issue is whether he deserves it for 1987 and 1988, where his numbers would MLE to major league caliber.71. Rocco's Not-so Malfunctioning Mitochondria Posted: January 21, 2009 at 11:15 AM (#3056971)

when I initially said that Edgar wasn't great in AA, it wasn't to say that he shouldn't get MLE for those years (of course he shouldn't), but that they would have given Mariners management good reason to be cautious about bringing him up even after mashing in AAA, considering that pretty much anyone who was a AAAA player could mash in Calgary. It was like Vegas or Albequerque today - numbers from a park that hitter-friendly should be taken not with a grain, but with a pound of salt.72. Chris Cobb Posted: January 21, 2009 at 11:35 AM (#3056984)

Responding to Bob "Jugement" Dernier -- when the HoM looked at Irvin and Minoso, we (at least, most voters) did not extrapolate their MLB records backwards. Instead, we used translations of their Negro League and minor league play into a major league context. If Edgar Martinez is given credit for minor-league play, it would be on that basis as well (at least, for most voters)."John Beckwith"includesGadfly #151-160 on Monte Irvin. Within this series Gadfly estimates and explains conversion rates 0.93 for batting and 0.87 for slugging, between the NNL and NL in Irvin's time. (Gadfly supports conversion rates as high as 0.95 and 0.90)

"John Beckwith" also includes

- earlier on page two, chiefly by GaryA, some run scoring and "raw park factor" data on 1920s Negro Leagues

- earlier on page two, some biographical tidbits on Sam and John Beckwith (and a third?), Juan and Luis Padron (and a third), and so on

- on page one, explanation of Chris Cobb's conversion rates 0.87 for batting and about 0.82 for slugging, between some NeL and some ML. (Cobb supports batting conversion rate as low as 0.85 --in page two response to sunnyday2 iirc)

- on page one, argument how demographics and economics bear on the expected numbers of great players in the Negro Leagues and majors (karlmagnus, jimd, others)

>>

74. jimd Posted: December 28, 2004 at 04:48 PM (#1043881)

Thought experiment: Let's assume that John McGraw was able to successfully integrate the majors around 1905, or, even better, that segregation never happened. Let's also assume (more controversially) that the percentage of black players mirrors their proportion in the general population as modified by geography. This is a null hypothesis, and yields an MLB that is about 15% black in the 1920's/1930's. What might this look like?

Our expectation is that the bottom 2-3 players at each position would be replaced by the top 2-3 Negro League players at that position. There could be a large variance in this; it's statistically reasonable that some position might have no black players, and another might have 7 (within 3 standard deviations), depending on the relative talent depth of MLB and the NeLs. Based on a typical team having 14 regulars (7 position players + 2 catchers + 5 pitchers), that's 224 regulars for 16 teams and 20-50 black regulars (again 3 standard deviations).

I have no idea where to set the replacement level for Negro League baseball relative to MLB. I have no idea whether the null hypothesis outlined above has any real merit. But it's a starting point.

<<

jimd, or anyone else who follows this,

What is the model here?

For example is there a single distribution of talent, uniform by race, whose upper tail (what size?) plays baseball in the major leagues?

Or does this concern major league players of two kinds, fixed in aggregate number, randomly allocated to particular fielding positions or roles?

Negro Leagues 1939-1947 0.90

Negro Leagues 1948-1950 0.87

Mexican League 1940-1948 0.90

Canadian Border League 1950 0.80 (Class C)

Canadian Provincial League 1948 0.87 (Independent)

Eastern League 1950 0.93 (Class A)

PCL 1949-1951 0.93 (Class AAA)

International League 1947 0.93 (Class AAA)

Cuban Winter League 1938, 1940 0.94

New England League 1946 0.80 (Class B)

Venezuela 1940 0.90

Are these discount rates correct? In all cases, I tried to use Chris Cobb’s most up-to-date BA conversions (as I understand it, Eric Chalek uses a runs converter, so I ignored those conversion rates). In some cases (the Eastern League, for instance), Chris simply used a 93% blanket rate for all minor leagues, which explains why I have an A league and a AAA league at the same rate (I think the Eastern League should be somewhat lower than 0.93, but I’m not sure what the exact rate should be). Also, I could not find a BA conversion for the 1946 New England League, but I did find Eric Chalek’s run conversion rate of 0.7, from which I guesstimated a BA conversion of 0.8.

Here are the league averages I used:

League averages:

Mexican League (from Eric Chalek’s compilations of Mexican League stats)

Year BA

1940 0.290

1941 0.288

1942 0.289

1943 0.273

1944 0.284

1945 0.280

1946 0.281

1947 0.278

1948 0.273

Negro Leagues (from the HOF averages)

Year BA

1939 0.260

1940 0.272

1941 0.256

1942 0.249

1943 0.269

1944 0.274

1945 0.276

1946 0.259

1947 0.273

1948 0.276

1949 0.268

Major and minor league averages were from Baseball-reference.com, where available.

To find players who played in the PRWL and the Negro Leagues, I looked through the various player threads here at the HOM, the scans from Holway at the Yahoo site, BR Bullpen, and the Negro Leagues eMuseum. Here are the 50 player-seasons I used in this project (each year refers to the year in which the winter season began; for instance, Josh Gibson is marked as 1939-1940, meaning that I considered his 1939/1940 and 1940/1941 PRWL seasons for this project)

Josh Gibson 1939-1940

Willard Brown 1941, 1946-1949

Ray Dandridge 1941

Bus Clarkson 1946, 1948, 1949

Pee Wee Butts 1944, 1949

Piper Davis 1947, 1949

Monte Irvin 1940, 1941, 1946

Luis Marquez 1946-1949

Artie Wilson 1947-1950

Marvin Williams 1944

Roy Campanella 1940, 1941, 1944, 1946

Francisco Coimbre 1940, 1943, 1944

Silvio Garcia 1939

Tetelo Vargas 1943

Buck Leonard 1940

Willie Wells 1941

Sam Bankhead 1944, 1945

Leon Day 1940, 1941

Quincy Trouppe 1944

Bill Wright 1941

Sam Jethroe 1944, 1946

Johnny Davis 1945

Larry Doby 1946

Luke Easter 1948

Wilmer Fields 1947

After aggregating everything together, weighting each player-season equally, I came up with a PRWL conversion rate, for 1939-1950, of 0.897. This is a somewhat preliminary finding, dependent upon the conversion rates, though I think that the conversion rates I’ve used are fairly accurate. Overall, my findings corroborate Eric Chalek’s earlier conversion rate of 0.90 for the PRWL (based off Gadfly’s advice). However, I’m going to hold off doing any MLEs until someone has checked my conversion rates.

Unfortunately, I have no data from either the 1938/1939 season or the 1942/1943 season. This is a fairly big deal in the case of Perucho Cepeda, whose case depends on a .465 PRWL batting average in 1938/1939. The lack of foreign players in 1938/1939 can probably be attributed to the league’s newness; as a result, the proper conversion rate should be somewhat less than 0.90, maybe 0.80 or 0.85 (from 1939-1941, with 15 switchers, I calculate a league conversion rate of 0.91, indicating that the PRWL quickly became a high-quality league). In 1942/1943, according to Wikipedia, “World War II affected the league directly, reducing the 1942-43 season's length with only four active teams. This amount of teams continued until 1946, while the rule that allowed the participation of three imported players per team, was suspended from 1942 to 1944.” I’m not entirely sure what this means—were teams prohibited from having any imported players, or were they allowed to have as many as they liked? Regardless, I think the absence of league-switchers in 1942/1943 (and the relative paucity in 1943/1944) reflects the impact of the war, and probably indicates a decline in league quality in these years.

Here's a link to my conversion rate spreadsheet, if anyone’s curious.

3You must be Registered and Logged In to post comments.

<< Back to main