|
| |||
|
You are here > Home > Hall of Merit > Discussion
| |||
Hall of Merit — A Look at Baseball's All-Time Best Tuesday, January 18, 2005Major League EquivalenciesThis thread will be used for examining and analyzing MLE’s throughout baseball history. John (You Can Call Me Grandma) Murphy
Posted: January 18, 2005 at 10:22 PM | 209 comment(s)
Related News: | |||
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
It's clear that, when attempting to create season-by-season MLEs for Negro-League players, some regression to the mean for seasonal totals is appropriate. The questions are: how much and how to establish the mean.
I have been doing regressions in an unsystematic way, and I'm sure it can be done better.
So far I have been using career totals as the mean towards which to regress, but I am considering changing to a rolling 5-year mean, with necessarily alerations at the beginning and end of careers. Would that lead to more probable results?
I have been simply guessing about how far to regress towards the mean: I am not a trained statistician, and would welcome plain-english guidance from those with more knowledge!
For Negro League players, you would regress to the Negro League average. Are those stats available? If not you could regress to the AL or NL league averages, if you believe the quality of play was about equal in those leagues.
Please pardon my ignorance, but why should you regress to the population?
Negro-League averages are seldom available. After making the conversion to major-league equivalents, regression to major-league averages could be done.
I can see that an incomplete career line would be regressed to the NeL mean. Would the incomplete seasons then be regressed to the player's regressed career line? Does that make any sense?
I think obviously Beckwith was not an average Negro League hitter, so regressing to THAT population wouldn't be correct.
Regressing to his career totals would probably be the way I would go absent a better method. You could use your proposed 5 year mean idea IF you have sufficient # of plate appearances during that time I guess.
Go the other way round, converting first, gives you .340MLE, which when regressed 50% as above gives you .295.
In other words, regressing and conversion aren't commutative.
(i) Is this what you're doing?
(ii) Is it correct that you should regress to the NL mean first -- surely right?
(iii) Am I then correct that doing it the other way round wrongly inflates MLEs?
This would explain why we're getting so many HOMable NLers; the difference between .276 and .295 is not gigantic, but it is substantial.
I'm quite prepared to be told I'm out to lunch, and I will promise to understand why if I am.
one section of Beckwith #135, nearly copied here]
Gadfly #123
I will try to find reference to the study I saw that did 1940s Major-Minor Translations for you and post some Negro League Triple-A comparisons when I have time. It was my understanding that the .92 conversion was overall. In other words, BA would be reduced by say .95 and slugging by .89. But I could be wrong.
Clay Davenport's minor league translation factors have some currency; indeed, I know of them only indirectly, by reference in remarks on major leagues. He uses the "overall" measure EqA. If I understand correctly, his translation factors should have Gadfly's property: magnitude between batting and slugging factors.
--
Also in Beckwith, Gary A #108 and jimd(page one) on ballparks used by NeL and MLB in the same year. For any year, a good share of NeL games played in MLB parks should yield a good estimate of NeL-average park factor (where MLB-average = 100).
What I have been doing is converting and then regressing.
It appears the operations are not commutative.
it is not intuitively obvious to me that the regression should precede the conversion. My intuition, and it is purely intuition, is that the other order is correct.
It is also not intuitively obvious to me that regression to the league average is appropriate (a concern seconded above by KJOK).
I may be signing a statistic book or two out of the library or contacting my math department's extension office . . .
definitely Convert before attempting to Regress.
Key Q: what is my goal? If I'm a career voter, I don't much care about regressing anyway. If I want to measure 'peak' or 'prime', I may as well regress to the length by which I measure these. But I were to measure peak for Negro leaguers, I might rely more heavily on contemporary opinion than stats (I will bow more to stats, if we have them, for career value).
Regressing to the league mean won't do anything besides drag everyone's stats to the center.
Regressing to a rolling average seems to make sense to get a truer shape of a career, as long as you kep in the mind the 'typical' career shape. As in, I could see using an uncorrected average for the years age 25 to 30, but not for 32 to 36 when we expect decline anyway.
Tom's personal most important law of stats: everything in life varies with the square root of N (the sample size). [You need 4 times as much data to cut the uncertainty in half.]
I THINK convert, then regress is correct. Even in the Major Leagues, you can have Mike Matheney hit .395 for the month of April, so a very good Negro League player hitting .450 for 60 games in the Negro Leagues would not be unexpected. You're also going to have more Negro League players who hit .125 for 60 games in the Negro Leagues than Major League players who hit even .150 for a full season, etc.
The conversion just gives you what that .450 would have been vs. Major League competition for 60 games. From there, it's no different from regressing any Major League player's 60 games into 154 or 162.
I don't do ANY regression on my MLE's, but I also tend to ignore season to season performance for Negro Leagues players and just look mainly at their career MLE's.
For example, in a paper in the 1975 Journal of the American Statistical Association, Efron and Morris looked at the following scenario: suppose you know the batting averages of 18 players over their first 45 at bats, and don't know anything else about the players. How would you estimate their averages over their remaining at-bats of the season? In that case, the answer is to regress all the players toward the overall mean; if a player starts the season hitting .400, chances are he is an above average hitter, but it is also highly unlikely (unless you have additional information about him) that he will continue to hit .400 the rest of the season. The amount by which you shrink the players toward the overall mean depends on the standard deviation of their batting averages. A summary of the formula and a baseball example can be found in section 5 of this paper by Efron, though I'll warn you that there is a lot of math. A more reader-friendly version appeared in 1977 in Scientific American.
Your situation is different, but I think the same formula could apply. Instead of knowing averages for p players over a few games, you know the averages for one player over the p years of his career. You know what his average was over perhaps 50 games, but you assume he would have played perhaps 140 games under major league equivalent conditions, so you are trying to predict what he would have hit over the remaining 90 games. I think you would use the same formula, regressing toward his career average based on the standard deviation of his batting average over his career.
Do you regress first, then convert, or convert first, then regress? I don't know. Statistical theory is good at coming up with formulas, but telling you how to apply them relies more on the experience and judgment of the people doing the calculations.
BTW, Carl Morris, in addition to being a prominent statistics professor at Harvard, also dabbles in sabermetrics. He has a runs generator formula that may be the most sophisticated one around -- he calls it simple, but it's too complicated for me to use. For some reason, I can't get the link to work in this post, so if you're interested I suggest that you google: "simple runs per game" carl morris.
If 1 league is only .85 as good as the other, then you need to reduce the league average for the lessor league before regression, use .85*.250=.212 for the league average if you convert first, then regress.
If you do it that way, you'll get the same result if you regress first or convert first.
Example A: .400*.85 = .340 regress 50% to .212 = .276
Example B: .400 regress 50% to .250 = .325 * .85 = .276
This is all for sake of example, you'd actually regress 125 AB a lot more than 50%. You can find more on this kind of stuff at tangotiger.net.
This is not an accurate representation of NeL statistics. Few conversions that I have done, even without regression, show the NeL players as leading the major leagues in batting average, which is the result we would expect according to the above.
Here's a lightning-fast survey of NeL leaders, converted, compared to ML-leaders. This will use only .87 as the conversion factor for batting average, not getting into league-offense levels, park adjustments, or regression to the mean, but it should suffice as a demonstration.
Seasons by Negro-League players that could, by this conversion, have won an ML batting title are marked in bold.
Year -- NeL W lead (MLE) -- NeL E lead (MLE) -- NL / Al leads
1920 -- .399 (.347) -- .409 (.356) -- .370 / .407
1921 -- .484 (.421) -- .361 (.314) -- .397 / .394
1922 -- .451 (.392) -- .404 (.351) -- .401 / .420
1923 -- .433 (.377) -- .441 (.384) -- .384 / .403
1924 -- .409 (.356) -- .382 (.332) -- .424 / .378
1925 -- .428 (.372) -- .419 (.365) -- .403 / .393
1926 -- .498 (.433) -- .351 (.305) -- .353 / .378
1927 -- .426 (.371) -- .435 (.378) -- .380 / .398
1928 -- .405 (.352) -- .563* (.490) -- .387/.379
1929 -- .390 (.339) -- .464 (.404) -- .398/.369
*This legendary batting performance by the 44-year-old Pop Lloyd is the only .500 season in these records, and data posted to our site by KJOK replaced this incredible number with one that was much more credible. I use the number from Holway simply to follow my source.
My simple MLE conversion turns up 5 batting titles out of 20 for Negro-League players. One of those is based on a batting average that has been proven to be apocryphal, and one is a tie. Only one other places above the range of major-league averages, the .433 average posted by Mule Suttles in 1926. Even slight regression would easily pull it into the range of ML values, and this average was achieved in a hitter's park probably more extreme than any in the majors at that time.
In sum, I hope this quick example shows that there is no evidence to support the contention that heavy regression is needed to convert NeL batting statistics to major league statistics that happen "often."
On the Arlett thread I've been using a Bill James method, which rather than converting batting averages or slugging averages directly, converts each element of the batting line. I wondered how his approach might compare in terms of the batting average conversion. It reduces a Triple-AAA player's hits by 89 percent, but it also reduces his at bats by 3 or 4 percent. Sure enough, the conversion factor for batting average turned out to be .92! (Actually, it varies depending on the player's minor league average, but for averages between .268 and .360 it rounds to .92.)
For slugging average, James applies a square root to home runs and triples, but does not use a square root for hits or doubles. The ratio depends on the characteristics of the hitter, but for a .300 hitter with 20 HR per year, the conversion factor is .90. If you increase his power to 40 HR per year, it drops to .89; if you keep the 40 HR and reduce the average to .240 (we're talking Dave Kingman) it drops to .88; if you keep the average at .300 with 5 HR per year, it rises to .91. So it's a relatively limited range, and I'd guess that for most of the players we're interested in .89 or .90 would be applicable.
My next effort on major league equivalencies is to look at the quality differential for a group of 1920s PCL players who also played in the majors. There's a Web site that has all the minor league hitting statistics for the Portland Beavers. Roughly half the regulars also played regularly in the majors, so I should be able to come up with a good sized sample. I'm still looking for more data on the PCL run environment during the 1920s though.
Rallymonkey has poiunted out the flaw in convert-then-regress; you should regress to the MLE of the NL average if you do it that way round, i.e. in my example regress to .212 not .250. It appears we have a systematic error here of quite some magnitude.
Again, your claim is simply inaccurate in its interpretation of the data at very simple levels.
1) Of these supposed five times you claim the converted NL leads the majors in batting average, one (1928) has been thrown out as bad data -- I included it because it was the _only_ .500+ batting average over a ten-year period, which refuted your claim that .500 batting averages happened "rather often" in the NeL. A second year was counted as a batting title because it tied the _lower_ of the two major league batting titlists, so it can't be said to lead the major leagues. So that's 3 in 10 major-league leading seasons.
2) That 3 in 10 leading seasons is BEFORE ANY regression, and I am not and have not argued that regression is not needed. I am trying to figure out HOW MUCH. You take incompletely processed data that I present only to demonstrate the wild factual inaccuracy of your claims that .500 averages were not unusual in the Negro Leagues and that major regression is necessary to make them fit with "real" major-league averages. I present a quick data set to show that neither of your points has any factual basis, and you use those incompletely processed MLEs as evidence that the conversions I am doing contain "a systemic error of quite some magnitude."
The treatment of the .498 season that you advocate would regress that .498 season to .338. That is hardly appropriate. 1) If that regression were applied to every NeL season, the number of ML batting leaders would be 0 for the 28 year history of the leagues, not the 1 or 2 in ten years you say we would expect. 2) If .338 is the highest MLE a Negro-Leaguer averaged during the 1920s, then NONE of them are HoMers, which contradicts a) what we would expect from the most conservative demographic estimates, b) the demonstrated performance of Negro-League stars vs. major-league competition, and c) their reputations.
Mule Suttles (our .498 hitter) averaged .341 for his career in 3230 at bats. We can agree, perhaps, that this is a large enough sample size that it doesn't need to be regressed to the mean?
Working just with the .87 conversion for batting average (skipping league offensive levels, park factors, and arguments about whether .87 is correct), Suttles comes out as a .297 major-league hitter, career. That makes .338 41 points above his lifetime average. The average amount by which major-league batting leaders exceed their lifetime averages during the 1920s is 48 points, the highest being 80 points (George Sisler's .420 in 1922). It is evident that a system that regresses the _most extreme_ Negro-League outlier (157 points above the player's lifetime average -- some drawing of this average towards the career mean is clearly necessary) to a lesser variance than the average variance for a major-league batting leader regresses too much to give proper credit to peak value.
I hope later today to take time to think through other posts more fully. Thanks to everyone who has responded on how regression to the mean should be used.
I might also add that the highest batting average I found in the NNL for that year was Willie Wells's .365 (which would translate to .318). Pythias Russ (the .405 hitter) hit .346 in the games I have, though I'm missing two dozen games played in Chicago. Of course, these were all played in Schorling's Park, so I don't know whether Russ could have raised his overall average by 60 points by his performance in those games.
Also, the highest batting average I have in the west for 1921 is almost 50 points lower than what Holway records. I'm missing 14 St Louis games, so I suppose it's possible that Blackwell or Charleston could have raised their averages by that much.
Four out of 20 doesn't seem excessive to me in the least, Chris.
If you regress Mule Suttles.498 50% (which may not be the right percentage -- how many ABs was that .498 on? -- you should normalize to about 500 ABs, I would think, so 50% would be for 125 ABs, 60% for 180 etc.) towards his career average of .341 you get .419, which converted at 87% gives .364, a batting title in the NL but not the AL in that year. Converting first you get .433, but as rallymonkey pointed out you then have to regress that 50% not to .341, but to .341x.87, or .297, which again gives you .364. Regressing it to .341 is comparing apples and oranges; it would give you .387, but that's a meaningless number.
How do you determine how much to regress? You seem to be using a formula that you assume is obvious, but I, alas, am ignorant of it.
An explanation of that would be most helpful.
Data on the number of at bats Suttles had in 1926 is at home, so further work on the specifics of that season will have to wait a bit.
If we are agreed that one regresses to the player's career average, converted to a major-league equivalent, rather than to the NeL average converted to its major-league equivalent, I think we are on the right track. It certainly appears to me that the range you are presenting for Suttles, .364 - .392 depending upon the amount of the regression, is reasonable.
That's what I meant, karlmagnus. I typed 20 by accident.
I'm pretty sure that's right, and am a math major (Cambridge), but it's now 34 years since I graduated and I threw away all my stats books as that was a part of the subject I hated, so could be all wet. But that is how I would regress NL stats gained in short seasons so they were equivalent to ML stats gained in 500AB seasons.
The 1880 Chicago Cubs went 67-17 (.798) in an 84 game schedule. Using a binomial distribution, this is 5.46 standard deviations away from the expected 42-42 (.500) that is the league mean.
How would this team do under a 162 game schedule? Simple extrapolation says 130-32 (.802), but this is 7.70 SD away from 81-81, which is too high. Regression to the mean says that it would go about 116-46 (.716), which is 5.50 SD.
the formula (technically, the normal approximation to the formula) for the standard deviaiton of a proportion (which is what a batting average is) is
SD = square root of [AVG*(1-AVG)/AB]
So if a player hits .400 in 100 AB, we are really sure (95%, which is two standard deviations) that his 'true' average is somewhere within +or- [.4*.6/100}^.5 * 2 = .098 of .400
That looks huge, but even in 550 ABs, it's +or-.042, and it's true that MLB hitters do occasionally hit 42 pts above or below their lifetime avgs in a long season, and they do sometimes hit 100 pts lower or higher in a month.
But I am NOT suggesting we 'regress' NeL stars an extra 50 pts or so for a 100 AB season. The above is based on a pre stats test that assumes we KNOW NOTHING ELSE about the player. For a Beckwith, if his typical NeL avg is .350 and he hits .400 one year, I'd weight his average something like
.400 for 100 AB
.350 for maybe 500 AB, where I estimate that my knowledge of his 'lifetime curve' is possibly 5 times the 'weight' or certainty of his one season
and so his estimated NeL avg for that year would be (.400 * 100 + .350 * 500) / 600 = .358.
voila! (or not...)
It is also not intuitively obvious to me that regression to the league average is appropriate (a concern seconded above by KJOK).
inappropriate inapprop inapp inapp inapp
The player's measured performance should regress toward the expected performance given his contemporary skill.
TomH #13
rightly says why you should convert first --and implies why you should publish the intermediate results. Several people are interested in convert-only; few are interested in regress-only.
Brent #16
may be right about how to approach a different problem, where you have some random sample of part-seasons from a player's career. Suppose you have data for 8 randomly selected 1/3-seasons for someone who played 16 years.
But you don't have that; the sampled 1/3-seasons are dated and you know the player's birth date and something about how careers generally develop. So Brent (regression to career average) is wrong here and TomH #13 is broadly right:
Regressing to a rolling average seems to make sense to get a truer shape of a career, as long as you keep in the mind the 'typical' career shape.
Of course, that quotation isn't a complete plug'n'play solution :-)
--
TomH #13:
Tom's personal most important law of stats: everything in life varies with the square root of N (the sample size).
The application is tricky here, for those (most HOMers) who are interested in the player's full-season achievements rather the player's skill at that time in his career. Consider a 20-game sample from a 50-game season, doubled to provide a 40-game sample from that season. Clear? If not, consider 20- and 40-games samples from a 40-game season.
So, data from just a few more games is more valuable here than in the world ruled by "Tom's" law. In other words: that law is discouraging but don't let it down your search for more boxscores!
If so convert then regress is wrong if you want career WS.
Even if not, it is wrong if you want peak WS.
Gary A #96, jimd #99
http://www.baseballthinkfactory.org/files/primer/hom_discussion/24597/P100
Gary A #[10]8
Redland Field, Cincinnati, 1921 park factors
111, 104 adjusted in Negro Leagues (Gary A #96)
99 in National League (Gary A #96)
95 in National League (jimd #99)
--
As I said in #11: For any year, a good share of NeL games played in MLB parks should yield a good estimate of NeL-average park factor (where MLB-average = 100).
But we don't have a good share of NeL games played in MLB parks, and for some NeL games we have no data.
thus in a 162 game schedule they would have gone .715.
Close enough. I think I've got the main point. Which is that the relationship to the mean, measured in standard deviations, remains constant when converting the results from a small real sample to a larger extrapolated one. (Which is what "regression to the mean" means if I think about it. D'Oh. ;-)
To be able to calculate appropriate regressions, I'll need statistics that include at bats or games more often than Holway does. That means getting hold of a MacMillan 8-10 edition, I think.
This comment is by no means meant to end discussion of regression, btw, just to express my appreciation and to note its implications for data gathering.
I'll pick up on the theoretical question that I think gadfly meant for me and not for Gary A:
So, if the distribution of superstars is roughly equal between white and black, why do none of your translations for the Negro League Superstars from the 1920s and 1930s end up with a lifetime BA of .330 to .350 like their white counterparts?
I don't have a firm response to this, but here are my thoughts about the situation. I have never attempted to justify my translations in demographic terms. I don't think it's possible to derive from demographic/economic arguments the percentage of stars by race with any certainty. For the purposes of Hall of Merit elections, I believe that a quota approach to electing black player would create an inappropriate double standard. One of my goals in trying to develop accurate and reliable MLEs has been to make quota arguments unncessary. So I have not attempted to measure the results of my translations against any demographic standard.
That said, I'm not at all sure my MLEs are correct. My gut, which is sensitive to players' reputations of greatness and to expert opinions, says that they are a bit low. However, my standard for calculating them is to base no step in the process on my sense of what the numbers _ought_ to show, but to construct a system that creates statistics that are (1) derived from the best available data and (2) based on conversion methods that have been discussed by the interested membrs of the electorate and that have been generally (if not universally) accepted as sound.
If the results seem lower than they ought to be, we have the commentary of experts to challenge the results and to help us find ways to do better. I think that you are probably right that my MLEs are a little low, and I hope the electorate here will consider the serious likelihood of that based on your comments and other evidence of expert opinion.
But I can't change the system based on opinion, or it ceases to be a system that aims at an objective numerical statement of value. If we find evidence that I have erred in a calculation or gain access to evidence that leads to different conclusions about the conversions, I can make changes to improve the system accordingly. I hope to do so. I believe improvements in my handling of regression based on recent conversations will improve the system and give a fairer representation of peak value.
I can see a number of points where the evidentiary basis for the conversion factor could be improved:
1) NeL park factors from 1938-1948. A lot of the data for the conversion comes from Doby and Irvin in Newark. We know that was a hitter's park, but how extreme was it, exactly, and what percentage of their games did they play there? If I have used too low a park factor, that would depress the conversion factor incorrectly.
2) Data on the overall level of offense in the NeL from 1938-1948 in comparison to the major leagues, especially in the Negro National League. Evidence from the 1920s provided by Gary A. indicates that, although NeL levels of offenses tracked with ML levels, these diverged at times by up to 10% (I think that the difference was even greater in the late teens), with the ML levels being higher. I have taken this into account for 1920s conversions, which raises the MLEs of Negro-League players. Eyeballing the numbers for the late 1930s and early 1940s, it looks like offensive levels were high in the Negro Leagues at that time, quite possibly higher than in the majors. If this is the case, that again, if not properly accounted for, would depress the conversion factor incorrectly.
3) Use of AAA conversion studies could help to better model the process of arriving at a conversion, provide a point of comparison for the Negro Leagues' competition level, and add statistics from NeL stars playing in the high minors to the pool of data available for the calculation of a conversion factor. Studies on the level of competition in the Mexican League would be similarly useful (and will be important for the assessment of players like Cool Papa Bell, Ray Dandridge, Martin Dihigo, and Will Bill Wright in any case).
4) Striking the right balance between conversion rates for batting and conversion rates for slugging. The discussion of the square-root relation between the two numbers has been helpful and should help to improve the accuracy of individual conversions and provide a standard by which the conversion factors can be judged. Obviously, the .87/.82 split I am using now isn't right. It appears to be a compromise between two different conversion levels. Figuring out why my calcuation of conversion factors from the data produced this discrepancy could lead to a more reliably derived factor.
I am hopeful that I/we can do better on all of these fronts.
their meaning and statistical foundation explained by the author
how many black stars?
- production and recruitment of quality ballplayers before WWI;
- (im)maturity of baseball in the South;
- black residence in (rural) South, migration to North
#[2]71 jimd's racial and regional demographic "Adventure"
compare how many HOMers each year?
- production and recruitment of quality ballplayers;
- regional (im)maturity of baseball
Having read through the postings above, I think I understand the rationale and the formula for calculating regression to the mean.
To summarize: regression to the mean corrects for the greater variance created by small sample size by keeping the number of standard deviations from the mean constant when moving from the small actual sample to the larger, hypothetical sample of data.
The ratio of the standard deviations of the two sample sizes is the square root of the ratio of the two sample sizes. The variance from the mean in the small sample is multiplied by the ratio of the standard deviations to find the variance that is the same number of standard deviations from the mean in the larger sample.
Mule Suttles hit .498 in 1926 in 212 at bats. His career average was .340. If we accept .340 as the mean and 500 as the normal number of actual at bats in a season.
Since some voters will want to see unregressed conversions and some will want to see regressed conversions, I'll convert and then regress, using the .87 factor for now. .340 --> .296, .498 --> .433
his .433 MLE average is regressed as follows:
Variance from the mean: .433-.296 = .137
Ratio of standard deviations: square root of 212/500
Multiply the variance by the ratio, and add it to the mean.
Suttles' 1926 MLE average, regressed: .296 + .089 = .385
He still has the highest average in the majors that year, but by a small rather than a huge margin.
Have I calculated this correctly, given our assumptions about what the mean is and what the conversion factor is?
Now, this conversion leaves out two important factors in the overall conversion calculation:
1) difference in league offensive levels
2) park effects
The proper place for these in formula needs to be ascertained, and I think that placement will make a difference in the results. I think that park factors should be applied first, before conversion and before regression, and that difference in league offensive levels should be applied at the same time as the conversion factor.
Does that seem right?
Comments on that?
As to the number of at-bats to which to regress, I'd welcome suggestions. One possibility would be to use the number from the rolling seasonal averages, as long as those were totals that might occur in a full major-league season? Or would it be better to set a derived norm based on typical at bats per game and typical games per season, and stick with that?
Comments?
I'd recommend regressing to what we consider to be a 'normal' set of at bats for a season. Maybe 550?
1) changes in competition levels in the Negro Leagues. I don't think we're close to having enough evidence to calculate these statistically, but I think there's enough evidence to indicate that levels changed. This is an area that needs to be handled subjectively for now.
2) normal paths of improvement and decline for major-league players. Since all the conversions are based on play happening in sequence in players' careers, the progression of careers must be influencing the conversion. I've tried to exclude instances where obvious improvement and decline are influencing the data, but there may be subtler influences even in the most level pieces of evidence available. Comparative data on this subject could also help to improve the conversions by enabling us to correct for its influence on the data I have used and to enlarge the set of data available by making it possible to use players during their periods of significant improvement and decline.
Nice job putting post 46 into English, Chris! Couldn't have said it better if you gave me a week
Thank you! I'm a professional with words, so that part comes easy to me. It's the numbers that I struggle to get straight!
Mule Suttles hit .498 in 1926 in 212 at bats. His career average was .340. [For discussion of regression, suppose] we accept .340 as the mean and 500 as the normal number of actual at bats in a season.
The 212ab sample from his 1926 season is a small part of our sample from his career, which is more than 1000ab (equivalently, he usually batted above .300). So I am comfortable with .340 as a talking point.
#46:
Now, this conversion leaves out two important factors in the overall conversion calculation:
1) difference in league offensive levels
2) park effects
The proper place for these in formula needs to be ascertained . . .
#51
Two additions to the list of issues for conversions from post 46 above:
[3] changes in competition levels in the Negro Leagues. . . .
[4] normal paths of improvement and decline for major-league players.
Rather,
[4'] Player's own path of improvement and decline, where the normal path for mlb players will be used in default of useful information about Player.
Otherwise, yes.
These issues pertain separately to the conversion of his .498 part-season average and the conversion of his .340 part-career average. For example, you want to season-park-adjust his .498 and career-park-adjust his .340.
--
Regarding [4] and [4'], iiuc.
Given ample data or a significant amount of data for adjoining seasons, you will use some (maybe weighted) 3-yr or 5-yr average of season records rather than his career average, and there will be no issue of improvement and decline.
I meant [3] and [4] as issues to address in order to improve the accuracy of the general conversion factor, though I take your point that career path improvement and decline (not to mention radical changes in offense levels such as the one that took place between the late teens and the early twenties) does make it problematic simply to use career average as a baseline.
Given ample data or a significant amount of data for adjoining seasons, you will use some (maybe weighted) 3-yr or 5-yr average of season records rather than his career average, and there will be no issue of improvement and decline.
Yes, and park-adjustments to the baseline will be easier to make. You imply above that 1000 ab is a sample size that gives you confidence in a baseline. At about what point does sample size become too small for confidence as a baseline?
Following Tom H's explanation of standard deviation above, I calculate that in a 1000 at bat sample, we can be 95% certain that the a player's true average is within 30 points (2 SD) of the average generated in that sample. In a 500 at-bat sample, 2 SD is 40 points. In a 2000 at-bat sample, 20 points. l
OK, to your point.
Regarding the 1000 ab, I chose 1000 because of its size relative to 212. I am not comfortable with regression of part-season to career average where the part-season is a large part of the sample for measurement of career average. Should I be comfortable?
-
Ron Wargo, 1944 Ballot Discussion #94
Negro League infielders seem to have difficulty in our rankings. Are we being too harsh? While outfielders like Torriente and Hill breeze into the HOM, only Lloyd can claim that distinction for infielders so far. Johnson & Grant took some time, although Johnson made it relatively quickly. [anticipations deleted]
This is a good observation, potentially as important as the epochal bias in HOF inductions from the Negro Leagues, against those who retired before Buck O'Neil played (or followed?) the game. The infield positions {3B, SS, 2B} are underrepresented among the great Negro ballplayers by reputation, inasmuch as I know the reputations. Only John Henry Lloyd is sometimes called "maybe the best" or generally included in the top ten, I think. Lloyd's age, older than 8 or 9 of the ten, underscores his case.
Does it indicate bias? I don't know. If so, it may be a common bias. Only Honus Wagner is routinely considered one of the top ten MLB players, or sometimes called maybe the best, and only George Wright is routinely considered one of the top ten 19c players, or sometimes called maybe the best. Each is the Shortstop from the Dawn of Time, older than any of the other players commonly considered one of the top ten.
Anyway, it so happens that I have his stats for 1921 and 1928, along with league and park data. So here’s how Herrera’s Negro League and major league careers compare. I have to say the results were a little surprising:
Ramon Herrera--Raw Averages
Year-age-team---G---PA----AVE--OBA--SLG
1921-23---CSW—-68--307---.234-.304-.305
1928—30---CSE---29—133---.317-.341-.397
NeL total-----------97--440---.261-.316-.334
AL totals, at ages 27-28:
1925/26------------84--308---.275-.320-.333
I don’t have stats for Herrera’s 1920 NeL season, but Holway has him hitting .259, which seems to indicate it probably wouldn’t change his career NeL stats very much.
League Context (park-adjusted)*
Year---league-----AVE------OBA------SLG
1921---NNL----.268668--.329507--.357343
1928---ECL----.281519--.333020--.383140
NeL total-------.272767--.330585--.365572 (prorated)
1925/26-AL----.288000--.359000--.404000**
Dividing Herrera’s percentages into these adjusted, prorated league contexts, you get these relative averages:
---------AVE------OBA------SLG
NeL--.955976--.956362--.914122
AL---.954861--.891364--.824257
Divide his major league relative averages into the corresponding NeL figures, and you get these conversion factors:
---AVE------OBA------SLG
.998834--.932036--.901693
In other words, Herrera’s Negro League and major league batting averages were almost the same, relative to his league and park, but he walked less and hit for less power in the American League.
There are obvious caveats:
1) This is very limited, comparing 440 NeL plate appearances to 308 AL plate appearances.
2) His NeL numbers are weighted toward the 1921 season, which is four years before his major league appearance. He hit significantly better in ’28, so he may simply have been a better hitter in the mid-to-late 20s—although the ’28 sample is only 29 games.
3) His NeL numbers are also divided between the ’21 NNL and ’28 ECL, which were very different leagues. I don’t really know how they stack up against each other, quality-wise; offhand, I’d say that the ’21 NNL might have been better, simply because it played the season through; the ’28 ECL disintegrated in late May, though most of the teams continued to play each other into October. (NOTE: All of the Cuban Stars’ 1928 games were against teams that were in the ECL in either ’27 or ’28—Brooklyn, Hilldale, Lincoln Giants, Baltimore, Bacharach Giants--along with the Homestead Grays, who clearly of league quality.)
I would hardly suggest that these conversion rates are accurate for all players at all times. Still, I thought it would be good to get this out there as a data point, especially since almost nobody knows about Herrera.
*I found BA/OBA/SLG park factors for Redland Field in the 1921 NNL, and adjusted Herrera’s league context, prorated to the number of plate appearances (for OBA) and at bats (for AVE and SLG) he had at home and away. He played in 28 games at home, 40 on the road. The raw park factor (runs) for Redland Field that year was 110; but the averages show a much milder effect; and, as in the majors, the park cut home runs significantly. There are some technical steps I skipped (such as accounting for Redland Field not being among the Cubans’ road parks), as the effects are pretty small and I was reaching the point of diminishing returns. In 1928, the Cuban Stars (E) were a road team, so I didn’t bother with park effects (again, there could be an effect depending on which road parks they played in, but I doubt it was very large).
The Redland Field factors for 1921, if anyone's interested:
AVE: 1.018774
OBA: 1.012775
SLG: 0.948110
HR: 0.441514
**-These are baseball-reference.com’s park-adjusted league numbers—as I understand it, they’re prorated for Herrera’s plate appearances, so they give the park-adjusted league context.
All of these data are very useful, and muchly appreciated, but let me offer one caveat.
Some people (including Bill James) have pointed out that in some circumstances, the drive to succeed is an important factor. This is anecdotally seen in many feel-good stories in all sorts of sports (and non-sporting events in life), and might be reflected in things like black-v-white exhibition games, and possibly in initial conversions to MLB. If some players see making the majors or beating the white players or whatever as a do-or-die item, it can surely affect their performance. The 'cornered rat' or 'mother bear' theory if you will. With fewer vocational options, kids from difficult bakcgrounds have exceeded normal expectations of 'making it' in sports. And it's plausible to me that those who crossed over from NeL to MLB may have been driven to succeed in ways some of us can ony imagine.
---AVE--------OBA--------SLG
0.846757--0.870737---0.795831
Of course, this represents 308 AL plate appearances compared to only 133 ECL plate appearances.
I think your point is valid and important regarding some of the black-white exhibition games. Reading The Pride of Havana, it's clear that while some major league managers (such as McGraw) took their exhibition games in Cuba very seriously, in other cases the major league teams treated it as a holiday, and sometimes key players wouldn't show, players would show up drunk, and so forth.
When it came to making the majors when integration arrived, however, it seems to me that both the white and black players must have had extra motivation. The white players' jobs were on the line, plus they didn't want to be showed up. As you describe, the early black players also must have had an intense desire to succeed. I think psychological factors must have worked both directions.
I'm not sure how to produce an equivalent adjustment for the Negro Leagues, as there were more multiposition, "double-duty" players, so it would be harder to figure out what to subtract. Negro League pitchers probably hit better relative to league than their white counterparts, but it was still the weakest-hitting position--so taking pitchers' hitting out of the league averages would cause Herrera's league context to go up, and he would look worse as a hitter in the Negro Leagues.
In other words, it appears that the Negro League MLE conversion factors I presented above should be even higher, if I can figure out how to adjust for this. Of course, all the other caveats (sample size, etc.) remain.
Is it possible that this effect is created not by removing pitchers but by adjustments for pitching quality? That is, does bbref take the Boston pitchers out of the offensive context for the Boston hitters? I'm pretty sure, actually that bbref does this. Would it account for the effect that Gary has observed, or not?
--
Quoting the BB-Ref Glossary: Adjusted OPS+
[OPS+] is calculated differently from the Total Baseball PRO+ statistic. I chose OPS+ to make this difference more clear.
. . .
My method
1. Compute the runs created for the league with pitchers removed . . .
Note, TB7 adopted 'OPS+' in place of 'PRO+' used in TB3-TB6, so 'OPS+' no longer suggests any difference between the statistics.
FWIW, I agree with Sean Forman (BB-Ref) about the calculation of season OPS+ but disagree about career OPS+. A few years ago, I reported the BB-Ref career calculation as a mistake, re: George Davis.
--
FWIW, the BB-Ref adjustment of ERA seems to be routine.
: AL1925 lgERA 4.40
: Boston lgERA* 4.52
Accounting for the ERA roundoff to #.##, that is consistent with any routine
: adjustment factor in interval [1.02497, 1.02958].
In turn, that fits the reported
: Boston PPF 103.
Ramón (Paíto or Mike) Herrera
Cuban League
Season Team G AB R H 2B 3B HR SB Avg
1913-14 Almendares 27 87 9 15 1 0 0 0 .172
1914-15 Fe 34 121 11 35 3 0 0 4 .289
1915-16 Almendares 29 151 22 46 2 1 0 12 .305
1917 Red Sox 14 54 3 9 1 0 0 0 .167
1917-18 No season
1918-19 Almendares (Statistics not available)
1919-20 Almendares -- 80 15 24 1 0 0 0 .300
1920-21 Almendares -- 71 9 10 0 0 0 0 .141
1921 Almendares -- 20 4 7 0 1 0 0 .350
1922-23 Almendares -- 97 10 27 4 0 0 0 .278
1923-24 Almendares -- 134 22 40 6 1 0 -- .299
1924* Almendares -- 83 10 19 -- -- -- 1 .229
1924-25 Habana -- 138 23 43 5 1 1 0 .312
1925-26 Habana -- 152 23 47 7 4 1 0 .309
1926-27 Habana -- 94 24 36 5 1 1 2 .383
1927-28 Habana -- 149 22 47 9 4 2 1 .315
1928-29 Habana (Statistics not available)
1929-30 Habana -- 90 9 29 4 0 0 -- .322
Total 1521 216 434 48 13 5 20 .285
</pre>
Notes:
1913-14 – No American players in league.
1913-16 seasons played at Old Almendares Park.
1917 – Record show is for an alternative league that was organized (and displaced the regular league that year); games played at Oriental Park. No American players in league.
1918-30 seasons – most games played at New Almendares Park.
1919-20 – Only one American player in league.
1921 – Season lasted only 5 games. Tied for league lead in triples (1).
1924* - Special season. The regular season was terminated early when Santa Clara reached an 11.5 game lead. The weakest team (Marianao) was dropped, its players were redistributed, and a 25-game special season was played.
1926-27 – A rival league was formed (“Triangular”) that raided many of the better players. Herrera led Cuban League in runs scored (24).
I'd like to figure out what bbref is up to, just to make sure I get Herrera right. Here's what they have:
year-AL ave--adj ave*--Boston BPF
1925-.292-----.300------100
1926-.281-----.286-------97
*Park adjusted league average for Herrera/Boston
He gives his first step in figuring Adjusted OPS+ as "Compute the runs created for the league with pitchers removed..."
1917 – Record show is for an alternative league that was organized (and displaced the regular league that year); games played at Oriental Park. No American players in league.
1918-30 seasons – most games played at New Almendares Park.
1919-20 – Only one American player in league.
1921 – Season lasted only 5 games. Tied for league lead in triples (1).
Note that five MLB seasons were played and nearly five calendar years passed between the '1917' and '1921' seasons in Cuba. I have two suggestions, one for humans who want line-sortability by computer.
1917w (for winter)
1918-19
1919-20
1920-21
1921f (for fall)
The other is for inside a database and it amounts to the following in chron order for the five given Cuban seasons and five interpolated MLB seasons.
1917 w
1917
1918
1919 o
1919
1920 o
1920
1921 o
1921
1922 f
This will be useful for any human using a spreadsheet who needs to subtract "years" within the Cuban league, preserving familiar properties.
Eg, '1917 w' to '1922 f' is a six-year span, 6 = 1922-1917+1
When creating MLEs for pitchers, should ERA and ERA+ be converted using the same multiple that one uses for EQA with hitters?
I've been assuming that the same multiple should be used, but I'd like those with better knowledge to confirm that for me.
That appeared to be the conclusion reached in the discussion on the Wes Ferrell thread concerning how to work out the impact of pitchers' OPS+ oon ERA+, but I'm not entirely clear on that.
Is that correct, or no?
My conversions actually track to batting average and slugging, using different rates (which should and eventually will be related by a consistent, theoretically justified ratio), not EQA.
So let me repose the question more concretely:
The evidence, so far, suggests a .87 ba/.82 sa conversion ratio. How ought the ratio for ERA+ be linked to these?
If you want to have a ba/sa pair that fits the square root formula, go with .90/.82.
I'm aware that gadfly's data may lead to a reconsideration of these ratios, but for now getting the principle of how to set up pitcher conversions in relation to hitter conversions will be enough to make some progress possible.
In post # 184 EricC wrote:
Because of selection bias, we arrive at the incorrect conclusion that league A is stronger.
and Brent responded:
I'm sorry to keep coming back to this argument, but now I see another flaw in it....
Understanding selection bias is such an important part of doing proper MLE's, thought I'd try to explain with an example.
Let's say we have League I and League II, and we 'know' League II is a stronger league, with League I having a TRUE strength at around 90% of the strength of League II.
Let's assume that League I has two types of players, with half of all League I players being Type A, and the other 50% being Type B, and that this split holds for all talent levels (superduperstars, allstars, very good players, average starting players, etc.) Type A players that move to League II are able to retain 95% of their value, and Type B players that move to League II retain 85% of their value (averaging back to 90% overall). The problem is that more Type A players will successfully transition to League II than Type B as detailed below, which will skew comparison results.
As players are actually selected to join League II from League I, all of the superduperstars, both Type A and Type B, will be selected (50% will be Type A, 50 % Type B)
However, when we get down to very good players, Type A players will continue to play well enough in League II to be selected, but some of the Type B players will not retain enough value to either be selected or hold a job in League II.
As you get down to average starting players, some of the Type A players will be selected into League II, but almost no Type B players will, etc.
So, the mix of players who played in both leagues will NOT be 50% Type A and 50% Type B, but might be something like 75% Type A and 25% Type B.
At this point, if we were to develop MLE's based on the performance of players moving from League I to League II, it would APPEAR that the correct conversion factor would be .93 (.75 x 95% + .25 x 85%) instead of the REAL factor of .90!
A clear explanation of the theoretical problem, but should this be of practical concern to us?
Can/should we do anything other than recognize that the conversion factor is an average, and that this average might be adjusted up or down for individual players, depending on the extent to which we believe their skills sets would have fit the major-league game, _if_ we believe that they should be evaluated according to that standard and not on their merits within the NeL context?
I certainly think there's an argument to be made that we SHOULD be evaluating the Negro League players primarily on their merits within the NeL context.
However, on "Can/should we do anything other than recognize that the conversion factor is an average...", the methodology being discussed above will be using the WRONG average as the starting point, which I believe does have some practical implications in player evaluation.
1) Regression to the mean or the Don Padgett Problem:
(From 1937-48, Padgett was a .288 hitting catcher-outfielder in the National League. In 1939, Don whacked .399 in 233 at bats. Very obviously, he was not the second coming of Ted Williams.)
If you ask me, you have to adjust first for park and league effects. If you don't do this, you will be mixing different park and league effects from different seasons together and the results will simply be a mess.
Mule Suttles is a pretty good example of this. If the Mule had played in the Majors of his time, I seriously doubt that he would have ever lead the Majors in BA (maybe if he was in the Baker Bowl, having a great year and with the stars aligned just right).
In 1926, Suttles played for St. Louis in an all-time great hitters park that inflated statistics about like Colorado does presently. In fact, the park pretty much made Suttles unpitchable.
Suttles was not a pull hitter, he liked his pitches out over the plate. With a 250 foot left field wall in St. Louis, it was suicide to pitch Suttles in because he could just muscle it to, or over, that wall. So the pitchers had to put it over the plate just were he liked it.
There was no concurrent park in the Majors that inflated offense like this park.
However, in 1925, Suttles played in Rickwood for Birmingham, one of the greatest pitcher's parks in history. If you do not adjust Suttles for the parks and leagues first, you will get a average that mixes and matchs these two very odd parks together.
Of course, I realize that, with much of this statistical info being unavailable, this is easier said than done.
However, once you have adjusted as best you can for park and league, then you have to regress to the mean of the player's current talent level. The idea of regressing a player to the mean of the League is obviously worthless and regressing to the player's career average is better but still not right.
This, of course, is the really interesting question: "How many at bats are necessary for skill and luck to even out and give a true representation of a player's skill?"
My personal opinion is that 500 at bats is good but not really enough to be totally certain (as evidenced by the number of Norm Cash or Brady Anderson type fluke seasons in the Majors), but that 1000-1500 at bats is better.
In other words, a Negro League player should be regressed to his average over the nearest 1000 or so at bats, at the least. For example, if John Beckwith hits .450 in 200 at bats and .350 over the nearest 1000 at bats, his average should regress to .386 with a .450 in 200 at bats and then .350 in the next 350 at bats.
(And, as someone pointed out, this works in the reverse - if .250 over 200, then regress to .314.)
I think Chris Cobb has the right solution with a rolling five year plan, though I would amend it to being simply the closet 1000 or 1500 at bats
For Example:
1940: 245 at bats
1941: 302 at bats
1942: 205 at bats
1943: 256 at bats
1944: 251 at bats
normalized for 1942 would be 205 + 302 + 256 + 237/496 (1940 and 1944 together).
I've been doing this for years and it seems to work just fine with, as was pointed out, the caveat that adjustments need to be made at the beginning and end of the player's career.
When I get some more time, I'll put up two more posts on:
2) Brent's interesting posts on Buzz Arlett; and
3) KJOK's interesting posts on Ramon Herrera.
Two other random thoughts:
1) As I stated before, I don't think that the conversion factor from the Majors to the Negro Leagues would deviate much due to differences in talent distribution.
I think that the distributions of talent between both Leagues were probably very very similar. I think that the context is pretty much the same with the real difference simply being the talent level.
In other words, there is a conversion constant (for each year) between the Majors and Negro Leagues, and it is important to know this to adequately judge how the Negro Leaguers are rated. And this rating should be by how they would have performed if they were able to play in the Majors (individually, not all at once, so as not to disrupt League levels).
2) I think it's funny that, here in the Hall of Merit, the Negro Leaguers are still being badly discriminated against. For example, Dick Lundy and Frankie Frisch are very very similar players; but Frisch is in at #2 in the 1944 ballot and Lundy finished #29.
Basically, Lundy is pretty much the same player as Frisch, possibly slower, but with more power and better defense (Lundy was a shortstop, Frisch a second baseman).
This is why the true year-to-year conversion rates are needed.
In 1923 he hit .354 in 148 games. 204 hits in 577 at bats. He had 31 doubles, 7 triples and 15 homers. I will try and give you some other names to gauge on.
Wade LeFler led the EL with a .369 average in 1923. Elmer bowman hit .366, George Fisher .365, Herrera .354 and Si Rosenthal .338.
20 players in the league scored 100 or more runs. Walter Simpson had 131, R. Emmerich had 123 and Herrera was next with 122.
Bowman led the league with 211 hits, Ted Hauk was next with 204. Then came Herrera (204), Emmerich (202) and then John Donahue and A. H. Schinkle with 201.
In 1924 Herrera hit .303 in 152 games. He had 191 hits in 631 at bats with 34 doubles, 5 triples and 8 homers. He scored 114 runs.
Other recognizable major league names (with more than 100 hits) in the Eastern League in 1924 were Wade Lefler (not so recognizable) at .370, Lou Gehrig at .369, Earl Webb at .343 and Clyde MIlan at .316.
In 1927 Herrera hit .243 in 94 games. He had 90 hits in 371 at bats, scored 41 runs and knocked in 25. He had 12 doubles, 0 triples and 2 homers.
I'm sure you are interested. I did find all five of the games played between the Lincoln Giants and the Philly Colored Giants in the late summer of 1928. The games were played (one in each) in Worcester and Brockton, Massachusetts, and then the final three in New Bedford. The Philly Giants won the first three and the Lincoln Giants took the final doubleheader.
I now have about 60 full boxscores on Bill Jackman in the 1925-1930 period. I have him 5-0 in five starts against major leaguer pitching opponents in 1925 and 1926. I also have him working in relief (no decision) in three more games against major league hurlers those same two seasons. I haven't even scratched the surface.
You probably already have these, but here are two other leads for Jackman.
1) Jackman, Burlin White, and company are discussed a little (3 or 4 pages) in the book 'Even the Babe Came to Play' by Robert Ashe (1991). The section about Jackman talking to batters and driving them nuts is pretty funny even with the racist undertone that Ashe gives it.
The book also states that Jackman went a reported 48-4 in 1927 with 2 no-hitters.
2) Jackman gave an interview in the Jan. 17, 1947, Boston Traveler newspaper. I know you stated in the Rogan thread that you had a 1947 Jackman interview, but I figured that I'd post this just in case it is not this one.
I would be very interested in knowing how Jackman did against the Lincoln Giants in those 5 games. I have always wanted to find proof of Jackman's greatness and the combined totals, posted in the Rogan thread, of the 8/30 game from that 5 game series and the 9/23, 9/30, and 10/7 games in New York are the best evidence I've ever seen.
Jackman, in four games against an elite Negro League team, went 1-3, gave up 33 hits, 16 runs, 11 walks, while striking 28 in 33 and a third innings with his team playing very poorly behind him. Taken in context, this suggests that his reputation was deserved.
How did he do in the other 4 games of the series?
Ramon (Mike) Herrera 1920-1929:
1920 Linares’ Cuban Stars of Havana
1921 Linares’ Cuban Stars of Havana
1922 Springfield Ponies (Eastern League A)
1923 Springfield Ponies (Eastern League A)
1924 Springfield Ponies (Eastern League A)
1925 Springfield Ponies (Eastern League A)
1925 Boston Red Sox (American League), last month
1926 Boston Red Sox (American League)
1927 Mobile Bears (Southern Association A)
1927 Springfield Ponies (Eastern League A)
1928 Pompez’ Cuban Stars of New York
1929 Pueblo Steelworkers (Western League A)
Of course, during the 1920s, the current Triple-A Leagues were AA Leagues, one step below the Majors. Herrera spent his decade mostly in A ball (currently AA) missing only the Texas League, two steps below the Majors.
The first is by Jerry Nason and he compares Jackman to Paige. Mention is made of Jackman playing for East Douglas in the early 30s as a teammate of Hank Greenberg (I think it was 1929) and Jackman averaging 16 strikeouts a game with a 10 dollar bonus for each strikeout, on top of his $175 base per game. Seems very reasonable from other sources I have put together.
Then there is a story about Jackman tossing a 5 inning 3-2 win in the Boston Park League in July. He fanned 7 and was listed at 54 years old; agreeing with the early version of his birth year (1894 vs 1897).
I have "Even the Babe Came to Play" though I hadn't made note of the Jackman story. I have found many similar stories. He was almost as big a draw as the third base coach as he was on the mound.
I have the 47-2 record from, I think, a 1929 paper but I do not know how much stock I put into it.
So far, and against all comers, I have:
1925: 9-1 record with 46 hits allowed in 78 innings. 19 walks and 60 K's (BB and K missing from one of those games). He allowed 20 runs and among his victories were games over ex-major league hurlers Buck O'Brien (twice), King Bader and