3 Decades of Minor League Translations
Attached is the current fruit of a long-term project I’ve been working on. Namely, a large reference of minor-league-to-major league translations (zMLE or ZiPS MLE). We get back into the late 70s here as going back to then, there’s always some source that has the statistics required. Once we get earlier, there are some years that have BB and SO data, generally the most important missing data, but it’s extremely spotty and sometimes, not even whole years are filled. Some day, I’ll have these going back for as long as there was minor league baseball as SABR’s database project proceeds.
So, what value do these have? For me, two things stand out as the most important. First, having these either reminds us or introduces us to fine players that never got a shot in the majors. We live in a time when Japan is a real alternative option for Ken Phelpsers like Greg LaRocca to have lucrative careers playing baseball and when increased understanding of the usefulness of minor league statistics in the mainstream has resulted in fewer guys getting completely overlooked.
Second, more information helps us increase our knowledge of how players age and develop. For systems that look at comparable players, it’s quite useful to have more 18-21 year-olds that aren’t stars to help us crack, from a statistics standpoint, who will develop and who will not.
The biggest problem with doing these, aside from piecing together data that wasn’t kept all that lovingly at the time, is the lack of minor league park factors from recent years. For the original MLEs, James simply used the teams runs scored and runs allowed to estimate a park factor. With game-by-game data mostly lost to history, I had to take a similar approach. Using the decade of known data, I constructed a model for estimated park factors from minor league hitting and pitching statistics. To get these to have value, I used a longer time frame (5-year factors) and regressed the numbers more heavily to the mean. As such, the factors are fairly conservative, but a park that long-term has a “true” HR factor of 0.80 is extremely unlikely to model as a 1.20 without home/road data. This doesn’t work for major league parks (which it doesn’t have to, since I can use actual there), but for minor league teams, success and failure are generally pretty ethereal - a major league team’s farm system quality essentially boils down to just a handful of the hundreds of minor leaguers they employ in any given season.
So, I hope that so of you find this to be useful. This information can be used for any non-profit endeavor that you wish and can be used for any original research for either for-profit or non-profit. I would appreciated a credit, of course. And if you really find this useful and have the means, I’d greatly appreciate something stuck in my tip jar (using the donate button below), which will help reimburse me for all the beer I drank to get through all this number-crunching. 2009 is included, as well.
zMLE for Excel 2007
zMLE for Excel 2003 (there are more than 65 thousand rows, so data is split into extra sheets)
Minor League Park Factors, Real and Estimated
zMLE for Pitchers (CSV)
zMLE for Hitters (CSV)
In the future, I hope to add biographical information to this and fix errors. While we have 30 years of reasonably good data, there are still holes in data (most notably by far, neither SABR nor BR nor the Cube have good data for the 1990 Sally League).
Posted: September 11, 2009 at 01:09 AM | 57 comment(s)
Login to Bookmark