Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Transaction Oracle > Discussion
Transaction Oracle
— A Timely Look at Transactions as They Happen

Friday, September 11, 2009

3 Decades of Minor League Translations

Attached is the current fruit of a long-term project I’ve been working on.  Namely, a large reference of minor-league-to-major league translations (zMLE or ZiPS MLE).  We get back into the late 70s here as going back to then, there’s always some source that has the statistics required.  Once we get earlier, there are some years that have BB and SO data, generally the most important missing data, but it’s extremely spotty and sometimes, not even whole years are filled.  Some day, I’ll have these going back for as long as there was minor league baseball as SABR’s database project proceeds.

So, what value do these have?  For me, two things stand out as the most important.  First, having these either reminds us or introduces us to fine players that never got a shot in the majors.  We live in a time when Japan is a real alternative option for Ken Phelpsers like Greg LaRocca to have lucrative careers playing baseball and when increased understanding of the usefulness of minor league statistics in the mainstream has resulted in fewer guys getting completely overlooked.

Second, more information helps us increase our knowledge of how players age and develop.  For systems that look at comparable players, it’s quite useful to have more 18-21 year-olds that aren’t stars to help us crack, from a statistics standpoint, who will develop and who will not.

The biggest problem with doing these, aside from piecing together data that wasn’t kept all that lovingly at the time, is the lack of minor league park factors from recent years.  For the original MLEs, James simply used the teams runs scored and runs allowed to estimate a park factor.  With game-by-game data mostly lost to history, I had to take a similar approach.  Using the decade of known data, I constructed a model for estimated park factors from minor league hitting and pitching statistics.  To get these to have value, I used a longer time frame (5-year factors) and regressed the numbers more heavily to the mean.  As such, the factors are fairly conservative, but a park that long-term has a “true” HR factor of 0.80 is extremely unlikely to model as a 1.20 without home/road data.  This doesn’t work for major league parks (which it doesn’t have to, since I can use actual there), but for minor league teams, success and failure are generally pretty ethereal - a major league team’s farm system quality essentially boils down to just a handful of the hundreds of minor leaguers they employ in any given season.

So, I hope that so of you find this to be useful.  This information can be used for any non-profit endeavor that you wish and can be used for any original research for either for-profit or non-profit.  I would appreciated a credit, of course.  And if you really find this useful and have the means, I’d greatly appreciate something stuck in my tip jar (using the donate button below), which will help reimburse me for all the beer I drank to get through all this number-crunching.  2009 is included, as well.

zMLE for Excel 2007

zMLE for Excel 2003 (there are more than 65 thousand rows, so data is split into extra sheets)

Minor League Park Factors, Real and Estimated

zMLE for Pitchers (CSV)

zMLE for Hitters (CSV)

In the future, I hope to add biographical information to this and fix errors.  While we have 30 years of reasonably good data, there are still holes in data (most notably by far, neither SABR nor BR nor the Cube have good data for the 1990 Sally League).




 

 

 

 

Dan Szymborski Posted: September 11, 2009 at 01:09 AM | 57 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Fresh Prince of Belisle Posted: September 11, 2009 at 01:37 AM (#3319359)
OK, that's worth some dough. Do I have to have a Paypal account?
   2. Dan Szymborski Posted: September 11, 2009 at 01:44 AM (#3319362)
No, you don't need a Paypal account.
   3. Der-K and the statistical werewolves. Posted: September 11, 2009 at 02:10 AM (#3319378)
Very cool, Dan - you've even tossed '09 in there...
   4. AROM Posted: September 11, 2009 at 02:24 AM (#3319380)
This is super cool. Thanks Dan.
   5. AROM Posted: September 11, 2009 at 02:55 AM (#3319390)
Are the MLE's to a neutral MLB environment, or to the specific MLB team? In other words, if I were to compare a minor leaguer in Colorado's system to one in San Diego's, do I need to make further adjustments?

Scanning through the list I see Adam Hyzdu with an MLE of 41 homers for 2000, in a year where he actually hit 31 - can that be right?

I'll have to check but I think I have stats for 1990 SAL. If I do I'll send them to you. At the very least I have them in a copy of that year's Baseball America Almanac.
   6. Frisco Cali Posted: September 11, 2009 at 03:05 AM (#3319395)
Just when I was wondering what to do for the next couple years...
   7. Dan Szymborski Posted: September 11, 2009 at 03:39 AM (#3319408)

Scanning through the list I see Adam Hyzdu with an MLE of 41 homers for 2000, in a year where he actually hit 31 - can that be right?


Nobody seemed to hit homers at Blair County. Here are the multipliers for Altoona at that time (and these are already essentially cut in half from the factors!)

1999: 0.79
2000: 0.77
2001: 0.79
2002: 0.81
2003: 0.70

Essentially, Petco Park squared. You add in the fact that 2000 was about the time between the biggest difference between major and minor league home run rate in recent years and you get a pretty nice conversion.

Hyzdu, of course, never played that well before or again. His translations those years:

1996: 274/348/478
1997: 247/337/410
1998: 271/325/408
1999: 273/333/517
2000: 273/375/581 (the 41 HR MLE year)
2001: 244/286/412
2002: 240/308/423

What you're pretty much looking at is a legitimate, but definitely non-star hitter having a big career year at the wrong place and time. Kind of like 2000's Ryan Ludwick but the Pirates were too stupid and Cam Bonifay had signed KY Jelly Roll at 1B.

The Pirates finally gave him sort-of playing time years too late. Didn't help that he was a terrible pinch-hitter, like most players:

Starter (271 PA): 250/325/496
Sub (136 PA): 186/281/322
Pinch-Hitter (71 PA): 190/324/293

Average player loses about 150 points of OPS as a pinch-hitter.
   8. The Most Interesting Man In The World Posted: September 11, 2009 at 04:41 AM (#3319431)
I guess this means we should invest in this Chip Cannon fellow.

Seriously, this is awesome Dan.
   9. Dan Szymborski Posted: September 11, 2009 at 04:59 AM (#3319439)
Oops, that Chip Cannon line should be removed for now.

For those that are wondering, he's translated to have 26162 hits in 30 at-bats and score 3666 runs. I must have got an errant keystroke in there.
   10. Zoppity Zoop Posted: September 11, 2009 at 05:01 AM (#3319440)
This is 18 kinds of awesome and a pogostick.

I'm the guy that sent you $33, or $1 for each year.
   11. Dan Szymborski Posted: September 11, 2009 at 05:02 AM (#3319441)
I think I owe you a buck then - 1978-2009 is only 32 seasons!
   12. Dan Szymborski Posted: September 11, 2009 at 05:05 AM (#3319443)
WTF got into Ruben Gotay this year? RUBEN GOTAY WALKED 102 TIMES IN THE MINORS!
   13. Brandon in MO (Yunitility Infielder) Posted: September 11, 2009 at 06:19 AM (#3319459)
Gotay always had a good walk rate for a middle infielder
   14. Biff, highly-regarded young guy Posted: September 11, 2009 at 06:25 AM (#3319460)
The pitchers were scared of him, obviously.
   15. 'zop sympathizes with the wrong ####### people Posted: September 11, 2009 at 10:53 AM (#3319475)
Ah, this is PERFECT data for a paper I am writing for a law school seminar. You'll definitely get a HT in the acknowledgments.
   16. Der Komminsk-sar Posted: September 11, 2009 at 04:14 PM (#3319699)
I haven't gotten to look at this in depth yet (and won't for a few days) but - just focusing on '09 for now - some of the low/lower minor league hitting translations are kinder than I'd expect (I mention this now having just seen Nick Derba's AA BA of .130 in Springfield translate up to .186 ... I guessing because of some BABIP regression?).
Would you (or have you - haven't gotten to peek at the PF file at all yet) consider supplying a league factors tab?
***
Separate fun: Roberto Petagine MLEs, through the '90s...
'91: .236/.305/.355 in 462 AB with Burlington (MWL) <note - there's an error here - they were an Astros affiliate>
'92: .247/.316/.407 in 396 AB with Osceola/Jackson
'93: .259/.355/.396 in 455 AB with Jackson
'94: .237/.315/.399 in 253 AB with Tuscon
'95: .190/.319/.310 in 58 AB with Las Vegas (now in the Padres org)
'96: .255/.352/.413 in 322 AB with Norfolk (Mets AAA)
'97: .275/.370/.501 in 461 AB with Norfolk
'98: .274/.369/.476 in 376 AB with Indianapolis (Reds AAA)
   17. Dan Szymborski Posted: September 11, 2009 at 04:39 PM (#3319716)
BABIP simply cannot go lower past a certain point (the .210-.230 for pitchers is just about the baseline). I regress towards the mean less for BABIP than I would normally; in-season regression towards the mean is significantly less than season-to-season regression for the same original sample of PA.
   18. Der Komminsk-sar Posted: September 11, 2009 at 05:17 PM (#3319753)
Ok.

My inclination is that we ought not regress much (if any) for BABIP in the context of MLEs (as distinct from projections, where we should) ... but I can implement that when I present my set of MLEs, sometime around the year 2150.
   19. AROM Posted: September 11, 2009 at 07:04 PM (#3319883)
Dan,

I checked and don't have the 1990 SAL in my database. I do have it in the BA Almanac though. If I decide to do the data entry on that one, I'll shoot you a copy. No guarantee as to when, since I've got a THT Annual article to write and volunteered to do some batting logs for retrosheet which I need to get to work on.
   20. KJOK Posted: September 11, 2009 at 11:46 PM (#3320129)
The MLE file is great, although I have the same question as AROM as to whether these are all 'neutral' MLE's or MLE's into the parent team's park and league.

On the minor league park factor link, I just seem to get a bunch of html files?
   21. KJOK Posted: September 12, 2009 at 12:23 AM (#3320167)
and the next question is, how hard is this to do? If I gave you a dataset with a Japanese League year, or Korean League, or Cuban League, or Netherlands, is it easy to just plug the data in and have it spit out some results?
   22. Der-K and the statistical werewolves. Posted: September 12, 2009 at 12:54 AM (#3320192)
Not to speak for Dan but...
...the trick would be (apart from stuff specific to his current model, as I understand it) that those are relatively closed systems - not much movement b/w them and the NA. If you do have that data + an estimate/assumption as to how strong the league is (+ are willing to agree w/ certain implicit assumptions about how various skills translate from setting to setting) - then the model he put out on BBTF (where you enter the stats, team, and year) would be pretty easily customized to your liking. [For instance, I was using it at one point to validate my own Cuban League translations.]
   23. Der-K and the statistical werewolves. Posted: September 12, 2009 at 01:18 AM (#3320205)
sorry - incomplete thought ...
the OLD model he used to use that people could download maybe 2 or 3 yrs back could be customized easily. can't speak for the current one.
   24. KJOK Posted: September 12, 2009 at 02:58 AM (#3320258)
You already have Cuban League translations?! What years?
   25. Der-K and the statistical werewolves. Posted: September 12, 2009 at 03:16 AM (#3320266)
I've done them on an ad-hoc basis, never tried to do the whole league + I can't say I had a lot of confidence in them (I came up with 2 estimates for Alexei his rookie yr - one I posted here (google it and frown), the other (which was closer) I sat on. Think I posted one for Viciedo as well. I plan on doing one for Yasser Gomez in a month or two. [I don't have a formal Cuban model - it's more like an subjective sim score type thing.]
If you dig around the internet, you can find a spare one here or there from Clay D as well.
I *think* but don't know, that Sackmann messed around w/ Dutch data at one point - could easily be something I'm making up, though.

Again, your best bet might be to find Dan's old .xls model - tweak it (it's gonna need tweaks - for instance, standard K/W relationships won't apply], give it your inputs, then run from there. This assumes you have Cuban stats handy - if not, they're online (at least were - haven't looked in awhile) but the connection (Cuban servers) is terrible.

If nothing else, this is a good place for someone with a little time on their hands to become the web's "expert" w/o too much competition.
   26. KJOK Posted: September 12, 2009 at 03:22 AM (#3320271)
I grabbed the Cuban stats, as the Cuban website is down most of the time (all of the time now it appears). I'd like to do at least the WBC players, if not the whole league, whenever I can find some time.
   27. Der-K and the statistical werewolves. Posted: September 12, 2009 at 03:44 AM (#3320283)
Cool - how many years worth? Depending on where you went they've got fielding by position (by inning), platoon stats, etc...kinda nice to see.
I'd bet money Davenport took a crack at the WBC guys already, though I'm not a subscriber so I don't know for certain.
   28. KJOK Posted: September 12, 2009 at 04:08 AM (#3320292)
I managed to get 2005-06 season thru 2008-09 season. I'm not 100% sure what some categories headings are (they're in Spanish of course). Davenport did do something for Cuba with WBC2006 I think but never heard him doing WBC2009...
   29. Dan Szymborski Posted: September 12, 2009 at 04:42 AM (#3320310)
I had too much regression in the old model, so don't use that. I've moved a lot of that to the projections.

If anyone needs an actual zMLE spreadsheet for research use, drop me a line at dan@baseballprimer.com and I'll send you a copy.

The reason I use regressed $H (but only moderately) is that people are more aware of regression and do a lot of that with things like FIP for actual MLB lines, so a line with $H regression makes sense. After all, when you see a guy with a .380 $H translation because of no regression, most people who care about that sort of thing are going to mentally push that down anyway.

Blurring the line too much between translation and projection is why I've backed off on some of the regression in the last couple of years, so people who have my older MLEs will notice a difference.
   30. StillFlash Posted: September 12, 2009 at 04:02 PM (#3320408)
KJOK, would you be kind enough to send me a copy of the Cuban stats?
brian.cartwright2@verizon.net

Here is a link to an English/Spanish glossary
http://web.minorleaguebaseball.com/stats/page.jsp?ymd=20080121&content_id=340901&vkey=stats_l125&fext;=.jsp&sid=l125

I've had it verified by a front office person that www.beisbolcubano.cu is blocked in the US.

beisbolcubano.cubasi.cu is available, and a few weeks ago had a page for each player, but now 'Estadisticas' links to Granma, which lists all the players, but not all the stats (like BB & SO for batters).
http://www.granma.cubaweb.cu/eventos/48serie/playoff/datos/epo-estad.html

Only the 8 playoff teams (out fo 16 total)
http://www.granma.cubaweb.cu/eventos/48serie/playoff/datos/epo-clasificados.html
   31. Der-K and the statistical werewolves. Posted: September 13, 2009 at 02:11 AM (#3320618)
For Serie 48, you may like: http://www.scribd.com/doc/12278257/EstadIsticas-48-Snb
   32. puck Posted: September 13, 2009 at 03:59 AM (#3320654)
Are the MLE's to a neutral MLB environment, or to the specific MLB team? In other words, if I were to compare a minor leaguer in Colorado's system to one in San Diego's, do I need to make further adjustments?


Has Dan answered this question yet?
   33. Dan Szymborski Posted: September 13, 2009 at 04:25 AM (#3320658)
Oh, sorry, it's for the MLB environment of the parent club (team/park).
   34. bob gee Posted: September 15, 2009 at 04:23 PM (#3322266)
gotta look at this stuff later today, especially doug frobel and brad komminsk...

(obv, thanks!)
   35. bucbeatle Posted: September 15, 2009 at 09:01 PM (#3322685)
I seem to have incomplete data. I don't have excel, so I needed to open with Open Office. I have over 65,000 lines of data in each sheet (hitters and pitchers), but the hitters only goes to Patrick Grady and the pitchers to Steve Parris. Any ideas?
   36. bob gee Posted: September 15, 2009 at 09:11 PM (#3322712)
can you post in office 00?
   37. Der Komminsk-sar Posted: September 15, 2009 at 09:20 PM (#3322726)
bucbeatle - the '07 version won't work with openoffice, but the '03 one should.
   38. bucbeatle Posted: September 15, 2009 at 09:27 PM (#3322738)
Der Komminsk-sar, thanks for the info. I found out I do have limited excel and was able to open it using microsoft works. Since I can't sort with the excel, I may try the '03 one.
   39. Dan Szymborski Posted: September 15, 2009 at 09:28 PM (#3322742)
Bob, I'll make some csvs.
   40. bucbeatle Posted: September 15, 2009 at 09:32 PM (#3322747)
Thanks for the help. The '03 version worked.
   41. bob gee Posted: September 15, 2009 at 10:18 PM (#3322813)
thanks dan!
   42. puck Posted: September 15, 2009 at 10:51 PM (#3322860)
In the 2003 version, is anyone else missing pitching for say, '88 Omaha, '89 Omaha, and '90 Indianapolis?
   43. Kyle S at work Posted: September 16, 2009 at 02:33 AM (#3323098)
Hi Dan -
This looks really neat. I would have spent days and/or weeks with it back when I had time to do my own research. Sigh.

A question: what is the source of the minor league data? Is it publicly available somewhere? Lots of folks seem to have it but there is no equivalent (that I've ever found) of the Lahman database for minor league stats.
   44. Dan Szymborski Posted: September 16, 2009 at 04:15 AM (#3323172)
Puck, there's some missing pitcher data that I'll be adding, mostly American Association pitcher data, so those 8 teams will have holes for a few more weeks.
   45. puck Posted: September 16, 2009 at 05:32 AM (#3323189)
Thanks, Dan. I just happened to be looking up Steve Fireovid who was in the AA quite a bit in the late 80's. Finally got a copy of 26th Man and read it.
   46. Der Komminsk-sar Posted: September 16, 2009 at 02:25 PM (#3323353)
How is that book, btw?
   47. Jeff Sackmann Posted: September 16, 2009 at 08:15 PM (#3323835)
I *think* but don't know, that Sackmann messed around w/ Dutch data at one point - could easily be something I'm making up, though.


Dutch girls, yes. Dutch data, no.
   48. Mike Emeigh Posted: September 16, 2009 at 08:22 PM (#3323850)
Lots of folks seem to have it but there is no equivalent (that I've ever found) of the Lahman database for minor league stats.


It's a work in progress, started by SABR's Minor League Committee and now available through Baseball Reference.

-- MWE
   49. Dan Szymborski Posted: September 16, 2009 at 08:24 PM (#3323853)
Update: American Association pitching and Oklahoma 1997-1999 pitching are up.

Please continue to let me know when things are missing (that I haven't already said I know about).
   50. Der Komminsk-sar Posted: September 16, 2009 at 09:23 PM (#3323941)
Jeff: well, you win, we don't. :)
   51. bob gee Posted: September 18, 2009 at 12:04 AM (#3325400)
thanks dan!
   52. shoewizard Posted: October 09, 2009 at 06:25 PM (#3346535)
In looking at the 2008 Park Factors Spreadsheet, I don't see Vero Beach listed for that year. That was the Tampa Bay affiliate in the FSL in 2008. Am I missing something ?

Thanks Dan.
   53. sliver7 Posted: March 03, 2010 at 05:00 AM (#3471450)
Also in relation to the (Devil) Rays: TB fielded minor league teams in both 1996 and 1997. I'm not seeing any entries for anything earlier than 1998 for any of TB's system.

1996:
GCL Devil Rays (R/Gulf Coast League)
Butte Copper Kings (R+/Pioneer League)

1997:
GCL Devil Rays (R/Gulf Coast League)
Princeton Devil Rays (R+/Appalachian League)
Hudson Valley Renegades (SS A/New York-Penn League; I think it was co-op with Texas that year)
Charleston RiverDogs (A/South Atlantic League)
St. Petersburg Devil Rays (A+/Florida State League)
   54. Der Komminsk-sar Posted: September 27, 2010 at 04:20 PM (#3649512)
Just want to remind people that Dan doing this was/is awesome.
   55. flournoy Posted: September 27, 2010 at 04:35 PM (#3649522)
I guess this means we should invest in this Chip Cannon fellow.


I played against him in mens league ball last year after he retired from professional ball. If I remember correctly, we got him out on a long fly ball to deep center, and then he hit two homers that totaled about 1000 feet. Nice guy, though.
   56. Der Komminsk-sar Posted: September 27, 2010 at 05:33 PM (#3649587)
Odd question, but how well did he run - relative to your league. Issue with a club foot hurt him in the draft/field.
   57. flournoy Posted: September 27, 2010 at 06:03 PM (#3649621)
He certainly moves around fine, though he's not fast. Of course I only got to see him trot. Can't give you much more than that; I don't know anything about his club foot.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
rr
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.5946 seconds
66 querie(s) executed