How to Calculate MLEs

by Dan Szymborski

Since the subject of MLEs has come up quite a bit [in the baseball newsgroups] in the last week, I've decided to print the basic concepts on how to come up with MLEs. There are a few variations of the MLEs, depending on whether you use 1 or 3 year park effects or whether you use actual minor league park effects rather than the estimation of minor league park effects (when Bill James proposed this system in the 1985 Baseball Astract, minor league park effects were difficult to come by due to the unavailablity of home/road data for minor leaguers). However, the differences are small, which is OK. It's what the numbers mean that's important, not what they exactly say. If one person finds the MLE of Joe Schmo to be 253/330/418 and another person finds it to be 258/327/425, they're still saying the same thing.

One thing to remember is that MLEs are not a prediction of what the player will do, just a translation of what the major league equivalence of what the player actually did is. This is useful for predictions however, because like, major league statistics, MLEs have strong predictive value. As strong as major league statistics (which was the goal of this). Bill James stated that MLEs were the most important concept that he had ever come up with.

The normal season-to-season fluctuation in batting average at the major league level is 25 points. I figured the season-to-season changes for every major league player who has had five years or more of 300 at bats, and the average annual change in batting average was between .024 and .025 [...]

[...] If the minor league equivalencies (mostly of 1983 seasons) that I printed last spring were exactly as accurate as an indicator of future hitting, the average differences between those rojections and what the players actually hit would be exactly the same---.025.

This is a victory statement. Thirty of those player batted 250 times. It is my pleasure to announce at the time that 29 of the players produced 1984 seasons which were substantially consistent with the major league translation of their minor league data, which were published last spring. The one player who got completely out of the range of expectations which should have been generated by his minor league batting statistics was Doug Frobel of Pittsburgh, who batted 276 times, and missed matching the translation of his minor league statistics by 20 hits (79 points).

The average difference between the translation of the minor league data and the actual major league performance was 25 points -- exactly the same as the normal season-to-season fluctuation in a player's batting average between two major league seasons. I think that as you compare the seasons below [clipped, but available on request-DS], it will be very obvious to you that minor league equivalencies and the major league records do, in fact, match up to exactly the same extent that two major league records could be expected to do so.
[Bill James 1985 Abstract p.10]

Note that this has been tested for batting average, slugging percentage, and on-base percentage over the last few years and the methods still work as well as they did in the early 1980s.

As I go along the simpler version of theprocess that I choose to use (which is usually very close to what STATS comes up with, they don't tell exactly what M factors they use so it's hard to reproduce exactly), I will use two players from different parks in two different leagues: Danny Clyburn and Paul Konerko. First let's look at their raw minor league statistics for 1997. Normally, I come up with also park-adjust the 2B/3B/HR by individual factors rather than one single factor, but it doesn't improve the result all that much in most instances as it really doesn't effect the qualitative results, just the quantitative (I'm more interested in qualitative results).

Here are the raw statistics for the two players I'm using to demonstrate MLEs: Danny Clyburn and Paul Konerko

```Player        AB        R	H  	2B	3B	HR	RBI	BB	SO
Clyburn       520	91	156	33	5	20	76	53	107
Konerko       483	97	156	31	1	37	127	64	61

Player       BA  OBP  SLG
Clyburn	   .300	.372 .498
Konerko	   .323	.407 .621

```

The first thing we need to do is adjust for the level of league and park. Some players play in hitters' parks/leagues, some play in pitchers' parks/leagues.

First, Clyburn.

Clyburn played in the International League last year in which 9.597 runs per game were scored. Last year, Clyburn played in Rochester, in which run-scoring was deflated by approximately 4% over the last 3 years (unfortunately, I don't have 1 year stats available). So, we'd expect a game between two league average teams at Rochester to score (1.02+[.10*.02])*9.597 runs to score. When we calculate it, we end up finding that Clyburn's production came in a 9.81 run per game context. In the American League last year 9.862 runs per game. The (.10*.02) is used to represent that you can't just use road totals because a league- average player would still get to play some games in Rochester. It's not crucial and if you leave it out, it won't change things that much, so if you prefer, you can find out this number by simply 1.02*9.597

Clyburn: 9.808/9.862 = 1/1.006

We'll call this number PL for park/league adjustment. Clyburn has a PL ratio of 1.006

This indicates that Clyburn's raw statistics won't take a nose dive due to his home park or his league.

Now, we do the same for Konerko. The PCL last year scored 11.532 runs per game. Albuquerque is a pretty darn good hitters' park and increased run scoring by 17%. Once we do what we did for Clyburn, we end up with Konerko's stats being produced in a 12.616. Now, let's match up Clyburn and Konerko's PL ratios side by side. (Konerko 12.616/9.862 = 1/.782

PL Ratios Clyburn 1/1.006 Konerko 1/.782

It's clear that since Konerko played in a park in which runs were easy to come by, that Konerko's raw stats will suffer much more because of the park differences.

Next, we have to adjust for the calibre of competition.

A player ordinarily loses about 18% of his offensive ability relative to the league in moving from AAA to the majors.

When we adjust for this, we get "m"

```Clyburn        Konerko
1.006		0.782
*0.82		*0.82
-------	       -------
0.825		0.641
```

This tells us that Clyburn, upon moving to the major leagues, will probably retain about 83% of his offensive punch while Konerko will retain about 64%.

The other thing we need to find is "M". It's merely the square root of "m".

```            m      M
Clyburn	   0.825  0.908
Konerko	   0.641  0.801
```

Now, we can start to adjust.

```RAW
Player		AB	R	H	2B	3B	HR	RBI	BB	SO
Clyburn	        520	91	156	33	5	20	76	53	107
Konerko	        483	97	156	31	1	37	127	64	61

MLE
Player		AB	R	H	2B	3B	HR	RBI	BB	SO
Clyburn
Konerko
```

Now, all we need to get is the park factors for the major league stadiums. To avoid getting too technical, if a stadium has has a park factor of 104 for something, use the multiplier 1.02 rather than 1.04 (not particularly accurate, but close enough).

Too be consistent, let's use three year factors again. Here are the multipliers. I'm gonna refer to park multipliers as PM. Which PM to use is pretty self explanatory.

```	R	H	2B	3B	HR	BB	SO
BAL	0.995	0.98	0.985	0.805	1.05	0.985	1.025
LA	0.895	0.88	0.865	0.745	0.89	0.97	0.995
```

First, we need to find the MLE hits. To get it, we multiply minor league hits * .98 * M * PM

Then, we need to find the MLE Doubles. To get it, we multiply minor league doubles * M * PM

Then, the MLE triples. To get them, we multiply minor league triples * m * .85 * PM

Then, the MLE homers. To get them, we multiply minor league homers * m * PM

Then, for the RBI and R, we multiply them each by m and then by the PM.

For walks, we do minor league walks * m * PM

For strikeouts, we simply do minor league strikeouts * 1.05 * PM

After we do that, we get this:

```MLE
Player		AB	R	H	2B	3B	HR	RBI	BB	SO
Clyburn		        75	136	30	3	17	62	52	43
Konerko		        55	108	21	0	21	72	40	64
```

Now for the At-Bats, we have to do something a little differently. First, we need to know how many outs both players made in the minors this year. For this, we just need to find the AB-H for Clyburn and Konerko.

```Clyburn:  520 - 156 = 364
Konerko:  483 - 156 = 327

```

So then, to get the MLE At Bats, we just add the amount of outs they really made to the amount of hits.

```Clyburn:  136 + 364 = 500
Konerko:  108 + 327 = 435

```

So finally, after using the complete stats to calculate batting average, on-base percentage, and slugging percentages, we end up with:

```MLE
Player		AB	R	H	2B	3B	HR	RBI	BB	SO
Clyburn	        500	75	136	30	3	17	62	52	115
Konerko	        435	55	108	21	0	21	72	40	64

Player		BA	OBP	SLG
Clyburn	        .272	.343	.446
Konerko	        .248	.312	.441

```

Neither of these differ much from STATS' MLEs. Keep in mind that Konerko would still be easily the better long-term bet. After all, he put up that MLE at age 21 last year while Clyburn was 23. A year or two makes a big difference in minor league development.