[ Webmaster's Note: The following article appears in The 1999 Big Bad Baseball Annual. ]
You attended the gala celebration at its release. You watched people fight over it on Jerry Springer. You watched the two-hour Dateline special discussing its impact. You participated in focus groups that discussed its construct. But you're still thinking, "What does Bill James' new Runs Created (RC) formula actually do?"
OK, maybe none of this is true. Maybe you didn't fork out the $79.95 to read a little about it in either of the new STATS, Inc. books The All-Time Baseball Sourcebook or The All-Time Baseball Handbook. Maybe, just maybe, you haven't seen anything at all about the new RC. In any case, you still may be wondering what changes Bill James made to his creation and how it works. If so, read on.
The new RCa big improvement ?
While the RC method is one of the seminal developments in sabermetric history, it soon became apparent that there was a problem with the original RC construct. Bill James himself commented on its deficiencies in his 1985 Baseball Abstract: Ive known for a little over a year that the runs created formula had a problem with players who combined high on-base percentages and high slugging percentages The reasons that this happens is that the players individual totals do not occur in an individual context. He went on to say, "I'll make some adjustments to the runs created formula within the next year or so. Right now, I don't know what they will be." That was fourteen years ago. If Bill intended to raise the level of anticipation for the changes, he certainly gave himself plenty of time.
Beyond the question of how much or little the new RC formula is an improvement over the old one, the work would have to be labeled a disappointment to many readers for the simple reason that Bill did not much explain his thought process behind the modifications, or for that matter even supply enough details so that others could easily replicate his work. He barely provided any justification for his changes themselves. Coming from the man who almost single-handedly broke the Elias monopoly on baseball information, it was disturbing to see him pitch his tent awfully close to those in the sabermetric community who fail to divulge the details of their methods or the thinking behind them in the name of propriety.
There is a rumor that Bill James will be releasing an update to his Historical Abstract sometime in 1999/2000. Maybe he figures he can explain the changes in more detail in the updated book. Without more information from Bill himself, were left to speculate. Speculate about how he came up with the changed formula. Speculate why he incorporated situational statistics. Speculate whyafter criticizing Pete Palmer for using actual runs in Linear Weightshe now factors actual runs into the revised formula.
Rather than just make wild guesses, however, we undertook an involved examination of the new RC methodology, and then spent a lot of time discussing among ourselves the possible reasons behind the changes. Of course, we didn't always agree. That happens when people are put in the position to guess what someone else is thinking.
How is the new RC calculated?
We'll run through the new RC calculations using Ken Griffey's 1998 statistics as an example. This is appropriate because Bill James used Ken Griffey's 1997 stats for the descriptions in both of the new stats books. The big difference between James description and this one is that well fully explain the parts he glossed over. We'll also comment on the changes.
Step 1 - Calculate the A, B and C Factors
Figured just about the same way as James' classic tech version of the formula. The only difference is that James changes some of the terms depending on the availability of the data and the time period of the season in question. For the current time period (1988 to present), the Historical Data Group (HDG) 24 formula is used.
Calculate the A, B, and C terms as follows:
A = (H+ BB + HP - GIDP - CS)
B = [ TB + ((BB + HB - IBB)*.24) + (SB*.62) + ((SH + SF)*.5)-(SO*.03) ]
C = (AB + BB + HB + SH + SF)
For Ken Griffey Jr.:
A = (180 + 76 + 7 - 14 - 5)=244
B = ( 387 + (( 76 + 7 - 11) * .24) + (20 * .62) + (( 0 + 4)*.5) + ( 121*-.03))=415.05
C = (633 + 76 + 7 + 0 + 4)=720
If things were still calculated the classic way, we'd simply calculate Griffey's RC by A*B / C. This would give us what I call Griffey's "base H24". Putting the numbers together would give us 140.66 RC.
Step 2 - Calculate initial Runs Created (iRC) by inserting A, B, and C factors into theoretical team context and round off the result:
iRC = [ ((A + (2.4*C)) * (B + (3*C)))/ (9*C) ] - .9 * C
Griffey iRC = [ ((244+ (2.4*720)) * (415.05 + (3*720)))/ (9*720) ] - .9 * 720=135.64 or 136 iRC
Two additional things that happen here is 1) iRC gets rounded into a whole number 2) any negative results for individual players get changed to 0.
Step 3 - Calculate the adjustment for home runs with runners on base
The first situational adjustment in the new RC method is made for how many home runs Griffey Jr. hits with runners on base. This is another adjustment that Bill James didn't fully explain. We need four bits of information for Griffey to do this adjustment:
Then calculate how many HRs Griffey would be expected to hit with runners on base, proportional to his AB.
Expected HRs = (303/633*56) = 26.81
Subtract Expected HR from actual HR with runners on base and round the result
26 - 26.81 = -.81 or -1
This leaves Griffey with an adjustment for home runs with runners on base (HR-ROB) of -1.
Step 4 - Calculate the adjustment for batting average with runners in scoring position
The information we need here is Griffey's regular batting average and his batting average with runners in scoring position. Its suggested you carry out this calculation to an extra decimal place.
Regular Batting = 180 hits / 633 at bats or .2844
Batting with runners in scoring position = 57 hits / 184 at bats or .3098
Subtract regular batting from batting with runners in scoring position, multiply the result times at bats with runners in scoring position, and round the result.
(.3098-.2844)*184 = 4.674 or 5
This gives Griffey an adjustment for batting with runners in scoring position (AvgSP) or +5
Step 5 - Calculate the Preliminary RC (PrelimRC)
Add together Griffey's initial Runs Created total with the situational adjustments to get his preliminary RC:
PrelimRC = iRC + HR-ROB + AvgSP
PrelimRC = 136 - 1 + 5 = 140
Step 6 - Calculate the team reconciliation factor
After calculating PrelimRC for all players, sum all the rounded individual players' PrelimRC.
The 1998 Mariners PrelimRCs add up to 892 PrelimRC.
Divide the actual team runs (859) by the team PrelimRC (892) to calculate reconciliation factor (RF):
1998 Mariners RF = 859 / 892 = .963
Step 7 - Multiply team reconciliation factor times individual player's PrelimRC and round off to get the final RC result:
For Griffey, 140 PrelimRC * .963 = 134.82 or 135 Runs Created for 1998.
Reconciliation or incorporating the error for teams into player values?
Does new theoretical team context do what it's supposed to?
The team reconciliation process done in Steps 6 and 7 is the part of James' changes that we question the most. Here's what Bill James has to say about this part of the process:
"Finally, we reconciled runs created, after the fact, with the runs actually scored by the team. Suppose, for example, that the individual runs created estimates for the members of a team were to total up to 500, but the team actually scored 700 runs. This would be an extraordinary thing, and I don't think such a discrepancy has ever actually happened. We're ordinarily talking about an adjustment in the range of 20 runs per team, or two runs per player. But if it did happen, we would then increase the runs created for each individual on the team by 40%, since 700 divided by 500 is 1.4. We don't know who created these extra runs, but somebody on the team certainly did. The best we can do is distribute them among the hitters proportional to their accomplishments."
Is Bill correct about the error rate of the formula? Well, the average absolute error for all teams from 1984-1987 is about 20 runs. The range of errors is a much broader however.
The biggest error for the formula during that time period is the +74 error for the 1987 Cubs. Since the Cubs scored 720 runs that season and the formula projects 794 runs, the formula overestimates by 10.2%. Who does the team reconciliation affect the most? Andre Dawson. Dawson loses 10 runs to "reconciliation". Instead of being credited with 110 runs, "The Hawk" gets credit for only 100.
Is this really fair? Did Andre's play generate less than the 110 runs the formula estimates? We don't know for sure, so it's not fair to penalize him. What's the point? Does subtracting the ten runs tell us something about Andre Dawson? No. The fact that the formula overestimates tells us that his team was less efficient than an average team, not that Andre was less efficient.
To further illustrate, let's look at the other extreme from the same season. The largest negative error for the 1987 National League belonged to the Cardinals. The Cardinals scored 798 runs enroute to the 1987 World Series title, while the Cardinal players are estimated to have 755 RC, an error of -43 runs. To account for this discrepancy 43 runs must be added to the player totals.
Which player benefits the most from this largesse? Jack Clark. Clark gets credited with 124 runs instead of 117 runs. Again, is this fair? No. The fact that the formula underestimates tells us that Clark's team was more efficient than the average team, not that Jack was more efficient.
The end result of this "reconciliation" is that Jack Clark ends up with 24 more runs created than Andre Dawson. Compare this to the original difference of 7 runs (Dawson 110, Clark 117). The spread between the players is widened. Unfortunately this is due to the inaccuracy of the formula, not due to any quantifiable difference between them.
Remember we're looking for objective evidence. Is this so much different from looking at the numbers and fiddling with them? Another subjective method to divvy up the extra runs could be to guess who generated the runs. I could say, "Jack Clark is so awesome with all those slap-hitting single hitters around him, he must have created more runs than 117. Give him the extra 23 runs. He's the best!" Or I could base my reconciliation on the number of plate appearances. Is James' reconciliation really much different? Essentially what he says is that if a player accounts for 10% of the RC projection, then the player creates 10% of run overestimate or shortfall. Its just one of a number of subjective criteria that could be applied.
This problem is compounded when the situational adjustments are factored in. To illustrate, I present the data from the 1998 Seattle Mariners and the 1998 Oakland Athletics.
1998 Seattle Mariners
1998 Oakland Athletics
The reason I chose these two teams is simple, they are the teams who had the "actual run hammer" applied hardest. Seattle scored 859 runs in 1998, while team RC-H24 predicts 906 runs; Oakland scored 804 runs in 1998, while team RC-H24 predicts 752 runs.
Although the team RC is only slightly different than the summed individual player totals (Sea 906/905 and Oak 752/754), the difference between the PrelimRC totals and actual runs is still substantial. Seattle's team estimate over-projects by 32 runs, while Oakland's team estimate under-projects by 37 runs. The end result of the hammer application is that Ken Griffey loses 5 runs and Ben Grieve gains 5 runs due to nothing other than the inadequacies of the formula.
Moving on to inspecting the situational adjustments, notice that these adjustments can have considerable affect on player totals. Rickey Henderson gains 6 runs due the adjustment, while Joe Cora loses 11 runs. Is Rickey really a great clutch hitter? Is Joey Cora a bad one? How about Alex Rodriguez, what conclusion can we draw from his adjustments? Even though Alex's minus 4 HR adjustment would lead us to believe he doesn't hit so well in the clutch, his plus 5 Batting Average adjustment would lead us to believe he does. Without looking into the reasons a team's estimate over/under projects, it just becomes a guessing game as to who is responsible for an offense being more or less efficient. As Jay once commented to me: "When all is said and done, the situational adjustment and team reconciliation steps obfuscate more than clarify player analysis."
As Harry Chadwick was the premier baseball analyst of the 19th century, Bill James can probably lay as much claim as anyone to being the leading analyst of the 20th century. The BBBA staff and myself are admirers of a great deal of his work. Unfortunately, his New RC reminds us a lot of the New Coke. Although the New RC was supposed to be an improvement, like the New Coke, it's really just different. And like the New Coke, we don't like the way it tastes.