Baseball for the Thinking Fan

Login | Register | Feedback

You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Monday, June 11, 2001

The All-OBP Team Revisited

When Don gets curious about something, neuron activity increases in a lot of minds and computer processors begin to overheat. So what is Don curious about now? How OBP and SLG interacts.

More data to fuel the fire ...

A couple of months back I lost what was left of my membership card in the "sabermetric   cabal" by suggesting that a team whose OBP and SLG were each .425 wouldn’t   score a gazillion runs.

The actual number, as some of you will remember, was 956. That projection was   arrived at by the use of a basic version of Bill James’ contribution   to our understanding of the mechanics of run scoring (Runs Created).

Several voices chimed in to correct this estimate, including BBBA co-authors   Mike Emeighand Sean Forman. While others simply trotted out their   own pet formulae for projecting run scoring, Mike did some real work, using   recent (1999-2000) actual game data to demonstrate that there is indeed more   of a synergistic effect for "high OBP/low SLG" teams than vice-versa.

Mike is still tinkering with this some more, so I won’t try to formalize his   results just yet, but what I can tell you is that teams with "high OBP/low   SLG" tend to outperform their RC estimates. Teams with the opposite profile   ("low OBP/high SLG") tend to underperform.

The variation isn’t quite as high as those who proposed that a .425/.425 team   would produce close to 1200 runs in a 162-game season, however. And even with   Mike’s excellent work, I wasn’t satisfied to stop there.

So I called on the masterful data manipulator of latter-day sabermetrics, Tom   Ruane. I asked Tom to go through his 22-year database of play-by-play data,   and create an actuarial chart of run scoringsorted by OBP and SLG range   combinations. In addition to his exceptional talents, Tom is one of the truly   good guys in the field, and so he graciously obliged me with yet another of   my massively-scaled requests for help.

The OBP and SLG range combinations traced the run scoring in actual games from   1979-2000 (over 130,000 team game records). They were constructed as follows:   after a category of .000-.199, SLG was broken out into 10-point increments (.200-.209,   .210-.219) on up to .690-.699. Everything over .700 was lumped into one final   category. For OBP, the initial category was .000-.099, followed by 10-point   increments beginning with .100-.109 and continuing in that fashion up to .490-.499.   All games where OBP was over .499 were lumped into one final category.

If I’ve managed to explain this with any level of clarity, you should see where   we’re headed. The idea was to look at all of the OBP/SLG combinations, and total   up the number of games and the number of runs scored in each. Dividing the latter   (runs) by the former (games) would give us the average number of runs/game in   each OBP/SLG combination.

For the purposes of the discussion of an "all-OBP" team like the   one proposed by BBBA reader Duane Thomas, which featured a series of   .400+ OBP walkmen (and whose combined statistics in the single seasons chosen   worked out to .290 BA/.425 OBP/ .425 SLG), we are interested in only one area   on Tom’s vast actuarial chart?the portion that links games where the OBP was   .420-.429 and the SLG was also .420-.429.

It turns out that of the 130,000+ games in Tom’s database, less than 100 of   them match the criteria described above. (This isn’t surprising, actually, because   we’ve created very narrow ranges for both measures, and when you combine them,   it creates hundreds of mini-categories with relatively small sample sizes).   Just to give you a taste of that chart, here is a list of all the OBP/SLG combinations   with more than 500 games:

SLG?????????? OBP???? G? RUNS??? AVG
000-199?? 180-189? 1345?? 966?? 0.72
000-199?? 210-219? 1092?? 902?? 0.83
000-199?? 200-209?? 964?? 948?? 0.98
000-199?? 160-169?? 798?? 338?? 0.42
000-199?? 250-259?? 797? 1015?? 1.27
000-199?? 150-159?? 696?? 457?? 0.66
000-199?? 220-229?? 666?? 777?? 1.17
700+??????? ?500+?? 638? 8851? 13.87
000-199?? 230-239?? 608?? 667?? 1.10
000-199?? 130-139?? 558?? 197?? 0.35
000-199?? 120-129?? 548?? 289?? 0.53

As you can see, most of these games cluster at either the extreme low end (SLG   below .200 and OBP between .150-.250) or at the extreme high end (SLG .700 or   better, OBP .500 or better). As noted, the category breakdowns are extremely   narrow.

At any rate, the 94 games in which a team produced an OBP and SLG that were   both in the .420-.429 range resulted in a total of 626 runs scored. You fundamentalists   out there may wish to take cover now, before I note that this works out to an   average of 6.66 runs per game.

Over the course of a 162-game season, that projects to 1079 runs.

That’s right about midway between the estimate using OBP times SLG plus 3.5%,   and the other estimates from those other theorists proclaiming that such a team   would score close to 1200 runs.

As Sean Forman pointed out, the OBP times SLG plus 3.5% formula actually works   pretty well if you remember to account for the extra plate appearances that   accrue when a team walks at a pace that is 30-40% higher than the current MLB   record for bases on balls in a season (835, by the 1949 Boston Red Sox).

However, a couple of other nuances remain. First is the fact that walks, despite   sabermetricians’ occasionally obsessive love for them, simply do not   have as much value in terms of run scoring as do hits. And we can demonstrate   this by examining the games within Tom Ruane’s data set that contain .420-.429   OBP and .420-.429 SLG.

I also asked Tom to segregate the OBP/SLG pairs by a third modifier?batting   average. How many runs per game did a team with less than a .310 BA score when   the OBP/SLG values in a game were between .420-.429?

How many when the team BA was .310-.349? And how many when the team BA was   .350 or higher?

While other analysts have noted that additional "secondary offensive"   characteristics have a strong tendency to enhance run scoring, that may not   be the case when dealing with more rarified levels of OBP. While two teams with   highly similar SLG but divergent batting averages quite often show a run scoring   advantage in favor of the team with the lower BA, it appears that the opposite   may be the case for OBP.

We can see this when we look at those 94 games in which teams produced an OBP/SLG   combination of .425/.425.? Less than one-fifth of these games (18, to be exact)   featured teams with BA’s lower than .310 (which would be in the range of the   team that Duane Thomas had selected, with its .290 BA). Breaking the run scoring   average for these games into groups based on BA, we see that the teams with   the lower BA scored fewer runs than was the case in the overall sample:

SLG?????????? OBP?????? BA?? G??? R?? R/G
420-429?? 420-429??? < 310? 18? 113? 6.28
420-429?? 420-429? 310-349? 50? 339? 6.78
420-429?? 420-429???? 350+? 26? 174? 6.69
420-429?? 420-429????? TOT? 94? 626? 6.66

Run scoring in games where OBP and SLG were highly similar (in the .400 to   .430 range) produced runs per game averages in the lower BA regions that averaged   somewhat more than four-tenths of a run lower than the average for these OBP/SLG   combinations as a whole.

There are two caveats here that need to be mentioned. First, the sample size   is small. Second, a team’s OBP/SLG stats are an aggregate of many games’ worth   of individual performances, and are not going to slavishly follow the results   in games that conform exactly to their overall performance level. (Mike Emeigh’s   research showed high OBP/low SLG teams exceeding their RC projection, but these   "high OBP" teams came from "aggregated" games as opposed   to those with an exact match in the .420-.429 OBP/SLG range. Mike’s high OBP/low   SLG sample produced an average of 6.82 runs per game.)

That said, it’s still an interesting effect, and it may go some distance toward   explaining why a walkman team with a .290/.425/.425 BA/OBP/SLG would tend to   score less runs than, say, a team with .330/.425/.425.

In short, the theorists who claim that BA is a superfluous statistic once you   have OBP and SLG would appear to have taken one too many liberties in their   zeal to create a calculational shorthand for run scoring. While for the most   part this shorthand works well, it appears to break down when we get to extremely   high OBP coupled with relatively low SLG (ie, SLG that is not as commensurately   high relative to the league as the team’s OBP).

This appears to be confirmed, at least provisionally, in the results for the   Tokyo Walkmen, a team based on Duane Thomas’ lineup selections that was   entered into competition at the sports simulation site WhatIf   Sports. After 144 games, the Walkmen?who were constrained   to play without a DH?have scored 814 runs, which puts them on pace to score   "only" 915 runs over 162 games.

Adding 10% to the Walkmen run total to adjust for the absence of the DH, we   see that our walk-taking pests would come in at approximately 1007 runs. That’s   about ten runs lower than what Tom Ruane’s empirical data suggests would be   the case for a .425/.425 OBP/SLG team with a sub-.300 BA (1017 runs).

And finally, there’s the issue raised earlier about just how much dropoff in   walk percentage (and, hence, in OBP) such a team would experience as a result   of having an entire lineup of walkmen. After some consideration, I managed to   come up with two benchmarks for this issue.

We’ll take the second one first, since we were talking about the What If Sports   simulation. While this is not even remotely an empirical test, looking at what   the simulation permitted the selected hitters to do is clearly a convenient   jumping-off point.

Interestingly, the simulation produced close-to-season-average OBP totals for   seven of the eight regulars who received signficant playing time (Wes Westrum,   catcher; Ferris Fain, first base; Eddie Stanky, second base; Eddie   Joost, shortstop; Eddie Yost, third base; Elmer Valo, left   field; Richie Ashburn, center field; Roy Cullenbine, right field).

One man?Elmer Valo?got hammered by the game. Valo’s .332 OBP (due in large   part to a .221 BA) brought the regulars’ OBP/SLG average down to .411/.410   (it was .419/.420 without him, which shows that What If Sports has a   pretty good simulator on its hands).

So the simulated team lost just under fifteen points of OPS. We can take a   more empirical approach, however. (And that brings us to the second of the two   nuances I referred to earlier.) Let’s look at one of the teams in baseball history   with the highest aggregate OBP?the 1921 Detroit Tigers. This team had   walkmen (Lu Blue, Donie Bush, Johnny Bassler) and high-average   hitters who walked at around league-average rates (the great outfield of Harry   Heilmann, Ty Cobb, and Bobby Veach).

The team OBP was .381; the eight regulars on the team (including second   baseman Ralph Young, who chipped in with the best season of his career,?   and third baseman Bob Jones, this team’s "weakest link") did   better than that, averaging an even .400.

What we want to know is?how much higher is the average of their peak OBP seasons?   When we look at Duane Thomas’ team, we see a lineup that features players performing   at their highest single-season level of OBP. "Mass career years" of   this nature are exceedingly rare in baseball (or in other endeavors as well).

It turns out that the best possible OBP for the eight regulars on the 1921   Tigers, using their best OBP seasons from their individual careers, works   out to .422.

So our data points indicate that this team lost somewhere between 14-22 points   of OBP to this "levelling off from career peak" effect. That’s about   a five percent dropoff.

Thus, for a team to have a realistic chance at producing a .425 OBP, it would   have to have players whose peak OBP performance would have to be that much higher?another   five percent above .425, or? close to .450 (.448 to be exact).

That’s if the Tiger example is representative, of course. I suspect it’s not   too far off, and it might actually be a little on the low side. An interesting   sabermetric study could be generated from this: calculate the percentage   of possible OBP achieved by teams, thus creating another benchmark for the   variability of season-to-season team performance. (You’d also want to do something   similar with SLG, of course.)

Prediction and projection tools have received a lot of exposure in the past   decade, but the basic "projection tool"?Runs Created and its many   competitors?is still for the most part ignoring the issue of season-to-season   variability. (The hoops that some analysts jump through to claim that they are   reducing the gap between projected and actual team run scoring have become downright   silly.) Getting deeper into the gradations of game-by-game performance is now   possible thanks to the increased access to detailed data, and it’s time for   theorists and those interested in such issues to embrace the possibilities that   are inherent in this approach, as opposed to re-inventing the wheel of "grand   theory" (what Bill James called the "great statistic"   in the 1984 Baseball Abstract).

I like ending on such a nice, bold thought?but then I remind myself that number-crunching   mania did not die off as a result of Bill’s words. If anything, the opposite   occured. As a matter of fact, Bill himself will be back with his own new and   improved "great statistic" later this year ("win shares").

And the beat goes on?


Don Malcolm Posted: June 11, 2001 at 06:00 AM | 3 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Tangotiger Posted: June 12, 2001 at 12:08 AM (#603898)
Would be nice to run a regression analysis of OBA/SLG/BA v R/game...
   2. Mike Emeigh Posted: June 14, 2001 at 12:08 AM (#603927)
Unfortunately Don can't pay enough to make me give up my day job:) And it's been terribly busy there lately, so I haven't had much time to follow up on the original work I did. Some thoughts, though:

The simulators ARE missing something, I think. Most assume that a player who hits .290/.425/.425 will hit in that range in every game situation. In fact, there is performance variance across game situations; most player will NOT hit in the same range with runners on base as they do with the bases empty, for example (something that Bill James noticed in his seminal study of rookie performance), and we don't have any idea how these variations affect a team's overall performance (yet). That's why the empirical game data, as limited as it is, is likely to represent how a team composed of such players will "really" perform better than the simulator does.

We should be able to capture (and simulate) variations in performance across game situations using the play-by-play data; one of the things that James noted in the rookie study was that the performance variations appeared to be similar across the groups of players.

-- MWE
   3. Law Boy Posted: June 16, 2001 at 12:08 AM (#603930)
Wow - this is about as interesting a post as I've read for a long time. I love on base percentage related articles - any recommendations for further reading on the subject?


You must be Registered and Logged In to post comments.



<< Back to main

BBTF Partner

Dynasty League Baseball

Support BBTF


Thanks to
Rough Carrigan
for his generous support.


You must be logged in to view your Bookmarks.


Page rendered in 0.1995 seconds
41 querie(s) executed