You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Monday, June 11, 2001The AllOBP Team RevisitedWhen Don gets curious about something, neuron activity increases in a lot of minds and computer processors begin to overheat. So what is Don curious about now? How OBP and SLG interacts. More data to fuel the fire ...A couple of months back I lost what was left of my membership card in the "sabermetric cabal" by suggesting that a team whose OBP and SLG were each .425 wouldn’t score a gazillion runs. The actual number, as some of you will remember, was 956. That projection was arrived at by the use of a basic version of Bill James’ contribution to our understanding of the mechanics of run scoring (Runs Created). Several voices chimed in to correct this estimate, including BBBA coauthors Mike Emeighand Sean Forman. While others simply trotted out their own pet formulae for projecting run scoring, Mike did some real work, using recent (19992000) actual game data to demonstrate that there is indeed more of a synergistic effect for "high OBP/low SLG" teams than viceversa. Mike is still tinkering with this some more, so I won’t try to formalize his results just yet, but what I can tell you is that teams with "high OBP/low SLG" tend to outperform their RC estimates. Teams with the opposite profile ("low OBP/high SLG") tend to underperform. The variation isn’t quite as high as those who proposed that a .425/.425 team would produce close to 1200 runs in a 162game season, however. And even with Mike’s excellent work, I wasn’t satisfied to stop there. So I called on the masterful data manipulator of latterday sabermetrics, Tom Ruane. I asked Tom to go through his 22year database of playbyplay data, and create an actuarial chart of run scoringsorted by OBP and SLG range combinations. In addition to his exceptional talents, Tom is one of the truly good guys in the field, and so he graciously obliged me with yet another of my massivelyscaled requests for help. The OBP and SLG range combinations traced the run scoring in actual games from 19792000 (over 130,000 team game records). They were constructed as follows: after a category of .000.199, SLG was broken out into 10point increments (.200.209, .210.219) on up to .690.699. Everything over .700 was lumped into one final category. For OBP, the initial category was .000.099, followed by 10point increments beginning with .100.109 and continuing in that fashion up to .490.499. All games where OBP was over .499 were lumped into one final category. If I’ve managed to explain this with any level of clarity, you should see where we’re headed. The idea was to look at all of the OBP/SLG combinations, and total up the number of games and the number of runs scored in each. Dividing the latter (runs) by the former (games) would give us the average number of runs/game in each OBP/SLG combination. For the purposes of the discussion of an "allOBP" team like the one proposed by BBBA reader Duane Thomas, which featured a series of .400+ OBP walkmen (and whose combined statistics in the single seasons chosen worked out to .290 BA/.425 OBP/ .425 SLG), we are interested in only one area on Tom’s vast actuarial chart?the portion that links games where the OBP was .420.429 and the SLG was also .420.429. It turns out that of the 130,000+ games in Tom’s database, less than 100 of them match the criteria described above. (This isn’t surprising, actually, because we’ve created very narrow ranges for both measures, and when you combine them, it creates hundreds of minicategories with relatively small sample sizes). Just to give you a taste of that chart, here is a list of all the OBP/SLG combinations with more than 500 games:
As you can see, most of these games cluster at either the extreme low end (SLG below .200 and OBP between .150.250) or at the extreme high end (SLG .700 or better, OBP .500 or better). As noted, the category breakdowns are extremely narrow. At any rate, the 94 games in which a team produced an OBP and SLG that were both in the .420.429 range resulted in a total of 626 runs scored. You fundamentalists out there may wish to take cover now, before I note that this works out to an average of 6.66 runs per game. Over the course of a 162game season, that projects to 1079 runs. That’s right about midway between the estimate using OBP times SLG plus 3.5%, and the other estimates from those other theorists proclaiming that such a team would score close to 1200 runs. As Sean Forman pointed out, the OBP times SLG plus 3.5% formula actually works pretty well if you remember to account for the extra plate appearances that accrue when a team walks at a pace that is 3040% higher than the current MLB record for bases on balls in a season (835, by the 1949 Boston Red Sox). However, a couple of other nuances remain. First is the fact that walks, despite sabermetricians’ occasionally obsessive love for them, simply do not have as much value in terms of run scoring as do hits. And we can demonstrate this by examining the games within Tom Ruane’s data set that contain .420.429 OBP and .420.429 SLG. I also asked Tom to segregate the OBP/SLG pairs by a third modifier?batting average. How many runs per game did a team with less than a .310 BA score when the OBP/SLG values in a game were between .420.429? How many when the team BA was .310.349? And how many when the team BA was .350 or higher? While other analysts have noted that additional "secondary offensive" characteristics have a strong tendency to enhance run scoring, that may not be the case when dealing with more rarified levels of OBP. While two teams with highly similar SLG but divergent batting averages quite often show a run scoring advantage in favor of the team with the lower BA, it appears that the opposite may be the case for OBP. We can see this when we look at those 94 games in which teams produced an OBP/SLG combination of .425/.425.? Less than onefifth of these games (18, to be exact) featured teams with BA’s lower than .310 (which would be in the range of the team that Duane Thomas had selected, with its .290 BA). Breaking the run scoring average for these games into groups based on BA, we see that the teams with the lower BA scored fewer runs than was the case in the overall sample:
Run scoring in games where OBP and SLG were highly similar (in the .400 to .430 range) produced runs per game averages in the lower BA regions that averaged somewhat more than fourtenths of a run lower than the average for these OBP/SLG combinations as a whole. There are two caveats here that need to be mentioned. First, the sample size is small. Second, a team’s OBP/SLG stats are an aggregate of many games’ worth of individual performances, and are not going to slavishly follow the results in games that conform exactly to their overall performance level. (Mike Emeigh’s research showed high OBP/low SLG teams exceeding their RC projection, but these "high OBP" teams came from "aggregated" games as opposed to those with an exact match in the .420.429 OBP/SLG range. Mike’s high OBP/low SLG sample produced an average of 6.82 runs per game.) That said, it’s still an interesting effect, and it may go some distance toward explaining why a walkman team with a .290/.425/.425 BA/OBP/SLG would tend to score less runs than, say, a team with .330/.425/.425. In short, the theorists who claim that BA is a superfluous statistic once you have OBP and SLG would appear to have taken one too many liberties in their zeal to create a calculational shorthand for run scoring. While for the most part this shorthand works well, it appears to break down when we get to extremely high OBP coupled with relatively low SLG (ie, SLG that is not as commensurately high relative to the league as the team’s OBP). This appears to be confirmed, at least provisionally, in the results for the Tokyo Walkmen, a team based on Duane Thomas’ lineup selections that was entered into competition at the sports simulation site WhatIf Sports. After 144 games, the Walkmen?who were constrained to play without a DH?have scored 814 runs, which puts them on pace to score "only" 915 runs over 162 games. Adding 10% to the Walkmen run total to adjust for the absence of the DH, we see that our walktaking pests would come in at approximately 1007 runs. That’s about ten runs lower than what Tom Ruane’s empirical data suggests would be the case for a .425/.425 OBP/SLG team with a sub.300 BA (1017 runs). And finally, there’s the issue raised earlier about just how much dropoff in walk percentage (and, hence, in OBP) such a team would experience as a result of having an entire lineup of walkmen. After some consideration, I managed to come up with two benchmarks for this issue. We’ll take the second one first, since we were talking about the What If Sports simulation. While this is not even remotely an empirical test, looking at what the simulation permitted the selected hitters to do is clearly a convenient jumpingoff point. Interestingly, the simulation produced closetoseasonaverage OBP totals for seven of the eight regulars who received signficant playing time (Wes Westrum, catcher; Ferris Fain, first base; Eddie Stanky, second base; Eddie Joost, shortstop; Eddie Yost, third base; Elmer Valo, left field; Richie Ashburn, center field; Roy Cullenbine, right field). One man?Elmer Valo?got hammered by the game. Valo’s .332 OBP (due in large part to a .221 BA) brought the regulars’ OBP/SLG average down to .411/.410 (it was .419/.420 without him, which shows that What If Sports has a pretty good simulator on its hands). So the simulated team lost just under fifteen points of OPS. We can take a more empirical approach, however. (And that brings us to the second of the two nuances I referred to earlier.) Let’s look at one of the teams in baseball history with the highest aggregate OBP?the 1921 Detroit Tigers. This team had walkmen (Lu Blue, Donie Bush, Johnny Bassler) and highaverage hitters who walked at around leagueaverage rates (the great outfield of Harry Heilmann, Ty Cobb, and Bobby Veach). The team OBP was .381; the eight regulars on the team (including second baseman Ralph Young, who chipped in with the best season of his career,? and third baseman Bob Jones, this team’s "weakest link") did better than that, averaging an even .400. What we want to know is?how much higher is the average of their peak OBP seasons? When we look at Duane Thomas’ team, we see a lineup that features players performing at their highest singleseason level of OBP. "Mass career years" of this nature are exceedingly rare in baseball (or in other endeavors as well). It turns out that the best possible OBP for the eight regulars on the 1921 Tigers, using their best OBP seasons from their individual careers, works out to .422. So our data points indicate that this team lost somewhere between 1422 points of OBP to this "levelling off from career peak" effect. That’s about a five percent dropoff. Thus, for a team to have a realistic chance at producing a .425 OBP, it would have to have players whose peak OBP performance would have to be that much higher?another five percent above .425, or? close to .450 (.448 to be exact). That’s if the Tiger example is representative, of course. I suspect it’s not too far off, and it might actually be a little on the low side. An interesting sabermetric study could be generated from this: calculate the percentage of possible OBP achieved by teams, thus creating another benchmark for the variability of seasontoseason team performance. (You’d also want to do something similar with SLG, of course.) Prediction and projection tools have received a lot of exposure in the past decade, but the basic "projection tool"?Runs Created and its many competitors?is still for the most part ignoring the issue of seasontoseason variability. (The hoops that some analysts jump through to claim that they are reducing the gap between projected and actual team run scoring have become downright silly.) Getting deeper into the gradations of gamebygame performance is now possible thanks to the increased access to detailed data, and it’s time for theorists and those interested in such issues to embrace the possibilities that are inherent in this approach, as opposed to reinventing the wheel of "grand theory" (what Bill James called the "great statistic" in the 1984 Baseball Abstract). I like ending on such a nice, bold thought?but then I remind myself that numbercrunching mania did not die off as a result of Bill’s words. If anything, the opposite occured. As a matter of fact, Bill himself will be back with his own new and improved "great statistic" later this year ("win shares"). And the beat goes on?

BookmarksYou must be logged in to view your Bookmarks. Hot Topics20172021 CBA
(1  10:47am, Oct 04) Last: villageidiom Loser Scores 2015 (12  2:28pm, Nov 17) Last: jingoist Loser Scores 2014 (8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: ERRORJolly Old St. Nick What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH 

Page rendered in 0.1919 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Tangotiger Posted: June 12, 2001 at 12:08 AM (#603898)The simulators ARE missing something, I think. Most assume that a player who hits .290/.425/.425 will hit in that range in every game situation. In fact, there is performance variance across game situations; most player will NOT hit in the same range with runners on base as they do with the bases empty, for example (something that Bill James noticed in his seminal study of rookie performance), and we don't have any idea how these variations affect a team's overall performance (yet). That's why the empirical game data, as limited as it is, is likely to represent how a team composed of such players will "really" perform better than the simulator does.
We should be able to capture (and simulate) variations in performance across game situations using the playbyplay data; one of the things that James noted in the rookie study was that the performance variations appeared to be similar across the groups of players.
 MWE
Rich
You must be Registered and Logged In to post comments.
<< Back to main