Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Al's Baseball Tidbits > Discussion
Al's Baseball Tidbits
— 

Friday, December 21, 2001

Al’s Baseball Tidbits - “Another-Letter-to-Rob-Neyer” edition

This is the text of an e-mail I sent Rob Neyer, co-author of “Baseball Dynasties.”

I have recently purchased and read your book, “Baseball Dynasties,” written with Eddie Epstein. While I enjoyed the book and agree that the use of standard deviations helps to put teams from different eras on a more “level playing field,” I have to take issue with the way you used SDs in ranking the teams.

When you take the number standard deviations by which a team exceeded the league mean in runs and add it to the number of standard deviations by which it was under the league mean in runs allowed, you commit a logical error that, conceptually, is like doing addition by adding exponents. This “SD score” is not a measure of dispersal at all, and is very misleading, implying that, for example, the 1998 Yankees were almost 4 standard deviations “better” than the average team. It seems to me that what you should have done was take all the run-differential figures, figure the standard deviation of those and see by how many SD the team’s run differential exceeded the mean (which would, of course, be zero). If you do that, you actually get a standard deviation, not some bastard figure that is misleading. Here’s what I mean:



1998 Yankees
runs scored: 965
mean runs scored, 1998 AL: 812
std dev of runs scored, 1998 AL: 89.2
SD score: 1.72


runs allowed: 656
mean runs allowed, 1998 AL: 811
std dev of runs allowed, 1998 AL: 71.8
SD score: 2.16


run differential: 309
mean run differential, 1998 AL: 0.8 (due to interleague play, R - RA not exactly equal to zero)
std dev of run differential, 1998 AL: 124.6
SD score: 2.47

The SD score for run differential is higher than either the runs scored or runs allowed SD score, but not nearly their sum. That SD score of 2.47 is a good measure of how good the 1998 Yankees were in terms of outscoring their opposition, which, after all, is what baseball is all about. Your 3.88 figure is not meaningful, and your misuse of “additive” SD scores actually affects the SD score rankings.

Let’s look at the ‘97-’99 Yankees and the ‘69-’71 Orioles. In the back of the book, you rank those Baltimore teams with the top 3-year “SD score” of all time by adding up each team’s offensive and defensive SD scores; the ‘97-’99 Yankees rank third. I put the runs/runs allowed for the six seasons involved into an Excel spreadsheet, put in a “difference” column and had Excel figure the standard deviation of each column. Then I added another set of columns that were the sum of each year’s columns, so that I got the standard deviation of runs scored, allowed and run differential for each year and for each 3-year period. (Your summing SD scores across years is just as bad as summing offense and defense SD scores; that’s just not how variation works.) Here are some figures:



'97-'99 Yankees '69-'71 Orioles
per the book


97 2.82 97 3.34
98 3.88 (6.70) 98 3.31 (6.64)
99 2.71 (9.41) 99 3.22 (9.86)


my method

97 off. 1.29 69 off. 1.34
def. -1.53 def. -1.99
diff. 2.15 diff 2.05 (note Yankees rank higher; the SD of diff was higher in '69 than '97)


98 off. 1.72 70 off. 1.89
def. -2.16 def. -1.42
diff. 2.47 diff. 1.92


99 off. 0.72 71 off. 1.61
def. -1.99 def. -1.60
diff. 1.63 diff. 2.01


'97-'99 '69-'71
off. 1.53 off. 1.76
def. -2.35 def. -2.18
diff. 2.39 diff. 2.30

So, when we measure what we’re really trying to measure here, how well the teams did at outscoring their opposition, the Yankees come out slightly ahead of Baltimore. The first year in each period made the difference. Why?

I’m not a professional statistician, so I’m not sure what the explanation is, but it’s clear that the standard deviation of run differential in 1969 was considerably higher than that for 1998. Look here:



1998 AL 1969 AL
mean runs: 797.4 663.3
std. dev. runs: 72.3 86.0
mean runs allowed: 798.4 663.3
std. dev. runs allowed: 72.2 73.4
mean run differential: 0.0 0.0
std. dev. run differential: 94.8 128.1

Note that the std. dev.’s of runs and runs allowed are essentially the same in 1998, whereas in 1969 there was more variation in scoring than allowing runs. 1998 was the only year when the scored and allowed SDs were very close, and it had the lowest SD of run differential. I think that’s significant, but I don’t really understand why.

OK, I just had to run the numbers for the ‘37-’39 Yankees, too. Here they are:



book method
-----------
37 3.22
38 2.96 (6.18)
39 3.52 (9.70)


my method
---------
37 off. 1.78
def. -1.44
diff. 1.79


38 off. 1.72
def. -1.25
diff. 1.63


39 off. 1.88
def. -1.63
diff. 1.86


'37-'39
off. 1.88
def. -1.63
diff. 1.80

According to SD of run differential, the ‘37-’39 Yankees just don’t measure up to those other two teams. Here again, the SD of run differential is the big factor, a factor your book ignored. In 1939, the SD of run differential was a whopping 221 runs! Although the Yankees had a plus 411, two teams were over 300 runs below the mean. St. Louis and Philadelphia each allowed over 1000 runs!

Bill James, in his new “Historical Abstract,” has also criticized your method, although he missed the logical error of adding SD scores. He does write of the need to use multi-year standard deviations, but I don’t agree with his conclusion that this just brings you back to the teams’ won-lost record. These results show that isn’t true, because the ‘37-’39 Yankees had the best 3-year W-L record of the three, 307-150. What it does lead you back, or near, to, however, is the SD score for 3-year wins, where the ‘97-’99 Yankees have an SD score of 2.31, compared to 2.13 for Baltimore and only 1.56 for the earlier Yank dynasty. Let’s fact it, those Yankees were beating up on some really bad teams; St. Louis won 46, 55 and 43 games, while Philadelphia won 54, 53 and 55. The Browns allowed an average of over 1000 runs those three years! It’s true that the longer the period, the smaller the difference is likely to be between a team’s run-differential SD score and its W-L percentage SD score, but I still think it’s worthwhile to look deeper than the W-L data.

Just as looking at run differentials removes a “level of luck” from the W-L data, looking at batting and batter-vs-pitcher data removes another. If you compare a team’s run created (or linear weights, or Extrapolated Runs, etc.) to their opponents’ run created, you’re probably getting yet closer to the actual level of ability of the team. If you do this for all teams in a league, you will find the standard deviation of “runs-created differential” is lower than that of runs differential. I think doing this kind of analysis using runs created (I prefer Extrapolated runs) instead of actual runs would be worthwhile, but unfortunately batter-vs-pitcher data for years before around 1984 is difficult or impossible to come by. A am going to do an “Al’s Baseball Tidbit” article comparing the ‘98 Yankees with the ‘01 Mariners, using SD of xRuns differential.

Cheers,
AlanShank

(end of e-mail)

I hope to do the study of the ‘98 Yanks vs. ‘01 Mariners before the New Year.
Cheers, and Happy Holidays,
Al

Alan Shank Posted: December 21, 2001 at 03:55 PM | 1 comment(s)
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 1 pages
   1. Walt Davis Posted: January 07, 2002 at 03:09 PM (#509359)

the standard deviation of run differential....

there’s a nifty thing called covariance algebra.  Among many properties of covariances (and a variance is just the covariance of one variable with itself), there’s the rule of the variance of a sum or difference:

RD = RS - RA

Page 1 of 1 pages

You must be Registered and Logged In to post comments.

 

<< Back to main

Support BBTF

donate

Thanks to
Jolly Old St. Neck Wound, Marching Through Georgia
for his generous support.

My Bookmarks

You must be logged in to view your Bookmarks.

Vivid Seats is a sports ticket broker, concert ticket broker and theater ticket broker offering the best baseball tickets like Yankees tickets, Cubs tickets, and Red Sox tickets, as well as Police reunion tour tickets and Jersey Boys tickets.

We have baseball tickets, the NFL schedule, college football tickets and Cowboys tickets. We have NBA tickets like Celtics tickets and Lakers tickets. Plus, buy concert tickets, Patriots tickets and Colts tickets. Also check out our MLB baseball schedule

Baseball Bats

JustGreatTickets.com provides the best value for Chicago Cubs Tickets, MLB tickets including Red Sox Tickets, Yankees Tickets, SF Giants Tickets, LA Dodgers Tickets, Cleveland Indians Tickets. Get the best concert tickets like Jonas Brothers tickets and more Chicago Tickets.

Concerts Theatre NFL Angels Dodgers MLB Celtics Theater NBA Tickets Venues NHL Lakers Tickets NFL Yankees NHL Phillies NBA Wicked Marlins MLB Concerts Cubs Mets Red Sox Wicked WWE Red Sox Mets Yankees Dodgers

Major League Baseball: All Star Game, New York Yankees, Boston Red Sox, LA Angels, Washington Nationals, Chicago White Sox, and the Chicago Cubs.

Find terrific deals on Yankees tickets for the new home, Cubs tickets for classic Wrigley, or Red Sox tickets for Fenway with OnlineSeats. We have seats for every baseball game, including Dodgers tickets.

Page rendered in 0.5942 seconds
62 querie(s) executed