User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Page rendered in 0.5942 seconds
62 querie(s) executed
|
| |||||||||
|
You are here > Home > Al's Baseball Tidbits > Discussion
| |||||||||
Al's Baseball Tidbits — Friday, December 21, 2001Al’s Baseball Tidbits - “Another-Letter-to-Rob-Neyer” editionThis is the text of an e-mail I sent Rob Neyer, co-author of “Baseball Dynasties.” I have recently purchased and read your book, “Baseball Dynasties,” written with Eddie Epstein. While I enjoyed the book and agree that the use of standard deviations helps to put teams from different eras on a more “level playing field,” I have to take issue with the way you used SDs in ranking the teams. When you take the number standard deviations by which a team exceeded the league mean in runs and add it to the number of standard deviations by which it was under the league mean in runs allowed, you commit a logical error that, conceptually, is like doing addition by adding exponents. This “SD score” is not a measure of dispersal at all, and is very misleading, implying that, for example, the 1998 Yankees were almost 4 standard deviations “better” than the average team. It seems to me that what you should have done was take all the run-differential figures, figure the standard deviation of those and see by how many SD the team’s run differential exceeded the mean (which would, of course, be zero). If you do that, you actually get a standard deviation, not some bastard figure that is misleading. Here’s what I mean:
The SD score for run differential is higher than either the runs scored or runs allowed SD score, but not nearly their sum. That SD score of 2.47 is a good measure of how good the 1998 Yankees were in terms of outscoring their opposition, which, after all, is what baseball is all about. Your 3.88 figure is not meaningful, and your misuse of “additive” SD scores actually affects the SD score rankings. Let’s look at the ‘97-’99 Yankees and the ‘69-’71 Orioles. In the back of the book, you rank those Baltimore teams with the top 3-year “SD score” of all time by adding up each team’s offensive and defensive SD scores; the ‘97-’99 Yankees rank third. I put the runs/runs allowed for the six seasons involved into an Excel spreadsheet, put in a “difference” column and had Excel figure the standard deviation of each column. Then I added another set of columns that were the sum of each year’s columns, so that I got the standard deviation of runs scored, allowed and run differential for each year and for each 3-year period. (Your summing SD scores across years is just as bad as summing offense and defense SD scores; that’s just not how variation works.) Here are some figures:
So, when we measure what we’re really trying to measure here, how well the teams did at outscoring their opposition, the Yankees come out slightly ahead of Baltimore. The first year in each period made the difference. Why? I’m not a professional statistician, so I’m not sure what the explanation is, but it’s clear that the standard deviation of run differential in 1969 was considerably higher than that for 1998. Look here:
Note that the std. dev.’s of runs and runs allowed are essentially the same in 1998, whereas in 1969 there was more variation in scoring than allowing runs. 1998 was the only year when the scored and allowed SDs were very close, and it had the lowest SD of run differential. I think that’s significant, but I don’t really understand why. OK, I just had to run the numbers for the ‘37-’39 Yankees, too. Here they are:
According to SD of run differential, the ‘37-’39 Yankees just don’t measure up to those other two teams. Here again, the SD of run differential is the big factor, a factor your book ignored. In 1939, the SD of run differential was a whopping 221 runs! Although the Yankees had a plus 411, two teams were over 300 runs below the mean. St. Louis and Philadelphia each allowed over 1000 runs! Bill James, in his new “Historical Abstract,” has also criticized your method, although he missed the logical error of adding SD scores. He does write of the need to use multi-year standard deviations, but I don’t agree with his conclusion that this just brings you back to the teams’ won-lost record. These results show that isn’t true, because the ‘37-’39 Yankees had the best 3-year W-L record of the three, 307-150. What it does lead you back, or near, to, however, is the SD score for 3-year wins, where the ‘97-’99 Yankees have an SD score of 2.31, compared to 2.13 for Baltimore and only 1.56 for the earlier Yank dynasty. Let’s fact it, those Yankees were beating up on some really bad teams; St. Louis won 46, 55 and 43 games, while Philadelphia won 54, 53 and 55. The Browns allowed an average of over 1000 runs those three years! It’s true that the longer the period, the smaller the difference is likely to be between a team’s run-differential SD score and its W-L percentage SD score, but I still think it’s worthwhile to look deeper than the W-L data. Just as looking at run differentials removes a “level of luck” from the W-L data, looking at batting and batter-vs-pitcher data removes another. If you compare a team’s run created (or linear weights, or Extrapolated Runs, etc.) to their opponents’ run created, you’re probably getting yet closer to the actual level of ability of the team. If you do this for all teams in a league, you will find the standard deviation of “runs-created differential” is lower than that of runs differential. I think doing this kind of analysis using runs created (I prefer Extrapolated runs) instead of actual runs would be worthwhile, but unfortunately batter-vs-pitcher data for years before around 1984 is difficult or impossible to come by. A am going to do an “Al’s Baseball Tidbit” article comparing the ‘98 Yankees with the ‘01 Mariners, using SD of xRuns differential.
Cheers,
(end of e-mail)
I hope to do the study of the ‘98 Yanks vs. ‘01 Mariners before the New Year.
|
My BookmarksYou must be logged in to view your Bookmarks. Hot Topics |
||||||||
|
About Baseball Think Factory | Write for Us | Copyright © 1996-2008 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| Page rendered in 0.5942 seconds | |||||||
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
the standard deviation of run differential....
there’s a nifty thing called covariance algebra. Among many properties of covariances (and a variance is just the covariance of one variable with itself), there’s the rule of the variance of a sum or difference:
RD = RS - RA
You must be Registered and Logged In to post comments.
<< Back to main