You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Wednesday, December 15, 2004An Oliver Twist on the Pythagorean TheoremWhat the Dickens?
An Oliver Twist on the Pythagorean Theorem? What the
I made a trip to the href="http://www.baseballreference.com/teams/NYY/2004_sched.shtml"> Basketball
Chapter 11, titled “Basketball’s Bell Curve” provides an
Where PPG = points per game, DPPG =defensive points per
Oliver used this method because he found a correlation href="http://basketballonpaper.com/"> website if you would like to learn
Naturally, as a baseball fan, I wanted to see if this method href="http://www.baseballreference.com/teams/NYY/2004_sched.shtml"> baseballreference.com href="http://www.retrosheet.org/boxesetc/VBOS02003.htm"> retrosheet.org also
It is well known in the slackademic circles of the baseball href="http://www.baseballreference.com/teams/NYY/2004.shtml"> 2004 New York
Actually, the Oliver winning percentages are extremely close
As consolation for this inconclusive study, I decided to
In closing, I’d like to dedicate this article to href="http://www.baseballgraphs.com/"> Studes who has done more than anyone 
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2017
(7  11:24am, Dec 22) Last: G. Bostock 20172021 CBA (1  10:47am, Oct 04) Last: villageidiom Loser Scores 2015 (12  2:28pm, Nov 17) Last: jingoist Loser Scores 2014 (8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: ERRORJolly Old St. Nick What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: G. Bostock 

Page rendered in 0.3458 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Nick S Posted: December 15, 2004 at 11:05 PM (#1023260)I would have guessed that a correlation in basketball between PPG and DPPG resulted from a correlation between PPG and opponent’s possessions/game. That is, every time you score, your opponents get the ball, whereas everytime you don’t score you get the ball back (offensive rebound) some noninsignificant percentage of the time. Also, and I know squat about basketball, it may be that teams choose to play aggressively, resulting in better offense and worse defense.
Well, there’s something in basketball called Game Pace, which relates to how fast a team goes thru a possession. If a team has a fast pace, then both the team and the opponents will have more possessions during the game, and therefore likely more points and more points allowed (and viceversa, of course).
I always wondered about +/ ratios in basketball. They have ‘em in hockey, why not hoops?
Of course, I have trouble with any metric that measures your performance when you’re not on the court/ice, but that’s neither here nor there.
BOP is an excellent book. It should also be noted that Dean Oliver was recently hired as a consultant by the Sonics, who perhaps not coincidentally are off to a surprising 184 start. There are some quotes from Oliver in this article:
These Sonics are indeed super
Also Roland Beech at 82games.com has basically the equivalent of +/ ratings in basketball.
Bill James did something like this in the 80s, I’m guessing, but I want to say it was the 1986 Abstract (the article focused on the Dodgers and Mets, and they were both good in 1985)  he basically came to a similar conclusion. James said that it’s basically 23% more acurate and 10 times the work, so it’s not worth it.
Obviously that’s different in a 810 run environment, as opposed to a 180 point environment. But it does explain a small percentage of a team’s deviation from the Pythagorean method.
3 comments:
1. It’s not too surprising that taking covariance into account doesn’t add much accuracy, for reasons that have already been given: game pace (each baseball team basically always gets 27 outs, whereas an NBA team might shoot 70 or 100 field goal attempts); the fact that the scoring team has to give the ball to the other team in basketball (this is actually much the same concept as the game pace one); and the garbage time/playing to the level of the opponent argument that Dean Oliver originally made.
2. The graphs would be better as scatterplots rather than bar graphs (assuming that the points would show up when viewed via the web). Because scatterplots would enable the user to see the degree of correlation between points scored and allowed. If the two are positively correlated, the scatterplots will look like upwardsloping elipses. If they are not (as appears to be the case with baseball) then they will look like diffuse spheres.
With the bar graphs, one cannot detect the correlations, indeed it’s rather hard to even figure out if a team outscored its opponents or not.
3. Post #3 said I always wondered about +/ ratios in basketball. They have ‘em in hockey, why not hoops?
Harvey Pollack of the Philadelphia 76ers has been reporting +/ for decades in his Philadelphia 76ers Statistical Yearbook. Also, recently 82games.com started tracking this with their
Roland Ratings. And Jeff Sagarin and Wayne Winston in recent years have come up with a more sophisticated version of +/ which they call WinVal.
There are however several problems with +/ ratings, both on theoretical grounds and on the empirical grounds that many of the players’ receive ridiculous ratings.
Hey GGC, thanks for the dedication! I appreciate it.
I’ve played with graphs like this (pythagorean distributions) ever since they invented Lotus. I find it a fascinating subject, and one that lends itself to “graphical reflection.”
I was just thinking this morning about our possible Yankee pythagorean project. Are you still up for it?
One reason not to apply this to baseball is that neither runscoring nor runallowing are quite normally distributed so chances are neither is run differential. The deviation from normality appears fairly small, but it might be enough to offset the small increase in explanatory power. (As scoring increases, the distribution converges on normality, so when you’re talking basketball scores things should be pretty normal)
For those who look at that formula and don’t get what it’s doing:
the denominator part is the standard deviation of run differential—there’s no actual need to calculate the variances and covariance of RS and RA, just calculate the variance of RD.
So you’re looking at the mean run differential relative to its standard deviation. If memory serves this is also called the coefficient of variation. This then is treated as a zscore (think bell curve), fed through the normal CDF (don’t worry about it), which gives you the probability of seeing this run differential given this standard deviation. At least I assume that’s what NORM is standing for in that formula.
The upshot being that the greater the variation in run differential, the less that a given run differential translates into wins. So a .5 run differential in Dodgers Stadium would be huge but in Coors is not. This would be true across eras with different scoring levels as well.
Jon, since you’ve got the data handy, how about just calculating the Rsquare and seeing how that works? Just correlate RS with (RS+RA) and square the result. This is quite similar to the original pythag but will correct for degrees of freedom and covariance. The interpretation is “of the total variance of runscoring in a team’s game, how much is due to that team’s offense.” The Rsquare would be the estimated win percentage.
Also if you’d be nice enough to send me the raw game score data, there’s something else I’d like to take a crack at.
Jon, since you’ve got the data handy, how about just calculating the Rsquare and seeing how that works? Just correlate RS with (RS+RA) and square the result. This is quite similar to the original pythag but will correct for degrees of freedom and covariance. The interpretation is “of the total variance of runscoring in a team’s game, how much is due to that team’s offense.” The Rsquare would be the estimated win percentage.
I may try that when I get some free time.
Also if you’d be nice enough to send me the raw game score data, there’s something else I’d like to take a crack at.
Walt, go to the spreadsheet that I linked. It contains all the game scores for each AL team.
I think the key to constructing a less empirical approach to predicting team winning percentages based on runs scored and allowed is to use Poissian statistics. I recently constructed a model using the Poissian distribution and the average runs scored data for all 162 game seasons and compared the results to actual winning percentage and Pythagorean winning percentage.
The primary results:
1) The Poissonian distrubution model and Pythagorean model gave similar results.
2) The Poissonian model predicts higher and lower winning percentages for the best and worst run differentials than the Pythagorean model does.
3) The Pythagorean model works better.
I have wondered if there would be any interest in my writing up the method and presenting the results here.
Responding to some of these posts here:
Bill James did something like this in the 80s, I’m guessing, but I want to say it was the 1986 Abstract (the article focused on the Dodgers and Mets, and they were both good in 1985)
Joe DiMino, the man who remembers Run Element Ratio! Indeed, Bill did something with the Mets article in 1986 that was somewhat similar.
The graphs would be better as scatterplots rather than bar graphs (assuming that the points would show up when viewed via the web). Because scatterplots would enable the user to see the degree of correlation between points scored and allowed. If the two are positively correlated, the scatterplots will look like upwardsloping elipses. If they are not (as appears to be the case with baseball) then they will look like diffuse spheres.
As per your idea, I did a scatterplot of the Royals. It’s more diffuse than upward sloping.
With the bar graphs, one cannot detect the correlations, indeed it’s rather hard to even figure out if a team outscored its opponents or not.
I originally did these as line graphs instead of column graphs. I actually think the columns look better. But you are right, neither make it clear whether a team outscored it’s opponents. You can look at the peaks for a clue, but even that doesn’t tell the whole story. (Witness the Texas Rangers.) Like I said, if you want to play around with these, I gave a link to the original spreadsheet.
how about just calculating the Rsquare and seeing how that works? Just correlate RS with (RS+RA) and square the result. This is quite similar to the original pythag but will correct for degrees of freedom and covariance. The interpretation is “of the total variance of runscoring in a team’s game, how much is due to that team’s offense.”
I did this for five of the teams. Unless I’m doing it incorrectly (a possibility) Rsquared doesn’t seem to match up well with winning percentages. THere were all between .521 and .593. That includes the Royals and Mariners, who were both around .521. Unlees the AL was renamed the Lake Woebegone League, I don’t think they’re accurate.
Kelly, I’d contact Dan Szymborski about that. I think that he may be able to accomodate you. If you’re looking for more research, BPro did an article that mentioned Poisson distribution back in 1999.
I wonder if the Oliver method would be more appropriate for 19th Century teams (esp pre1893).
Over on the Hall of Merit area, we’ve been discussing whether pitching to the score was a skill/strategy employed in the 19th C. because there essentially was no bullpen. Pitching to the score sounds an awful lot like the baseball equivalent of
‘1.) The tendency for teams to play up or down to their competition, and 2.) “Garbage time”; where the game is no longer in doubt and the team with the lead can give up points without changing the result.’
Wow! Oliver Twist! Cockney Rhyming Slang!
I love this!
Whose twist is it now?
I always wondered about +/ ratios in basketball. They have ‘em in hockey, why not hoops?
They do have them in hoops. It’s not widely published, but it is being done.
A couple of years ago someone showed that the Bulls were best off with both Jamal Crawford and Jay Williams on the floor at the same time, rather than paired up with other guards. (Then Williams cracked up his motorcycle.)
Kelly, you may want to look into the negative binomial distribution. It allows for more dispersion than the poisson (i.e. should be able to handle very high scoring games a little better).
You must be Registered and Logged In to post comments.
<< Back to main