|
|
|
|
Baseball Primer Newsblog— The Best News Links from the Baseball Newsstand
Monday, December 14, 2009
Direct from the 2010 MIT Sloan Sports Analytics Conference (AKA - Dorkapalooza) comes…
The Pythagorean expectation formula, originally developed by Bill James, provides a reasonably good estimate of the win percentage of a baseball team using the number of runs scored and runs allowed by that team. Improvements on the formula, such as the Pythagenport by Davenport and Woolner, and the Pythagenpat by Smyth and Patriot, allow for variation of the Pythagorean exponent and give a very good estimate for win percentages over a wide range of run environments. This article looks at possible improvements on the Pythagorean formula and its variants that take into account the shapes of the run distributions in addition to the run environment. There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation. This article also examines how metrics that use runs for evaluating teams and players might be adjusted to account for these effects.
Derivations of the Pythagorean expectation formula have been made under certain assumptions. Hein Hundal showed that if run distributions are independent log-normal distributions, then the Pythagorean exponent can be approximated by a particular function of the standard deviation and mean of a typical run distribution. Steven Miller showed that if the run distributions are Weibull distributions, then win percentages are given exactly by the Pythagorean formula, where the Pythagorean exponent is a parameter used to fit the run distributions to data. In both cases the Pythagorean exponent inferred from run distribution data is very close to the empirical value.
Repoz
Posted: December 14, 2009 at 01:23 PM | 24 comment(s)
Login to Bookmark
Tags:
sabermetrics
|
Bookmarks
You must be logged in to view your Bookmarks.
Hot Topics
Newsblog: 12 Baseball Feats That Only Happened Once (29 - 9:07am, May 25)Last: Harveys WallbangersNewsblog: Boston.com: Curt Schilling’s 38 Studios lays off all staff (46 - 9:05am, May 25)Last: Dale SamsNewsblog: Krauthammer: The Nationals and the Joy of Winning (An old fan needs a new philosophy.) (1 - 8:58am, May 25)Last: Avoid running at all times.-S. PaigeNewsblog: OT: NBA Monthly Thread, May 2012 (1774 - 8:56am, May 25)Last:  Don't want the truth; just wanna see some dingersNewsblog: HP: Baseball is leaving the human factor behind (12 - 8:52am, May 25)Last: Double-Spin MechanicSox Therapy: The Two Dan Bards (13 - 8:50am, May 25)Last: Jose Can You SeabiscuitNewsblog: Roy Halladay bobblehead with glove on wrong hand selling on MLB.com (14 - 8:47am, May 25)Last: smileyyNewsblog: Major League Baseball named Sports League of the Year at Sports Business Awards (11 - 8:33am, May 25)Last: depletionNewsblog: FS Midwest: Streaker halts Cardinals-Phillies game (3 - 8:27am, May 25)Last: depletionNewsblog: Matinale: WADJ: Wins Above Derek Jeter (2 - 8:24am, May 25)Last: Fancy Pants is braggadocious about his HandleNewsblog: Neyer: New Yankee Stadium: A Review (75 - 8:01am, May 25)Last: Harveys WallbangersNewsblog: Greenberg: Cubs' Ricketts decries proposal (750 - 7:54am, May 25)Last:  Jolly Old St. Neck Wound, Moral IdiotNewsblog: Sullivan: Dan Haren Makes Mariners Look Like Mariners (1 - 6:40am, May 25)Last: The cushions are crowded for EdmundoNewsblog: Shawn Green to play for Israel in World Baseball Classic (12 - 5:50am, May 25)Last: shoewizardNewsblog: Primer Dugout (and link of the day) 5-25-2012 (1 - 5:33am, May 25)Last: Tim Stauffer, Trot Nixon's Coming (Dan Lee)
|
|
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Jack Keefe Posted: December 14, 2009 at 04:33 PM (#3411750)The Weibull distribution has been found to model both run distribution and the Pythagorean formula very well. Here's the theoretical paper by Steven Miller:
http://arxiv.org/PS_cache/math/pdf/0509/0509698v4.pdf
THT did some work with the Weibull distribution and its relationship the Pythagorean record.
http://www.hardballtimes.com/main/article/feast-or-famine-first-draft/
http://www.hardballtimes.com/main/article/avoiding-the-famine/
http://www.hardballtimes.com/main/article/consistency-is-key/
http://www.hardballtimes.com/main/article/consistency-is-key-part-two/
http://www.hardballtimes.com/main/article/consistency-is-inconsistent/
Keith Woolner looked at this a while ago, too:
http://www.baseballprospectus.com/article.php?articleid=472
Lucky that this guy wasn't in high school during World War One.
I've noted this in some analyses I've done in the past, which is why I've argued that statistical analysts tend to overvalue OBP and undervalue SLG. The 1992 Milwaukee Brewers, a fairly classic small-ball team (and one of my favorite study topics), had a very inefficient offense, and although they won 92 games they underperformed their Pyth by 4 games.
Poisson distributions don't model actual scoring distributions quite as well. Ted Turocy did some research on this several years back, which if I can find it I'll post.
-- MWE
It's much more flexible than the Poisson (or even an over-dispersed Poisson) and it is often used in competing risks failure time modeling. Winning vs. losing is really competing risks... you win if you score more runs than the other guy, you lose if you don't.
The reason why Weibull is used is that it is a bit more flexible AND because it is more analytically tractable. There is a close relationship between the exponential/weibull which is similar to the relationship between the geometric/negative binomial distributions. See, for example, http://www.objectivedoe.com/student/ReliabilityResources/weibull1.html .
The negative binomial can be a bit more annoying to work with and there is not much gain to switching to it for RS/RA (given the historical data about how close it fits), so I would guess that is why the Weibull is often preferred.
Poisson don't allow for heterogeneity of mean/variance (i.e. the fact that the mean number of runs for each game will be different -- because you're playing a different team), I wouldn't be surprised if negative binomials would do the trick, as mentioned above.
I would guess that blowout are diffcult to handle in a model because one manager often gives up and leaves a bad pitcher in the game. Although blowouts aren't what they used to be. A 6 run lead used to be a blowout.
The 3-parameter Weibull is nice because of its flexibility. Beta allows you to re-center for binning, and so it's really a two parameter model. This minimizes the dof when doing fitting. The neat thing is that the gamma parameter *is* the pythagorean exponent, which makes me wonder if there is an underlying reason that the Weibull is a good fit or whether it is just phenomonenological. I like Russ' pseudo-explanation about competing risks. I wonder if there's something there to be more fully fleshed out.
The author here; Sal Baxamusa alerted me to this discussion.
I did know about the Miller proof (in fact, I mentioned it in the article!). I knew about one of the THT articles -- thanks for the links to the others, plus the Woolner article.
Certainly Weibull is just a phenomenological fit (it's continuous after all).
The problem with the Weibull, as used by Miller (other than the fact that it's continuous and not discrete) is that it becomes a one-parameter distribution after fitting -- it loses one parameter when choosing the binning (as you mentioned), and loses another when fitting to the data. The remaining parameter is the Pythagorean exponent, of course, but since it's a universal exponent, it doesn't allow for variation of distribution shapes.
I suppose you could use Weibull distributions with different gamma parameters for each team; it would be interesting to see if that gave results similar to mine.
As Mike said, if you want to go to a discrete distribution, Poisson won't work. As Russ said, with Poisson the shape is not independent of the mean. Another way of looking at it is that Poisson assumes events are independent, but runs
are not independent events in baseball since your probability of scoring another run depends on how you score the last one (i.e., whether you had men still on base). Interestingly, I've found hockey scoring to be well-described by Poisson, but that's because goals ARE a good approximation to independent events.
I doubt if any standard discrete distribution accurately reproduces run scoring distributions, which is why I went the modeling route. As depletion mentioned, you need a way to resolve ties, too. The basis for RPG distribution is the RPI (runs per inning) distribution, which also allows you to determine who wins in extra innings.
I've had this discussion re: Markov chain analyses of baseball, where future and past states are also not independent of the present state but where the Markov chain appears to be a reasonable model nonetheless - does this caveat make sense in that context as well?
-- MWE
I agree, future and past states are also not completely independent (violating the Markov chain hypothesis), but it probably doesn't make that big a difference.
The fact that the W/L parameters derived from the Markov chain analysis also fit the real data fairly well was encouraging, and suggest that the lack of independence is minimal, or at least not a problem.
As it turns out the difference between the best and worst AL teams in SLG was .081 last year. So we're talking about a 1 win difference here, on the most extremes.
While preparing my article I looked at using Weibulls with different gammas to see if a modified Pythagorean expectation followed from it, but the math appeared to become intractable. Of course, if you didn't want a closed form result that wouldn't matter.
I think the alpha is the main driver of the run environment (i.e., RPG), so for the Weibull to model the run distribution effectively, I would expect the gamma to mainly affect the shape (i.e., standard deviation) -- although of course both mean and variance are functions of both alpha and gamma.
The problem with any continuous distribution is that it doesn't handle ties, which is why I like starting with a RPI distribution, which can handle ties and which also allows you to calculate a RPG distribution.
Yes, but you also have the pitching side, which can give another win. And if you construct your team using these principles, you might be able to accentuate the differences some.
But, agreed, these are not huge effects. The problem is you have to do it for every player, and the gain per player is small. OTOH, if you could do something to add two wins a year, why not do it?
I like the basic result, which is that a SLG .080 higher for a player is worth one additional run. So if you have one player that rates out to a WARP or WAR of 4.3 and another to 4.8, if the first player had a SLG .080 higher, you would actually rate him at 5.3 compared to the other player. Or with a SLG .160 higher, it would be 6.3. That's not insignificant.
Fielding is a large part of a WARP rating, so lately I've been wondering if there is a similar effect in fielding, i.e., do some fielding plays that have the same runs allowed actually lead to different shapes of the runs alllowed distribution, which would affect wins? That will be a much harder nut to crack.
No, one extra run would change the 4.3 player to 4.4, a trivial change. (It takes 10 runs to generate one win.)
Oops, I meant RAR, not WARP. Duh. Anyway, doing it for the whole team (both offense and defense) might be worth 2 wins.
Although the differences are small for one player, it can be used as a tiebreaker, i.e., for two players with the same conventional rating, choose the one with the higher SLG.
Sorry if this was covered already, but is consistency always good? Wouldn't a bad team (one that averages fewer RS than RA) win more games with an inconsistent offense?
I agree they are not large effects, and noise makes it less useful. The base-running aspect is definitely too small to be useful!
And the effect varies with the run environment. In 1968, it would have taken only a .050 higher team SLG to add a win -- of course SLG separation would probably be harder to achieve in a lower run environment.
That's an interesting question. There are some subtleties I didn't get into in the article (there was a length limit on the original, and I didn't rewrite it for the web):
Although I used standard deviation as a measure of the shape (how wide it is), you can also have a skew as well (more probability above or below the average), in which case the distribution is not symmetric about the mean.
If you had a skew with more probability above the mean, then a wider distribution (more inconsistent) WOULD be helpful, no matter what your RPG. However, realistic baseball distributions always have more probability below the mean, and in that case a wider distribution is bad.
A neat (non-realistic) example showing these effects is to consider three teams, A who scores 4 runs two-thirds of the time and 7 runs the other third (call it a 447 distribution), B with a 663 distribution, and C who always scores 5 runs (call it 555). All of these have an average RPG of 5.0.
A is sort of shaped like a baseball team (more probability below the mean), C is perfectly consistent, and B is unrealistic (more probability above the mean).
It turns out that C beats A (6 games out of 9), A beats B (5 games out of 9) and B beats C (6 games out of 9)! For perfectly general distributions, which team is better is non-transitive.
In real life, the skew correlates very closely with the standard deviation, always has more probability below the mean, and doesn't really add anything to the analysis. The team closest to perfect consistency (like team C) will do better for the same RPG.
You must be Registered and Logged In to post comments.
<< Back to main