Baseball for the Thinking Fan

You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

## Monday, December 14, 2009

#### Whisnant: Beyond Pythagorean Expectation: How Run Distributions Affect Win Percentage

Direct from the 2010 MIT Sloan Sports Analytics Conference (AKA - Dorkapalooza) comes…

The Pythagorean expectation formula, originally developed by Bill James, provides a reasonably good estimate of the win percentage of a baseball team using the number of runs scored and runs allowed by that team.  Improvements on the formula, such as the Pythagenport by Davenport and Woolner, and the Pythagenpat by Smyth and Patriot, allow for variation of the Pythagorean exponent and give a very good estimate for win percentages over a wide range of run environments. This article looks at possible improvements on the Pythagorean formula and its variants that take into account the shapes of the run distributions in addition to the run environment. There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation. This article also examines how metrics that use runs for evaluating teams and players might be adjusted to account for these effects.

Derivations of the Pythagorean expectation formula have been made under certain assumptions. Hein Hundal showed that if run distributions are independent log-normal distributions, then the Pythagorean exponent can be approximated by a particular function of the standard deviation and mean of a typical run distribution. Steven Miller showed that if the run distributions are Weibull distributions, then win percentages are given exactly by the Pythagorean formula, where the Pythagorean exponent is a parameter used to fit the run distributions to data. In both cases the Pythagorean exponent inferred from run distribution data is very close to the empirical value.

Repoz Posted: December 14, 2009 at 12:23 PM | 24 comment(s) Login to Bookmark
Tags: sabermetrics

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

1. Jack Keefe Posted: December 14, 2009 at 03:33 PM (#3411750)
Hey whats wrong with you score more runs than the other guys and you win the God Dam game Al.
2. Hang down your head, Tom Foley Posted: December 14, 2009 at 03:43 PM (#3411762)
Keefe knows how to win.
3. The importance of being Ernest Riles Posted: December 14, 2009 at 03:47 PM (#3411768)
There has been some previous work in the area of which I'm not sure the author is aware. Hopefully they surf over to BTF and find this comment.

The Weibull distribution has been found to model both run distribution and the Pythagorean formula very well. Here's the theoretical paper by Steven Miller:

http://arxiv.org/PS_cache/math/pdf/0509/0509698v4.pdf

THT did some work with the Weibull distribution and its relationship the Pythagorean record.

http://www.hardballtimes.com/main/article/feast-or-famine-first-draft/
http://www.hardballtimes.com/main/article/avoiding-the-famine/
http://www.hardballtimes.com/main/article/consistency-is-key/
http://www.hardballtimes.com/main/article/consistency-is-key-part-two/
http://www.hardballtimes.com/main/article/consistency-is-inconsistent/

Keith Woolner looked at this a while ago, too:

http://www.baseballprospectus.com/article.php?articleid=472
4.  Posted: December 14, 2009 at 03:59 PM (#3411781)
Hein Hundal

Lucky that this guy wasn't in high school during World War One.
5. depletion Posted: December 14, 2009 at 04:28 PM (#3411807)
I've often wondered why a continuous distribution, such as a Weibull, is used for modelling what are clearly events with discrete outcomes. The author, Miller, referenced in (3) above mentions this difficulty but doesn't address it. Why not use a Poisson distribution, or some other discrete distribution? The issue of ties and extra innings would still need to be addressed. Once the game goes into extra innings, a separate distribution could be used as the gap in runs for the game is expected to be much smaller than otherwise.
6.  Posted: December 14, 2009 at 04:43 PM (#3411825)
There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation.

I've noted this in some analyses I've done in the past, which is why I've argued that statistical analysts tend to overvalue OBP and undervalue SLG. The 1992 Milwaukee Brewers, a fairly classic small-ball team (and one of my favorite study topics), had a very inefficient offense, and although they won 92 games they underperformed their Pyth by 4 games.

Why not use a Poisson distribution, or some other discrete distribution?

Poisson distributions don't model actual scoring distributions quite as well. Ted Turocy did some research on this several years back, which if I can find it I'll post.

-- MWE
7. Russ Posted: December 14, 2009 at 04:43 PM (#3411826)
I've often wondered why a continuous distribution, such as a Weibull, is used for modelling what are clearly events with discrete outcomes.

It's much more flexible than the Poisson (or even an over-dispersed Poisson) and it is often used in competing risks failure time modeling. Winning vs. losing is really competing risks... you win if you score more runs than the other guy, you lose if you don't.

The reason why Weibull is used is that it is a bit more flexible AND because it is more analytically tractable. There is a close relationship between the exponential/weibull which is similar to the relationship between the geometric/negative binomial distributions. See, for example, http://www.objectivedoe.com/student/ReliabilityResources/weibull1.html .

The negative binomial can be a bit more annoying to work with and there is not much gain to switching to it for RS/RA (given the historical data about how close it fits), so I would guess that is why the Weibull is often preferred.
8. Russ Posted: December 14, 2009 at 04:45 PM (#3411830)
Poisson distributions don't model actual scoring distributions quite as well. Ted Turocy did some research on this several years back, which if I can find it I'll post.

Poisson don't allow for heterogeneity of mean/variance (i.e. the fact that the mean number of runs for each game will be different -- because you're playing a different team), I wouldn't be surprised if negative binomials would do the trick, as mentioned above.
9. depletion Posted: December 14, 2009 at 05:07 PM (#3411864)
Russ and Mike: Thanks for the response. I'm pretty good with math (electrical engineer) but don't often have the time to look into baseball math as much as I'd like. I agree that the Weibull is more flexible than the Poisson. I worked in reliability engineering a long time ago and do remember it from then.

I would guess that blowout are diffcult to handle in a model because one manager often gives up and leaves a bad pitcher in the game. Although blowouts aren't what they used to be. A 6 run lead used to be a blowout.
10. The importance of being Ernest Riles Posted: December 14, 2009 at 06:52 PM (#3412019)
Echo the thanks offered in 9. I'm also an engineer (chemical).

The 3-parameter Weibull is nice because of its flexibility. Beta allows you to re-center for binning, and so it's really a two parameter model. This minimizes the dof when doing fitting. The neat thing is that the gamma parameter *is* the pythagorean exponent, which makes me wonder if there is an underlying reason that the Weibull is a good fit or whether it is just phenomonenological. I like Russ' pseudo-explanation about competing risks. I wonder if there's something there to be more fully fleshed out.
11. KerryW Posted: December 14, 2009 at 08:35 PM (#3412187)
Sid,

The author here; Sal Baxamusa alerted me to this discussion.

I did know about the Miller proof (in fact, I mentioned it in the article!). I knew about one of the THT articles -- thanks for the links to the others, plus the Woolner article.

Certainly Weibull is just a phenomenological fit (it's continuous after all).

The problem with the Weibull, as used by Miller (other than the fact that it's continuous and not discrete) is that it becomes a one-parameter distribution after fitting -- it loses one parameter when choosing the binning (as you mentioned), and loses another when fitting to the data. The remaining parameter is the Pythagorean exponent, of course, but since it's a universal exponent, it doesn't allow for variation of distribution shapes.

I suppose you could use Weibull distributions with different gamma parameters for each team; it would be interesting to see if that gave results similar to mine.

As Mike said, if you want to go to a discrete distribution, Poisson won't work. As Russ said, with Poisson the shape is not independent of the mean. Another way of looking at it is that Poisson assumes events are independent, but runs
are not independent events in baseball since your probability of scoring another run depends on how you score the last one (i.e., whether you had men still on base). Interestingly, I've found hockey scoring to be well-described by Poisson, but that's because goals ARE a good approximation to independent events.

I doubt if any standard discrete distribution accurately reproduces run scoring distributions, which is why I went the modeling route. As depletion mentioned, you need a way to resolve ties, too. The basis for RPG distribution is the RPI (runs per inning) distribution, which also allows you to determine who wins in extra innings.
12.  Posted: December 14, 2009 at 08:51 PM (#3412200)
Thanks for showing up, Kerry.

runs are not independent events in baseball since your probability of scoring another run depends on how you score the last one (i.e., whether you had men still on base).

I've had this discussion re: Markov chain analyses of baseball, where future and past states are also not independent of the present state but where the Markov chain appears to be a reasonable model nonetheless - does this caveat make sense in that context as well?

-- MWE
13. The importance of being Ernest Riles Posted: December 14, 2009 at 08:53 PM (#3412208)
Kerry/11: I did it once with different gamma parameters on a per-team basis. The interpretation of that is the varying gammas reflect different run environments for each team.
14. KerryW Posted: December 14, 2009 at 09:15 PM (#3412279)
Mike,

I agree, future and past states are also not completely independent (violating the Markov chain hypothesis), but it probably doesn't make that big a difference.

The fact that the W/L parameters derived from the Markov chain analysis also fit the real data fairly well was encouraging, and suggest that the lack of independence is minimal, or at least not a problem.
15. The importance of being Ernest Riles Posted: December 14, 2009 at 09:30 PM (#3412312)
If two teams have the same value in runs (using whatever metric you prefer), and one team has a SLG that is .080 higher, then that team should expect to win about one more game a season, even though it has the same RPG as the other team.

As it turns out the difference between the best and worst AL teams in SLG was .081 last year. So we're talking about a 1 win difference here, on the most extremes.
16. KerryW Posted: December 14, 2009 at 09:49 PM (#3412368)
Sid,

While preparing my article I looked at using Weibulls with different gammas to see if a modified Pythagorean expectation followed from it, but the math appeared to become intractable. Of course, if you didn't want a closed form result that wouldn't matter.

I think the alpha is the main driver of the run environment (i.e., RPG), so for the Weibull to model the run distribution effectively, I would expect the gamma to mainly affect the shape (i.e., standard deviation) -- although of course both mean and variance are functions of both alpha and gamma.

The problem with any continuous distribution is that it doesn't handle ties, which is why I like starting with a RPI distribution, which can handle ties and which also allows you to calculate a RPG distribution.
17. KerryW Posted: December 14, 2009 at 10:03 PM (#3412394)
Sid/15,

Yes, but you also have the pitching side, which can give another win. And if you construct your team using these principles, you might be able to accentuate the differences some.

But, agreed, these are not huge effects. The problem is you have to do it for every player, and the gain per player is small. OTOH, if you could do something to add two wins a year, why not do it?

I like the basic result, which is that a SLG .080 higher for a player is worth one additional run. So if you have one player that rates out to a WARP or WAR of 4.3 and another to 4.8, if the first player had a SLG .080 higher, you would actually rate him at 5.3 compared to the other player. Or with a SLG .160 higher, it would be 6.3. That's not insignificant.

Fielding is a large part of a WARP rating, so lately I've been wondering if there is a similar effect in fielding, i.e., do some fielding plays that have the same runs allowed actually lead to different shapes of the runs alllowed distribution, which would affect wins? That will be a much harder nut to crack.
18. GuyM Posted: December 14, 2009 at 10:17 PM (#3412435)
I like the basic result, which is that a SLG .080 higher for a player is worth one additional run. So if you have one player that rates out to a WARP or WAR of 4.3 and another to 4.8, if the first player had a SLG .080 higher, you would actually rate him at 5.3 compared to the other player.

No, one extra run would change the 4.3 player to 4.4, a trivial change. (It takes 10 runs to generate one win.)
19. The importance of being Ernest Riles Posted: December 14, 2009 at 10:23 PM (#3412448)
16: Yes, I agree that starting from an RPI is the more correct, if not more tractable, way to handle things.
20. KerryW Posted: December 14, 2009 at 10:39 PM (#3412479)
GuyM,

Oops, I meant RAR, not WARP. Duh. Anyway, doing it for the whole team (both offense and defense) might be worth 2 wins.

Although the differences are small for one player, it can be used as a tiebreaker, i.e., for two players with the same conventional rating, choose the one with the higher SLG.
21. The importance of being Ernest Riles Posted: December 15, 2009 at 02:27 PM (#3413132)
20: I suspect that the greater utility for this will be descriptive and no proscriptive, since it is next to impossible to build a team with the degree of certainty necessary to utilize these results. And even descriptively, the noise involved would drown out almost all of the signal. I'm not trying to knock your work - I rather like it - but I do want to point out that the subtleties of run distribution, even when you can tease them out theoretically, are almost always drowned out by random variation in the experiment.
22. Danny Posted: December 15, 2009 at 03:20 PM (#3413201)
There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation.

Sorry if this was covered already, but is consistency always good? Wouldn't a bad team (one that averages fewer RS than RA) win more games with an inconsistent offense?
23. KerryW Posted: December 15, 2009 at 03:27 PM (#3413208)
Sid/21,

I agree they are not large effects, and noise makes it less useful. The base-running aspect is definitely too small to be useful!

And the effect varies with the run environment. In 1968, it would have taken only a .050 higher team SLG to add a win -- of course SLG separation would probably be harder to achieve in a lower run environment.
24. KerryW Posted: December 15, 2009 at 03:52 PM (#3413261)
Danny,

That's an interesting question. There are some subtleties I didn't get into in the article (there was a length limit on the original, and I didn't rewrite it for the web):

Although I used standard deviation as a measure of the shape (how wide it is), you can also have a skew as well (more probability above or below the average), in which case the distribution is not symmetric about the mean.

If you had a skew with more probability above the mean, then a wider distribution (more inconsistent) WOULD be helpful, no matter what your RPG. However, realistic baseball distributions always have more probability below the mean, and in that case a wider distribution is bad.

A neat (non-realistic) example showing these effects is to consider three teams, A who scores 4 runs two-thirds of the time and 7 runs the other third (call it a 447 distribution), B with a 663 distribution, and C who always scores 5 runs (call it 555). All of these have an average RPG of 5.0.

A is sort of shaped like a baseball team (more probability below the mean), C is perfectly consistent, and B is unrealistic (more probability above the mean).

It turns out that C beats A (6 games out of 9), A beats B (5 games out of 9) and B beats C (6 games out of 9)! For perfectly general distributions, which team is better is non-transitive.

In real life, the skew correlates very closely with the standard deviation, always has more probability below the mean, and doesn't really add anything to the analysis. The team closest to perfect consistency (like team C) will do better for the same RPG.

You must be Registered and Logged In to post comments.

<< Back to main

### News

Old-School Newsstand

### Support BBTF

Thanks to
JPWF13
for his generous support.

### Bookmarks

You must be logged in to view your Bookmarks.

### Hot Topics

Sox TherapyAre The Angels A Real Team?
(28 - 11:56am, Apr 26)
Last: jmurph

NewsblogThat's my secret, Captain. I'm always OMNICHATTER, for April 26, 2018
(6 - 11:55am, Apr 26)
Last: Panik on the streets of London (Trout! Trout!)

NewsblogKyle Schwarber hits 2 homers in Cubs' win
(59 - 11:55am, Apr 26)
Last: What did Billy Ripken have against ElRoy Face?

NewsblogPujols' Age Revisted
(72 - 11:54am, Apr 26)
Last: snapper (history's 42nd greatest monster)

NewsblogOT - Catch-All Pop Culture Extravaganza (April - June 2018)
(443 - 11:54am, Apr 26)
Last: Greg K

NewsblogOT - 2017-18 NBA thread (All-Star Weekend to End of Time edition)
(2780 - 11:54am, Apr 26)
Last: jmurph

NewsblogRaissman: Mike Francesa returning to WFAN in the 3 pm - 7 pm time slot, sources tell News
(81 - 11:53am, Apr 26)
Last: snapper (history's 42nd greatest monster)

NewsblogPrimer Dugout (and link of the day) 4-26-2018
(12 - 11:52am, Apr 26)
Last: Man o' Schwar

NewsblogOTP 2018 Apr 23: The Dominant-Sport Theory of American Politics
(909 - 11:48am, Apr 26)
Last: Stormy JE

NewsblogBrewers first baseman Eric Thames goes on DL with torn thumb ligament
(10 - 11:43am, Apr 26)
Last: Nasty Nate

NewsblogTampa Bay Rays promote LHP Jonny Venters
(4 - 11:17am, Apr 26)
Last: What did Billy Ripken have against ElRoy Face?

NewsblogBrandon Belt sets MLB record, sees 21 pitches in AB before lining out
(38 - 11:00am, Apr 26)
Last: Baldrick

NewsblogThe Greatest Season That Never Was
(6 - 10:56am, Apr 26)
Last: Mefisto

Gonfalon CubsRiding the Rails of Mediocrity
(29 - 10:14am, Apr 26)
Last: Moses Taylor, aka Hambone Fakenameington

Hall of MeritMost Meritorious Player: 1942 Ballot
(5 - 10:07am, Apr 26)
Last: DL from MN

Page rendered in 0.2418 seconds
47 querie(s) executed