Baseball for the Thinking Fan

You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

## Monday, December 14, 2009

#### Whisnant: Beyond Pythagorean Expectation: How Run Distributions Affect Win Percentage

Direct from the 2010 MIT Sloan Sports Analytics Conference (AKA - Dorkapalooza) comes…

The Pythagorean expectation formula, originally developed by Bill James, provides a reasonably good estimate of the win percentage of a baseball team using the number of runs scored and runs allowed by that team.  Improvements on the formula, such as the Pythagenport by Davenport and Woolner, and the Pythagenpat by Smyth and Patriot, allow for variation of the Pythagorean exponent and give a very good estimate for win percentages over a wide range of run environments. This article looks at possible improvements on the Pythagorean formula and its variants that take into account the shapes of the run distributions in addition to the run environment. There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation. This article also examines how metrics that use runs for evaluating teams and players might be adjusted to account for these effects.

Derivations of the Pythagorean expectation formula have been made under certain assumptions. Hein Hundal showed that if run distributions are independent log-normal distributions, then the Pythagorean exponent can be approximated by a particular function of the standard deviation and mean of a typical run distribution. Steven Miller showed that if the run distributions are Weibull distributions, then win percentages are given exactly by the Pythagorean formula, where the Pythagorean exponent is a parameter used to fit the run distributions to data. In both cases the Pythagorean exponent inferred from run distribution data is very close to the empirical value.

Repoz Posted: December 14, 2009 at 01:23 PM | 24 comment(s) Login to Bookmark
Tags: sabermetrics

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

1. Jack Keefe Posted: December 14, 2009 at 04:33 PM (#3411750)
Hey whats wrong with you score more runs than the other guys and you win the God Dam game Al.
2. Hang down your head, Tom Foley Posted: December 14, 2009 at 04:43 PM (#3411762)
Keefe knows how to win.
3. The importance of being Ernest Riles Posted: December 14, 2009 at 04:47 PM (#3411768)
There has been some previous work in the area of which I'm not sure the author is aware. Hopefully they surf over to BTF and find this comment.

The Weibull distribution has been found to model both run distribution and the Pythagorean formula very well. Here's the theoretical paper by Steven Miller:

http://arxiv.org/PS_cache/math/pdf/0509/0509698v4.pdf

THT did some work with the Weibull distribution and its relationship the Pythagorean record.

http://www.hardballtimes.com/main/article/feast-or-famine-first-draft/
http://www.hardballtimes.com/main/article/avoiding-the-famine/
http://www.hardballtimes.com/main/article/consistency-is-key/
http://www.hardballtimes.com/main/article/consistency-is-key-part-two/
http://www.hardballtimes.com/main/article/consistency-is-inconsistent/

Keith Woolner looked at this a while ago, too:

http://www.baseballprospectus.com/article.php?articleid=472
4.  Posted: December 14, 2009 at 04:59 PM (#3411781)
Hein Hundal

Lucky that this guy wasn't in high school during World War One.
5. depletion Posted: December 14, 2009 at 05:28 PM (#3411807)
I've often wondered why a continuous distribution, such as a Weibull, is used for modelling what are clearly events with discrete outcomes. The author, Miller, referenced in (3) above mentions this difficulty but doesn't address it. Why not use a Poisson distribution, or some other discrete distribution? The issue of ties and extra innings would still need to be addressed. Once the game goes into extra innings, a separate distribution could be used as the gap in runs for the game is expected to be much smaller than otherwise.
6.  Posted: December 14, 2009 at 05:43 PM (#3411825)
There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation.

I've noted this in some analyses I've done in the past, which is why I've argued that statistical analysts tend to overvalue OBP and undervalue SLG. The 1992 Milwaukee Brewers, a fairly classic small-ball team (and one of my favorite study topics), had a very inefficient offense, and although they won 92 games they underperformed their Pyth by 4 games.

Why not use a Poisson distribution, or some other discrete distribution?

Poisson distributions don't model actual scoring distributions quite as well. Ted Turocy did some research on this several years back, which if I can find it I'll post.

-- MWE
7. Russ Posted: December 14, 2009 at 05:43 PM (#3411826)
I've often wondered why a continuous distribution, such as a Weibull, is used for modelling what are clearly events with discrete outcomes.

It's much more flexible than the Poisson (or even an over-dispersed Poisson) and it is often used in competing risks failure time modeling. Winning vs. losing is really competing risks... you win if you score more runs than the other guy, you lose if you don't.

The reason why Weibull is used is that it is a bit more flexible AND because it is more analytically tractable. There is a close relationship between the exponential/weibull which is similar to the relationship between the geometric/negative binomial distributions. See, for example, http://www.objectivedoe.com/student/ReliabilityResources/weibull1.html .

The negative binomial can be a bit more annoying to work with and there is not much gain to switching to it for RS/RA (given the historical data about how close it fits), so I would guess that is why the Weibull is often preferred.
8. Russ Posted: December 14, 2009 at 05:45 PM (#3411830)
Poisson distributions don't model actual scoring distributions quite as well. Ted Turocy did some research on this several years back, which if I can find it I'll post.

Poisson don't allow for heterogeneity of mean/variance (i.e. the fact that the mean number of runs for each game will be different -- because you're playing a different team), I wouldn't be surprised if negative binomials would do the trick, as mentioned above.
9. depletion Posted: December 14, 2009 at 06:07 PM (#3411864)
Russ and Mike: Thanks for the response. I'm pretty good with math (electrical engineer) but don't often have the time to look into baseball math as much as I'd like. I agree that the Weibull is more flexible than the Poisson. I worked in reliability engineering a long time ago and do remember it from then.

I would guess that blowout are diffcult to handle in a model because one manager often gives up and leaves a bad pitcher in the game. Although blowouts aren't what they used to be. A 6 run lead used to be a blowout.
10. The importance of being Ernest Riles Posted: December 14, 2009 at 07:52 PM (#3412019)
Echo the thanks offered in 9. I'm also an engineer (chemical).

The 3-parameter Weibull is nice because of its flexibility. Beta allows you to re-center for binning, and so it's really a two parameter model. This minimizes the dof when doing fitting. The neat thing is that the gamma parameter *is* the pythagorean exponent, which makes me wonder if there is an underlying reason that the Weibull is a good fit or whether it is just phenomonenological. I like Russ' pseudo-explanation about competing risks. I wonder if there's something there to be more fully fleshed out.
11. KerryW Posted: December 14, 2009 at 09:35 PM (#3412187)
Sid,

The author here; Sal Baxamusa alerted me to this discussion.

I did know about the Miller proof (in fact, I mentioned it in the article!). I knew about one of the THT articles -- thanks for the links to the others, plus the Woolner article.

Certainly Weibull is just a phenomenological fit (it's continuous after all).

The problem with the Weibull, as used by Miller (other than the fact that it's continuous and not discrete) is that it becomes a one-parameter distribution after fitting -- it loses one parameter when choosing the binning (as you mentioned), and loses another when fitting to the data. The remaining parameter is the Pythagorean exponent, of course, but since it's a universal exponent, it doesn't allow for variation of distribution shapes.

I suppose you could use Weibull distributions with different gamma parameters for each team; it would be interesting to see if that gave results similar to mine.

As Mike said, if you want to go to a discrete distribution, Poisson won't work. As Russ said, with Poisson the shape is not independent of the mean. Another way of looking at it is that Poisson assumes events are independent, but runs
are not independent events in baseball since your probability of scoring another run depends on how you score the last one (i.e., whether you had men still on base). Interestingly, I've found hockey scoring to be well-described by Poisson, but that's because goals ARE a good approximation to independent events.

I doubt if any standard discrete distribution accurately reproduces run scoring distributions, which is why I went the modeling route. As depletion mentioned, you need a way to resolve ties, too. The basis for RPG distribution is the RPI (runs per inning) distribution, which also allows you to determine who wins in extra innings.
12.  Posted: December 14, 2009 at 09:51 PM (#3412200)
Thanks for showing up, Kerry.

runs are not independent events in baseball since your probability of scoring another run depends on how you score the last one (i.e., whether you had men still on base).

I've had this discussion re: Markov chain analyses of baseball, where future and past states are also not independent of the present state but where the Markov chain appears to be a reasonable model nonetheless - does this caveat make sense in that context as well?

-- MWE
13. The importance of being Ernest Riles Posted: December 14, 2009 at 09:53 PM (#3412208)
Kerry/11: I did it once with different gamma parameters on a per-team basis. The interpretation of that is the varying gammas reflect different run environments for each team.
14. KerryW Posted: December 14, 2009 at 10:15 PM (#3412279)
Mike,

I agree, future and past states are also not completely independent (violating the Markov chain hypothesis), but it probably doesn't make that big a difference.

The fact that the W/L parameters derived from the Markov chain analysis also fit the real data fairly well was encouraging, and suggest that the lack of independence is minimal, or at least not a problem.
15. The importance of being Ernest Riles Posted: December 14, 2009 at 10:30 PM (#3412312)
If two teams have the same value in runs (using whatever metric you prefer), and one team has a SLG that is .080 higher, then that team should expect to win about one more game a season, even though it has the same RPG as the other team.

As it turns out the difference between the best and worst AL teams in SLG was .081 last year. So we're talking about a 1 win difference here, on the most extremes.
16. KerryW Posted: December 14, 2009 at 10:49 PM (#3412368)
Sid,

While preparing my article I looked at using Weibulls with different gammas to see if a modified Pythagorean expectation followed from it, but the math appeared to become intractable. Of course, if you didn't want a closed form result that wouldn't matter.

I think the alpha is the main driver of the run environment (i.e., RPG), so for the Weibull to model the run distribution effectively, I would expect the gamma to mainly affect the shape (i.e., standard deviation) -- although of course both mean and variance are functions of both alpha and gamma.

The problem with any continuous distribution is that it doesn't handle ties, which is why I like starting with a RPI distribution, which can handle ties and which also allows you to calculate a RPG distribution.
17. KerryW Posted: December 14, 2009 at 11:03 PM (#3412394)
Sid/15,

Yes, but you also have the pitching side, which can give another win. And if you construct your team using these principles, you might be able to accentuate the differences some.

But, agreed, these are not huge effects. The problem is you have to do it for every player, and the gain per player is small. OTOH, if you could do something to add two wins a year, why not do it?

I like the basic result, which is that a SLG .080 higher for a player is worth one additional run. So if you have one player that rates out to a WARP or WAR of 4.3 and another to 4.8, if the first player had a SLG .080 higher, you would actually rate him at 5.3 compared to the other player. Or with a SLG .160 higher, it would be 6.3. That's not insignificant.

Fielding is a large part of a WARP rating, so lately I've been wondering if there is a similar effect in fielding, i.e., do some fielding plays that have the same runs allowed actually lead to different shapes of the runs alllowed distribution, which would affect wins? That will be a much harder nut to crack.
18. GuyM Posted: December 14, 2009 at 11:17 PM (#3412435)
I like the basic result, which is that a SLG .080 higher for a player is worth one additional run. So if you have one player that rates out to a WARP or WAR of 4.3 and another to 4.8, if the first player had a SLG .080 higher, you would actually rate him at 5.3 compared to the other player.

No, one extra run would change the 4.3 player to 4.4, a trivial change. (It takes 10 runs to generate one win.)
19. The importance of being Ernest Riles Posted: December 14, 2009 at 11:23 PM (#3412448)
16: Yes, I agree that starting from an RPI is the more correct, if not more tractable, way to handle things.
20. KerryW Posted: December 14, 2009 at 11:39 PM (#3412479)
GuyM,

Oops, I meant RAR, not WARP. Duh. Anyway, doing it for the whole team (both offense and defense) might be worth 2 wins.

Although the differences are small for one player, it can be used as a tiebreaker, i.e., for two players with the same conventional rating, choose the one with the higher SLG.
21. The importance of being Ernest Riles Posted: December 15, 2009 at 03:27 PM (#3413132)
20: I suspect that the greater utility for this will be descriptive and no proscriptive, since it is next to impossible to build a team with the degree of certainty necessary to utilize these results. And even descriptively, the noise involved would drown out almost all of the signal. I'm not trying to knock your work - I rather like it - but I do want to point out that the subtleties of run distribution, even when you can tease them out theoretically, are almost always drowned out by random variation in the experiment.
22. Danny Posted: December 15, 2009 at 04:20 PM (#3413201)
There is a clear pattern that teams which have a higher slugging percentage score runs more consistently, i.e., their run distributions have a smaller standard deviation, and they tend to win more games than their Pythagorean expectation.

Sorry if this was covered already, but is consistency always good? Wouldn't a bad team (one that averages fewer RS than RA) win more games with an inconsistent offense?
23. KerryW Posted: December 15, 2009 at 04:27 PM (#3413208)
Sid/21,

I agree they are not large effects, and noise makes it less useful. The base-running aspect is definitely too small to be useful!

And the effect varies with the run environment. In 1968, it would have taken only a .050 higher team SLG to add a win -- of course SLG separation would probably be harder to achieve in a lower run environment.
24. KerryW Posted: December 15, 2009 at 04:52 PM (#3413261)
Danny,

That's an interesting question. There are some subtleties I didn't get into in the article (there was a length limit on the original, and I didn't rewrite it for the web):

Although I used standard deviation as a measure of the shape (how wide it is), you can also have a skew as well (more probability above or below the average), in which case the distribution is not symmetric about the mean.

If you had a skew with more probability above the mean, then a wider distribution (more inconsistent) WOULD be helpful, no matter what your RPG. However, realistic baseball distributions always have more probability below the mean, and in that case a wider distribution is bad.

A neat (non-realistic) example showing these effects is to consider three teams, A who scores 4 runs two-thirds of the time and 7 runs the other third (call it a 447 distribution), B with a 663 distribution, and C who always scores 5 runs (call it 555). All of these have an average RPG of 5.0.

A is sort of shaped like a baseball team (more probability below the mean), C is perfectly consistent, and B is unrealistic (more probability above the mean).

It turns out that C beats A (6 games out of 9), A beats B (5 games out of 9) and B beats C (6 games out of 9)! For perfectly general distributions, which team is better is non-transitive.

In real life, the skew correlates very closely with the standard deviation, always has more probability below the mean, and doesn't really add anything to the analysis. The team closest to perfect consistency (like team C) will do better for the same RPG.

You must be Registered and Logged In to post comments.

<< Back to main

### Support BBTF

Thanks to
Marc Sully's not booin'. He's Youkin'.
for his generous support.

### Bookmarks

You must be logged in to view your Bookmarks.

### Hot Topics

NewsblogJosh Lueke Is A Rapist, You Say? Keep Saying It.
(243 - 2:11pm, Apr 24)
Last: CrosbyBird

NewsblogCalcaterra: Blogger Murray Chass attacks me for bad reporting, ignores quotes, evidence in doing so
(10 - 2:09pm, Apr 24)
Last: spike

NewsblogNY Times: The Upshot: Up Close on Baseball’s Borders
(4 - 2:07pm, Apr 24)
Last: esseff

NewsblogMichael Pineda ejected from Red Sox game after pine tar discovered on neck
(91 - 2:06pm, Apr 24)
Last: Sunday silence

NewsblogOMNICHATTER for 4-24-2014
(17 - 2:05pm, Apr 24)
Last: Rickey! In a van on 95 south...

NewsblogOTP April 2014: BurstNET Sued for Not Making Equipment Lease Payments
(2519 - 2:00pm, Apr 24)
Last: Rickey! In a van on 95 south...

NewsblogThe Five “Acts” of Ike Davis’s Career, and Why Trading Ike Was a Mistake
(64 - 1:47pm, Apr 24)
Last: Ray (RDP)

NewsblogDoyel: How was Gerrit Cole not suspended? He basically started the brawl
(39 - 1:45pm, Apr 24)
Last: ellsbury my heart at wounded knee

Newsblog4 balls, you’re out!
(59 - 1:44pm, Apr 24)
Last: Sunday silence

NewsblogMatt Williams: No problem with Harper's two-strike bunting
(18 - 1:41pm, Apr 24)
Last: Ron J2

NewsblogKeri: Slump City: Why Does the 2014 MLB Season Suddenly Feel Like 1968?
(37 - 1:38pm, Apr 24)
Last: The Clarence Thomas of BBTF (scott)

NewsblogColiseum Authority accuses Athletics of not paying rent
(19 - 1:34pm, Apr 24)
Last: RoyalsRetro (AG#1F)

NewsblogOT: NBA Monthly Thread - April 2014
(506 - 1:23pm, Apr 24)
Last: Jimmy P

NewsblogJonah Keri Extended Interview | Video | Late Night with Seth Meyers | NBC
(12 - 1:05pm, Apr 24)
Last: Greg K

NewsblogToronto Star: Blue Jays pave way for grass at the Rogers Centre
(8 - 12:49pm, Apr 24)
Last: Astroenteritis (tom)

 Demarini, Easton and TPX Baseball Bats

Page rendered in 0.5259 seconds
52 querie(s) executed