 
Baseball Primer Newsblog — The Best News Links from the Baseball Newsstand Monday, December 14, 2009Whisnant: Beyond Pythagorean Expectation: How Run Distributions Affect Win PercentageDirect from the 2010 MIT Sloan Sports Analytics Conference (AKA  Dorkapalooza) comes…

Login to submit news.
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsSox Therapy: Are The Angels A Real Team?
(28  11:56am, Apr 26) Last: jmurph Newsblog: That's my secret, Captain. I'm always OMNICHATTER, for April 26, 2018 (6  11:55am, Apr 26) Last: Panik on the streets of London (Trout! Trout!) Newsblog: Kyle Schwarber hits 2 homers in Cubs' win (59  11:55am, Apr 26) Last: What did Billy Ripken have against ElRoy Face? Newsblog: Pujols' Age Revisted (72  11:54am, Apr 26) Last: snapper (history's 42nd greatest monster) Newsblog: OT  CatchAll Pop Culture Extravaganza (April  June 2018) (443  11:54am, Apr 26) Last: Greg K Newsblog: OT  201718 NBA thread (AllStar Weekend to End of Time edition) (2780  11:54am, Apr 26) Last: jmurph Newsblog: Raissman: Mike Francesa returning to WFAN in the 3 pm  7 pm time slot, sources tell News (81  11:53am, Apr 26) Last: snapper (history's 42nd greatest monster) Newsblog: Primer Dugout (and link of the day) 4262018 (12  11:52am, Apr 26) Last: Man o' Schwar Newsblog: OTP 2018 Apr 23: The DominantSport Theory of American Politics (909  11:48am, Apr 26) Last: Stormy JE Newsblog: Brewers first baseman Eric Thames goes on DL with torn thumb ligament (10  11:43am, Apr 26) Last: Nasty Nate Newsblog: Tampa Bay Rays promote LHP Jonny Venters (4  11:17am, Apr 26) Last: What did Billy Ripken have against ElRoy Face? Newsblog: Brandon Belt sets MLB record, sees 21 pitches in AB before lining out (38  11:00am, Apr 26) Last: Baldrick Newsblog: The Greatest Season That Never Was (6  10:56am, Apr 26) Last: Mefisto Gonfalon Cubs: Riding the Rails of Mediocrity (29  10:14am, Apr 26) Last: Moses Taylor, aka Hambone Fakenameington Hall of Merit: Most Meritorious Player: 1942 Ballot (5  10:07am, Apr 26) Last: DL from MN 

Page rendered in 0.2418 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Jack Keefe Posted: December 14, 2009 at 03:33 PM (#3411750)The Weibull distribution has been found to model both run distribution and the Pythagorean formula very well. Here's the theoretical paper by Steven Miller:
http://arxiv.org/PS_cache/math/pdf/0509/0509698v4.pdf
THT did some work with the Weibull distribution and its relationship the Pythagorean record.
http://www.hardballtimes.com/main/article/feastorfaminefirstdraft/
http://www.hardballtimes.com/main/article/avoidingthefamine/
http://www.hardballtimes.com/main/article/consistencyiskey/
http://www.hardballtimes.com/main/article/consistencyiskeyparttwo/
http://www.hardballtimes.com/main/article/consistencyisinconsistent/
Keith Woolner looked at this a while ago, too:
http://www.baseballprospectus.com/article.php?articleid=472
Lucky that this guy wasn't in high school during World War One.
I've noted this in some analyses I've done in the past, which is why I've argued that statistical analysts tend to overvalue OBP and undervalue SLG. The 1992 Milwaukee Brewers, a fairly classic smallball team (and one of my favorite study topics), had a very inefficient offense, and although they won 92 games they underperformed their Pyth by 4 games.
Poisson distributions don't model actual scoring distributions quite as well. Ted Turocy did some research on this several years back, which if I can find it I'll post.
 MWE
It's much more flexible than the Poisson (or even an overdispersed Poisson) and it is often used in competing risks failure time modeling. Winning vs. losing is really competing risks... you win if you score more runs than the other guy, you lose if you don't.
The reason why Weibull is used is that it is a bit more flexible AND because it is more analytically tractable. There is a close relationship between the exponential/weibull which is similar to the relationship between the geometric/negative binomial distributions. See, for example, http://www.objectivedoe.com/student/ReliabilityResources/weibull1.html .
The negative binomial can be a bit more annoying to work with and there is not much gain to switching to it for RS/RA (given the historical data about how close it fits), so I would guess that is why the Weibull is often preferred.
Poisson don't allow for heterogeneity of mean/variance (i.e. the fact that the mean number of runs for each game will be different  because you're playing a different team), I wouldn't be surprised if negative binomials would do the trick, as mentioned above.
I would guess that blowout are diffcult to handle in a model because one manager often gives up and leaves a bad pitcher in the game. Although blowouts aren't what they used to be. A 6 run lead used to be a blowout.
The 3parameter Weibull is nice because of its flexibility. Beta allows you to recenter for binning, and so it's really a two parameter model. This minimizes the dof when doing fitting. The neat thing is that the gamma parameter *is* the pythagorean exponent, which makes me wonder if there is an underlying reason that the Weibull is a good fit or whether it is just phenomonenological. I like Russ' pseudoexplanation about competing risks. I wonder if there's something there to be more fully fleshed out.
The author here; Sal Baxamusa alerted me to this discussion.
I did know about the Miller proof (in fact, I mentioned it in the article!). I knew about one of the THT articles  thanks for the links to the others, plus the Woolner article.
Certainly Weibull is just a phenomenological fit (it's continuous after all).
The problem with the Weibull, as used by Miller (other than the fact that it's continuous and not discrete) is that it becomes a oneparameter distribution after fitting  it loses one parameter when choosing the binning (as you mentioned), and loses another when fitting to the data. The remaining parameter is the Pythagorean exponent, of course, but since it's a universal exponent, it doesn't allow for variation of distribution shapes.
I suppose you could use Weibull distributions with different gamma parameters for each team; it would be interesting to see if that gave results similar to mine.
As Mike said, if you want to go to a discrete distribution, Poisson won't work. As Russ said, with Poisson the shape is not independent of the mean. Another way of looking at it is that Poisson assumes events are independent, but runs
are not independent events in baseball since your probability of scoring another run depends on how you score the last one (i.e., whether you had men still on base). Interestingly, I've found hockey scoring to be welldescribed by Poisson, but that's because goals ARE a good approximation to independent events.
I doubt if any standard discrete distribution accurately reproduces run scoring distributions, which is why I went the modeling route. As depletion mentioned, you need a way to resolve ties, too. The basis for RPG distribution is the RPI (runs per inning) distribution, which also allows you to determine who wins in extra innings.
I've had this discussion re: Markov chain analyses of baseball, where future and past states are also not independent of the present state but where the Markov chain appears to be a reasonable model nonetheless  does this caveat make sense in that context as well?
 MWE
I agree, future and past states are also not completely independent (violating the Markov chain hypothesis), but it probably doesn't make that big a difference.
The fact that the W/L parameters derived from the Markov chain analysis also fit the real data fairly well was encouraging, and suggest that the lack of independence is minimal, or at least not a problem.
As it turns out the difference between the best and worst AL teams in SLG was .081 last year. So we're talking about a 1 win difference here, on the most extremes.
While preparing my article I looked at using Weibulls with different gammas to see if a modified Pythagorean expectation followed from it, but the math appeared to become intractable. Of course, if you didn't want a closed form result that wouldn't matter.
I think the alpha is the main driver of the run environment (i.e., RPG), so for the Weibull to model the run distribution effectively, I would expect the gamma to mainly affect the shape (i.e., standard deviation)  although of course both mean and variance are functions of both alpha and gamma.
The problem with any continuous distribution is that it doesn't handle ties, which is why I like starting with a RPI distribution, which can handle ties and which also allows you to calculate a RPG distribution.
Yes, but you also have the pitching side, which can give another win. And if you construct your team using these principles, you might be able to accentuate the differences some.
But, agreed, these are not huge effects. The problem is you have to do it for every player, and the gain per player is small. OTOH, if you could do something to add two wins a year, why not do it?
I like the basic result, which is that a SLG .080 higher for a player is worth one additional run. So if you have one player that rates out to a WARP or WAR of 4.3 and another to 4.8, if the first player had a SLG .080 higher, you would actually rate him at 5.3 compared to the other player. Or with a SLG .160 higher, it would be 6.3. That's not insignificant.
Fielding is a large part of a WARP rating, so lately I've been wondering if there is a similar effect in fielding, i.e., do some fielding plays that have the same runs allowed actually lead to different shapes of the runs alllowed distribution, which would affect wins? That will be a much harder nut to crack.
No, one extra run would change the 4.3 player to 4.4, a trivial change. (It takes 10 runs to generate one win.)
Oops, I meant RAR, not WARP. Duh. Anyway, doing it for the whole team (both offense and defense) might be worth 2 wins.
Although the differences are small for one player, it can be used as a tiebreaker, i.e., for two players with the same conventional rating, choose the one with the higher SLG.
Sorry if this was covered already, but is consistency always good? Wouldn't a bad team (one that averages fewer RS than RA) win more games with an inconsistent offense?
I agree they are not large effects, and noise makes it less useful. The baserunning aspect is definitely too small to be useful!
And the effect varies with the run environment. In 1968, it would have taken only a .050 higher team SLG to add a win  of course SLG separation would probably be harder to achieve in a lower run environment.
That's an interesting question. There are some subtleties I didn't get into in the article (there was a length limit on the original, and I didn't rewrite it for the web):
Although I used standard deviation as a measure of the shape (how wide it is), you can also have a skew as well (more probability above or below the average), in which case the distribution is not symmetric about the mean.
If you had a skew with more probability above the mean, then a wider distribution (more inconsistent) WOULD be helpful, no matter what your RPG. However, realistic baseball distributions always have more probability below the mean, and in that case a wider distribution is bad.
A neat (nonrealistic) example showing these effects is to consider three teams, A who scores 4 runs twothirds of the time and 7 runs the other third (call it a 447 distribution), B with a 663 distribution, and C who always scores 5 runs (call it 555). All of these have an average RPG of 5.0.
A is sort of shaped like a baseball team (more probability below the mean), C is perfectly consistent, and B is unrealistic (more probability above the mean).
It turns out that C beats A (6 games out of 9), A beats B (5 games out of 9) and B beats C (6 games out of 9)! For perfectly general distributions, which team is better is nontransitive.
In real life, the skew correlates very closely with the standard deviation, always has more probability below the mean, and doesn't really add anything to the analysis. The team closest to perfect consistency (like team C) will do better for the same RPG.
You must be Registered and Logged In to post comments.
<< Back to main