## Wednesday, October 07, 2009

#### Ellenberg: The total variation of win probability, or: THE MAGNIFICENT TWINS-TIGERS GAME

I’ll let Jordan explain…“What’s discussed there is a back-of-the-envelope measure of “excitingness"in which you compute the total variation in win probability (as measured by FanGraphs) over the course of a game. So a game where this probabilityswings back and forth a lot, as with last night’s Tigers-Twins game, gets ahigh score.”

The total variation in win probability over the course of a game is a good way of quantifying how much back-and-forth there’s been between the two teams.  You might take it as a loose measure of “excitingness.”

In this game, the Twins have gone from an 80% chance of winning to 20% to 73% to 20%, again up to 83% and then back down to 50%.  That’s a total variation of at least 2.62, all since the 6th inning!

I wonder what the all-time record for total variation in a single game is?  It would have to be a game with multiple extra innings in which runs scored, I’d think.

And we go to the bottom of the 11th, still tied 5-5.  Minnesota with a 64% win probability per FanGraphs.  Joe Mauer coming up third this inning.  Now that my own team is done playing for the year, I am allowed to say:  go Twins.

Update:  In the comments, Michael Lugo proves by science that the Tigers-Twins playoff (total variation:  7.69) was more exciting than game 7 of the 1991 World Series, but less exciting than this.

Tags: sabermetrics, tigers, twins

1. bobm Posted: October 07, 2009 at 04:29 PM (#3343750)
I think it's ironic that "excitingness" can be reduced to a quantitative comparison of the total variation in a given game's win probability summed across all base/out/inning states.
2. sunnyday2 Posted: October 07, 2009 at 04:34 PM (#3343752)
think it's ironic that "excitingness" can be reduced to a quantitative comparison of the total variation in a given game's win probability summed across all base/out/inning states.

Ironic or not, I wonder if "can be reduced" is correct. I'd say that this is a hypotheses, i.e. that "excitingness" = this formula. To test the hypothesis I think you'd have to define "excitingness," which to me inherently refers to the amount of excitement experienced by those who watch the game. So you'd need some participant feedback, etc. etc., no? And then correlate the two?
3. GGC don't think it can get longer than a novella Posted: October 07, 2009 at 04:48 PM (#3343767)
I think that the Win Probability graphs are cool (and Brian Burke does them for football), but I don't really think you can come up with a number to measure excitement. Like Dan says, you have to put the game into context.
4. GGC don't think it can get longer than a novella Posted: October 07, 2009 at 04:52 PM (#3343775)
I think that the Win Probability graphs are cool (and Brian Burke does them for football), but I don't really think you can come up with a number to measure excitement. Like Dan says, you have to put the game into context.
5. GGC don't think it can get longer than a novella Posted: October 07, 2009 at 04:55 PM (#3343776)
I think that the Win Probability graphs are cool (and Brian Burke does them for football), but I don't really think you can come up with a number to measure excitement. Like Dan says, you have to put the game into context.
6. Guapo Posted: October 07, 2009 at 05:00 PM (#3343784)
The most exciting post on BTF is the triple post.
7. Tango Posted: October 07, 2009 at 05:12 PM (#3343795)
We've discussed this at Fangraphs in the past few years already, as well as on my blog.

The "excitedness" being measured here is within the context of the game, with no playoff or World Series implication. I don't like the adjective being used, because it implies the playoff implications that the metric being used does not use.

If you come up with the list of those games with either the highest average LI, or absolute sum of delta WE (i.e. WPA), you will definitely hit upon a huge share of the "top" games. You've got some 100,000 games in the Retrosheet event files. This is certainly the fastest way to come up with the "top" games in baseball (within the constraints noted).

Others have also done the same, but look at the win probability based on winning the World Series. Invariably, it's game 6 and game 7s that would be near the top.
8. GGC don't think it can get longer than a novella Posted: October 07, 2009 at 05:27 PM (#3343813)
Is Dag Nabbit around? He read John Thorn's Ten Greatest Games book. It was purely subjective, but I believe the only regular season game with no pennant implications that he included was Harvey Haddix's game. Lee Sinins may disagree, but I find potoential no-nos and perfect games exciting. The first Clemens 20K game may've been one of the most exciting ones I watched, too.
9. Benji Gil Gamesh Rises Posted: October 07, 2009 at 05:29 PM (#3343818)
I think it's ironic that "excitingness" can be reduced to a quantitative comparison of the total variation in a given game's win probability summed across all base/out/inning states.

Stat zombie!
10. Matt Clement of Alexandria Posted: October 07, 2009 at 05:43 PM (#3343845)
A game, obviously, can be extremely exciting without featuring lots of lead changes. If the game stays close to 50-50 most of the time, any slight blip starts to look like an opportunity.

Further, a game can have a lead shift back and forth wildly due to mediocre play, which would for me lessen the enjoyment of the tension. I mean, I remember watching a DIII college basketball game (Haverford vs. Swarthmore, 2000), triple overtime with crazy shifts in momentum, including the moment at the end of the first overtime when Haverford's star forward stole a pass at the free throw line extended and raced down the court, only to brick what would have been the winning layup. It was exciting, no doubt (Swat lost!), but it's hardly something I'd compare to a great game played by top athletes.

I mean, it hardly needs to be said that a particular quantitative attempt to measure "excitement" fails to adequately account for lived experience of excitement in various ways, but I'll say it anyway.

Certainly, it's fun to look at which games have seen the largest shifts in win expectancy, and I don't want to say no one should research this or write about it, but, you know, human experience is not going to be usefully quantifiable in this regard. A little separation between the two, please.
11. escabeche Posted: October 07, 2009 at 05:44 PM (#3343851)
Tango, I (JE) assumed FanGraphs would have already had something on this, but couldn't find it -- could you post a link to the discussion there and/or at your blog? I'd love to see these top games.

By the way, I don't really think "excitingness" can be reduced to this measure; I'm just curious about the extent to which games that score high on this measure really do tend to be those we think of as "exciting." As was pointed out in the comments at my place, there's a large degree of quantity effect -- a 16-inning game, even one that you might experience as kind of boring, is almost inevitably going to get a high score.

Point taken re context: but would it be totally crazy to say that, if this Tigers-Twins game had been played in August in a season where both teams were 20 games under .500, it would have been a tremendously exciting game that people failed to be excited by?
12. Esoteric throws a 'hard slider' Posted: October 07, 2009 at 05:55 PM (#3343869)
The most exciting post on BTF is the triple post.
Hilarious.
13. GGC don't think it can get longer than a novella Posted: October 07, 2009 at 05:58 PM (#3343873)
Here are the most exciting NFL games of the decade, using Brian Burke's methods.
14. RoyalsRetro (AG#1F) Posted: October 07, 2009 at 06:00 PM (#3343879)
I think it's ironic that "excitingness" can be reduced to a quantitative comparison of the total variation in a given game's win probability summed across all base/out/inning states.

I think this thread is why people think stats take all the fun out of baseball.
15. Tango Posted: October 07, 2009 at 06:11 PM (#3343898)
Yes, the XI game has an effect, which is why you can do an average, rather than a sum, of all the absolute values of WPA.

I'll see if I can find that discussion. Note that Fangraphs has, for every game, at the top right of each scoreboard page, like here, the "aLI", which is the "average Leverage Index" (in this case it was 1.93, where 1.000 is the average), and aWE, which is "average Win Expectancy" (I don't know what the average is, but the closer to .500, the more the game was even).

So, those two metrics tell you that the leverage was high (typical ace-reliever situation for the entirety of the game, on average), and that the game was close.
16. Tango Posted: October 07, 2009 at 06:20 PM (#3343908)
Here are links to "most exciting games".

Dennis at HardballTimes.

Original Fangraphs discussion.

Then a followup also at Fangraphs.

Those are all good reads, and I think everyone here will get positive from at least one of them.
17. Buddha Posted: October 07, 2009 at 06:20 PM (#3343909)
Give me a break. The attempt to make everything into some mathematical formula get a little ridculous after awhile.
18. Tango Posted: October 07, 2009 at 06:30 PM (#3343922)
If you read the THT article, you will see that Dennis does a fantastic job to address the World Series issue. He factors all that in a simple way, and he ends up, since 1961, that the 2nd best game of that time period was the 1991 Twins/Braves game 7. Yanks/DBacks, 2001, was #3. #4 and #5 were the Reds/Redsox games 6,7 of 1975.

Say what you will, but those are the 4 games I would have guessed (and I think most of us would), and they are #2 through #5.
19. Nasty Nate Posted: October 07, 2009 at 06:32 PM (#3343924)
I read the HardballTimes link in 18, and after a short list of the worst playoff games (based on the article's criteria) it says:

If these conjure up many memories, you need to get out more.

Buuuut, one of the games was 1999 ALCS game 3. ...which was memorable. Maybe just for external 'storyline' factors but also you got to see the work of the planet's best pitcher handing the best team its only playoff loss.
20. Tango Posted: October 07, 2009 at 06:33 PM (#3343926)
Buddha: even if it is ridiculous, is it somehow wrong? Or unenlightening?

Do you like the customer reviews at Amazon or Ebay or Consumer Reports? Or the Fan polls at my site? What if, instead of trying to capture the feelings of people, you can do it without actually polling them? Well, Dennis at THT did just that. Ridiculous? Maybe. But, enlightening, nevertheless.
21. Marmaduke Ellington Posted: October 07, 2009 at 06:33 PM (#3343927)
A high-scoring seesaw game:
http://www.retrosheet.org/boxesetc/2000/B05050TEX2000.htm (followed a day later by http://www.retrosheet.org/boxesetc/2000/B05060TEX2000.htm !)

Some other games with big first innings:
http://www.retrosheet.org/boxesetc/2007/B07290HOU2007.htm (the only one of these that happened on the road - but note the big attempted comeback)

http://www.retrosheet.org/boxesetc/1989/B08030CIN1989.htm

http://www.retrosheet.org/boxesetc/1996/B07050OAK1996.htm (but the A's were down 3-0)

http://www.retrosheet.org/boxesetc/1995/B08210TEX1995.htm
22. Tango Posted: October 07, 2009 at 06:37 PM (#3343930)
Nate/21: the "excitement" factor was obviously one-sided there, no? I mean, it was exciting the first few innings, but once Pedro came in, all the excitement was on the Redsox side.

Regardless, you can always suggest other ways to make the metric better. If someone handed you 100,000 games, and asked you to come up with the 10 best games, wouldn't it be nice to have a metric like this that would whittle it down substantially for you?
23. Nasty Nate Posted: October 07, 2009 at 06:50 PM (#3343949)
oh yeah, I find this stuff interesting and a good way to find unnoticed/forgotten games that were exciting.

and that game in question does not hold a candle to the games that the metric picked as the most exciting. But the first of 3 Clemens v Pedro playoff clashes was not one of the least memorable playoff games of alltime, so I guess its a false negative.
24. Matt Clement of Alexandria Posted: October 07, 2009 at 06:55 PM (#3343955)
Regardless, you can always suggest other ways to make the metric better. If someone handed you 100,000 games, and asked you to come up with the 10 best games, wouldn't it be nice to have a metric like this that would whittle it down substantially for you?
Yes and no. This work is certainly fun, which is really all I ask of baseball research.

But, so long as it purports to measure irreducibly complex human experience, it is falsely and improperly defined. These metrics measure specific aspects of ballgames which are related to the experience of excitement, and can help us to find and remember exciting games. They do not measure excitement. They help us find games which were exciting in particular respects. Various metrics can be imagined which either measure individual features of games which contribute to excitement, or which merge various features that contribute to excitement, but that's very different from "measuring excitement".
25. Tango Posted: October 07, 2009 at 07:05 PM (#3343973)
But, so long as it purports to measure irreducibly complex human experience, it is falsely and improperly defined.

You can slap the writer who makes far-reaching conclusions, but that doesn't invalidate the impact of the research, including using the word "exciting".

If we want to be technical, then we simply say exactly what it does (measures the closeness of games after every half-inning). Some people have a problem if we usurp the words "top" or "best" or "exciting" or "dramatic" or "clutch" or whatever. Can't we just use the word, or do we have to invent a word like dramaticiness, so that we are not arguing about definitions?
26. Tom Nawrocki Posted: October 07, 2009 at 07:08 PM (#3343978)
If we can't quantify how exciting a game is, we should just go ahead and ignore any rankings of excitingness.
27. Matt Clement of Alexandria Posted: October 07, 2009 at 07:09 PM (#3343979)
If we want to be technical, then we simply say exactly what it does (measures the closeness of games after every half-inning). Some people have a problem if we usurp the words "top" or "best" or "exciting" or "dramatic" or "clutch" or whatever. Can't we just use the word, or do we have to invent a word like dramaticiness, so that we are not arguing about definitions?
Just say what it does - this research measures features of games which contribute to the experience of excitement. It's not as pretty as saying that it measures excitement, but I don't think it's particularly hard to understand, and it has the benefit of not being a category mistake. Talking about measuring "excitement" when the metric doesn't attempt to talk about human experience is clearly incorrect.

EDIT: again, I want to be clear that I enjoy the research and the article. I just think the article would benefit from greater precision and modesty in defining what it's doing and what its metrics are capable of doing.
28. Tango Posted: October 07, 2009 at 07:21 PM (#3343996)
I created a metric called "Leverage Index". Others who have created similar metrics have chosen the name "Pressure" (Doug Drinen), "Relative Importance" (Phil Birnbaum), "Stress" (Pete Palmer).

Now, yes, my name happens to be the one that is the most correct in terms of what it's doing, and therefore, I sidestep the argument that Doug, Phil, and Pete have to endure. But, I don't like that as a reason for creating a name.

Clearly "Runs Created" is a misnomer. "Estimated Runs Contributed Towards" would be better. And "Runs Produced" should be renamed "Runs Participated In". And so on and so forth.

You are correct that the author has an obligation to use the word in the correct context, and just because he might call something the "Drama Index" does not mean that it's the "most dramatic". But, we have to have the freedom at least to create a short-hand term, even if it's not exactly what it does. Again, as long as the author doesn't reach farther than what the metric actually does.
29. Tulo's Fishy Mullet (mrams) Posted: October 07, 2009 at 07:31 PM (#3344023)
The first Clemens 20K game may've been one of the most exciting ones I watched, too.

But for Dewey's 3 run HR, I wonder how that game would be remembered. "Clemens Fans 20, Fails to Fan Gorman a 2nd Time"
30.  Posted: October 07, 2009 at 07:47 PM (#3344057)
I think Dennis's approach at THT mostly makes sense for such an exercise, though I think his weighting scheme for "in series context" creates some odd results. It's just plain odd, if not wrong, to think that a 1-0 game is more "exciting", even when it's a Game 7 winner-take-all scenario, than a game like 1975 WS G6. Yes, he shows the values with and w/o weighting, which to my mind makes the discrepancy that much more apparent.

I think it's a conceptual problem for such a method if the closeness of the WPA or the number of lead changes take on disproportionate importance in the calculation. The amount of <u>actual WPA shift across the game</u> is probably more analogous to what people following the game in the stands or through live transmission actually experience. How to characterize the difference between "tension" (as in 1-0 game) vs. "see-saw" (as in 1976 WS G6).

Also--can't tell if the system awards anything or otherwise takes into account the walk-off phenomenon. It looks like it might, but I can't find a way to measure it from the examples given.

Finally--why is WS Game 5 equally exciting in both 3-1 and 2-2 modes? Why isn't WS Game 6 weighted more heavily than it is due to the chance of that game being the final game? And is there more excitement in the result of such a Game like 1975 WS G6, which forces a final game? That is another piece of the context, even if it is an ex-post facto component.
31. sunnyday2 Posted: October 07, 2009 at 07:54 PM (#3344067)
Don't get me wrong, I think the TVWP is a great concept. I just question whether it measures "excitingness" or not. But, if not, it measures something that is interesting by itself. I think it is good work.
32. Tango Posted: October 07, 2009 at 08:28 PM (#3344152)
Right, that's the point. There's no reason to get hung up on "exciting" or "value" because the reader himself has decided that those words cannot be redefined within the scope of the article. I always joke that I'll just use the word "quatlu" when I need to create a word to stay away from the dictionary police.
33. Tango Posted: October 07, 2009 at 08:33 PM (#3344160)
Don, I didn't notice the 3-1 / 2-2 equal weighting. In one case, it can go 4-1 or 3-2, and in the other, it's 3-2 or 2-3. I dunno... is the spread between the two that much different?

Your point however might be more accurate depending on who is leading the game in a 3-1 series. But even then, will it make much difference in terms of swing changes? I'd have to think about it.
34. sardonic Posted: October 07, 2009 at 09:32 PM (#3344244)
I think another interesting measure would be "tenseness," where win probability is closest to 50% in aggregate. Of course, this could also be a measure of how boring a game is, but I think for a game that is important in context (WS game 7, etc), this would be an interesting measure of how compelling a game was to watch.
35.  Posted: October 07, 2009 at 09:41 PM (#3344247)
Tango, they're not quite equivalent in "backstory" or "closure." WS G4's should also be measured differently depending on if it's 3-0 or 2-1. Those should be weighted on the probabilistic outcome of the series from the its specific interior states, i.e. how many teams with a 3-0 lead win? 99.9%, so G4 with 3-0 is really low (though, by WPA/LI measures the game could be exciting as all get-out...), but 2-1 is more like 60-40, so its weighting should be around 40%.

Again, this stuff just points out how we're using a "common sense" overlay on the in-game WPA calculations. I think the other questions raised in 32/ are actually more significant than this one. I don't know if average LI even captures it perfectly: the '78 playoff game between Yanks and Red Sox was a win-or-go-home moment of epic proportions, and it had a lot of see-saw, but it doesn't show up with a really high LI, and it's nowhere to be found on Dennis's list. I'd like to see how his method quantified it.

It just seems as though there are three measures for this that need to be blended somehow, and I don't get the impression that this is quite the way it is in Dennis's system. And that's <u>before</u> we get to the post-season overlay.
36. GGC don't think it can get longer than a novella Posted: October 08, 2009 at 02:26 AM (#3344657)
I looked and there was one other regular season game in Thorn's book. There was a double no-hitter between Hippo Vaughn and Fred Toney through nine in a Cubs Reds game in 1917. The Reds got a couple of hits in the top of the tenth to win.
37. Tango Posted: October 08, 2009 at 01:43 PM (#3344838)
Don, let's call what you are talking "pivotaliness" (since that's a made up word, I get to define it).

You are suggesting, I presume, that if you are up 3-0, that the pivotaliness is different than if you are up 2-1.

If you are 3-0, then going to 4-0 means 100% win and going to 3-1 means 87.5% chance of winning the series. That's a 12.5% swing.

If you are 2-1, then going up 3-1 means 87.5%, and losing means 50.0%. That's a 37.5% swing.

So, you are correct that the pivotaliness of a 2-1 game is different than the 3-0 game.
38.  Posted: October 08, 2009 at 01:56 PM (#3344855)
Don't all "most exciting" games in baseball history have to involve the Brooklyn Dodgers?
39. GGC don't think it can get longer than a novella Posted: October 08, 2009 at 02:06 PM (#3344859)
Dodgericity is a factor, Bob.
40.  Posted: October 08, 2009 at 02:20 PM (#3344868)
That was a great game and all that, one that had me glued to the set, but put either of those teams in the East or the West and they'd be lucky to finish at .500.

It's games like this---great as it was, and yay, Twins---that made me get down on my knees and thank Bud Selig for giving us the wild card, so at least we don't have one of these two teams crowding out a team that finished nine games above them. To put this game on the same planet with 1951 or 1978 is just nuts, and you shouldn't need a mathematical formula to understand that.
41. sunnyday2 Posted: October 08, 2009 at 02:26 PM (#3344879)
To put this game on the same planet with 1951 or 1978 is just nuts,

OTOH if you put it in the same time zone, it WOULD be the greatest game of all time.
42. GGC don't think it can get longer than a novella Posted: October 08, 2009 at 02:42 PM (#3344903)
Marc, I thought the start time was a blessing. I was able to watch (and listen to on the radio while driving home from work and later making a quick trip to the supermarket) 12 innings and I was done by 10 PM.
43. SoSH U at work Posted: October 08, 2009 at 02:50 PM (#3344916)
Marc, I thought the start time was a blessing. I was able to watch (and listen to on the radio while driving home from work and later making a quick trip to the supermarket) 12 innings and I was done by 10 PM.

I think sunny meant that if this involved two East Coast teams, Andy's opinion would be different.
44.  Posted: October 08, 2009 at 02:50 PM (#3344918)
at least we don't have one of these two teams crowding out a team that finished nine games above them

Obligatory gripe about how the Rangers finished one game ahead of both of them. End gripe.
45. kthejoker Posted: October 08, 2009 at 02:58 PM (#3344925)
To me, a double no-hitter suffers from the laws of diminishing returns. I can't imagine going to a ballgame and seeing absolutely *nobody* get a hit for 9 full innings and having *that* exciting a time.
46. GGC don't think it can get longer than a novella Posted: October 08, 2009 at 03:13 PM (#3344936)
To me, a double no-hitter suffers from the laws of diminishing returns. I can't imagine going to a ballgame and seeing absolutely *nobody* get a hit for 9 full innings and having *that* exciting a time.

Yeah, I think there is a difference between a great game and an exciting game. FWIW, going off off memory, here were Thorn's Top Ten games. He wrote this in 1981:

1.) Some 1908 game towards the end of the AL pennant race that ended in a 9-9 tie. I think it was the Tigers and A's.
2.) The aforementioned 1917 double no-no.
3.) Game 7 of the 1924 World Series. The Big Train comes on in relief and finally gets his ring.
4.) 1929 WS game where Cub's pitching melts down (or the A's bats really come alive....
47. GGC don't think it can get longer than a novella Posted: October 08, 2009 at 03:20 PM (#3344942)
... 5.) Shot Heard Round The World
6.) Larsen's Perfecto
8.) Game 7 1960
9.) Game 6 1975
10.) The Bucky Dent game.
48. GGC don't think it can get longer than a novella Posted: October 19, 2009 at 12:05 AM (#3357206)
Last nite's game was supposed to be more exciting than the play-in game. As I said at Fangraphs, I disagree. However, I may be confusing excitement with greatness. In any case, Ellenberg sparked what could be an interesting discussion.
49. Howie Menckel Posted: October 19, 2009 at 12:47 AM (#3357271)
Wait, that link says the Jets-Bills game I just watched was the most exciting game of the decade?

The late-game and OT was a bit wacky, but I hope there have been better games out there.

Also, that chart has the Bills playing in the top 3 dramas of the decade.

???
50. GGC don't think it can get longer than a novella Posted: October 19, 2009 at 01:52 PM (#3357603)
Wow, that Jets-Bills game blew away the competition!
51. Tulo's Fishy Mullet (mrams) Posted: October 19, 2009 at 02:14 PM (#3357621)
Here are the most exciting NFL games of the decade, using Brian Burke's methods.

on top of the Jets-Bills excitorama, the Rams v Jaguars game from yesterday was on the list.

I know I hate NFL and generally ignore it, but is anybody going to be talking about that game today?

You must be Registered and Logged In to post comments.

<< Back to main

