Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Friday, January 03, 2014

What big data can teach us about our favorite sports

thought the article was worth reading

Harveys Wallbangers Posted: January 03, 2014 at 09:08 AM | 17 comment(s) Login to Bookmark
  Tags: analytics, basketballsucks, bigdata, football, statistics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Mike Emeigh Posted: January 03, 2014 at 09:53 AM (#4628995)
I think the issue with some of the data scientists who look at sports is that rather than starting with a model and reviewing the data to see whether the data fits the model, they start with the data and try to fit a hypothesis to the patterns they see in the data.

Take this, for example:

Clauset and Merritt found another interesting pattern: While hockey and football teams tend to extend their leads, pro basketball squads play worse when they’re ahead. They’re not the first to notice this pattern. Jonah Berger, a professor at Wharton and the author of Contagious: Why Things Catch On, has argued that this phenomenon suggests losing teams are inspired to play harder. Berger tells me via email that we only see the pattern in basketball because “teams score frequently, and differences in motivation can easily impact whether a scoring event occurs. In hockey, and even football to some extent, scoring occurs less frequently and is more discrete. So even if players were more motivated it would be harder for that motivation to translate into additional scoring events.”

Clauset considers Berger’s “underdog inspiration” theory interesting, but as a data scientist, he wants to see the numbers. “How do you measure motivation?” he asks. “I don’t know.” Clauset thinks the NBA’s “restoring force”—the tendency for teams to lose their leads—might instead be due to player management. In basketball, he theorizes, coaches often pull their best players from the lineup when they’re in the lead, meaning they’re less likely to score. In football, by contrast, coaches rarely substitute in this manner, and there’s so much rotation in hockey that it’s more difficult to orchestrate when the best players shuffle in and out.


Did these scientists consider that, in hockey, a trailing team sends out its best offensive players (who are quite frequently lesser defenders) more often, that they pull the goalie (which often leads to empty-net goals on the other end) and that they take more risks in the offensive end, which opens up the ice on the defensive end for breakaways and two-on-ones? Did these scientists consider that trailing teams in football do things like go for it on fourth down, rather than punting or going for a field goal, and that when those strategies fail the trailing team gives up field position? Did they actually look at late-game strategies that football and hockey teams follow, and map how those strategies affect the outcome?

I think not.

-- MWE
   2. Harveys Wallbangers Posted: January 03, 2014 at 09:56 AM (#4628998)
mike

what you describe this author would likely point to as the 'reactive' approach he lists
   3. Mike Emeigh Posted: January 03, 2014 at 10:18 AM (#4629012)
Harveys:

I get that. But if you're going to draw a conclusion from the data, which these guys have done, you should at least investigate the impact of those forces, rather than publishing this:

Clauset agrees that there are incredibly complicated forces underpinning his seemingly simple findings. “These teams are working so hard to beat everyone else,” he says. “But in the end, it’s like the Red Queen in Alice in Wonderland—you have to run as hard as you can just to stand still. Only by working so hard and figuring out these strategies can you achieve this system that is like a random coin flip.”

That, to Clauset, is good news for the fans. It means these games are inherently balanced, that all teams have more or less the same advantages. What victories hinge on are the rare chance events—the tragic mistakes or lucky breaks, the stuff that gets stadiums and arenas cheering.



How can he draw that conclusion from the data? Because they have developed a simple model that is more accurate in predicting game outcomes, on average? This is the MGL approach - where you decide to take Jon Lester out of a WS game because "on average" your bullpen will hold the lead. Yes, it's important to know the percentages - but it's also important to understand specific contextual factors that lead you to make a decision that goes "against" the percentages. I think that's what these guys don't get; they see a pattern and make an assumption that understanding the dynamics of the situation doesn't matter in the long run because ultimately it comes down to a lucky bounce here and there - when it's the dynamics that actually drive the data that the see.

-- MWE
   4. Harveys Wallbangers Posted: January 03, 2014 at 10:35 AM (#4629019)
mike

not disagreeing with either of your posts.

just my guess on how this author would respond.

like a good many number pros, he's pretty sure of himself.
   5. Bitter Mouse Posted: January 03, 2014 at 11:48 AM (#4629039)
Note: I hate the term Big Data. I am in the field though so ....

Anyway the money quote is I think:
“These teams are working so hard to beat everyone else,” he says. “But in the end, it’s like the Red Queen in Alice in Wonderland—you have to run as hard as you can just to stand still. Only by working so hard and figuring out these strategies can you achieve this system that is like a random coin flip.”


And I think that is the correct view.

To address what Mike is saying, there are (grossly simplified) two ways of analyzing data. Start with a theory and test it (relationships and correlations) against the data, or set loose mathematics on large data sets and allow the data to spit out relationships and correlations.

Again this is very simplified but that is the high level view. Both are very valid and both can lead one down the rabbit hole. It sounds like he is definitely doing the second, and it often helps to take that perspective because it can free you of bias and you can gain new insights into the data set (and the system it represents) that someone "inside" the system will never see. Of course you can also end up with nonsense. I have seen both in plenty.

The two main factors are the quality of the data (accuracy and granularity - is it correct, complete, and granular enough that the main factors are measured and not buried) and quality of the analyst. Good math and enough quantity of data are also pretty important.
   6. Rickey! On a blog from 1998. With the candlestick. Posted: January 03, 2014 at 11:55 AM (#4629046)
Note: I hate the term Big Data. I am in the field though so ....


I am counting the days until I'm no longer working a "big data" team.
   7. Tulo's Fishy Mullet (mrams) Posted: January 03, 2014 at 12:09 PM (#4629058)
I enjoyed the running poll an industry publication ran last month in ridiculing the buzzwords of the year in our business (securities). The winning phrase: "skate to where the puck is going to be"
   8. Yeaarrgghhhh Posted: January 03, 2014 at 12:15 PM (#4629066)
But if you're going to draw a conclusion from the data, which these guys have done, you should at least investigate the impact of those forces

Why? The "scientists" are looking at long term patterns in the data. I don't see why they have any obligation to analyze the discrete events as well or why a specific set of circumstances (Lester feels really good that day, but the bullpen is spent) contradicts the overall trends.
   9. villageidiom Posted: January 03, 2014 at 01:21 PM (#4629137)
But if you're going to draw a conclusion from the data, which these guys have done, you should at least investigate the impact of those forces

Why? The "scientists" are looking at long term patterns in the data. I don't see why they have any obligation to analyze the discrete events as well or why a specific set of circumstances (Lester feels really good that day, but the bullpen is spent) contradicts the overall trends.
They don't have an obligation to analyze the discrete events, but if they don't they have an obligation not to dismiss discrete events as luck or error if they haven't actually determined it to be so. They should not pretend (nor suffer the delusion that) their analysis is more than it is.

IMO, they actually do have an obligation to analyze the discrete events, because the discrete events can reveal ways in which one has totally screwed up the data prep. I always go back to this one, so sorry for being redundant, but... When UZR had Manny Ramirez at something like -47 one year, people were eager to defend UZR on the basis of (a) it works on average and (b) we all know Manny is bad, and -47 is bad, so -47 must be right. Shortly after that someone went through the PBP video and found a drastic mismatch between balls that were considered playable (underlying UZR) and balls that were actually playable, in Fenway LF. UZR has since been adjusted to account for the problems with Fenway LF data, and Manny's UZR bottoms out around a -26, which is still bad, but a couple wins better. A couple wins is a pretty big difference, on an individual player.

Understanding the outliers, the overall patterns, etc., is key to understanding if your data was worth using in the first place.
   10. Benji Gil Gamesh Rises Posted: January 03, 2014 at 01:39 PM (#4629153)
Why? The "scientists" are looking at long term patterns in the data. I don't see why they have any obligation to analyze the discrete events as well or why a specific set of circumstances (Lester feels really good that day, but the bullpen is spent) contradicts the overall trends.
I'm no numbers pro, but in addition to what vi said in #9, I think what Mike was originally referring to in #1 was not so much discrete events as specific, repeated strategies used within different sports that could reasonably be having a direct effect on the numbers that they are analyzing and the conclusions they are drawing.
   11. Shibal Posted: January 03, 2014 at 01:45 PM (#4629156)
Factoring in these few basic findings, Clauset and Merritt developed a mathematical model that, after observing just a few scoring events, predicted game outcomes for college and pro football, the NHL, and the NBA with surprising accuracy. Their model proved more accurate than the simple metric of looking who was in the lead at a given time, and it outperformed SportsbookReview.com’s pregame betting odds while more or less matching the accuracy of the live-betting site Bovada. Impressive results, considering Clauset and Merritt spent just three months analyzing the data and coming up with their model.


I think their definition of "impressive" is pretty weak. It would take me an hour to build a model to match Bovada's in-game lines, once I have the game data. The data is the hard part. The model certainly isn't.



   12. Accent Shallow Posted: January 03, 2014 at 02:06 PM (#4629177)
I enjoyed the running poll an industry publication ran last month in ridiculing the buzzwords of the year in our business (securities). The winning phrase: "skate to where the puck is going to be"


You know how I know you're Canadian?
   13. bobm Posted: January 03, 2014 at 02:06 PM (#4629178)
[1]
[FTFA:] Berger tells me via email that we only see the pattern in basketball because “teams score frequently, and differences in motivation can easily impact whether a scoring event occurs. In hockey, and even football to some extent, scoring occurs less frequently and is more discrete. 


I wonder if this is true in NCAA or just in the NBA with its rigged officiating.
   14. Swedish Chef Posted: January 03, 2014 at 02:10 PM (#4629185)
If you haven't got a trillion of something it isn't "big data".
   15. Elvis Posted: January 03, 2014 at 03:54 PM (#4629340)
#13 bobm -

If you don't think the NCAA has rigged officials, try watching an ACC contest involving Duke or Carolina.
   16. zenbitz Posted: January 03, 2014 at 04:35 PM (#4629394)
I think in general, data science is mediocre *science* -- although it's incredibly USEFUL. The reason is that it almost never provides models of understanding.

Or more to the point: it can never give you causation.
   17. KT's Pot Arb Posted: January 03, 2014 at 05:18 PM (#4629436)
I think in general, data science is mediocre *science* -- although it's incredibly USEFUL. The reason is that it almost never provides models of understanding.

Or more to the point: it can never give you causation.


I think a million Pot Arbs could type on a million keyboards for a million innings and not come up with a more succient explanation of Pot Arb's opinion of data science.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Edmundo got dem ol' Kozma blues again mama
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogOTP - July 2014: Republicans Lose To Democrats For Sixth Straight Year In Congressional Baseball Game
(3523 - 7:09pm, Jul 29)
Last: BDC

NewsblogValencia traded to Toronto
(11 - 7:04pm, Jul 29)
Last: Davo Dozier

NewsblogOMNICHATTER 7-29-2014
(1 - 7:03pm, Jul 29)
Last: BDC

NewsblogTrader Jack? As Seattle's GM struggles to complete deals, some rival executives wonder | FOX Sports
(52 - 6:55pm, Jul 29)
Last: Yeaarrgghhhh

NewsblogMASN TV Contract Pits Selig vs Nationals vs Orioles
(12 - 6:48pm, Jul 29)
Last: Joey B.: posting for the kids of northeast Ohio

NewsblogWhich Players Will Be Most Affected by the Hall of Fame’s New Rules?
(4 - 6:48pm, Jul 29)
Last: McCoy

NewsblogHoward: David Ortiz shaping up to become first steroid era Teflon slugger
(18 - 6:37pm, Jul 29)
Last: 'zop sympathizes with the wrong ####### people

Newsblog‘Caucasians’ T-shirt mocking Cleveland Indians becomes hot seller on reserves
(23 - 6:23pm, Jul 29)
Last: Fernigal McGunnigle has become a merry hat

NewsblogABC News: ‘Capital Games’: How Congress Saved the Baseball Hall of Fame
(44 - 6:20pm, Jul 29)
Last: Ray (RDP)

NewsblogNewsweek: With Big Data, Moneyball Will Be on Steroids
(7 - 6:09pm, Jul 29)
Last: A New Leaf

NewsblogPrimer Dugout (and link of the day) 7-29-2014
(44 - 6:08pm, Jul 29)
Last: philphan

NewsblogSickels: George McClellan, Dayton Moore, and the Kansas City Royals
(631 - 5:54pm, Jul 29)
Last: McCoy

NewsblogFull Count » Tim Kurkjian on MFB: ‘I’m going to say that Jon Lester is not going to be traded’
(29 - 5:36pm, Jul 29)
Last: The Yankee Clapper

NewsblogGossage on Bonds, McGwire Hall hopes: ‘Are you f–king kidding?’
(156 - 5:36pm, Jul 29)
Last: BDC

NewsblogSOE: Minor League Manhood - A first-hand account of masculine sports culture run amok.
(3 - 5:35pm, Jul 29)
Last: ursus arctos

Page rendered in 0.1519 seconds
52 querie(s) executed