|
|
|
|
Baseball Primer Newsblog— The Best News Links from the Baseball Newsstand
Monday, August 13, 2007
In the new study, Hamermesh’s team analyzed the calls on 2.1 million pitches thrown in the Major League between the 2004 and 2006 seasons. Controlling for all other outside factors, such as the pitcher’s tendency to throw strikes, the umpires’ tendency to call strikes and the batter’s ability to attract balls, researchers found evidence of same-race bias — and the data revealed that the bias benefits mostly white pitchers. Not surprising, since 71% of MLB pitchers and 87% of umpires are white.
The highest percentage of strikes were called when both the home-plate umpire and pitcher were white, and the lowest percentage were called between a white ump and a black pitcher. The study also found that minority umpires judged Asian pitchers more unfairly than they did white pitchers. It’s a significant disadvantage for Asian pitchers because the MLB doesn’t have any Asian umpires. Interestingly enough, Hamermesh’s research found that the race of the batter didn’t seem to matter — the correlation was only between the pitcher and the home-plate ump. Rich Levin, an MLB spokesman, refused to comment on the research findings.
|
Bookmarks
You must be logged in to view your Bookmarks.
Hot Topics
Newsblog: Sources: Cubs’ Starlin Castro Accused Of Sexual Assault (6130 - 8:24pm, Feb 10)Last:  Misirlou's got a busy day, he's wearing a vestNewsblog: Curt Schilling Says Manny 'Quit on the Field,' Teammates Stopped Him From Confronting Slugger (27 - 8:22pm, Feb 10)Last: Buzzards BayNewsblog: OT: NBA Monthly Thread, February 2012 (417 - 8:20pm, Feb 10)Last:  ray jamesNewsblog: Mets owners knew about Maddoff (32 - 8:16pm, Feb 10)Last: SouthSideRyanNewsblog: Sullivan: 2011 in Extreme Home Runs (1 - 8:15pm, Feb 10)Last: Good cripple hitterNewsblog: Knobler: Stay away from steroids -- but vote how you want (26 - 8:14pm, Feb 10)Last: BooeyTransaction Oracle: 2012 ZiPS Projections - Oakland A's (56 - 8:12pm, Feb 10)Last: Drew (Primakov, Gungho Iguanas)Newsblog: MLB: Hall of Fame worthy? Furthest thing from Schilling's mind (41 - 7:55pm, Feb 10)Last: PreservedFishNewsblog: Grantland/Bill James: An Open Letter to the Hall of Fame About Dwight Evans (45 - 6:59pm, Feb 10)Last: Ron JNewsblog: ESPN: Law: Top 100 Prospects (paywalled) (11 - 6:54pm, Feb 10)Last: Crispix AttacksNewsblog: 'Duk: Tim Lincecum slims down with swim routine, loses appetite for McDonald’s (298 - 6:51pm, Feb 10)Last:  rflohNewsblog: FSKC announces on-air lineup for Royals - Rex Hudler and Steve Physioc to join (12 - 6:32pm, Feb 10)Last: Robert in Manhattan BeachSox Therapy: Offseason Minor League Thread (3 - 6:11pm, Feb 10)Last: DanNewsblog: Jeff Sullivan: The Worst Team Ever Projected? (67 - 6:00pm, Feb 10)Last: Eric J is Financed by a Rich GrandpaNewsblog: Bluetales blog: JetBlue’s 605 Wears Red Sox Colors! (8 - 5:56pm, Feb 10)Last: JE (Jason Epstein)
|
|
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
I thought someone familiar with the questec system recently posted here that questec required a lot of post-processing; that is, the vertical portion of the strike zone was calibrated after the game, based on the video. Apparently they don't do it in real time.
Also I recall a study done on the uniform color of football teams, where the team with the dark uniform would get more penalties than the lighter team color irrespective of the race of the players. Did they also try to look at uniform color to verify its impact?
No doubt there. I don't disagree with the study's finding that the refs have a racial bias. They may, they may not, I'd lean toward that they do. They're human, people favor what is more familiar to them, and for most people that is other people of the same race, religion, education, etc.
What I've been arguing against in this thread is poor science. Have I been bashing the NBA ref study because I'm a closet racist? No. Like I said earlier, I'm a little biased, I'll admit it. As others have noted, bias does not imply racism. The poor science comes in in that the NBA study was poorly done, in that a crew of 2 white refs + 1 black ref = 3 white refs or 2 black refs + 1 white ref = 3 black refs. Unless you can convince me that the white refs exert that much influence over the black ref, or the black refs over the white ones, I don't believe that. I think that is poor data collection and methodology, and all claims made off that data are just as poor spoiling the study. Yet, that doesn't stop the authors and Wharton from going to the NY Times, the WSJ, ESPN, ABC, CNN, etc. Poor science to gain publicity.
Again, while I haven't read the MLB study, if what everyone is saying here is true, that at max it is one pitch a game, then you have to compare this to the "normal" errors in umpiring. Even then, going to Time magazine and having an article written up seems to me self-publicity rather than science. Most people aren't going to read the whole thing, or they may not even read the study (hey, I haven't yet), yet they'll see the headlines and all the sudden Schrutebag will be on the radio, "MLB umps are racist and its better for all playoff teams to have white pitching staffs."
These kinds of studies hurt science in general. It makes good scientists' claims harder to gain footing.
How would you do it?
How would I do which study? The NBA ref one? It will be rigorous. First, I'd start by only including the games that 3 refs of the same race ref. This would be an easy study to gather data, although it may not give a large enough sample size. Then, again this will be rigorous, you'd either have to get the NBA to allow you to see their referee evaluations, or you'd have to watch games to gather the data yourself. Then, I would think you would have a strong data set of truly what race ref called a foul on a player of a certain race. There would be no generalizations and no interpolations.
The IAT study is a little different. I think you'd have to do the study multiple times using different words, different faces, and introducing the relationship different ways. Good + european = right, Good + African = right, Good + african = left, etc. This one is tough because of the practice component and the pattern component. The better bet may be to make 6 or 7 different tests and increase your sample size numerous times.
By what method would you classify players and umpires by race?
Wasn't there a supreme court case a while ago that said the KKK was allowed to do just that, hold a public access TV show?
Can one of the lawyer types around here confirm that?
I haven't read the MLB study, so I don't know how they did it. I'd probably either try and get the data from MLB or get it from an outside source. Last resort would I go through and label them myself, but if I were forced to do that (and sometimes you are), then you'd have to use a group and not just one be all end all.
Someone brought up a great point earlier about Latin Americans and African Americans. Both have dark skin and could be considered "black" or "African-American" by appearance, so you'd have to make the determination whether to go by skin color or background. This would be a huge influence on the study, because if you really want to go racial bias by appearance, I'd think you'd group Latin-American players with African-American players. Umps aren't going to have a card with the birthplaces of the players in the game to start making their biases on. I know MLB does not consider Latin-Americans and African-Americans in the same group, so this is something that the researcher has to determine at the beginning and state. If the study were done blind, then you could probably do both, considering the two groups together and separate.
I don't know of such a case but that ruling is the undoubted outcome of such a challenge.
So I'm wondering how to reconcile the authors' findings of significance with the fact that the difference is only five pitches, less than half a standard deviation from expected.
See my blog post for details. Any help appreciated, I really don't know how they managed to find significance.
Further, much of the criticisms of this paper and the sociologists' paper on the NBA seem a bit disingenuous or question-begging. Look, I know most of us at this site think, among other things, that we're GMs-in-waiting, psychologists-without-homes, and sociologists-without-frontiers, however, none of the criticisms of these papers are particularly compelling. The sociologists' paper was peer-reviewed, and subsequent to FURTHER critique that substantiated their methodology. To say "I'll be rigorous" is about as compelling as when I claim in my papers that "I'll go deeper than prior analyses."
If that turns out to be the case, I'd have no problem with that. I just have a problem with what they did. It just screamed of being a media whore to me (which wouldn't surprise me considering they're Ivy League).
I do wonder how the outcome from the study would change (if at all) if the authors introduced age and/or experience as an umpire as a factor.
-- MWE
You don't think that questioning the data gathering of a study is compelling?
Just because something is peer-reviewed doesn't mean it's good. JC, you and I both know that sometimes the names and institutions on the top of the paper are just as important as the data in the paper to how critical the reviewers are.
Of course, but I don't find your questions about the data-gathering compelling. They themselves (I'm talking about the NBA study) note the limitations of the approach you describe; it's not like they were unaware of it. Sure, it would be preferable if they could pin every call to every ref and every player, but the NBA blocked them from doing that. So, they took the data they had and drew conclusions that that data supported. Am I wrong about this?
Does it really matter how many pitches white umps saw from white pitchers? The fact that there was only 1 pitch in the black/black cell pretty much means you can't conclude anything.
No. I find it irresponsible that they knew there were limitations in the data they used, leading to limitations in the conclusions they drew, and they still went ahead and published it. They probably could've gotten the same result had they used data from ref crews of all the same race, rather than saying that 2/3 of a race = 100% of that race. With this data, they would've had a solid, non-assumption based study, and then they could've pressured the NBA to give them the strong data.
Because they went ahead knowing the limitations, I think they were more concerned with causing a media shitstorm than actually looking at the data or trying to determine the deeper meaning (or cause) in the data. Were they doing the science for knowledge, or for personal glory? I vote personal glory.
Because the null hypothesis (that there is absolutely no difference) is always quasi-false, in whatever field of human behavior one chooses to study. With a large enough sample size, even the smallest differences can be found to be "statistically significant." The real challenge is to determine whether the findings have any practical meaningfulness.
The authors' own estimate of "same-race" advantage, after a lot of complex regression, is just 0.34%, or about 5 calls per season. So a white pitcher gains perhaps 1 win every 15 years. And that's if you buy their regression results. Using the unadjusted data, a white pitcher who faced only black and Hisp umpires would see a reduction in called strikes of 0.15%, which means he would lose about 2 calls per season or maybe 1 win every 30 years.
Also, although the authors control for a boatload of factors in their regressions, it doesn't appear that they control for park effects. That probably doesn't matter in the aggregate, but it could have a big impact on the Questec/non-questec part of their analysis. Once you start looking at same-race matchups for minority pitchers (for which the sample is small to begin with), separately by Questec/non-questec, you almost certainly could see a real park effect.
The bottom line using this data (Table 2) seems to be that the racial bias — if it exists at all — is limited to Asian and Hisp pitchers. White and Black pitchers get the same called strike rate by umpires of all races. Asian pitchers get lower rate from both Black and Hisp umps (or, less plausibly, a positive bias from white umps). And Hispanic pitchers may get a slight positive bias from Hisp umps, and slight negative bias from Hisp umps.
(I think the only way to really get a good read on this is the questec data, that would allow us to see how umps call pitches in comparison to actual pitch location. That would allow us to see, for example, if Black pitchers really should be getting so many fewer called strikes than white pitchers.)
Jimmy, from what I understand they treated 3/0 separately from 2/1. They didn't group them together.
Phil - the white umps matter because the bias is most easily shown there. The data leads us to conclude they have a significant bias; the question now is can we see the same result in the other, less significant data we have? The answer seems to be yes. The effect is small, but noticable.
How can you measure bias by white umps, except by comparing them to the 3 Hispanic and 5 black umpires? And even then, white umps call strikes on a black pitcher just 0.15% less than do black umps. That's about one call per 80 innings, so about one extra BB per season for a full-time starter. And that's ignoring the legitimate sample issue Phil is raising. (The difference is even smaller if you do the same analysis for white pitchers.)
Actually, the data can *at most* lead us to conclude that there is a bias SOMEWHERE. It could be white on black, it could be black on white, it could be black on black, or white on white. All you can say is that "same race appears to give different called strike rates than different race," but you can't tell if it's white or black bias, or even which pitchers the bias favors or shortchanges.
For significance levels, it's the SMALLEST cell in the matrix that largely determines the probability of observing the bias. The variance for the sum or difference between two means (such as means of called strike rates) is the sum of the variances of the two samples. The black/black cell has such a large variance compared to the rest that it almost doesn't matter how big the white/white sample is.
This isn't quite as strong a result as if each of the combinations was signficant (if I'm understanding this correctly) since the matched data is mostly white pitcher-white ump and the unmatched data is mostly latin pitcher-white ump and could (just?) be showing that white umps call fewer strikes for latin pitchers than white pitchers. Since Latin and white pitchers come from a difference baseball cultures and might not be distributed evenly throughout age groups it's probably not a safe assumption that white and Latin pitchers are throwing the same % of strikes.
I like that they controlled for pitch count among other things but wish they had more years of data and, even better, wish MLB would give them access to the questec data.
"Could the result be a function of the order in which I did the two parts? I had to group one category together with pleasant words first. I then found it difficult when I later had to group the other category with pleasant words.
Answer: The order in which tests are administered does make a difference to the overall result in some tests. However, the difference is small and recent changes to the test have sharply reduced the influence of order. Because of this order effect, the orders used for IATs presented on this website are assigned at random. For any data we present, we are careful to be sure that half the test-takers got the A then B order and the other half got the B then A order. With the revised task design, the order has only a minimal influence on task performance. If you want to check whether the order made a difference for you, you can take the test again and complete it if you get assigned to the reverse order. If you do take the test twice in different orders and get different outcomes, the best estimate of your result is intermediate between the two. For more information about the order effect, see this paper (Nosek, Greenwald, & Banaji, in press)."
There are more white umpires than black, so the team with the pitcher whose race matches the umpire's race probably has a white pitcher. And that's an above-average pitcher, so the winning percentage is higher.
As far as I recall, the study never actually mentions that the white pitchers are better than the black pitchers, so it's easy for reporters to get confused by what's going on here.
You must be Registered and Logged In to post comments.
<< Back to main