User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Page rendered in 1.1922 seconds
48 querie(s) executed
| ||||||||
You are here > Home > Baseball Newsstand > Discussion
| ||||||||
Baseball Primer Newsblog — The Best News Links from the Baseball Newsstand Tuesday, October 03, 2006Sabermetric Research: Birnbaum: Chopped liverAnd let the DIPS fall where they may…
|
Login to submit news.
You must be logged in to view your Bookmarks. Hot TopicsNewsblog: As Padres’ season spirals, questions emerge about culture, cohesion and chemistry
(49 - 11:41am, Sep 25) Last: Mr. Hotfoot Jackson (gef, talking mongoose) Newsblog: OT Soccer - World Cup Final/European Leagues Start (110 - 11:25am, Sep 25) Last: jmurph Newsblog: OT - NBA Off-Pre-Early Thread for the end of 2023 (7 - 11:16am, Sep 25) Last: Der-K's enjoying the new boygenius album. Sox Therapy: Over and Out (38 - 11:10am, Sep 25) Last: Darren Newsblog: OT - 2023 NFL thread (9 - 10:04am, Sep 25) Last: Howie Menckel Newsblog: Qualifying Offer Value To Land Around $20.5MM (15 - 9:23am, Sep 25) Last: DL from MN Newsblog: Site Outage Postponed (106 - 9:10am, Sep 25) Last: Nasty Nate Newsblog: Ex-Nats reliever Sean Doolittle exits after '11 incredible seasons' (4 - 7:32am, Sep 25) Last: Steve Parris, Je t'aime Newsblog: Omnichatter for September 2023 (520 - 11:43pm, Sep 24) Last: DFA Newsblog: Oakland vs. the A's: The inside story of how it all went south (to Las Vegas) (40 - 8:18pm, Sep 24) Last: Howie Menckel Newsblog: Republicans propose $614M in public funds for Brewers' stadium upgrades (36 - 6:03pm, Sep 24) Last: base ball chick Newsblog: Is It Time to Stop Using Scripts on Sports Uniforms? (19 - 4:55pm, Sep 24) Last: base ball chick Newsblog: Left-handed hitters describe the craft of hitting lefty pitching (2 - 12:25pm, Sep 24) Last: McCoy Newsblog: Here are the 2023 Arizona Fall League rosters (1 - 11:36am, Sep 24) Last: Ziggy: social distancing since 1980 Newsblog: Yankees' status quo under Brian Cashman resulted in 'disaster' season, and a fresh perspective is needed (7 - 2:59am, Sep 24) Last: TVerik - Dr. Velocity |
|||||||
About Baseball Think Factory | Write for Us | Copyright © 1996-2021 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| Page rendered in 1.1922 seconds |
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
(I wont name schools that this is common practice at but one of the biggies for doing this is a powerhouse in football :-)
Maybe it's just me, but the phrases like "really smart people," "awesome . . . job," etc. don't exactly convey the intellectual posture intended for this article.
Unfortunately there is a lot of this going around, JC is just one of the more blatent. He got his book deal.
Maybe it's just me, but the phrases like "really smart people," "awesome . . . job," etc. don't exactly convey the intellectual posture intended for this article.
Actually, it does.
Wasn't Tippett widely knocked for ignoring Voros' follow-ups to DIPS 1.0 in conducting his study.
In Case You're Interested
I'll take winner.
(Oh, and twenty bucks says that this "< a >" button is lying to me and that link didn't work.)
Just kidding, of course.
No,it is important to cite the relevant published work on the topic, but it is not uncommon to cite unpublished - non-academic - works. With the popularity of blogging and tools like Google Scholar, this is becoming increasingly common.
Maybe the professorial culture requires that you presume no knowledge is real until it’s been published in peer-reviewed journals.
That's over-stating it. There is a difference between peer review in a site like this and what takes place in publishing in a scientific journal. From what I've read, feedback here (or at Birbaum's site, or Bradbury's) is actually more useful than what you get from formal reviewers. However, the key difference is that the feedback comes AFTER the study is published (on the internet). I could do a crappy study, post it on a site, get lots of negative feedback, and others may choose to site my non-journal/unpublished study. The reader may never know how bad it is. However, in an academic journal, negative feedback may kill the submission, so that it never sees the light of day. So, it's not really whether the knowledge is real or not, but the confidence you have in its credibility (based on citation alone).
Maybe Dr. Bradbury is right that all the previous results are questionable unless he uses the exact technique that he does.
Doubtul, there are methods that fit the data and the research question better, but most scholars will accept the fact that different researchers use different research strategies. In fact, it creates confidence that what you're looking at is a real phenomenon and not an artifact of the method.
<u>And maybe I’m just overreacting</u> to a couple of throwaway sentences intended only to get his paper past the referees.
I love Phil's work, but I would say, yes, it's an over-reaction.
This is an economics article, not a statistics article. The point is not DIPS, but are GM's already using DIPS to value pitchers. I don't see where J.C. is ignoring anything. He fully credits Voros's original work and then applies the standard statistical tools used by economists in order to say, "See economists, this does in fact hold true using the techniques you know and trust." He then works out what the economic impact of this effect are to the compensation given to pitchers, which as far as I know is completely original work.
A rehash of Tippett, TangoTiger, Emancip8d, and Patriot (get those by the reviewers) isn't really necessary for what he is trying to do here. I'm not sure what Phil wants here. A list of every internet article on DIPS?
Perhaps, J.C. is a little flip with the comment about rigor, but I don't think he is too far off from the truth. Very little of the work done on sabermetric websites and sabermetric books would pass muster with regards to the level of statistical analysis needed to be published in an academic journal. The ideas are good, but doing statistics well requires training and knowledge that most people simply do not have. Even as an applied mathematician, I would lump myself into the group that doesn't know enough statistics. For instance, one of Bill James's favorite techniques of using matched pairs of players to show some effect may show that there is an effect, but it gives us little info as to what the size of the effect is and tells us nothing about the error bars around that measurement for the population at large.
In the interest of full disclosure, I have a PhD., consider myself a friend of J.C.'s, and I share Phil's frustration at academics poaching ideas from the sabermetric community, but I think he chose a very bad example here.
Well, actually, I do have *some* reading comprehension. :) The first half of J.C.'s study tests whether DIPS is correct. Only the second half deals with the labor market issues. My point relates only to the first half, the part to "verify if [DIPS] is correct using the proper econometric tools." That half is pure sabermetrics.
And I don't dispute that J.C.'s article is an improvement on previous work. I think studying DIPS using better statistical techniques is a welcome and worthy endevor. Actually, Sean, I am NOT frustrated at academics poaching ideas from the sabermetric community -- ideas are public domain, and anyone should be welcome and encouraged to grab whatever ideas are out there and run with them. I'd be very happy if J.C. eclipses previous work and sets he standard for analysis of DIPS. My point is simply that while he does so, he should mention previous valid work on the same topic where relevant.
I don't want a list of every internet article on DIPS. But I did expect an acknowledgement that other researchers have tested DIPS before (and in a valid way). What J.C. did is a *further* test of DIPS over and above what's already done. This is certainly a good thing. But given that there are already other tests out there, some of them should be mentioned. Not all of them, just the ones that J.C. thought were valid, perhaps the ones that influenced his own work. (Maybe Tippett, since J.C. actually wrote about it previously, and it comes to the same conclusion?)
Sean, you did a well-received study on catchers and passed balls. You didn't publish it academically -- it's on your website in Powerpoint format. Now, let's say I read your study, and then I figure out another way to measure the same skill. I write an academic article using my method. I write, "Bill James once said he wished he had a way to figure out which catchers were best at preventing passed balls. I now have a rigorous method that uses standard econometric techniques." I don't mention your study at all. Isn't that totally inappropriate? Wouldn't you call me on it?
I agree with you that even after peer review, bad studies stay on the web. However, given that J.C. has followed much of what has been written about DIPS, he is in a position to know which are valid and which aren't, and cite only the appropriate ones.
That's really the heart of my argument -- J.C. is very active in the sabermetric community. He's been around for the debates and the studies and the discussions. Is there really NOTHING that any of you guys have done on this topic that J.C. has seen that's worthy of citation as prior work? It's been five full years. After all that's been done on DIPS in five years, there's absolutely zero scientific knowledge about DIPS that's relevant to J.C.'s study?
<quote>">... I would say, yes, it's an over-reaction."</quote>
Thanks, I appreciate hearing that.
Phil
From a statistical analysis perspective, what's impressive (to me) about well-done sabermetric studies is not (generally) the statistical analysis but the truly excellent and careful work with the data, what it means, what it doesn't, what needs to be adjusted for, etc. It would be great if more academics did that.
But this isn't the consensus at all. In fact the very studies that he quotes later in the article, Tippett's and the solving DIPS, both demonstrate that this strong version of DIPS is false.
The exageration of DIPS is a big problem that the sabremetric community needs to face honestly.
Phil, again the point of his paper is not to validate other people's studies of DIPS, but to introduce DIPs to an economics audience using rigorous analytical tools. I doubt a paper that did nothing else, but rehash and beef up the work of Tippett, TangoTiger and the Primates would be accepted to an economics journal. The work on DIPS is really just an introduction to the meat of the article which is the marginal revenue products section. It's an ECONOMICS paper, not a STATISTICS or SABERMETRICS paper. Re-read the abstract, there are two sentences about the DIPS aspect and three on the marginal revenue products aspect. For that matter, read the title, "DOES THE LABOR MARKET PROPERLY VAUE PITCHERS?" If his article was, "IS THE DIPS HYPOTHESIS VALID?" then you would be completely right. As
It depends. If you got the idea from me, then you should cite me (as J.C. cites Voros). If you got the idea elsewhere and are just aware of the method, a citation might be good. However, I still think you are comparing apples and oranges. J.C.'s paper is an economics paper about more accurately measuring how teams assign marginal revenue product to pitchers. Your article is about measuring catcher defense. You'll notice that J.C. goes into detail and gives a survey of other articles measuring marginal revenue product for pitchers. If you are doing a purely sabermetric paper, then I think you would need a fuller survey of other methods.
I apologize if I'm being too strident here, but I feel Phil is a little too recklessly impugning J.C.'s reputation here.
There's a healthy debate going on about the present and future of peer review: should the old print-oriented system prevail, or should articles be rapidly circulated via the Internet and just as rapidly critiqued there? And there is a lot of fur flying, but there is also some serious debate over the protocols and the social significance of different ways of distributing knowledge. Some "academic credentialism" is involved in these debates, but like Sean I fail to see how it infiltrates JC's paper, which is quite respectful towards original Internet-published work. If anything, JC gets Voros cited and into the academic knowledge system -- maybe that will lead to more citations.
1. Do teams properly value pitcher skills, once they are already in the market of major league pitchers?
2. Do team properly evaluate pitcher skills in order to determine whether or not a pitcher should be in the market of major league pitchers in the first place?
If (as I and others have argued), hit prevention skills are a key criterion for determining whether a pitcher can enter the market of major league pitchers in the first place, then I think it's logical to assume that those skills should become less important as a discriminant among pitchers that have already qualified for entry into the market.
-- MWE
I've noticed this as well, DIPS advocates seem to simply ignore the fact that some in the stathead community dipsutre their assumptions. For instance one article I read last week began like this, "...and for those who believe in DIPS, and at this point isn't that everyone, ...."
I think one problem with the arguments is the lack of proof (for both sides of the argument)
It very well may be that teams do a good job of removing pitchers with poor hit prevention skills from their systems before they reach the majors- as a result, the "survivors" tend to have hit prevention skills that flatten out at a certain level. Those who are worse than that level do not last very long even if they reach the majors. As a result it's difficult to get a decent smaple size of pitchers who are bad at hit prevention pitching at the major league level- to see if they are really bad at hit prevention - or just unlucky.
One discussion chain I saw had a great back and forth concerning the viability of MLEs (or minor league translations) The "anti" position was driving the "pro" position insane. Simply stated the anti-position was this: MLEs are absolutely 100% useless- the only reason that they "appear" to work so often is because teams are good at weeding out the AAAA hitters who can't hit in the majors the way their MLEs say they should- as a result the players who would statistically disprove the ideas behind MLEs are systematically removed from the MLB player pool. Neither side could convince the other because, each side had a belief that the other lacked an empirical basis to invalidate.
To over simplify, let's say someone "Voros" believes wholeheartedly in DIPS; whereas someone else, "Mike" belives that hit prevention skills are real, but that the weeding out process of teh minor league system weeds out those with poor skills from reaching the majors, or accumulating significant playing time once they reach there- as a result DIPS studies are conducting on a population that happens to be pretty homogenous with respect to the trait your trying to study.
Voros lacks the data to disprove Mike's assumption, Mike lacks the data to disprove Voros' assumption.
>... I feel Phil is a little too recklessly impugning J.C.'s reputation here.
That is fair criticism of my post, and separate from the other issue. I'm open to the accusation that I am being too harsh on J.C. and am unfairly impugning his reputation. That is quite possible, and if you convince me of that, I will apologize. But the question of whether he should have cited previous work is independent of that.
>It's an ECONOMICS paper, not a STATISTICS or SABERMETRICS paper.
No, the first half is a SABERMETRICS paper. Analyzing whether DIPS is valid or not is Sabermetrics. I don't care if it's in an economics paper, a nuclear physics paper, or a history paper. It is what it is, and that is *sabermetrics*.
And it is not a matter of "re-hashing," as if the work of those previous researchers was flung against the wall and would have to be scraped off and cleaned up. It is scientfic research on the EXACT SAME question that J.C. addressed in the sabermetrics portion of his economics paper. It is, in my opinion, good and valid scientific research. And, in my opinion, when you research the same question that others have, and their research is valid, you mention, maybe even in one sentence, that you are confirming their result.
This is *more* true, not less true, when you are writing for a completely different audience who is likely unaware of previous research on the question -- for instance, when you do a sabermetrics study for economists.
If you were an economist with no knowledge of sabermetrics and you read that paper, what would your impression be? It would be something like,
"Hey, there's is something called an "analytic baseball community." What's that? I dunno. Is it a group of serious researchers? Nah, their "concepts" have undergone "little formal scrutiny," so it's probably just a bunch of fans with some idea in their heads. It's nice to see economics come to the rescue by finally putting this idea to a proper scientific test."
That's just wrong. The sabermetric community has done some excellent work on this question and come up with some valid answers. The same answers that J.C. was able to duplicate.
Sean, let me ask you -- before you read this paper (assuming you have), did you know that knuckleballers tended to exceed beat the expected results of their balls in play? Did you know that power pitchers did also?
I did. And you probably did too. And a lot of readers did. And by using the word "knew," I mean I trusted the studies that told us this, accepted them as valid science. I still do.
And that means that when J.C's study shows that power pitchers do better in DIPS, he has replicated a previous result in our field. And, yes, when you replicate a previous result, you should mention that, instead of (even unintentionally) leaving the reader with the idea that the result is new.
Here's a sentence from page 8: "In summary, it appears that pitchers do have some minor control over hits on balls in play: but, this influence is small."
Sean, look at that sentence. Is that something that's new to you? Did you not know this before (or, if you disagree with it, have you at least not seen studies arguing it before)? Is it reasonable to make that statement in a research paper WITHOUT MENTIONING that it has been said (and perhaps shown scientifically) many times in the past few years in the sabermetric community?
I have no problem with the peer review system. I think it's a great idea. My argument is not one against the peer review system.
My argument is that some knowledge got out there *outside* the peer review system, and that knowledge is no less worthy of respect.
My (perhaps uninformed) layman's impression is that "has not been peer reviewed" is academic code for "is crappy and invalid research, perhaps by a crank." Or, at least, code for "is probably not worth considering because there's a good chance it's flawed."
That might be true in some contexts. But not in sabermetrics, where 99% of our research has not been peer reviewed. You will learn almost nothing about sabermetrics if you confine yourself to peer-reviewed studies.
In sabermetrics, the academics' job is much more difficult, because without prior peer review, separating the wheat from the chaff is much harder. But they are NOT EXEMPT from this task. For a long time, academia chose not to study sabermetrics. An alternative scientific community (a large portion of which is Bill James), sprang up alongside to build a flourishing science out of almost nothing. It has accumulated an impressive base of scientific knowledge, especially considering most of its practitioners are unpaid volunteers with no formal statistical training.
It is not fair, appropriate, or ethical for academia to come along these thirty years later and say, "hey, you didn't publish in our journals, and therefore your work isn't worth acknowledging. Therefore, we're going to re-prove all your results and not even mention that you guys found them first."
And isn't that the subtext of what J.C. wrote? "[DIPS] ... is not part of the economics literature." Well, no, it's not. It's part of the SABERMETRICS literature.
We of the "sabermetrics community" are not particularly protective of our territory. Scholars in other fields are quite welcome to research topics that have been traditionally studied in the field of sabermetrics. Welcome! But, please, respect that we're structured differently from what you're used to.
Administratively, the sabermetric community does things differently from academe. We don't have anonymous referees and formal peer review. And that's OK. The only requirement for science to be science is the scientific method, which we follow pretty darn faithfully. Dismissing our work because we don't choose to use your particular method of pre-publication review is disrespectful. Yes, it makes life harder for you academics. You have to figure out what sabermetrics is good and what sabermetrics is bad. Not all of our knowledge can be found in textbooks and journals -- it's scattered all over. If you're new to the field, you might actually have to ask around and learn the culture, just like starting a new job at a new company.
You are not entitled, because you happen to study in a field where knowledge is confined to a limited number of indexed, peer-reviewed journals, to write us off because we are not. You are not entitled to fail to acknowledge our valid research just because, without the imprimatur of a respected journal, you don't know if it's good work or not. You are not entitled to assume that our work is unworthy because it has not been peer reviewed by YOUR peers in YOUR way.
That's all.
That was Steve Goldman in a BPro chat, IIRC.
I can see your taking umbrage at that sentence, Phil. I didn't see it as quite so supercilious, because the context was one of acknowledging the work and confirming its general validity. But I can see where it might come across that way.
I think that Internet forums lend themselves to meta-discussions (like this one). Printed papers don't; there are space constraints and conventions of terseness. At best, the various follow-up Internet discussions of DIPS might have rated a footnote in JC's paper.
Your impression is not only uninformed, it is in fact the mirror image of the credentialism you decry and every bit as obnoxious and damaging. If "has not been peer reviewed" is academic code for anything, it is something much more along the lines of "I can't really be sure how much stock to put in this." "Would never has survived peer review" is what we academics say when we mean what you think we mean by "has not been peer reviewed."
Phil, are you proposing that academics should have to work through all of the work in the Hardball Times, By The Numbers, Baseball Prospectus, FanGraphs, Saberstats.com, TangoTiger, various discussion boards, books, magazines, and other outlets before they publish in sabermetrics? I don't think you understand what goes into the job.
If academics are held to that standard, what standard are amateur sabermetricians held to? Shouldn't they be expected to read a simple statistics textbook before publishing something? Do they need to do literature searches and do bibliographies. After all, why should they be EXEMPT from doing the necessary legwork. It seems to you are suggesting that statheads should go along doing whatever they want, while academics have a much higher standard.
Given the title of the paper was "DOES THE LABOR MARKET PROPERLY VAUE PITCHERS?", I agree with "Bob" that at most a footnote was needed.
From my experience, little to no sabermetric work is published in heavy duty math or statistics journals. Instead it is published in works by the MAA or in Chance which are slanted towards undergraduate education and interesting applications of mathematics and statistics. I definitely would not get credit towards tenure just publishing sabermetrics. It simply would not be seen as serious. Now economics seems to view this work as more legitimate. In terms of mathematics and statistics, most sabermetric material is too simple to be considered "real" mathematics or statistics (to steal a line my colleagues would use).
Not sure I understand or agree with your first sentence, but the rest of what you say is well taken. Thanks for the info.
Dberri agrees
No, of course I'm not suggesting that academics should have to read every web post. But they should be familiar with the major findings in the subject area.
Suppose another economist comes along and reads J.C.'s paper, and wants to study the issue himself. He should know about Voros' paper, which he will learn from J.C. He could then maybe do a google search on DIPS, check out the Wikipedia entry that includes a bunch of links, check the Bill James index to see if it's been mentioned there, maybe do a quick search of the BTF archives, and so on. Isn't searching the literature standard before beginning a paper? Our literature search is just a bit more complicated.
It doesn't have to be perfect -- nobody would fault a researcher for missing a couple of studies considering they're not indexed. But if I were doing a study on, say, clutch hitting, I might find everything I can, and then drop an e-mail to one of the authors of what I found who seemed to know what he's talking about, and saying, hey, here's what I found, are there any other studies you'd recommend?
In any case, citing lack of peer review is a cop-out. Almost NOTHING in sabermetrics is peer-reviewed. If a researcher insists on peer review, he's throwing away the entire existing knowledge base of sabermetrics.
And, of course sabermetricians are held to the same standard. If I start writing about labor markets, and duplicate research that's been done in economics, OF COURSE it's valid criticism to say that I should have cited it. And if I do something stupid mathematically because I don't understand statistics, of course that's reason to be critical of my paper. My argument is that it goes both ways. Am I missing your point here?
He's one of the co-authors of Wages Of Wins.
>Given the title of the paper was "DOES THE LABOR MARKET PROPERLY VAUE PITCHERS?", I agree with "Bob" that at most a footnote was needed.
Why does the title matter more than the content? Up until the middle of page 8, the paper is pure sabermetrics.
If he had decided to evaluate the labor question using Linear Weights instead of DIPS, and he spent eight pages deriving the linear weights formula without mentioning Pete Palmer, would you make the same argument?
Literally half of the non-Ph.D. professionals that I meet in my field assume that I must be a condescending elitist the minute they find out that I hold a doctorate. These folks are blissfully unaware of the fact that their own attitude toward the credentialed is just as poisonous as the behavior they object to. Your somewhat strident take on this whole acedeme vs sabrmetrics thing was a rather unpleasant reminder of all of that. Perhaps I'm over-reacting, and if so I apologize, but it does strike me that you are assuming a whole hell of a lot more disrespect than is actually being shown.
>In terms of mathematics and statistics, most sabermetric material is too simple to be considered "real" mathematics or statistics (to steal a line my colleagues would use).
Sabermetrics is not mathematics or statistics, any more than economics is. The value of a sabermetrics or economics result should not be judged by the complexity of the underlying math. (This can also be my response to the first part of comment #21).
Think of your favorite physics formulas. Gravity, F = g(m1)(m2)/(d^2). Pretty simple math, right? Pretty important discovery still, no?
No, not at all. The code for that is something like "shoddy methodology and invalid research, perhaps by a crank or an undergraduate -- is not worthy of peer review." The next step up the ladder would be something like "would be destroyed in peer review", while the next step might be the less harsh "would not survive peer review."
"Has not been peer reviewed" can mean a couple of things, almost always positive. Most commonly, it is "code" for "Hey, this sounds real good, but will it really stand up? It is worth a serious look. Put some grad assistants on it." Another variation is, "This sounds real good, but since it is not really part of the thrust of my paper, I am not going to run the numbers. This is my disclaimer to cover my tail."
However, even in academic circles, just the term peer review itself often has negative connotations, so your confusion is very understanding.
Do you really believe that the standards are the same? Do the webpages come down if something is discredited? The criticism is valid, but how is it made, in the comments on your blog? In another person's blog. In academics, the paper is not published. Without peer review judging quality in Sabermetrics is largely a crapshoot. Look at PAP. We have zero idea if PAP is any good or not. ZERO.
Also, your interest in citation and understanding statistics is admirable, but I suspect you are in a very, very small minority. For instance, Bill James (who I admire greatly) largely rejects formal studies for more ad hoc ones and has admitted to not reading much of the sabermetric work currently being done.
Without a structure of formal peer review, how can one even begin to 1) wade through the work being done and 2) assess its validity. Some of it seems to stand the test of time, but within academic publishing, we know what the rejection rates are for journals, we know how many citations each article and journal gets. A researcher can stand on the shoulders of others. In sabermetrics, the only really safe approach is to continually reinvent the wheel.
All that said, in the matter at hand, J.C.'s paper is about the economics of pitcher valuation, and I feel he fully cited the work that inspired the first section of the paper.
Nope, no condescension is intended at all. Sorry if my post came out that way. I am equally opposed to credentialism both ways. :)
My experience with Ph.D.s has been no less positive than my experience with non-Ph.D.s.
I have never found academics snooty to us because we don't have degrees. My beef is that academia is snooty towards (or wary of) citing our work because they don't see it as legitimate enough.
Phil
Actually, it's usually published in modified form in a less prestigious journal where fewer people will read it.
Perhaps, this is the crux of the issue. There is no sabermetrics field in academics. No one would be hired listing sabermetrics as their field of study. The academics aren't publishing sabermetrics as much as they are publishing as much as they are publishing math, stats and economics. If they wanted to publish sabermetrics they would choose another outlet by BTN or a website.
One of my papers is on Automated Congressional Redistricting. But is a math/cs paper and not a political science paper. I didn't provide an overview of the political means of redistricting because it wasn't of interest to my audience and wasn't relevant. I gave an introduction and focussed on the math I used to solve the problem.
Because the title and abstract tell you why the paper is important. Many, many academic papers spend 3/4ths of their content tying up loose ends before hitting you with the primary result. My favorites are the math papers with lemmas that have proofs of six pages each and then state the primary theorem of interest with a proof of "follows from lemmas 1 and 2." It happens all of the time.
He mentioned and cited Voros! Voros:Dips::Pete Palmer:linear weights; while Tango:Dips::Patriot:Linear Weights
You would absolutely have a legitimate beef if he had not mentioned or undersold Voros's work. That didn't happen.
This is a good summary of the issue. The available evidence (as Clay Davenport noted WRT minor league pitchers, and as I've noted WRT low BIP pitchers as a group) suggests that "weeding out" based on hit prevention skills does occur, but it doesn't really prove that MLB teams are good at doing it, nor does it mean that we should "expect" a more or less homegeneous population when we look at pitchers who survive long enough to pitch multiple years in the majors.
I think that anyone who has looked at the subject in more than a superficial way agrees with Phil's comment in #28. The only limiting factor that I'm trying to put on the data set is a caution that a *high* BABIP in any given season might be an indicator that the pitcher is not of MLB quality, rather than that the pitcher was *unlucky*.
Somewhere on the site before the season started, I did a mild prediction, selecting some pitchers who I thought were likely to improve over 2005 and some that I expected to decline over 2005. I remember that Mussina was in the "improve" list (which he did, and which I feel pretty good about) and Harden was in the "decline" list (which he did, sort of, but only because he was hurt - I can't really claim credit for that). I can't find the thread in which I did that now, and those were the only two pitchers I remember.
-- MWE
>There is no sabermetrics field in academics.
I agree with you that perhaps that's part of the problem. I would have no objection to sabermetrics becoming affiliated with economics departments in some way. But it really is its own field of study.
>One of my papers is on Automated Congressional Redistricting. But is a math/cs paper and not a political science paper. I didn't provide an overview of the political means of redistricting because it wasn't of interest to my audience and wasn't relevant. I gave an introduction and focussed on the math I used to solve the problem.
Fair enough. In the latest JQAS, the first paper is actually a theoretical mathematical paper, deducing properties of tournaments structured in different ways. It's of interest to sabermetricians, perhaps, but it truly is math. However, most of sabermetrics is NOT math. This paper is an exception. Yours sounds like a math/CS paper on a technique that might be of interest to political types. Same idea, it's a math paper, and perfectly suited to a math or CS journal.
J.C.'s paper isn't a math paper. It's a baseball paper, analyizing real baseball data to learn the relationship between events on the field. So it's sabermetrics rather than economics. That first half, anyway.
>My favorites are the math papers with lemmas that have proofs of six pages each and then state the primary theorem of interest with a proof of "follows from lemmas 1 and 2." It happens all of the time.
And if those Lemmas have been addressed before, the author should cite the literature! It doesn't matter why the paper is important, if, in any part of it, you duplicate previous research, you should cite it.
>He mentioned and cited Voros! ... You would absolutely have a legitimate beef if he had not mentioned or undersold Voros's work.
Point taken, bad analogy. I'll rephrase. What if he had spent eight pages *proving that linear weights was correct,* even though many others have already done so?
It depends, if he used the same methods as they had, and the methods had been widely discussed, then I think they should be mentioned, but my impression is that J.C.'s method was different than others before it.
Phil, are you aware that when Tom Tippett presented his paper at SABR in 2003, he was not aware that Voros had revamped the formulas into a version 2.0 in January of 2002? If Tippett wasn't able to keep up with the changes in the very specific area he was commenting on, why are you holding academics to a higher standard?
We are going to have to just agree to disagree here.
Because the others that had done so had never done so in a peer-reviewed journal, or used rigorous econometric techniques? Isn't that pretty much what JC says in the paper? I'm sure that he would have rather just cited Tippett or Voros and moved on to the meat and potatoes, but his referees were unlikely to accept that.
By the way, check out this paper, mentioned on BBRef a long time ago. In the previous version of this paper, Hakes and Sauer derived their own formulation of WPA (based off of PGP) and cited PGP's creators but not Studes or Fangraphs (it looks like the current version of this paper relegates most of the play-by-play model discussion to footnote 5).
WPA predates Studes and FanGraphs by a long ways.
Drinen, Doug, "Big Bad Baseball Annual"
Cook, Earnshaw, "Percentage Baseball"
What is PGP?
>Do you really believe that the standards are the same? Do the webpages come down if something is discredited?
The easy answer is, do the journals rip out pages if a peer-reviewed article is discredited? There's a fair number of academic articles that, if I had been a peer reviewer, I would have recommended rejecting. (See my blog for some.) It works both ways.
The hard answer is, it's not a matter of just reading websites. It's keeping up with the informal ebb and flow of discovery that takes place in sabermetrics to be able to decide what's valid and what's not. As I wrote -- it's harder than looking in journals, but when almost all the knowledge is "stored" this way, that's just what ya gotta do. As best you can, anyway.
>Without peer review judging quality in Sabermetrics is largely a crapshoot. Look at PAP. We have zero idea if PAP is any good or not. ZERO.
Ok, sure. But for DIPS, we have MUCH idea if DIPS is any good or not, and when and why and how. If what you say about PAP is right, and I have no reason to believe otherwise, then in an academic paper, you'd just say "it has been the subject of much discussion but weak and contradictory evidence, such as (a sentence or two)." Isn't this just common sense?
>Also, your interest in citation and understanding statistics is admirable, but I suspect you are in a very, very small minority. For instance, Bill James (who I admire greatly) largely rejects formal studies for more ad hoc ones and has admitted to not reading much of the sabermetric work currently being done.
Actually, simple ignorance of other research isn't all that horrible, and actually isn't the topic here. But I'd say that if Bill James duplicates a well-known piece of research THAT HE KNOWS ABOUT, to an audience that DOESN'T KNOW MUCH SABERMETRICS, and doesn't credit others, he'd be subject to the same kind of criticism. Don't you agree?
>Without a structure of formal peer review, how can one even begin to 1) wade through the work being done and 2) assess its validity?
WTF??? Do you doubt the validity of everything Bill James ever wrote? Come on, Sean, this is ridiculous. EVERYTHING in sabermetrics has been done without "a structure of formal peer review." And you somehow know what is valid, what is not, and what is iffy. And on the topic of DIPS, so does J.C.
>Some of it seems to stand the test of time, but within academic publishing, we know what the rejection rates are for journals, we know how many citations each article and journal gets. A researcher can stand on the shoulders of others. In sabermetrics, the only really safe approach is to continually reinvent the wheel.
Huh? Are you kidding me? Sabermetrics has made huge strides, from zero, by standing on the shoulders of others. Am I missing something? Am I on another planet today, Sean?
>Phil, are you aware that when Tom Tippett presented his paper at SABR in 2003, he was not aware that Voros had revamped the formulas into a version 2.0 in January of 2002? If Tippett wasn't able to keep up with the changes in the very specific area he was commenting on, why are you holding academics to a higher standard?
My argument was not about inadvertantly being unaware of a piece of research. My argument was about being aware, doing a study on the same subject, and choosing not to mention ANY of the relevant prior work.
So you're changing the subject. But I'll answer anyway. :)
Sure, it was regrettable that Tom didn't know specifically about Voros' updates. I am not arguing that you have to know about EVERYTHING, and know it immediately. I'm saying you have to be aware, broadly, of the state of the art, by doing a reasonable amount of checking at the places where sabermetrics happens (books, websites, BTN, and so forth). All I'm saying is that "there's too much stuff on the web" is not an excuse for not checking the literature.
If J.C. had quoted Tango but not Tippett, if he had quoted Emeigh but didn't know about Lichtman, well, that's OK, and I wouldn't have complained at all. Do a bit of research, then cite what you know and what's relevant. But "I'm not citing anything because there's no peer review" doesn't cut it.
Phil, I meant this comment in reference to single pieces of work not bodies of work. If I come across work by someone I don't know or someone who doesn't really explain their work that well or use standardized techniques, what can we safely say about that work?
You are also taking the attitude that others are as up on this as you are. You are the editor of the only journal on sabermetrics. Almost by definition, you know more about what is going on than everyone else. I'm pretty into this and I would say, I can only keep up with maybe 25-35% of what is going on in sabermetrics.
Player Game Percentages (PGP
OK, sorry, I understand now.
I'd say if you haven't heard any more about the study in a reasonable period of time, you can safely not worry about it. Look, as I said, I argue that you don't have to keep up with *everything*, just the important stuff. And you and I both do that pretty well. I'm probably with you on the 25% to 35%. But both of us are *aware* of more than we actually keep up on. I don't know about what's going on in the PAP debate, but I know there is one. And so if I were about to start a study on tired pitchers, I'd make a point of learning a bit more.
But again, my argument isn't about single studies that are unknown to authors. It's about a body of work that was well known to the author.
Thanks, that makes a lot of sense.
>Phil, he cited Voros McCracken. And we'll just have to disagree as to whether that is sufficient or not.
Sure, let's leave it there. Thanks for the dialogue.
Phil
I don't care so much about the protocol of who should or shouldn't be cited in X type of academic article. I do care that JCB chooses to ignore or minimize work that contradicts his quasi-religious belief in DIPS.
The year-to-year focus also makes the labor market analysis here pretty questionable. If I'm reading it correctly, he uses year 1 performance data to predict year 2 salary. But players don't generally sign a series of one-year contracts. So even if teams DID overestimate the importance of BABIP/ERA, you'd have to look at a pitcher's performance in their pre-contract year (or two years) to find it. Not to mention that teams obviously take account of a pitchers' CAREER performance (including even minor lgs) to determine his value. So why in the world would anyone build a model to predict salary based only on one year of data? Well, one reason might be to avoid having to get into multi-year or career data that would then undercut the DIPS premise.
All this regression shows is that pitcher's actual talent has some correlation with their salary. Big surprise! It would be really interesting to know whether teams overestimate the importance of ERA, wins, etc., but unfortunately this analysis can't answer that.
I actually don't think that it matters very much, in the context of this paper, whether JC had that ulterior motive or not. There are good reasons to minimize the value of single-season performance in hit prevention (some of which Guy mentions) for major league pitchers with a track record, and the fact that JC found that major league teams do in fact seem to minimize that value isn't particularly surprising.
I should add "even if you DON'T accept strong DIPS".
-- MWE
Exactly. The religious belief in DIPS needs to be questioned and destroyed. It's an unsustainable idea that continues to be propagated by otherwise intelligent people.
We started with pitchers have no control over balls in play but then we discovered that in fact this wasn't true because, knucklers, relievers, power pitchers, Greg Maddux, bad pitchers, hurt pitchers, minor league pitchers all have a significant effect on balls in play.
In the face of this evidence we have managed to move towards "most pitchers have little control" over balls in play. That's not progress, that's sophistry and it needs to stop.
This is a very good point.
Is this meant to be a joke?
1) I'm amazed at the ratio of critics to actual research on BTF and the general sabermetric community. It reminds me of a camera club my friend joined, only to find a bunch of men talking about their fancy equipment, but none brought any pictures to the meeting. You'd think with all of the people who offer supposedly damning criticism, at least one would use his knowledge to prove it.
2) The studies that appear to show some control over BIP are methodologically flawed. The critique might be right, but no one has yet used the proper methods to examine the subject. I would do so, but I don't have the data. I have addressed this critique head on. See here. I don't see why GuyM thinks I'm avoiding this argument, since he's "argued" with me over it with all the maturity of a spoiled 8-year-old.
3) If you disagree, do your own damn study and stop busting my balls. The majority of posters here are great, but their is a strong negative correlation between those who criticize and have the ability to criticize in this forum.
4) Overall, I'm quite saddened by Phil's ridiculous charges against me. Here I am, writing a paper that's sticking it to academics for ignoring a particular sabermetric advance, and Phil gets all huffy about my paper, in a field where he doesn't understand the conventions, not having a lit review. I didn't do anything wrong, and he handled his concerns in an extremely unprofessional manner. Rather than mention it to me in person, despite the fact that we had an e-mail conversation the day before in which he didn't even bring up the subject, he chose to thump his chest on the Internet. In follow-up e-mail conversations he acts like it's no big deal, but in and Internet forum I'm the devil. How pathetic. I've lost all respect for that man. I hope he feels at least some embarrassment.
Oh no, the anti-DIPS chort is quite serious
So here are things we know:
1) We know that some pitchers have an ability to suppress hits on balls in play and that this ability follows a normal age curve. (the Greg Maddux rule)
2) We know that minor league pitchers allow more hits on balls in play than major league pitchers. (the Kyle Snyder rule)
3) We know that when pitchers are injured or recovering from an injury they allow more hits on balls in play. (the Curt Schilling rule)
4) We know that when formerly good pitchers are done they allow more hits on balls in play. (the Kevin Brown rule)
5) We know that when pitchers are tired they allow more hits on balls in play. (the Pedro Martinez rule)
6) We know that when pitchers who rely on command are less than perfect they allow more hits on balls in play. (the Josh Towers rule)
7) We know that pitchers pitch differently with men on base than with the bases empty and this effects hits on ball in play. (the Tom Glavine rule)
Ok, I don't actually "know" any of these things because I haven't done any "studies" to "prove" them. Nonetheless . . .
If you believe in DIPS you would be wrong about all of those pitchers and countless others like them. DIPS and DIPS related measurements only look good when they are compared to ERA which is a pretty weak test.
And this is without considering that before WWII strikeouts, walks and homeruns were relatively low, and hence, according to DIPS, pitchers had very little control on run scoring and yet, good pitchers were able, somehow, to be consistently good.
DIPS is worse than useless because it explains away all variation as luck thus making it more difficult to understand the actual, underlying causes.
3) If you disagree, do your own damn study and stop busting my balls. The majority of posters here are great, but their is a strong negative correlation between those who criticize and have the ability to criticize in this forum.
***
Alright, I'll write this up later, as I'm running late to class right now, but I did a little study just now. I took all pitcher seasons since 1921 and divided them into even and odd years. I then tallied up each pitcher's career totals for both halves, and looked only at pitchers with at least 1,000 BIP in both the even and odd years. There were 1,157 such pitchers. I then ran a regression in which BABIP in even years was the dependent variable and BABIP, K/BFP, BB/BFP, HR/BFP, and HBP/BFP were independent. The results were as follows:
Constant = .137 (p = .000)
BABIP = .532 (p = .000)
HR/G = -.002 (p = .267)
BB/G = -.002 (p = .004)
K/G = .000 (p = .542)
HBP/G = .009 (p = .009)
r = .557 (r^2 = .310)
*note: I used per game instead of per BFP notation. That simply means I multiplied those numbers by 38.5. So K/G = K/BFP*38.5.
So here's what we've got:
(a) Controlling for defensive independent variables, BABIP is still highly, highly, highly significant. It's more than half the equation.
(b) Unlike in JC's THT study, strikeouts and home runs are NOT significant. The reason JC found such a significant effect for K/9 is because all other things being equal K/9 are inversely related with BABIP. The higher your BABIP, the more strikeouts you're going to get per 27 outs because your fielders aren't making outs. You must use K/BFP, especially with a one-year sample when BABIP can vary a lot from pitcher to pitcher. For HR/9, it could be the same effect, or it could be because pitchers with high HR-rates often have fluky HR/Fly ball rates, so more of their fly balls stay in the park the next season, and fly balls are converted into outs a very high percentage of the time.
(c) HBP and BB are significant but in opposite directions (more hit batsmen means a higher BABIP, more walks means a lower BABIP). This one I need to think about for a moment.
JC, any comments?
I like what you've done, and it is far better than most of the stuff that's out there. And at least you're willing to share your results. There are a few things that need to be fixed. You didn't correct for autocorrelation (very important). Also, you didn't control for a lot of other important factors (e.g., defense, age, league, parks, etc.). For a study this long, I'd want some sort of year correction.
When I say do a study, I don't mean post some results on BTF. This is a horrible forum to do research. It's good for other things, but not a collaborative research effort. Remember the Simpson's episode when the whole family tries to work a Rubik's Cube together? That's what doing a study on BTF is like.
Sit down, think about it, work through the data over several weeks, then write up an article explaining the full methodology and possible problems. Then present it as a final product to be discussed.
Anyone following this debate should click on JCB's first link and see if they think he has addressed the critique "head-on." He rejects all these studies out of hand because, essentially, they fail to control for pitchers' strikeout rate. Yet JCB's own work found only a weak connection between BABIP and the prior year's K/9, and no signif relationship with current K/9. (And work by DSG, Tango and others suggests there may actually be no relationship at all.) So he knows that controlling for Ks could not possibly change the conclusions these studies reach about real talent differences. Yet he uses this (at most) minor problem as a pretext to dismiss all this fine work. I'll stand by this assessment: "JCB chooses to ignore or minimize work that contradicts" his belief in DIPS.
As for the "all the maturity of a spoiled 8-year-old" comment. I'll let others reach their own conclusions. Look at my comment on JCB's first linked piece and his reply, as well as past BTF threads, and decide for yourself if his assessment is on the mark or, ummm, a bit of projection.
* * *
DSG: great work. Controlling for league will be crucial, at least post-1973. Age is not necessary, of course, because of your even/odd year methodology. If controlling for park and team DER is beyond what you can do now, looking at post-1980 data only would be interesting -- pitchers change teams often enough now that if your results were still this robust, it would be a pretty good indication of BABIP's multi-year predictive power.
I'm staring at DSG's post and I feel dense.
BABIP is still highly, highly, highly significant. It's more than half the equation...
Unlike in JC's THT study, strikeouts and home runs are NOT significant.
I think that he posted this quickly, but I'm not sure what he means. Are these points in favor of using K,BB, and HR stats while ignoring batting average? Or is it the other way around?
I read the Hakes-Sauer paper that JC referenced. I could only find one mention of Voros in it and nothing about DIPS.
Not that I'm anymore certain than you are, but I believe he's saying you can predict even-year BABIP (BABIP2)with the following formula:
BABIP2 = .137 + .532BABIP1 - .002HR/G - .002BB/G + .009HBP/G
So BABIP in year 1 is the most important factor in determining BABIP in year 2, HR/G and BB/G have a slight negative impact and K/G has no impact whatsoever.
Or I'm completely wrong.
...
DIPS is worse than useless because it explains away all variation as luck thus making it more difficult to understand the actual, underlying causes.
You know, as long as you keep arguing against positions that no one holds, you'll almost certainly win all of your arguments. Does that make you feel smart or something?
Yeah, I'm not sure about that either.
I guess we'll have to wait for more statistically-minded readers to help :-)
If you believe in DIPS you would be wrong about all of those pitchers and countless others like them.
All models are wrong. Some are useful.
Essentially, what my test showed is that a pitcher's BABIP is extremely important, even controlling for defensive independent outcomes (which are not very important at all). I'll take JC's suggestions into account and publish something more substantial in a couple of weeks.
Thanks. So it's probably not right to say K/G has no effect on BABIP because the p value was .5, but can we be fairly certain its effect is low in this scenario?
(BTW, I agree with JC's concern about the correlation between the defense-independent stats that may be skewing this)
Those are p values, in other words the probability that the estimator is actually zero. In other words, a very low p value means that the OLS estimator is very unlikely to actually be zero and just the result of noise, whereas a higher p value means the estimator isn't as reliable.
in david's example, there is a 26.7% chance that the HR/G estimator is actually zero, but practically zero chance that the BABIP or constant estimators are.
JC, how would he correct for serial correlation? Use BABIP - league average BABIP rather than simply BABIP?
Also, David, what formula do you use (assuming you're using the lahman database) to estimate AB for pitchers? BFP - BB - HBP ?
Shouldn't this be controlled for league. I would suspect that BABIP1 is less important than LgBABIP2 (or 1)?
One of the general rules I have for projections is what is a batter's IsoBB and ISO. His BA will fluctuate more than htese otehr two factors. So, I think you'd regress and find some correlation that indicated regression to LGBABIP for pitchers would be more effective than the previous year's BABIP.
DOes that make sense?
On one blog I swear I could hear Tango imploding when I tried to make the same argument...
BIP = BFP - SO - BB - HR - HBP
Thanks. So it's probably not right to say K/G has no effect on BABIP because the p value was .5, but can we be fairly certain its effect is low in this scenario?
No. If the p-value is higher than .05 (generally), which it is, we go with the null hypothesis (Ks have no effect on BABIP).
(BTW, I agree with JC's concern about the correlation between the defense-independent stats that may be skewing this)
I'll certainly re-do the test more correctly, but by how much, really? I mean, it's not like the BABIP coefficient was barely significant. The t-value was 20.947, which means that there is a roughly one in infinity chance of it being insignificant.
BABIP2 = .137 + .532BABIP1 - .002HR/G - .002BB/G + .009HBP/G
I'm confused, in this equation are we looking HR per 9 innings or HR per BFP?
Either way its not going to have much affect on batting average but one is a bit bigger than the other.
I would suggest using team BABIP and (pitcher BABIP - team BABIP) as separate variables.
Using team BABIP controls for park, fielders, and league in one fell swoop. Only thing that throws it off is if you're a great pitcher on a team of great pitchers.
DSG defined a game as 38.5 BFP instead of 9 innings.
We're way out of my statistical expertise here, but I thought that regression assumed independence of the component variables and having them be correlated caused major problems. Is that something a high level of significance on one of them can overcome?
Which correlates better with BABIP in year 2- BABIP in year one- or the league average BABIP?
Which correlates better with BABIP in year 2- BABIP in year one- or the team average BABIP?
I would expect BABIP in year 1 to correlate (better than hr/g, k/9 etc) with BABIP in year 2 whether I belived in strong DIPS, weak DIPS or no DIPS.
If nothing else K and BB rates can swing more than batting averages- how often do you see a pitchers BABIP move 100 points (30-40%%)? .250-.350? Meanwhile HR, k and BB rates can fluctuate much more wildly (over 100%)- even if batters had no inherent ability to prevent hits on balls in play I'd expect BABIP from one year to the next to show a stronger positive correlation
Variables are never totally independent. These happen to be relatively independent which means that there shouldn't be much of a problem. But even if autocorrelation was a serious problem, it's doubtful that controlling for it would bring down the significance level essentially by a factor of infinity.
Almost everyday I read about baseball on the internet I find someone propagating exactly what I'm arguing against.
The problem with DIPS is that it's not sold as a model. It's sold as a truth. Even the watered down version is sold as "most pitchers have little control" over balls in play. It is this statement, which is almost certainly false, that I'm arguing against.
Now whether the model is useful is a matter of perspective. If I was going to spend ten minutes drafting a fantasy team then a DIPS model would be quite a useful thing. If, on the other hand, I was going to make a multi-million dollar decision then I might want to hold my models to a higher standard of usefulness.
And don't start with the nobody here is a general manager rountine. I know we aren't. But people around here are pretty serious about baseball and they spend a fair amount of time thinking and talking about it. So if we're going to spend the time and we're going to be serious then we might as well be right, instead of spreading falsehoods throughout the internet like a virus.
That is fairly similar to my attitude. All things being equal, the guy with the better FIP ERA may be the one to sign, but medical issues and CBW-like mechanical analysis needs to be considered, too.
Calcualte BIPA for each pitcher.
Use either the Basic Pitch Count Estimator or the Extended Pitch Count Estimator. Calculate pitches thrown.
Now, sort by "pitches thrown". Make a 9-point moving average of BIPA for each pitcher. (Except, obviously, for the first 4 and last 4.)
Make a scatterplot of the Moving Average of BIPA against the Pitches Thrown.
This is what pops out:
http://img142.imageshack.us/img142/3840/bipaplotij7.jpg
Note:
I used Lahmann data, with pitch-count estimator formulas from Tango's website
Where? Not in this thread. Maybe you're reading the same archived pages over and over again. Seriously. More likely, you're just so worked up about this that you overinterpret every mention of DIPS as "propagating" a much stronger version of it than anyone ever believed. The simple fact that you would use the phrase "pitchers have no control over balls in play" proves my point. If I might go TOLAXOR for a moment, NO ONE EVER REALLY BELIEVED THIS. If that statement were literally true, there would be no such thing as ground ball pitchers and fly ball pitchers. Whether you care to admit it or not, most major league pitchers have little control over the outcome of balls in play is a very different statement than the one you keep attacking. Now, I'll grant you that there are still problems with even the formulation as I just stated it, and the real point of this kind of analysis should be to refine and improve our ability to prospectively evaluate pitchers (ie -- predict future performance) rather than quibbling over semantics. But still, you should stop arguing against that strawman; it really doesn't help your side.
Among the candidates to improve:
Mussina chopped nearly a run off his ERA, and his ERA+ went from 101 to 125. Penny's ERA went up, but the league ERA went up as well, so his ERA+ actually improved slightly, from 104 to 106. I'd take that as "virtually the same pitcher" rather than as improvement. Jason Johnson pitched himself out of the majors. Mark Mulder was showing improvement for nearly the first two months (3.74 ERA through May 22), but then he got hurt. One clear-cut right pick, one clear-cut wrong pick, one borderline pick (at best), one "who knows what would have happened if he's stayed healthy?"
Among the candidates to decline, three guys got hurt (Armas, Ramirez, and Harden). Armas and Ramirez were somewhat better in '06 than in '05 when they were healthy; Harden's season was significantly worse, but in fairness he was close to the same pitcher in '06 when he finally did get healthy. Bruce Chen crashed and burned. Rogers's ERA actually did decline, as did his ERA+. One clear-cut right pick, one borderline correct (masked by the team success), and three "who knows what would have happened had they been healthy all year?" Not such a great track record, except at picking pitchers who were likely to be injured.
-- MWE
So are studies that conclude there is little-to-no-control, and that select from a narrow group of major-league pitchers - those who pitch "x" numbers of innings in back-to-back years - because those studies do not account for the impact of selection bias on the results.
From JC's paper:
That eliminates from the sample all pitchers who failed to pitch 100 innings in season 1, AND all pitchers who failed to pitch 100 innings in season 2. There's a significant amount of selection bias right there; only the pitchers who are "most" able to make it through two such seasons in a row even qualify for the study.
All told, in 2004 and 2005 combined, there were 776 different pitchers. 462 of them pitched in both 2004 and 2005, so right off the bat the selection criteria about 40% of the total sample set of pitchers. Of the 462 pitchers who pitched in both 2004 and 2005, 142 of them pitched 100 or more innings in 2004 - so now the selection criteria have eliminated over 80% of the total sample set of pitchers, without even considering the pitchers who pitched 100 or more innings in 2005. Fortunately (whew!) 104 of the 142 who pitched 100 innings in 2004 also pitched 100 innings in 2005, so only another couple percent of pitchers were eliminated - but still, the total sample is only 14% of the whole group of major league pitchers. But note what else has happened:
All of the "really" bad pitchers are gone.
All of the relievers are gone
All of the guys who were regular starters when they did pitch, but who didn't pitch a full season, either because of injury (Curt Schilling, 2005) or because they didn't crack the rotation until midway through a season (David Bush, 2004), are gone.
What's left are front-end-of-the-rotation pitchers - three pitchers per team (well, 3 1/2, actually). And baseball people will tell you that there's a lot of difference between front-end-of-the-rotation pitchers and other members of the pitching breed - in other words, that not every pitcher even gets a chance to be a front-end starter, and thus to qualify for JC's study group.
Suppose I were doing a study of the relationship between wealth and educational opportunity. If I were to draw my sample from Wakefield (a suburban country-club community here in Raleigh), and found that there was no correlation between wealth and educational opportunity, would you consider that to be a reliable conclusion? I can't imagine that you would, because you'd realize from the get-go that my sample was biased; it's awfully hard to find enough low-wealth families in Wakefield to balance such a sample.
The bias here is no less real; the pitchers that make it into studies like JC's are no more representative of the entire population of major league pitchers in any given season than are the residents of Wakefield representative of the entire population of families with school-age children. And not only is this sample biased - it's biased in such a way as to virtually guarantee that "if" hit prevention skills are a true indicator of pitcher ability, you would NOT be able to find them in your sample anyway - because you've selected only the "most" able pitchers in MLB. And when you look at "less able" pitchers - those who get weeded out of the study by the selection criteria - and compare them to the group of "able" pitchers that you have selected - you see a noticeable difference in their hit prevention skills. That's what Clay Davenport did, looking at minor league pitchers who advance through the minors vs those who do not; that's what I did, looking at low-BIP pitchers vs high-BIP pitchers within a single season. If you look at all pitchers - starters, relievers, major leaguers, minor leaguers - and not just the elite core of front-line starters, there's enough evidence that a pitcher has to have a significant ability to prevent hits on BIP in order to (a) get to the majors and then (b) pitch enough to become an elite starter, and that because nearly all of the variation in hit prevention skills has been selected out of the study group because of the selection criteria, JC's conclusion is both unsurprising and flawed.
-- MWE
It's a baldface lie that no one ever believed that but I'll let that one slide because that's not the claim I'm arguing against. What I'm arguing against is exactly the claim you made that "most pitchers have little control over the outcome of balls in play. In fact let's look at what I actually wrote.
So, in fact you quoted me exactly in your own formulation with the small, though I suppose important, addition of "outcome." So I'm not arguing against strawmen I'm arguing against exactly what you say is what people really believe. My point is that this improved formulation is still wrong on a very significant scale. I then went on to categorize the many different ways that it is wrong. I even left out the groundball/flyball thing because defenders of ZIPS don't think that matters because it "evens out" in terms of run prevention.
Great points, and analogy, on the DIPS issue.
What did you think of the second, market valuation, part of the paper? I don't see how the methodology works (post 60), but I may be misunderstanding what he did. (The same approach was used in the Hakes/Sauer paper that purports to show that the MLB marketplace long undervalued OBP, but then corrected suddenly in 2004.)
I think your critique of the salary estimator is correct. One thing I did like about JC's model was the attempt to identify arbitration-eligibility, though it isn't clear how he did that in detail. In correlating salaries over time, I think you'd also want to take explicit account of the changing labor contracts - changes to minimum salary, compensation rules for free agent signings and so on. Moreover, this was a labor market with proven collusion at one time. Did I miss a yearly salary corrector proxy in the model (analagous to a correction for changing run environments between leagues & years?)
One more perhaps minor issue is JC's use of ERA in his models. Since Craig Wright's Diamond Appraised in 1987 it should be clearer that unearned runs are also influenced by the pitcher's ability, and are relevant to the pitcher's value. [I think there's been more recent work confirming this, perhaps by Tom Ruane, but I don't recall any details offhand.] For JC's purposes, R/9 inn, not ERA is the better metric for estimating from DIPS or comparing to DIPS. Similarly, JC uses the pitcher park factor PPF from the Lahman database to "normalize" ERA. But as I understand it, that factor is a run adjuster, not an earned run adjuster - one more reason to use R/9 inn rather than ERA in the first place.
I believe he adds a dummy variable for arbitration-eligible, and another for free agency (and the young wage-slaves are the default). I'll defer to others who know more about regression than I do, but I don't see how a dummy variable can properly adjust for what are really three (or at least two) discrete labor markets. The non-arb-elig players all make within a very narrow range, so would greatly understate the market value of their skills. For example, Miguel Cabrera made $472,000 this year, less than a replacement-level middle infielder. To him, free agency would add perhaps $15M to his contract, but giving free agency to Chris Duncan might be worth only $1-2M. No constant value for "free agency" can work here. Arb-elig players are somewhere in between, and maybe using a dummy for arb-elig would work if you lumped them in with free agents. But the non-arb players need to be treated separately or left out of the analysis, I would think.
But again, I may not have a full understanding of how the model is working....
You must be Registered and Logged In to post comments.
<< Back to main