Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Tuesday, October 03, 2006

Sabermetric Research: Birnbaum: Chopped liver

And let the DIPS fall where they may…

And so Dr. Bradbury sets out to correct this. How? Not by reviewing the existing research, and validating it academically. Not by finding those studies which have “insufficient statistical rigor” and analyzing them statistically. Not by summarizing what’s already out there and criticizing it.

No, Dr. Bradbury’s paper ignores it. Completely. He mentions none of it in his article, not even in the bibliography. Instead, Dr. Bradbury’s explains his own study as if it’s the first and only test of the DIPS hypothesis.

This happens all the time in academic studies involving baseball. Years of sabermetric advances, as valid as anything in the journals, are dismissed out of hand because of a kind of academic credentialism, an assumption that only formal academic treatment entitles a body of knowledge to be considered, and the presumption that only the kinds of methods that econometricians use are worthy of acknowledgement.

Repoz Posted: October 03, 2006 at 09:44 PM | 101 comment(s) Login to Bookmark
  Tags: sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 2 pages  1 2 > 
   1. Yakov Posted: October 03, 2006 at 10:11 PM (#2196409)
In Soviet Russia, liver chops YOU!
   2. dr. scott Posted: October 03, 2006 at 10:24 PM (#2196422)
this guys complaints about the article are not nearly as interesting as the article, that claims that Front ofices were correctly evaluating pitchers on the results of DIPS before Voros's article came out in 2001, and that aprt from some dlips during expansion, the market always seemd to figure this out... that I find interesting.
   3. More Dewey is Always Good Posted: October 03, 2006 at 10:25 PM (#2196424)
Wait, people with advanced degrees look down their noses at research performed by people without advanced degrees? Shocking!
   4. deb Posted: October 03, 2006 at 10:28 PM (#2196429)
I don't know that what Dr. Bradbury did was all that unusual in academia. I say that from the standpoint of listening to stories from friends getting doctorates in engineering and them complaining about their doctorate prof taking their original work and calling it his own. And said doctorate prof getting big bucks for the students work.

(I wont name schools that this is common practice at but one of the biggies for doing this is a powerhouse in football :-)
   5. Frisco Cali Posted: October 03, 2006 at 11:03 PM (#2196466)
Northwestern?
   6. dr. scott Posted: October 03, 2006 at 11:17 PM (#2196476)
In engineering and science it depends on the stage of the profs career. the rule of thumb I have noticed is... If he/she is well established his/her name will be at the end of the list. If s/he is new the prof's name is at the beginning. Ive had one of each.
   7. Kyle S Posted: October 03, 2006 at 11:23 PM (#2196480)
I'm sure JC will be around to comment soon. I'm glad this thread was linked if only to find out about the paper.
   8. Moloka'i Three-Finger Brown (Declino DeShields) Posted: October 04, 2006 at 12:02 AM (#2196503)
In “Solving DIPS,” a bunch of really smart people, statistically literate, probably no less intelligent than academic economists and much better versed in sabermetrics, do an awesome and groundbreaking job of determining the causes of variation in the results of balls in play.


Maybe it's just me, but the phrases like "really smart people," "awesome . . . job," etc. don't exactly convey the intellectual posture intended for this article.
   9. Robert in Manhattan Beach Posted: October 04, 2006 at 12:05 AM (#2196508)
Instead, Dr. Bradbury’s explains his own study as if it’s the first and only test of the DIPS hypothesis.

Unfortunately there is a lot of this going around, JC is just one of the more blatent. He got his book deal.
   10. dlf Posted: October 04, 2006 at 12:08 AM (#2196511)
Despite my advanced degree, I can't fathom most of this article. Sections 2 and 3 seem to be a rehash of what has been discussed before. I understand Birnbaum's point about a lack of "respect" for the work in the sabermetric community, but those sections seem prefatory to the real meat of what I gathered from the article -- that the labor market has been valuing the items over which a pitcher has nearly exclusive control more than they value ERA. I do hope JC drops by and is willing to discuss that element of the paper.
   11. Flynn Posted: October 04, 2006 at 12:18 AM (#2196525)

Maybe it's just me, but the phrases like "really smart people," "awesome . . . job," etc. don't exactly convey the intellectual posture intended for this article.


Actually, it does.
   12. 185/456(GGC) Posted: October 04, 2006 at 12:34 AM (#2196558)
Here's a linksometrics article for Dial and the other duffers.
   13. BDC Posted: October 04, 2006 at 12:44 AM (#2196596)
Bradbury cites Voros clearly and replicates some of his findings. He does not cite a lot of the subsequent discussion and replication by others. I'm not sure how much of a sin he has committed either against academic protocol or against common sense. Bradbury appears to test Voros's findings for himself, which is admirable. And we're dealing with a pretty finite data set here, which is the only data set possibly relevant to the problem of major-league pitching, so it isn't like he's ignoring different paradigms and conceptualizations that might be valid. BABIP is a significant part of ERA, and BABIP fluctuates a lot from year to year; this is not all that debatable, and Bradbury properly cites Voros for being the first to realize it.
   14. Danny Posted: October 04, 2006 at 12:56 AM (#2196625)
There’s Tom Tippett’s famous study that showed that power pitchers performed better than projected by DIPS -- the same conclusion that Dr. Bradbury reaches. (Tom didn’t do any significance testing, though, which I guess makes his excellent analysis unworthy of citation.)

Wasn't Tippett widely knocked for ignoring Voros' follow-ups to DIPS 1.0 in conducting his study.
   15. Garth found his way to daylight Posted: October 04, 2006 at 02:40 AM (#2196868)
Fight! Fight! Fight! Fight!

In Case You're Interested
What kind of person would I be if I didn’t acknowledge my critics? I’m not sure why Phil is so angry with me, and I don’t think he chose the best forum to address his concerns. I guess I should apologize for knowing some econometrics, since it’s only useful for out-credentialing other scholars. If you read the paper, I think it’s very clear that I’m not claiming to have invented DIPS, just trying to replicate it for an academic audience, then to test how well the labor market values DIPS. I’m not sure that it warrants a detour through a few studies on the Internet. I think it’s a pretty pro-sabermetric paper.
I'll take winner.

(Oh, and twenty bucks says that this "< a >" button is lying to me and that link didn't work.)
   16. Garth found his way to daylight Posted: October 04, 2006 at 02:42 AM (#2196874)
(Oh, and twenty bucks says that this "< a >" button is lying to me and that link didn't work.)

Just kidding, of course.
   17. Too Much Coffee Man Posted: October 04, 2006 at 03:42 AM (#2196978)
Maybe it’s one of the realities of academic life that to get an article published, you have to ignore non-academic work.
No,it is important to cite the relevant published work on the topic, but it is not uncommon to cite unpublished - non-academic - works. With the popularity of blogging and tools like Google Scholar, this is becoming increasingly common.

Maybe the professorial culture requires that you presume no knowledge is real until it’s been published in peer-reviewed journals.
That's over-stating it. There is a difference between peer review in a site like this and what takes place in publishing in a scientific journal. From what I've read, feedback here (or at Birbaum's site, or Bradbury's) is actually more useful than what you get from formal reviewers. However, the key difference is that the feedback comes AFTER the study is published (on the internet). I could do a crappy study, post it on a site, get lots of negative feedback, and others may choose to site my non-journal/unpublished study. The reader may never know how bad it is. However, in an academic journal, negative feedback may kill the submission, so that it never sees the light of day. So, it's not really whether the knowledge is real or not, but the confidence you have in its credibility (based on citation alone).

Maybe Dr. Bradbury is right that all the previous results are questionable unless he uses the exact technique that he does.
Doubtul, there are methods that fit the data and the research question better, but most scholars will accept the fact that different researchers use different research strategies. In fact, it creates confidence that what you're looking at is a real phenomenon and not an artifact of the method.


<u>And maybe I’m just overreacting</u> to a couple of throwaway sentences intended only to get his paper past the referees.
I love Phil's work, but I would say, yes, it's an over-reaction.
   18. Sean Forman Posted: October 04, 2006 at 04:10 AM (#2196985)
I think that Phil fails reading comprehension here. The abstract states, "Abstract. Defense in baseball is a product of team production in which pitchers jointly prevent runs with fielders. This means that raw run prevention statistics that economists often use to gauge the value of pitchers, like ERA, may not properly assign credit for their performances. Therefore, marginal revenue product (MRP) derivations based on such statistics contain some erroneous information that may bias the estimates. In this paper, I examine a method for isolating pitcher contributions to the team production of defense. Evidence from the labor market suggests pitchers are paid according to their individual contributions consistent with the areas in which pitchers possess skill."

This is an economics article, not a statistics article. The point is not DIPS, but are GM's already using DIPS to value pitchers. I don't see where J.C. is ignoring anything. He fully credits Voros's original work and then applies the standard statistical tools used by economists in order to say, "See economists, this does in fact hold true using the techniques you know and trust." He then works out what the economic impact of this effect are to the compensation given to pitchers, which as far as I know is completely original work.

A rehash of Tippett, TangoTiger, Emancip8d, and Patriot (get those by the reviewers) isn't really necessary for what he is trying to do here. I'm not sure what Phil wants here. A list of every internet article on DIPS?

Perhaps, J.C. is a little flip with the comment about rigor, but I don't think he is too far off from the truth. Very little of the work done on sabermetric websites and sabermetric books would pass muster with regards to the level of statistical analysis needed to be published in an academic journal. The ideas are good, but doing statistics well requires training and knowledge that most people simply do not have. Even as an applied mathematician, I would lump myself into the group that doesn't know enough statistics. For instance, one of Bill James's favorite techniques of using matched pairs of players to show some effect may show that there is an effect, but it gives us little info as to what the size of the effect is and tells us nothing about the error bars around that measurement for the population at large.

In the interest of full disclosure, I have a PhD., consider myself a friend of J.C.'s, and I share Phil's frustration at academics poaching ideas from the sabermetric community, but I think he chose a very bad example here.
   19. Phil Birnbaum Posted: October 04, 2006 at 04:49 AM (#2196998)
Hi, Sean (#18),

Well, actually, I do have *some* reading comprehension. :) The first half of J.C.'s study tests whether DIPS is correct. Only the second half deals with the labor market issues. My point relates only to the first half, the part to "verify if [DIPS] is correct using the proper econometric tools." That half is pure sabermetrics.

And I don't dispute that J.C.'s article is an improvement on previous work. I think studying DIPS using better statistical techniques is a welcome and worthy endevor. Actually, Sean, I am NOT frustrated at academics poaching ideas from the sabermetric community -- ideas are public domain, and anyone should be welcome and encouraged to grab whatever ideas are out there and run with them. I'd be very happy if J.C. eclipses previous work and sets he standard for analysis of DIPS. My point is simply that while he does so, he should mention previous valid work on the same topic where relevant.

I don't want a list of every internet article on DIPS. But I did expect an acknowledgement that other researchers have tested DIPS before (and in a valid way). What J.C. did is a *further* test of DIPS over and above what's already done. This is certainly a good thing. But given that there are already other tests out there, some of them should be mentioned. Not all of them, just the ones that J.C. thought were valid, perhaps the ones that influenced his own work. (Maybe Tippett, since J.C. actually wrote about it previously, and it comes to the same conclusion?)

Sean, you did a well-received study on catchers and passed balls. You didn't publish it academically -- it's on your website in Powerpoint format. Now, let's say I read your study, and then I figure out another way to measure the same skill. I write an academic article using my method. I write, "Bill James once said he wished he had a way to figure out which catchers were best at preventing passed balls. I now have a rigorous method that uses standard econometric techniques." I don't mention your study at all. Isn't that totally inappropriate? Wouldn't you call me on it?
   20. Phil Birnbaum Posted: October 04, 2006 at 05:10 AM (#2197001)
Too Much Coffee Man (#17),

I agree with you that even after peer review, bad studies stay on the web. However, given that J.C. has followed much of what has been written about DIPS, he is in a position to know which are valid and which aren't, and cite only the appropriate ones.

That's really the heart of my argument -- J.C. is very active in the sabermetric community. He's been around for the debates and the studies and the discussions. Is there really NOTHING that any of you guys have done on this topic that J.C. has seen that's worthy of citation as prior work? It's been five full years. After all that's been done on DIPS in five years, there's absolutely zero scientific knowledge about DIPS that's relevant to J.C.'s study?

<quote>">... I would say, yes, it's an over-reaction."</quote>

Thanks, I appreciate hearing that.

Phil
   21. Walt Davis Posted: October 04, 2006 at 07:08 AM (#2197014)
Pardon my snooty academicism (no really, I know I am), but "Solving DIPS" is mainly a bunch of really smart people working their way through the very basics of the binomial distribution, measurement theory, and covariance algebra. It's impressive in that many of them didn't seem to know this stuff before and worked it out, but it's stuff you get in intro stats courses.

From a statistical analysis perspective, what's impressive (to me) about well-done sabermetric studies is not (generally) the statistical analysis but the truly excellent and careful work with the data, what it means, what it doesn't, what needs to be adjusted for, etc. It would be great if more academics did that.
   22. Gaelan Posted: October 04, 2006 at 11:49 AM (#2197043)
From the article:

The consensus, seven years later, after countless hours of research, analysis, and dialogue, is that the theory is generally true – pitchers have much less control over batted balls than previously believed, and any material deviation from league average is almost always just luck.


But this isn't the consensus at all. In fact the very studies that he quotes later in the article, Tippett's and the solving DIPS, both demonstrate that this strong version of DIPS is false.

The exageration of DIPS is a big problem that the sabremetric community needs to face honestly.
   23. Kyle S Posted: October 04, 2006 at 01:52 PM (#2197116)
I think the key phrases of that paragraph are "material deviation" and "almost always". There's plenty of room in there for "knucklers/Greg Maddux/Mariano Rivera consistently beat their team BABIP by less than a standard deviation". As I read it, the larger point (no one consistently outperforms their eBABIP by 2 or 3 SDs ) is true. Of course, JC's paper won't touch on closers, because they don't accumulate enough innings in consecutive seasons to show up in the data, so perhaps he should make mention of that failing somehow, especially since closers are one group that seem to be an aberration to classic strong DIPS theory.
   24. Sean Forman Posted: October 04, 2006 at 02:06 PM (#2197127)
And so Dr. Bradbury sets out to correct this. How? Not by reviewing the existing research, and validating it academically. Not by finding those studies which have “insufficient statistical rigor” and analyzing them statistically. Not by summarizing what’s already out there and criticizing it.


Phil, again the point of his paper is not to validate other people's studies of DIPS, but to introduce DIPs to an economics audience using rigorous analytical tools. I doubt a paper that did nothing else, but rehash and beef up the work of Tippett, TangoTiger and the Primates would be accepted to an economics journal. The work on DIPS is really just an introduction to the meat of the article which is the marginal revenue products section. It's an ECONOMICS paper, not a STATISTICS or SABERMETRICS paper. Re-read the abstract, there are two sentences about the DIPS aspect and three on the marginal revenue products aspect. For that matter, read the title, "DOES THE LABOR MARKET PROPERLY VAUE PITCHERS?" If his article was, "IS THE DIPS HYPOTHESIS VALID?" then you would be completely right. As


Sean, you did a well-received study on catchers and passed balls. You didn't publish it academically -- it's on your website in Powerpoint format. Now, let's say I read your study, and then I figure out another way to measure the same skill. I write an academic article using my method. I write, "Bill James once said he wished he had a way to figure out which catchers were best at preventing passed balls. I now have a rigorous method that uses standard econometric techniques." I don't mention your study at all. Isn't that totally inappropriate? Wouldn't you call me on it?


It depends. If you got the idea from me, then you should cite me (as J.C. cites Voros). If you got the idea elsewhere and are just aware of the method, a citation might be good. However, I still think you are comparing apples and oranges. J.C.'s paper is an economics paper about more accurately measuring how teams assign marginal revenue product to pitchers. Your article is about measuring catcher defense. You'll notice that J.C. goes into detail and gives a survey of other articles measuring marginal revenue product for pitchers. If you are doing a purely sabermetric paper, then I think you would need a fuller survey of other methods.

I apologize if I'm being too strident here, but I feel Phil is a little too recklessly impugning J.C.'s reputation here.
   25. BDC Posted: October 04, 2006 at 02:22 PM (#2197144)
Again, I will second Sean's comments. I am an academic too -- in the humanities -- and I am an editor on a peer-reviewed journal (Aethlon, the sport literature journal). We moved to peer review recently in order to involve more colleagues in the assessment of scholarship in our field. So I may be biased toward the peer-review system, and I am certainly somewhat dismayed to see a quick swipe at the process itself as exclusionary in some way, provoking attendant comments from posters about the haughtiness of the credentialed.

There's a healthy debate going on about the present and future of peer review: should the old print-oriented system prevail, or should articles be rapidly circulated via the Internet and just as rapidly critiqued there? And there is a lot of fur flying, but there is also some serious debate over the protocols and the social significance of different ways of distributing knowledge. Some "academic credentialism" is involved in these debates, but like Sean I fail to see how it infiltrates JC's paper, which is quite respectful towards original Internet-published work. If anything, JC gets Voros cited and into the academic knowledge system -- maybe that will lead to more citations.
   26. Mike Emeigh Posted: October 04, 2006 at 02:38 PM (#2197169)
There are two issues that I consider important, of which JC's paper addresses only the first:

1. Do teams properly value pitcher skills, once they are already in the market of major league pitchers?
2. Do team properly evaluate pitcher skills in order to determine whether or not a pitcher should be in the market of major league pitchers in the first place?

If (as I and others have argued), hit prevention skills are a key criterion for determining whether a pitcher can enter the market of major league pitchers in the first place, then I think it's logical to assume that those skills should become less important as a discriminant among pitchers that have already qualified for entry into the market.

-- MWE
   27. JPWF13 Posted: October 04, 2006 at 02:58 PM (#2197187)
But this isn't the consensus at all. In fact the very studies that he quotes later in the article, Tippett's and the solving DIPS, both demonstrate that this strong version of DIPS is false.

The exageration of DIPS is a big problem that the sabremetric community needs to face honestly.


I've noticed this as well, DIPS advocates seem to simply ignore the fact that some in the stathead community dipsutre their assumptions. For instance one article I read last week began like this, "...and for those who believe in DIPS, and at this point isn't that everyone, ...."

I think one problem with the arguments is the lack of proof (for both sides of the argument)
It very well may be that teams do a good job of removing pitchers with poor hit prevention skills from their systems before they reach the majors- as a result, the "survivors" tend to have hit prevention skills that flatten out at a certain level. Those who are worse than that level do not last very long even if they reach the majors. As a result it's difficult to get a decent smaple size of pitchers who are bad at hit prevention pitching at the major league level- to see if they are really bad at hit prevention - or just unlucky.

One discussion chain I saw had a great back and forth concerning the viability of MLEs (or minor league translations) The "anti" position was driving the "pro" position insane. Simply stated the anti-position was this: MLEs are absolutely 100% useless- the only reason that they "appear" to work so often is because teams are good at weeding out the AAAA hitters who can't hit in the majors the way their MLEs say they should- as a result the players who would statistically disprove the ideas behind MLEs are systematically removed from the MLB player pool. Neither side could convince the other because, each side had a belief that the other lacked an empirical basis to invalidate.

To over simplify, let's say someone "Voros" believes wholeheartedly in DIPS; whereas someone else, "Mike" belives that hit prevention skills are real, but that the weeding out process of teh minor league system weeds out those with poor skills from reaching the majors, or accumulating significant playing time once they reach there- as a result DIPS studies are conducting on a population that happens to be pretty homogenous with respect to the trait your trying to study.

Voros lacks the data to disprove Mike's assumption, Mike lacks the data to disprove Voros' assumption.
   28. Phil Birnbaum Posted: October 04, 2006 at 02:59 PM (#2197189)
Sean (#24),

>... I feel Phil is a little too recklessly impugning J.C.'s reputation here.

That is fair criticism of my post, and separate from the other issue. I'm open to the accusation that I am being too harsh on J.C. and am unfairly impugning his reputation. That is quite possible, and if you convince me of that, I will apologize. But the question of whether he should have cited previous work is independent of that.

>It's an ECONOMICS paper, not a STATISTICS or SABERMETRICS paper.

No, the first half is a SABERMETRICS paper. Analyzing whether DIPS is valid or not is Sabermetrics. I don't care if it's in an economics paper, a nuclear physics paper, or a history paper. It is what it is, and that is *sabermetrics*.

And it is not a matter of "re-hashing," as if the work of those previous researchers was flung against the wall and would have to be scraped off and cleaned up. It is scientfic research on the EXACT SAME question that J.C. addressed in the sabermetrics portion of his economics paper. It is, in my opinion, good and valid scientific research. And, in my opinion, when you research the same question that others have, and their research is valid, you mention, maybe even in one sentence, that you are confirming their result.

This is *more* true, not less true, when you are writing for a completely different audience who is likely unaware of previous research on the question -- for instance, when you do a sabermetrics study for economists.

If you were an economist with no knowledge of sabermetrics and you read that paper, what would your impression be? It would be something like,

"Hey, there's is something called an "analytic baseball community." What's that? I dunno. Is it a group of serious researchers? Nah, their "concepts" have undergone "little formal scrutiny," so it's probably just a bunch of fans with some idea in their heads. It's nice to see economics come to the rescue by finally putting this idea to a proper scientific test."

That's just wrong. The sabermetric community has done some excellent work on this question and come up with some valid answers. The same answers that J.C. was able to duplicate.

Sean, let me ask you -- before you read this paper (assuming you have), did you know that knuckleballers tended to exceed beat the expected results of their balls in play? Did you know that power pitchers did also?

I did. And you probably did too. And a lot of readers did. And by using the word "knew," I mean I trusted the studies that told us this, accepted them as valid science. I still do.

And that means that when J.C's study shows that power pitchers do better in DIPS, he has replicated a previous result in our field. And, yes, when you replicate a previous result, you should mention that, instead of (even unintentionally) leaving the reader with the idea that the result is new.

Here's a sentence from page 8: "In summary, it appears that pitchers do have some minor control over hits on balls in play: but, this influence is small."

Sean, look at that sentence. Is that something that's new to you? Did you not know this before (or, if you disagree with it, have you at least not seen studies arguing it before)? Is it reasonable to make that statement in a research paper WITHOUT MENTIONING that it has been said (and perhaps shown scientifically) many times in the past few years in the sabermetric community?
   29. Phil Birnbaum Posted: October 04, 2006 at 03:47 PM (#2197246)
Bob (#25),

I have no problem with the peer review system. I think it's a great idea. My argument is not one against the peer review system.

My argument is that some knowledge got out there *outside* the peer review system, and that knowledge is no less worthy of respect.

My (perhaps uninformed) layman's impression is that "has not been peer reviewed" is academic code for "is crappy and invalid research, perhaps by a crank." Or, at least, code for "is probably not worth considering because there's a good chance it's flawed."

That might be true in some contexts. But not in sabermetrics, where 99% of our research has not been peer reviewed. You will learn almost nothing about sabermetrics if you confine yourself to peer-reviewed studies.

In sabermetrics, the academics' job is much more difficult, because without prior peer review, separating the wheat from the chaff is much harder. But they are NOT EXEMPT from this task. For a long time, academia chose not to study sabermetrics. An alternative scientific community (a large portion of which is Bill James), sprang up alongside to build a flourishing science out of almost nothing. It has accumulated an impressive base of scientific knowledge, especially considering most of its practitioners are unpaid volunteers with no formal statistical training.

It is not fair, appropriate, or ethical for academia to come along these thirty years later and say, "hey, you didn't publish in our journals, and therefore your work isn't worth acknowledging. Therefore, we're going to re-prove all your results and not even mention that you guys found them first."

And isn't that the subtext of what J.C. wrote? "[DIPS] ... is not part of the economics literature." Well, no, it's not. It's part of the SABERMETRICS literature.

We of the "sabermetrics community" are not particularly protective of our territory. Scholars in other fields are quite welcome to research topics that have been traditionally studied in the field of sabermetrics. Welcome! But, please, respect that we're structured differently from what you're used to.

Administratively, the sabermetric community does things differently from academe. We don't have anonymous referees and formal peer review. And that's OK. The only requirement for science to be science is the scientific method, which we follow pretty darn faithfully. Dismissing our work because we don't choose to use your particular method of pre-publication review is disrespectful. Yes, it makes life harder for you academics. You have to figure out what sabermetrics is good and what sabermetrics is bad. Not all of our knowledge can be found in textbooks and journals -- it's scattered all over. If you're new to the field, you might actually have to ask around and learn the culture, just like starting a new job at a new company.

You are not entitled, because you happen to study in a field where knowledge is confined to a limited number of indexed, peer-reviewed journals, to write us off because we are not. You are not entitled to fail to acknowledge our valid research just because, without the imprimatur of a respected journal, you don't know if it's good work or not. You are not entitled to assume that our work is unworthy because it has not been peer reviewed by YOUR peers in YOUR way.

That's all.
   30. 185/456(GGC) Posted: October 04, 2006 at 03:51 PM (#2197251)
For instance one article I read last week began like this, "...and for those who believe in DIPS, and at this point isn't that everyone, ...."


That was Steve Goldman in a BPro chat, IIRC.
   31. BDC Posted: October 04, 2006 at 03:58 PM (#2197255)
My (perhaps uninformed) layman's impression is that "has not been peer reviewed" is academic code for "is crappy and invalid research, perhaps by a crank." Or, at least, code for "is probably not worth considering because there's a good chance it's flawed."

I can see your taking umbrage at that sentence, Phil. I didn't see it as quite so supercilious, because the context was one of acknowledging the work and confirming its general validity. But I can see where it might come across that way.

I think that Internet forums lend themselves to meta-discussions (like this one). Printed papers don't; there are space constraints and conventions of terseness. At best, the various follow-up Internet discussions of DIPS might have rated a footnote in JC's paper.
   32. cercopithecus aethiops Posted: October 04, 2006 at 04:20 PM (#2197283)
My (perhaps uninformed) layman's impression is that "has not been peer reviewed" is academic code for "is crappy and invalid research, perhaps by a crank." Or, at least, code for "is probably not worth considering because there's a good chance it's flawed."

Your impression is not only uninformed, it is in fact the mirror image of the credentialism you decry and every bit as obnoxious and damaging. If "has not been peer reviewed" is academic code for anything, it is something much more along the lines of "I can't really be sure how much stock to put in this." "Would never has survived peer review" is what we academics say when we mean what you think we mean by "has not been peer reviewed."
   33. Sean Forman Posted: October 04, 2006 at 04:33 PM (#2197305)
In sabermetrics, the academics' job is much more difficult, because without prior peer review, separating the wheat from the chaff is much harder. But they are NOT EXEMPT from this task. For a long time, academia chose not to study sabermetrics. An alternative scientific community (a large portion of which is Bill James), sprang up alongside to build a flourishing science out of almost nothing. It has accumulated an impressive base of scientific knowledge, especially considering most of its practitioners are unpaid volunteers with no formal statistical training.

It is not fair, appropriate, or ethical for academia to come along these thirty years later and say, "hey, you didn't publish in our journals, and therefore your work isn't worth acknowledging. Therefore, we're going to re-prove all your results and not even mention that you guys found them first."


Phil, are you proposing that academics should have to work through all of the work in the Hardball Times, By The Numbers, Baseball Prospectus, FanGraphs, Saberstats.com, TangoTiger, various discussion boards, books, magazines, and other outlets before they publish in sabermetrics? I don't think you understand what goes into the job.

If academics are held to that standard, what standard are amateur sabermetricians held to? Shouldn't they be expected to read a simple statistics textbook before publishing something? Do they need to do literature searches and do bibliographies. After all, why should they be EXEMPT from doing the necessary legwork. It seems to you are suggesting that statheads should go along doing whatever they want, while academics have a much higher standard.

Given the title of the paper was "DOES THE LABOR MARKET PROPERLY VAUE PITCHERS?", I agree with "Bob" that at most a footnote was needed.

From my experience, little to no sabermetric work is published in heavy duty math or statistics journals. Instead it is published in works by the MAA or in Chance which are slanted towards undergraduate education and interesting applications of mathematics and statistics. I definitely would not get credit towards tenure just publishing sabermetrics. It simply would not be seen as serious. Now economics seems to view this work as more legitimate. In terms of mathematics and statistics, most sabermetric material is too simple to be considered "real" mathematics or statistics (to steal a line my colleagues would use).
   34. Phil Birnbaum Posted: October 04, 2006 at 04:43 PM (#2197320)
Ignoratio (#32),

Not sure I understand or agree with your first sentence, but the rest of what you say is well taken. Thanks for the info.
   35. 185/456(GGC) Posted: October 04, 2006 at 04:47 PM (#2197325)
Now economics seems to view this work as more legitimate


Dberri agrees
   36. Sean Forman Posted: October 04, 2006 at 04:53 PM (#2197331)
Link doesn't work for me GGC.
   37. Phil Birnbaum Posted: October 04, 2006 at 04:59 PM (#2197340)
Hi, Sean (#33),

No, of course I'm not suggesting that academics should have to read every web post. But they should be familiar with the major findings in the subject area.

Suppose another economist comes along and reads J.C.'s paper, and wants to study the issue himself. He should know about Voros' paper, which he will learn from J.C. He could then maybe do a google search on DIPS, check out the Wikipedia entry that includes a bunch of links, check the Bill James index to see if it's been mentioned there, maybe do a quick search of the BTF archives, and so on. Isn't searching the literature standard before beginning a paper? Our literature search is just a bit more complicated.

It doesn't have to be perfect -- nobody would fault a researcher for missing a couple of studies considering they're not indexed. But if I were doing a study on, say, clutch hitting, I might find everything I can, and then drop an e-mail to one of the authors of what I found who seemed to know what he's talking about, and saying, hey, here's what I found, are there any other studies you'd recommend?

In any case, citing lack of peer review is a cop-out. Almost NOTHING in sabermetrics is peer-reviewed. If a researcher insists on peer review, he's throwing away the entire existing knowledge base of sabermetrics.

And, of course sabermetricians are held to the same standard. If I start writing about labor markets, and duplicate research that's been done in economics, OF COURSE it's valid criticism to say that I should have cited it. And if I do something stupid mathematically because I don't understand statistics, of course that's reason to be critical of my paper. My argument is that it goes both ways. Am I missing your point here?
   38. 185/456(GGC) Posted: October 04, 2006 at 04:59 PM (#2197342)
Better?

He's one of the co-authors of Wages Of Wins.
   39. Phil Birnbaum Posted: October 04, 2006 at 05:03 PM (#2197350)
Sean (#33) still,

>Given the title of the paper was "DOES THE LABOR MARKET PROPERLY VAUE PITCHERS?", I agree with "Bob" that at most a footnote was needed.

Why does the title matter more than the content? Up until the middle of page 8, the paper is pure sabermetrics.

If he had decided to evaluate the labor question using Linear Weights instead of DIPS, and he spent eight pages deriving the linear weights formula without mentioning Pete Palmer, would you make the same argument?
   40. cercopithecus aethiops Posted: October 04, 2006 at 05:11 PM (#2197363)
Not sure I understand or agree with your first sentence

Literally half of the non-Ph.D. professionals that I meet in my field assume that I must be a condescending elitist the minute they find out that I hold a doctorate. These folks are blissfully unaware of the fact that their own attitude toward the credentialed is just as poisonous as the behavior they object to. Your somewhat strident take on this whole acedeme vs sabrmetrics thing was a rather unpleasant reminder of all of that. Perhaps I'm over-reacting, and if so I apologize, but it does strike me that you are assuming a whole hell of a lot more disrespect than is actually being shown.
   41. Phil Birnbaum Posted: October 04, 2006 at 05:12 PM (#2197366)
Sean again,

>In terms of mathematics and statistics, most sabermetric material is too simple to be considered "real" mathematics or statistics (to steal a line my colleagues would use).

Sabermetrics is not mathematics or statistics, any more than economics is. The value of a sabermetrics or economics result should not be judged by the complexity of the underlying math. (This can also be my response to the first part of comment #21).

Think of your favorite physics formulas. Gravity, F = g(m1)(m2)/(d^2). Pretty simple math, right? Pretty important discovery still, no?
   42. bads85 Posted: October 04, 2006 at 05:15 PM (#2197369)
My (perhaps uninformed) layman's impression is that "has not been peer reviewed" is academic code for "is crappy and invalid research, perhaps by a crank."


No, not at all. The code for that is something like "shoddy methodology and invalid research, perhaps by a crank or an undergraduate -- is not worthy of peer review." The next step up the ladder would be something like "would be destroyed in peer review", while the next step might be the less harsh "would not survive peer review."

"Has not been peer reviewed" can mean a couple of things, almost always positive. Most commonly, it is "code" for "Hey, this sounds real good, but will it really stand up? It is worth a serious look. Put some grad assistants on it." Another variation is, "This sounds real good, but since it is not really part of the thrust of my paper, I am not going to run the numbers. This is my disclaimer to cover my tail."

However, even in academic circles, just the term peer review itself often has negative connotations, so your confusion is very understanding.
   43. Sean Forman Posted: October 04, 2006 at 05:17 PM (#2197372)
And, of course sabermetricians are held to the same standard. If I start writing about labor markets, and duplicate research that's been done in economics, OF COURSE it's valid criticism to say that I should have cited it.


Do you really believe that the standards are the same? Do the webpages come down if something is discredited? The criticism is valid, but how is it made, in the comments on your blog? In another person's blog. In academics, the paper is not published. Without peer review judging quality in Sabermetrics is largely a crapshoot. Look at PAP. We have zero idea if PAP is any good or not. ZERO.

Also, your interest in citation and understanding statistics is admirable, but I suspect you are in a very, very small minority. For instance, Bill James (who I admire greatly) largely rejects formal studies for more ad hoc ones and has admitted to not reading much of the sabermetric work currently being done.

Without a structure of formal peer review, how can one even begin to 1) wade through the work being done and 2) assess its validity. Some of it seems to stand the test of time, but within academic publishing, we know what the rejection rates are for journals, we know how many citations each article and journal gets. A researcher can stand on the shoulders of others. In sabermetrics, the only really safe approach is to continually reinvent the wheel.

All that said, in the matter at hand, J.C.'s paper is about the economics of pitcher valuation, and I feel he fully cited the work that inspired the first section of the paper.
   44. Phil Birnbaum Posted: October 04, 2006 at 05:22 PM (#2197381)
Hi, Ignorantio,

Nope, no condescension is intended at all. Sorry if my post came out that way. I am equally opposed to credentialism both ways. :)

My experience with Ph.D.s has been no less positive than my experience with non-Ph.D.s.

I have never found academics snooty to us because we don't have degrees. My beef is that academia is snooty towards (or wary of) citing our work because they don't see it as legitimate enough.

Phil
   45. cercopithecus aethiops Posted: October 04, 2006 at 05:22 PM (#2197382)
In academics, the paper is not published.

Actually, it's usually published in modified form in a less prestigious journal where fewer people will read it.
   46. Sean Forman Posted: October 04, 2006 at 05:33 PM (#2197408)
Sabermetrics is not mathematics or statistics, any more than economics is. The value of a sabermetrics or economics result should not be judged by the complexity of the underlying math. (This can also be my response to the first part of comment #21).


Perhaps, this is the crux of the issue. There is no sabermetrics field in academics. No one would be hired listing sabermetrics as their field of study. The academics aren't publishing sabermetrics as much as they are publishing as much as they are publishing math, stats and economics. If they wanted to publish sabermetrics they would choose another outlet by BTN or a website.

One of my papers is on Automated Congressional Redistricting. But is a math/cs paper and not a political science paper. I didn't provide an overview of the political means of redistricting because it wasn't of interest to my audience and wasn't relevant. I gave an introduction and focussed on the math I used to solve the problem.


Why does the title matter more than the content? Up until the middle of page 8, the paper is pure sabermetrics.


Because the title and abstract tell you why the paper is important. Many, many academic papers spend 3/4ths of their content tying up loose ends before hitting you with the primary result. My favorites are the math papers with lemmas that have proofs of six pages each and then state the primary theorem of interest with a proof of "follows from lemmas 1 and 2." It happens all of the time.

If he had decided to evaluate the labor question using Linear Weights instead of DIPS, and he spent eight pages deriving the linear weights formula without mentioning Pete Palmer, would you make the same argument?


He mentioned and cited Voros! Voros:Dips::Pete Palmer:linear weights; while Tango:Dips::Patriot:Linear Weights

You would absolutely have a legitimate beef if he had not mentioned or undersold Voros's work. That didn't happen.
   47. Mike Emeigh Posted: October 04, 2006 at 05:49 PM (#2197443)
As a result it's difficult to get a decent sample size of pitchers who are bad at hit prevention pitching at the major league level- to see if they are really bad at hit prevention - or just unlucky.


This is a good summary of the issue. The available evidence (as Clay Davenport noted WRT minor league pitchers, and as I've noted WRT low BIP pitchers as a group) suggests that "weeding out" based on hit prevention skills does occur, but it doesn't really prove that MLB teams are good at doing it, nor does it mean that we should "expect" a more or less homegeneous population when we look at pitchers who survive long enough to pitch multiple years in the majors.

I think that anyone who has looked at the subject in more than a superficial way agrees with Phil's comment in #28. The only limiting factor that I'm trying to put on the data set is a caution that a *high* BABIP in any given season might be an indicator that the pitcher is not of MLB quality, rather than that the pitcher was *unlucky*.

Somewhere on the site before the season started, I did a mild prediction, selecting some pitchers who I thought were likely to improve over 2005 and some that I expected to decline over 2005. I remember that Mussina was in the "improve" list (which he did, and which I feel pretty good about) and Harden was in the "decline" list (which he did, sort of, but only because he was hurt - I can't really claim credit for that). I can't find the thread in which I did that now, and those were the only two pitchers I remember.

-- MWE
   48. Phil Birnbaum Posted: October 04, 2006 at 06:01 PM (#2197470)
Sean (#46),

>There is no sabermetrics field in academics.

I agree with you that perhaps that's part of the problem. I would have no objection to sabermetrics becoming affiliated with economics departments in some way. But it really is its own field of study.

>One of my papers is on Automated Congressional Redistricting. But is a math/cs paper and not a political science paper. I didn't provide an overview of the political means of redistricting because it wasn't of interest to my audience and wasn't relevant. I gave an introduction and focussed on the math I used to solve the problem.

Fair enough. In the latest JQAS, the first paper is actually a theoretical mathematical paper, deducing properties of tournaments structured in different ways. It's of interest to sabermetricians, perhaps, but it truly is math. However, most of sabermetrics is NOT math. This paper is an exception. Yours sounds like a math/CS paper on a technique that might be of interest to political types. Same idea, it's a math paper, and perfectly suited to a math or CS journal.

J.C.'s paper isn't a math paper. It's a baseball paper, analyizing real baseball data to learn the relationship between events on the field. So it's sabermetrics rather than economics. That first half, anyway.

>My favorites are the math papers with lemmas that have proofs of six pages each and then state the primary theorem of interest with a proof of "follows from lemmas 1 and 2." It happens all of the time.

And if those Lemmas have been addressed before, the author should cite the literature! It doesn't matter why the paper is important, if, in any part of it, you duplicate previous research, you should cite it.

>He mentioned and cited Voros! ... You would absolutely have a legitimate beef if he had not mentioned or undersold Voros's work.

Point taken, bad analogy. I'll rephrase. What if he had spent eight pages *proving that linear weights was correct,* even though many others have already done so?
   49. Sean Forman Posted: October 04, 2006 at 06:37 PM (#2197533)
Point taken, bad analogy. I'll rephrase. What if he had spent eight pages *proving that linear weights was correct,* even though many others have already done so?


It depends, if he used the same methods as they had, and the methods had been widely discussed, then I think they should be mentioned, but my impression is that J.C.'s method was different than others before it.

Phil, are you aware that when Tom Tippett presented his paper at SABR in 2003, he was not aware that Voros had revamped the formulas into a version 2.0 in January of 2002? If Tippett wasn't able to keep up with the changes in the very specific area he was commenting on, why are you holding academics to a higher standard?

J.C.'s paper isn't a math paper. It's a baseball paper, analyizing real baseball data to learn the relationship between events on the field. So it's sabermetrics rather than economics. That first half, anyway.


We are going to have to just agree to disagree here.
   50. Kyle S Posted: October 04, 2006 at 06:45 PM (#2197553)
What if he had spent eight pages *proving that linear weights was correct,* even though many others have already done so?

Because the others that had done so had never done so in a peer-reviewed journal, or used rigorous econometric techniques? Isn't that pretty much what JC says in the paper? I'm sure that he would have rather just cited Tippett or Voros and moved on to the meat and potatoes, but his referees were unlikely to accept that.

By the way, check out this paper, mentioned on BBRef a long time ago. In the previous version of this paper, Hakes and Sauer derived their own formulation of WPA (based off of PGP) and cited PGP's creators but not Studes or Fangraphs (it looks like the current version of this paper relegates most of the play-by-play model discussion to footnote 5).
   51. Sean Forman Posted: October 04, 2006 at 07:02 PM (#2197595)
Kyle,

WPA predates Studes and FanGraphs by a long ways.

Drinen, Doug, "Big Bad Baseball Annual"

Cook, Earnshaw, "Percentage Baseball"

What is PGP?
   52. Phil Birnbaum Posted: October 04, 2006 at 07:02 PM (#2197597)
#43 (sean),

>Do you really believe that the standards are the same? Do the webpages come down if something is discredited?

The easy answer is, do the journals rip out pages if a peer-reviewed article is discredited? There's a fair number of academic articles that, if I had been a peer reviewer, I would have recommended rejecting. (See my blog for some.) It works both ways.

The hard answer is, it's not a matter of just reading websites. It's keeping up with the informal ebb and flow of discovery that takes place in sabermetrics to be able to decide what's valid and what's not. As I wrote -- it's harder than looking in journals, but when almost all the knowledge is "stored" this way, that's just what ya gotta do. As best you can, anyway.

>Without peer review judging quality in Sabermetrics is largely a crapshoot. Look at PAP. We have zero idea if PAP is any good or not. ZERO.

Ok, sure. But for DIPS, we have MUCH idea if DIPS is any good or not, and when and why and how. If what you say about PAP is right, and I have no reason to believe otherwise, then in an academic paper, you'd just say "it has been the subject of much discussion but weak and contradictory evidence, such as (a sentence or two)." Isn't this just common sense?

>Also, your interest in citation and understanding statistics is admirable, but I suspect you are in a very, very small minority. For instance, Bill James (who I admire greatly) largely rejects formal studies for more ad hoc ones and has admitted to not reading much of the sabermetric work currently being done.

Actually, simple ignorance of other research isn't all that horrible, and actually isn't the topic here. But I'd say that if Bill James duplicates a well-known piece of research THAT HE KNOWS ABOUT, to an audience that DOESN'T KNOW MUCH SABERMETRICS, and doesn't credit others, he'd be subject to the same kind of criticism. Don't you agree?

>Without a structure of formal peer review, how can one even begin to 1) wade through the work being done and 2) assess its validity?

WTF??? Do you doubt the validity of everything Bill James ever wrote? Come on, Sean, this is ridiculous. EVERYTHING in sabermetrics has been done without "a structure of formal peer review." And you somehow know what is valid, what is not, and what is iffy. And on the topic of DIPS, so does J.C.

>Some of it seems to stand the test of time, but within academic publishing, we know what the rejection rates are for journals, we know how many citations each article and journal gets. A researcher can stand on the shoulders of others. In sabermetrics, the only really safe approach is to continually reinvent the wheel.

Huh? Are you kidding me? Sabermetrics has made huge strides, from zero, by standing on the shoulders of others. Am I missing something? Am I on another planet today, Sean?
   53. Phil Birnbaum Posted: October 04, 2006 at 07:12 PM (#2197614)
Sean (#49),

>Phil, are you aware that when Tom Tippett presented his paper at SABR in 2003, he was not aware that Voros had revamped the formulas into a version 2.0 in January of 2002? If Tippett wasn't able to keep up with the changes in the very specific area he was commenting on, why are you holding academics to a higher standard?

My argument was not about inadvertantly being unaware of a piece of research. My argument was about being aware, doing a study on the same subject, and choosing not to mention ANY of the relevant prior work.

So you're changing the subject. But I'll answer anyway. :)

Sure, it was regrettable that Tom didn't know specifically about Voros' updates. I am not arguing that you have to know about EVERYTHING, and know it immediately. I'm saying you have to be aware, broadly, of the state of the art, by doing a reasonable amount of checking at the places where sabermetrics happens (books, websites, BTN, and so forth). All I'm saying is that "there's too much stuff on the web" is not an excuse for not checking the literature.

If J.C. had quoted Tango but not Tippett, if he had quoted Emeigh but didn't know about Lichtman, well, that's OK, and I wouldn't have complained at all. Do a bit of research, then cite what you know and what's relevant. But "I'm not citing anything because there's no peer review" doesn't cut it.
   54. Sean Forman Posted: October 04, 2006 at 07:14 PM (#2197618)
Without a structure of formal peer review, how can one even begin to 1) wade through the work being done and 2) assess its validity?


Phil, I meant this comment in reference to single pieces of work not bodies of work. If I come across work by someone I don't know or someone who doesn't really explain their work that well or use standardized techniques, what can we safely say about that work?

You are also taking the attitude that others are as up on this as you are. You are the editor of the only journal on sabermetrics. Almost by definition, you know more about what is going on than everyone else. I'm pretty into this and I would say, I can only keep up with maybe 25-35% of what is going on in sabermetrics.
   55. Kyle S Posted: October 04, 2006 at 07:17 PM (#2197625)
sean, yes, i know. my point was that the paper didn't cite the most recent developments in wpa-type statistics.

Player Game Percentages (PGP
   56. Sean Forman Posted: October 04, 2006 at 07:22 PM (#2197641)
Phil, he cited Voros McCracken. And we'll just have to disagree as to whether that is sufficient or not.
   57. Phil Birnbaum Posted: October 04, 2006 at 07:55 PM (#2197730)
Sean (#54),

OK, sorry, I understand now.

I'd say if you haven't heard any more about the study in a reasonable period of time, you can safely not worry about it. Look, as I said, I argue that you don't have to keep up with *everything*, just the important stuff. And you and I both do that pretty well. I'm probably with you on the 25% to 35%. But both of us are *aware* of more than we actually keep up on. I don't know about what's going on in the PAP debate, but I know there is one. And so if I were about to start a study on tired pitchers, I'd make a point of learning a bit more.

But again, my argument isn't about single studies that are unknown to authors. It's about a body of work that was well known to the author.
   58. Phil Birnbaum Posted: October 04, 2006 at 08:01 PM (#2197745)
bads85 (#42),

Thanks, that makes a lot of sense.
   59. Phil Birnbaum Posted: October 04, 2006 at 08:05 PM (#2197760)
Sean (#56),

>Phil, he cited Voros McCracken. And we'll just have to disagree as to whether that is sufficient or not.

Sure, let's leave it there. Thanks for the dialogue.

Phil
   60. GuyM Posted: October 04, 2006 at 09:17 PM (#2197999)
I think this discussion largely misses the point of WHY JCB fails to cite Tippett, Woolner, Davenport, et. al. He is a true believer in strong DIPS. You can only maintain such a belief by focusing solely on year-to-year data. If you do that, BABIP fluctuates wildly and you're pretty much guaranteed that it won't predict much of anything (as he finds here with pitcher salaries). ALL of JCB's work on DIPS that I've seen looks at yearly data. You have to look at multi-year or career data to see the variations in BABIP skill (which do exist).

I don't care so much about the protocol of who should or shouldn't be cited in X type of academic article. I do care that JCB chooses to ignore or minimize work that contradicts his quasi-religious belief in DIPS.

The year-to-year focus also makes the labor market analysis here pretty questionable. If I'm reading it correctly, he uses year 1 performance data to predict year 2 salary. But players don't generally sign a series of one-year contracts. So even if teams DID overestimate the importance of BABIP/ERA, you'd have to look at a pitcher's performance in their pre-contract year (or two years) to find it. Not to mention that teams obviously take account of a pitchers' CAREER performance (including even minor lgs) to determine his value. So why in the world would anyone build a model to predict salary based only on one year of data? Well, one reason might be to avoid having to get into multi-year or career data that would then undercut the DIPS premise.

All this regression shows is that pitcher's actual talent has some correlation with their salary. Big surprise! It would be really interesting to know whether teams overestimate the importance of ERA, wins, etc., but unfortunately this analysis can't answer that.
   61. Mike Emeigh Posted: October 04, 2006 at 09:31 PM (#2198033)
I think this discussion largely misses the point of WHY JCB fails to cite Tippett, Woolner, Davenport, et. al. He is a true believer in strong DIPS. You can only maintain such a belief by focusing solely on year-to-year data. If you do that, BABIP fluctuates wildly and you're pretty much guaranteed that it won't predict much of anything (as he finds here with pitcher salaries). ALL of JCB's work on DIPS that I've seen looks at yearly data. You have to look at multi-year or career data to see the variations in BABIP skill (which do exist).


I actually don't think that it matters very much, in the context of this paper, whether JC had that ulterior motive or not. There are good reasons to minimize the value of single-season performance in hit prevention (some of which Guy mentions) for major league pitchers with a track record, and the fact that JC found that major league teams do in fact seem to minimize that value isn't particularly surprising.
   62. Mike Emeigh Posted: October 04, 2006 at 11:48 PM (#2198448)
There are good reasons to minimize the value of single-season performance in hit prevention (some of which Guy mentions) for major league pitchers with a track record,


I should add "even if you DON'T accept strong DIPS".

-- MWE
   63. Gaelan Posted: October 05, 2006 at 12:55 AM (#2198507)
I don't care so much about the protocol of who should or shouldn't be cited in X type of academic article. I do care that JCB chooses to ignore or minimize work that contradicts his quasi-religious belief in DIPS.


Exactly. The religious belief in DIPS needs to be questioned and destroyed. It's an unsustainable idea that continues to be propagated by otherwise intelligent people.

We started with pitchers have no control over balls in play but then we discovered that in fact this wasn't true because, knucklers, relievers, power pitchers, Greg Maddux, bad pitchers, hurt pitchers, minor league pitchers all have a significant effect on balls in play.

In the face of this evidence we have managed to move towards "most pitchers have little control" over balls in play. That's not progress, that's sophistry and it needs to stop.

There are good reasons to minimize the value of single-season performance in hit prevention (some of which Guy mentions) for major league pitchers with a track record,


I should add "even if you DON'T accept strong DIPS".


This is a very good point.
   64. J.C. Bradbury Posted: October 05, 2006 at 10:57 AM (#2198666)
Exactly. The religious belief in DIPS needs to be questioned and destroyed. It's an unsustainable idea that continues to be propagated by otherwise intelligent people.


Is this meant to be a joke?

1) I'm amazed at the ratio of critics to actual research on BTF and the general sabermetric community. It reminds me of a camera club my friend joined, only to find a bunch of men talking about their fancy equipment, but none brought any pictures to the meeting. You'd think with all of the people who offer supposedly damning criticism, at least one would use his knowledge to prove it.

2) The studies that appear to show some control over BIP are methodologically flawed. The critique might be right, but no one has yet used the proper methods to examine the subject. I would do so, but I don't have the data. I have addressed this critique head on. See here. I don't see why GuyM thinks I'm avoiding this argument, since he's "argued" with me over it with all the maturity of a spoiled 8-year-old.

3) If you disagree, do your own damn study and stop busting my balls. The majority of posters here are great, but their is a strong negative correlation between those who criticize and have the ability to criticize in this forum.

4) Overall, I'm quite saddened by Phil's ridiculous charges against me. Here I am, writing a paper that's sticking it to academics for ignoring a particular sabermetric advance, and Phil gets all huffy about my paper, in a field where he doesn't understand the conventions, not having a lit review. I didn't do anything wrong, and he handled his concerns in an extremely unprofessional manner. Rather than mention it to me in person, despite the fact that we had an e-mail conversation the day before in which he didn't even bring up the subject, he chose to thump his chest on the Internet. In follow-up e-mail conversations he acts like it's no big deal, but in and Internet forum I'm the devil. How pathetic. I've lost all respect for that man. I hope he feels at least some embarrassment.
   65. JPWF13 Posted: October 05, 2006 at 11:14 AM (#2198667)
Is this meant to be a joke?


Oh no, the anti-DIPS chort is quite serious
   66. Gaelan Posted: October 05, 2006 at 11:50 AM (#2198673)
If it is true that pitchers have no control over balls in play it must be true at both the macro and the micro level. The difficulty is that all of the statistical tests are done at the season or career level while the game level is ignored. The problem is that at the game level it is clear that pitchers do have control over balls in play. Beyond the general ability that individual pitchers have their own ability certainly varies over time. For instance as they get tired they give up more hits. DIPS explicitly denies this kind of variance, attributing what is actually a product of genuine skill, or lack thereof, to the ever nebulous luck. Over the course of a season, or season, this kind of variation may be difficult to detect because of the amount of statistical noise but that doesn't make it real.

So here are things we know:

1) We know that some pitchers have an ability to suppress hits on balls in play and that this ability follows a normal age curve. (the Greg Maddux rule)
2) We know that minor league pitchers allow more hits on balls in play than major league pitchers. (the Kyle Snyder rule)
3) We know that when pitchers are injured or recovering from an injury they allow more hits on balls in play. (the Curt Schilling rule)
4) We know that when formerly good pitchers are done they allow more hits on balls in play. (the Kevin Brown rule)
5) We know that when pitchers are tired they allow more hits on balls in play. (the Pedro Martinez rule)
6) We know that when pitchers who rely on command are less than perfect they allow more hits on balls in play. (the Josh Towers rule)
7) We know that pitchers pitch differently with men on base than with the bases empty and this effects hits on ball in play. (the Tom Glavine rule)

Ok, I don't actually "know" any of these things because I haven't done any "studies" to "prove" them. Nonetheless . . .

If you believe in DIPS you would be wrong about all of those pitchers and countless others like them. DIPS and DIPS related measurements only look good when they are compared to ERA which is a pretty weak test.

And this is without considering that before WWII strikeouts, walks and homeruns were relatively low, and hence, according to DIPS, pitchers had very little control on run scoring and yet, good pitchers were able, somehow, to be consistently good.

DIPS is worse than useless because it explains away all variation as luck thus making it more difficult to understand the actual, underlying causes.
   67. DSG Posted: October 05, 2006 at 12:34 PM (#2198688)
2) The studies that appear to show some control over BIP are methodologically flawed. The critique might be right, but no one has yet used the proper methods to examine the subject. I would do so, but I don't have the data. I have addressed this critique head on. See here. I don't see why GuyM thinks I'm avoiding this argument, since he's "argued" with me over it with all the maturity of a spoiled 8-year-old.

3) If you disagree, do your own damn study and stop busting my balls. The majority of posters here are great, but their is a strong negative correlation between those who criticize and have the ability to criticize in this forum.

***

Alright, I'll write this up later, as I'm running late to class right now, but I did a little study just now. I took all pitcher seasons since 1921 and divided them into even and odd years. I then tallied up each pitcher's career totals for both halves, and looked only at pitchers with at least 1,000 BIP in both the even and odd years. There were 1,157 such pitchers. I then ran a regression in which BABIP in even years was the dependent variable and BABIP, K/BFP, BB/BFP, HR/BFP, and HBP/BFP were independent. The results were as follows:

Constant = .137 (p = .000)
BABIP = .532 (p = .000)
HR/G = -.002 (p = .267)
BB/G = -.002 (p = .004)
K/G = .000 (p = .542)
HBP/G = .009 (p = .009)

r = .557 (r^2 = .310)

*note: I used per game instead of per BFP notation. That simply means I multiplied those numbers by 38.5. So K/G = K/BFP*38.5.

So here's what we've got:

(a) Controlling for defensive independent variables, BABIP is still highly, highly, highly significant. It's more than half the equation.

(b) Unlike in JC's THT study, strikeouts and home runs are NOT significant. The reason JC found such a significant effect for K/9 is because all other things being equal K/9 are inversely related with BABIP. The higher your BABIP, the more strikeouts you're going to get per 27 outs because your fielders aren't making outs. You must use K/BFP, especially with a one-year sample when BABIP can vary a lot from pitcher to pitcher. For HR/9, it could be the same effect, or it could be because pitchers with high HR-rates often have fluky HR/Fly ball rates, so more of their fly balls stay in the park the next season, and fly balls are converted into outs a very high percentage of the time.

(c) HBP and BB are significant but in opposite directions (more hit batsmen means a higher BABIP, more walks means a lower BABIP). This one I need to think about for a moment.

JC, any comments?
   68. J.C. Bradbury Posted: October 05, 2006 at 12:54 PM (#2198700)
David,

I like what you've done, and it is far better than most of the stuff that's out there. And at least you're willing to share your results. There are a few things that need to be fixed. You didn't correct for autocorrelation (very important). Also, you didn't control for a lot of other important factors (e.g., defense, age, league, parks, etc.). For a study this long, I'd want some sort of year correction.

When I say do a study, I don't mean post some results on BTF. This is a horrible forum to do research. It's good for other things, but not a collaborative research effort. Remember the Simpson's episode when the whole family tries to work a Rubik's Cube together? That's what doing a study on BTF is like.

Sit down, think about it, work through the data over several weeks, then write up an article explaining the full methodology and possible problems. Then present it as a final product to be discussed.
   69. GuyM Posted: October 05, 2006 at 02:44 PM (#2198803)
The studies that appear to show some control over BIP are methodologically flawed. The critique might be right, but no one has yet used the proper methods to examine the subject. I would do so, but I don't have the data. I have addressed this critique head on. See here. I don't see why GuyM thinks I'm avoiding this argument, since he's "argued" with me over it with all the maturity of a spoiled 8-year-old.

Anyone following this debate should click on JCB's first link and see if they think he has addressed the critique "head-on." He rejects all these studies out of hand because, essentially, they fail to control for pitchers' strikeout rate. Yet JCB's own work found only a weak connection between BABIP and the prior year's K/9, and no signif relationship with current K/9. (And work by DSG, Tango and others suggests there may actually be no relationship at all.) So he knows that controlling for Ks could not possibly change the conclusions these studies reach about real talent differences. Yet he uses this (at most) minor problem as a pretext to dismiss all this fine work. I'll stand by this assessment: "JCB chooses to ignore or minimize work that contradicts" his belief in DIPS.

As for the "all the maturity of a spoiled 8-year-old" comment. I'll let others reach their own conclusions. Look at my comment on JCB's first linked piece and his reply, as well as past BTF threads, and decide for yourself if his assessment is on the mark or, ummm, a bit of projection.

* * *

DSG: great work. Controlling for league will be crucial, at least post-1973. Age is not necessary, of course, because of your even/odd year methodology. If controlling for park and team DER is beyond what you can do now, looking at post-1980 data only would be interesting -- pitchers change teams often enough now that if your results were still this robust, it would be a pretty good indication of BABIP's multi-year predictive power.
   70. J.C. Bradbury Posted: October 05, 2006 at 02:54 PM (#2198815)
I don't "ignore or minimize." I reject it, and not out of hand, because of a serious flaw.
   71. 185/456(GGC) Posted: October 05, 2006 at 04:03 PM (#2198889)
MWE did a critique of DIPs at the SABR conference in Toronto. I thought that it was on BTF somewhere but I couldn't find it.

I'm staring at DSG's post and I feel dense.

BABIP is still highly, highly, highly significant. It's more than half the equation...
Unlike in JC's THT study, strikeouts and home runs are NOT significant.


I think that he posted this quickly, but I'm not sure what he means. Are these points in favor of using K,BB, and HR stats while ignoring batting average? Or is it the other way around?
   72. 185/456(GGC) Posted: October 05, 2006 at 04:13 PM (#2198907)
this guys complaints about the article are not nearly as interesting as the article, that claims that Front ofices were correctly evaluating pitchers on the results of DIPS before Voros's article came out in 2001, and that aprt from some dlips during expansion, the market always seemd to figure this out... that I find interesting.


I read the Hakes-Sauer paper that JC referenced. I could only find one mention of Voros in it and nothing about DIPS.
   73. Dan Turkenkopf Posted: October 05, 2006 at 04:14 PM (#2198909)
I think that he posted this quickly, but I'm not sure what he means. Are these points in favor of using K,BB, and HR stats while ignoring batting average? Or is it the other way around?


Not that I'm anymore certain than you are, but I believe he's saying you can predict even-year BABIP (BABIP2)with the following formula:

BABIP2 = .137 + .532BABIP1 - .002HR/G - .002BB/G + .009HBP/G

So BABIP in year 1 is the most important factor in determining BABIP in year 2, HR/G and BB/G have a slight negative impact and K/G has no impact whatsoever.

Or I'm completely wrong.
   74. 185/456(GGC) Posted: October 05, 2006 at 04:31 PM (#2198939)
How you read the formula makes sense, Dan. I'm not sure what the "p"'s stand for. Are they probability? I have some misgivings about DIPS due to the Tom Galvine rule for one (we "know" that pitchers pitch differently with men on base than with the bases empty and this effects hits on ball in play.), but I thought that strikeout pitchers induce more infield popups, which are 99 44/100% of the time converted into outs.
   75. cercopithecus aethiops Posted: October 05, 2006 at 04:41 PM (#2198951)
If it is true that pitchers have no control over balls in play it must be true at both the macro and the micro level.

...

DIPS is worse than useless because it explains away all variation as luck thus making it more difficult to understand the actual, underlying causes.


You know, as long as you keep arguing against positions that no one holds, you'll almost certainly win all of your arguments. Does that make you feel smart or something?
   76. Dan Turkenkopf Posted: October 05, 2006 at 04:41 PM (#2198952)
I'm not sure what the "p"'s stand for. Are they probability?


Yeah, I'm not sure about that either.

I guess we'll have to wait for more statistically-minded readers to help :-)
   77. greenback does not like sand Posted: October 05, 2006 at 04:45 PM (#2198959)
p is a measure of significance. The lower, the better.

If you believe in DIPS you would be wrong about all of those pitchers and countless others like them.

All models are wrong. Some are useful.
   78. DSG Posted: October 05, 2006 at 04:48 PM (#2198965)
Yep, "p" means probability. Specifically, it's the probability that the coefficient is insignificant. Generally, a p-value has to be no higher than .05 to be considered significant. So Dan got it right, though the p-values show that we probably shouldn't be including any coefficient for home runs.

Essentially, what my test showed is that a pitcher's BABIP is extremely important, even controlling for defensive independent outcomes (which are not very important at all). I'll take JC's suggestions into account and publish something more substantial in a couple of weeks.
   79. GuyM Posted: October 05, 2006 at 04:51 PM (#2198966)
I think Dan's interpretation of the model is correct, but David didn't make it clear if all the indep variables are odd years (I think so). The high coefficient for BABIP 1 almost certainly does reflect league differences (DH) and year differences (league BABIP has varied over time). On the other hand, the relationship is so strong that it would be shocking if it wasn't still very significant even after controlling for such factors.
   80. Dan Turkenkopf Posted: October 05, 2006 at 04:52 PM (#2198969)
p is a measure of significance. The lower, the better.


Thanks. So it's probably not right to say K/G has no effect on BABIP because the p value was .5, but can we be fairly certain its effect is low in this scenario?

(BTW, I agree with JC's concern about the correlation between the defense-independent stats that may be skewing this)
   81. Kyle S Posted: October 05, 2006 at 04:52 PM (#2198970)
I'm not sure what the "p"'s stand for. Are they probability?

Those are p values, in other words the probability that the estimator is actually zero. In other words, a very low p value means that the OLS estimator is very unlikely to actually be zero and just the result of noise, whereas a higher p value means the estimator isn't as reliable.

in david's example, there is a 26.7% chance that the HR/G estimator is actually zero, but practically zero chance that the BABIP or constant estimators are.

JC, how would he correct for serial correlation? Use BABIP - league average BABIP rather than simply BABIP?

Also, David, what formula do you use (assuming you're using the lahman database) to estimate AB for pitchers? BFP - BB - HBP ?
   82. Foghorn Leghorn Posted: October 05, 2006 at 04:56 PM (#2198977)
So BABIP in year 1 is the most important factor in determining BABIP in year 2, HR/G and BB/G have a slight negative impact and K/G has no impact whatsoever.

Shouldn't this be controlled for league. I would suspect that BABIP1 is less important than LgBABIP2 (or 1)?

One of the general rules I have for projections is what is a batter's IsoBB and ISO. His BA will fluctuate more than htese otehr two factors. So, I think you'd regress and find some correlation that indicated regression to LGBABIP for pitchers would be more effective than the previous year's BABIP.

DOes that make sense?
   83. JPWF13 Posted: October 05, 2006 at 04:56 PM (#2198978)
All models are wrong. Some are useful.


On one blog I swear I could hear Tango imploding when I tried to make the same argument...
   84. DSG Posted: October 05, 2006 at 05:05 PM (#2198990)
Also, David, what formula do you use (assuming you're using the lahman database) to estimate AB for pitchers? BFP - BB - HBP ?

BIP = BFP - SO - BB - HR - HBP

Thanks. So it's probably not right to say K/G has no effect on BABIP because the p value was .5, but can we be fairly certain its effect is low in this scenario?

No. If the p-value is higher than .05 (generally), which it is, we go with the null hypothesis (Ks have no effect on BABIP).

(BTW, I agree with JC's concern about the correlation between the defense-independent stats that may be skewing this)

I'll certainly re-do the test more correctly, but by how much, really? I mean, it's not like the BABIP coefficient was barely significant. The t-value was 20.947, which means that there is a roughly one in infinity chance of it being insignificant.
   85. Rally Posted: October 05, 2006 at 05:23 PM (#2199037)
...BABIP, K/BFP, BB/BFP, HR/BFP, and HBP/BFP were independent...

BABIP2 = .137 + .532BABIP1 - .002HR/G - .002BB/G + .009HBP/G


I'm confused, in this equation are we looking HR per 9 innings or HR per BFP?

Either way its not going to have much affect on batting average but one is a bit bigger than the other.

I would suggest using team BABIP and (pitcher BABIP - team BABIP) as separate variables.

Using team BABIP controls for park, fielders, and league in one fell swoop. Only thing that throws it off is if you're a great pitcher on a team of great pitchers.
   86. 185/456(GGC) Posted: October 05, 2006 at 05:28 PM (#2199055)
I'm confused, in this equation are we looking HR per 9 innings or HR per BFP?


DSG defined a game as 38.5 BFP instead of 9 innings.
   87. Dan Turkenkopf Posted: October 05, 2006 at 05:32 PM (#2199068)
I'll certainly re-do the test more correctly, but by how much, really? I mean, it's not like the BABIP coefficient was barely significant. The t-value was 20.947, which means that there is a roughly one in infinity chance of it being insignificant.


We're way out of my statistical expertise here, but I thought that regression assumed independence of the component variables and having them be correlated caused major problems. Is that something a high level of significance on one of them can overcome?
   88. JPWF13 Posted: October 05, 2006 at 05:33 PM (#2199082)
So BABIP in year 1 is the most important factor in determining BABIP in year 2, HR/G and BB/G have a slight negative impact and K/G has no impact whatsoever.


Which correlates better with BABIP in year 2- BABIP in year one- or the league average BABIP?
Which correlates better with BABIP in year 2- BABIP in year one- or the team average BABIP?

I would expect BABIP in year 1 to correlate (better than hr/g, k/9 etc) with BABIP in year 2 whether I belived in strong DIPS, weak DIPS or no DIPS.

If nothing else K and BB rates can swing more than batting averages- how often do you see a pitchers BABIP move 100 points (30-40%%)? .250-.350? Meanwhile HR, k and BB rates can fluctuate much more wildly (over 100%)- even if batters had no inherent ability to prevent hits on balls in play I'd expect BABIP from one year to the next to show a stronger positive correlation
   89. DSG Posted: October 05, 2006 at 05:43 PM (#2199108)
We're way out of my statistical expertise here, but I thought that regression assumed independence of the component variables and having them be correlated caused major problems. Is that something a high level of significance on one of them can overcome?

Variables are never totally independent. These happen to be relatively independent which means that there shouldn't be much of a problem. But even if autocorrelation was a serious problem, it's doubtful that controlling for it would bring down the significance level essentially by a factor of infinity.
   90. Gaelan Posted: October 06, 2006 at 12:30 AM (#2199854)
You know, as long as you keep arguing against positions that no one holds, you'll almost certainly win all of your arguments. Does that make you feel smart or something?


Almost everyday I read about baseball on the internet I find someone propagating exactly what I'm arguing against.

All models are wrong. Some are useful.


The problem with DIPS is that it's not sold as a model. It's sold as a truth. Even the watered down version is sold as "most pitchers have little control" over balls in play. It is this statement, which is almost certainly false, that I'm arguing against.

Now whether the model is useful is a matter of perspective. If I was going to spend ten minutes drafting a fantasy team then a DIPS model would be quite a useful thing. If, on the other hand, I was going to make a multi-million dollar decision then I might want to hold my models to a higher standard of usefulness.

And don't start with the nobody here is a general manager rountine. I know we aren't. But people around here are pretty serious about baseball and they spend a fair amount of time thinking and talking about it. So if we're going to spend the time and we're going to be serious then we might as well be right, instead of spreading falsehoods throughout the internet like a virus.
   91. 185/456(GGC) Posted: October 06, 2006 at 01:07 PM (#2200395)
If I was going to spend ten minutes drafting a fantasy team then a DIPS model would be quite a useful thing. If, on the other hand, I was going to make a multi-million dollar decision then I might want to hold my models to a higher standard of usefulness.


That is fairly similar to my attitude. All things being equal, the guy with the better FIP ERA may be the one to sign, but medical issues and CBW-like mechanical analysis needs to be considered, too.
   92. . . . . . . Posted: October 06, 2006 at 01:17 PM (#2200405)
Take all pitchers from 1994-2005 who threw more than 50IP.

Calcualte BIPA for each pitcher.

Use either the Basic Pitch Count Estimator or the Extended Pitch Count Estimator. Calculate pitches thrown.

Now, sort by "pitches thrown". Make a 9-point moving average of BIPA for each pitcher. (Except, obviously, for the first 4 and last 4.)

Make a scatterplot of the Moving Average of BIPA against the Pitches Thrown.

This is what pops out:
http://img142.imageshack.us/img142/3840/bipaplotij7.jpg


Note:
I used Lahmann data, with pitch-count estimator formulas from Tango's website
   93. cercopithecus aethiops Posted: October 06, 2006 at 01:53 PM (#2200438)
Almost everyday I read about baseball on the internet I find someone propagating exactly what I'm arguing against.

Where? Not in this thread. Maybe you're reading the same archived pages over and over again. Seriously. More likely, you're just so worked up about this that you overinterpret every mention of DIPS as "propagating" a much stronger version of it than anyone ever believed. The simple fact that you would use the phrase "pitchers have no control over balls in play" proves my point. If I might go TOLAXOR for a moment, NO ONE EVER REALLY BELIEVED THIS. If that statement were literally true, there would be no such thing as ground ball pitchers and fly ball pitchers. Whether you care to admit it or not, most major league pitchers have little control over the outcome of balls in play is a very different statement than the one you keep attacking. Now, I'll grant you that there are still problems with even the formulation as I just stated it, and the real point of this kind of analysis should be to refine and improve our ability to prospectively evaluate pitchers (ie -- predict future performance) rather than quibbling over semantics. But still, you should stop arguing against that strawman; it really doesn't help your side.
   94. Mike Emeigh Posted: October 07, 2006 at 01:08 AM (#2201435)
I found my earlier comment:

For ‘05, I’ll take Mike Mussina, Jason Johnson, Mark Mulder, and Brad Penny as candidates to improve, and Tony Armas, Horacio Ramirez, Bruce Chen, Kenny Rogers, and Rich Harden as candidates to decline.


Among the candidates to improve:

Mussina chopped nearly a run off his ERA, and his ERA+ went from 101 to 125. Penny's ERA went up, but the league ERA went up as well, so his ERA+ actually improved slightly, from 104 to 106. I'd take that as "virtually the same pitcher" rather than as improvement. Jason Johnson pitched himself out of the majors. Mark Mulder was showing improvement for nearly the first two months (3.74 ERA through May 22), but then he got hurt. One clear-cut right pick, one clear-cut wrong pick, one borderline pick (at best), one "who knows what would have happened if he's stayed healthy?"

Among the candidates to decline, three guys got hurt (Armas, Ramirez, and Harden). Armas and Ramirez were somewhat better in '06 than in '05 when they were healthy; Harden's season was significantly worse, but in fairness he was close to the same pitcher in '06 when he finally did get healthy. Bruce Chen crashed and burned. Rogers's ERA actually did decline, as did his ERA+. One clear-cut right pick, one borderline correct (masked by the team success), and three "who knows what would have happened had they been healthy all year?" Not such a great track record, except at picking pitchers who were likely to be injured.

-- MWE
   95. Mike Emeigh Posted: October 07, 2006 at 02:29 AM (#2201664)
2) The studies that appear to show some control over BIP are methodologically flawed.


So are studies that conclude there is little-to-no-control, and that select from a narrow group of major-league pitchers - those who pitch "x" numbers of innings in back-to-back years - because those studies do not account for the impact of selection bias on the results.

From JC's paper:

To obtain an adequate sample I include all pitchers who pitched more than 100 innings for consecutive seasons from 1980–2004.


That eliminates from the sample all pitchers who failed to pitch 100 innings in season 1, AND all pitchers who failed to pitch 100 innings in season 2. There's a significant amount of selection bias right there; only the pitchers who are "most" able to make it through two such seasons in a row even qualify for the study.

All told, in 2004 and 2005 combined, there were 776 different pitchers. 462 of them pitched in both 2004 and 2005, so right off the bat the selection criteria about 40% of the total sample set of pitchers. Of the 462 pitchers who pitched in both 2004 and 2005, 142 of them pitched 100 or more innings in 2004 - so now the selection criteria have eliminated over 80% of the total sample set of pitchers, without even considering the pitchers who pitched 100 or more innings in 2005. Fortunately (whew!) 104 of the 142 who pitched 100 innings in 2004 also pitched 100 innings in 2005, so only another couple percent of pitchers were eliminated - but still, the total sample is only 14% of the whole group of major league pitchers. But note what else has happened:

All of the "really" bad pitchers are gone.
All of the relievers are gone
All of the guys who were regular starters when they did pitch, but who didn't pitch a full season, either because of injury (Curt Schilling, 2005) or because they didn't crack the rotation until midway through a season (David Bush, 2004), are gone.

What's left are front-end-of-the-rotation pitchers - three pitchers per team (well, 3 1/2, actually). And baseball people will tell you that there's a lot of difference between front-end-of-the-rotation pitchers and other members of the pitching breed - in other words, that not every pitcher even gets a chance to be a front-end starter, and thus to qualify for JC's study group.

Suppose I were doing a study of the relationship between wealth and educational opportunity. If I were to draw my sample from Wakefield (a suburban country-club community here in Raleigh), and found that there was no correlation between wealth and educational opportunity, would you consider that to be a reliable conclusion? I can't imagine that you would, because you'd realize from the get-go that my sample was biased; it's awfully hard to find enough low-wealth families in Wakefield to balance such a sample.

The bias here is no less real; the pitchers that make it into studies like JC's are no more representative of the entire population of major league pitchers in any given season than are the residents of Wakefield representative of the entire population of families with school-age children. And not only is this sample biased - it's biased in such a way as to virtually guarantee that "if" hit prevention skills are a true indicator of pitcher ability, you would NOT be able to find them in your sample anyway - because you've selected only the "most" able pitchers in MLB. And when you look at "less able" pitchers - those who get weeded out of the study by the selection criteria - and compare them to the group of "able" pitchers that you have selected - you see a noticeable difference in their hit prevention skills. That's what Clay Davenport did, looking at minor league pitchers who advance through the minors vs those who do not; that's what I did, looking at low-BIP pitchers vs high-BIP pitchers within a single season. If you look at all pitchers - starters, relievers, major leaguers, minor leaguers - and not just the elite core of front-line starters, there's enough evidence that a pitcher has to have a significant ability to prevent hits on BIP in order to (a) get to the majors and then (b) pitch enough to become an elite starter, and that because nearly all of the variation in hit prevention skills has been selected out of the study group because of the selection criteria, JC's conclusion is both unsurprising and flawed.

-- MWE
   96. Gaelan Posted: October 07, 2006 at 04:26 AM (#2201872)
The simple fact that you would use the phrase "pitchers have no control over balls in play" proves my point. If I might go TOLAXOR for a moment, NO ONE EVER REALLY BELIEVED THIS. If that statement were literally true, there would be no such thing as ground ball pitchers and fly ball pitchers. Whether you care to admit it or not, most major league pitchers have little control over the outcome of balls in play is a very different statement than the one you keep attacking.


It's a baldface lie that no one ever believed that but I'll let that one slide because that's not the claim I'm arguing against. What I'm arguing against is exactly the claim you made that "most pitchers have little control over the outcome of balls in play. In fact let's look at what I actually wrote.

In the face of this evidence we have managed to move towards "most pitchers have little control" over balls in play. That's not progress, that's sophistry and it needs to stop.


So, in fact you quoted me exactly in your own formulation with the small, though I suppose important, addition of "outcome." So I'm not arguing against strawmen I'm arguing against exactly what you say is what people really believe. My point is that this improved formulation is still wrong on a very significant scale. I then went on to categorize the many different ways that it is wrong. I even left out the groundball/flyball thing because defenders of ZIPS don't think that matters because it "evens out" in terms of run prevention.
   97. GuyM Posted: October 07, 2006 at 03:56 PM (#2202020)
MWE:
Great points, and analogy, on the DIPS issue.

What did you think of the second, market valuation, part of the paper? I don't see how the methodology works (post 60), but I may be misunderstanding what he did. (The same approach was used in the Hakes/Sauer paper that purports to show that the MLB marketplace long undervalued OBP, but then corrected suddenly in 2004.)
   98. JoeArthur Posted: October 07, 2006 at 04:26 PM (#2202037)
Guy,

I think your critique of the salary estimator is correct. One thing I did like about JC's model was the attempt to identify arbitration-eligibility, though it isn't clear how he did that in detail. In correlating salaries over time, I think you'd also want to take explicit account of the changing labor contracts - changes to minimum salary, compensation rules for free agent signings and so on. Moreover, this was a labor market with proven collusion at one time. Did I miss a yearly salary corrector proxy in the model (analagous to a correction for changing run environments between leagues & years?)

One more perhaps minor issue is JC's use of ERA in his models. Since Craig Wright's Diamond Appraised in 1987 it should be clearer that unearned runs are also influenced by the pitcher's ability, and are relevant to the pitcher's value. [I think there's been more recent work confirming this, perhaps by Tom Ruane, but I don't recall any details offhand.] For JC's purposes, R/9 inn, not ERA is the better metric for estimating from DIPS or comparing to DIPS. Similarly, JC uses the pitcher park factor PPF from the Lahman database to "normalize" ERA. But as I understand it, that factor is a run adjuster, not an earned run adjuster - one more reason to use R/9 inn rather than ERA in the first place.
   99. GuyM Posted: October 07, 2006 at 05:03 PM (#2202064)
"One thing I did like about JC's model was the attempt to identify arbitration-eligibility, though it isn't clear how he did that in detail"

I believe he adds a dummy variable for arbitration-eligible, and another for free agency (and the young wage-slaves are the default). I'll defer to others who know more about regression than I do, but I don't see how a dummy variable can properly adjust for what are really three (or at least two) discrete labor markets. The non-arb-elig players all make within a very narrow range, so would greatly understate the market value of their skills. For example, Miguel Cabrera made $472,000 this year, less than a replacement-level middle infielder. To him, free agency would add perhaps $15M to his contract, but giving free agency to Chris Duncan might be worth only $1-2M. No constant value for "free agency" can work here. Arb-elig players are somewhere in between, and maybe using a dummy for arb-elig would work if you lumped them in with free agents. But the non-arb players need to be treated separately or left out of the analysis, I would think.

But again, I may not have a full understanding of how the model is working....
   100. GuyM Posted: October 07, 2006 at 05:11 PM (#2202068)
Correction: In JCB's model free agency is the default, and there are dummy variables for arb-elig and non-arb eligible.
Page 1 of 2 pages  1 2 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Dynasty League Baseball

Support BBTF

donate

Thanks to
Tuque
for his generous support.

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogAs Padres’ season spirals, questions emerge about culture, cohesion and chemistry
(49 - 11:41am, Sep 25)
Last: Mr. Hotfoot Jackson (gef, talking mongoose)

NewsblogOT Soccer - World Cup Final/European Leagues Start
(110 - 11:25am, Sep 25)
Last: jmurph

NewsblogOT - NBA Off-Pre-Early Thread for the end of 2023
(7 - 11:16am, Sep 25)
Last: Der-K's enjoying the new boygenius album.

Sox TherapyOver and Out
(38 - 11:10am, Sep 25)
Last: Darren

NewsblogOT - 2023 NFL thread
(9 - 10:04am, Sep 25)
Last: Howie Menckel

NewsblogQualifying Offer Value To Land Around $20.5MM
(15 - 9:23am, Sep 25)
Last: DL from MN

NewsblogSite Outage Postponed
(106 - 9:10am, Sep 25)
Last: Nasty Nate

NewsblogEx-Nats reliever Sean Doolittle exits after '11 incredible seasons'
(4 - 7:32am, Sep 25)
Last: Steve Parris, Je t'aime

NewsblogOmnichatter for September 2023
(520 - 11:43pm, Sep 24)
Last: DFA

NewsblogOakland vs. the A's: The inside story of how it all went south (to Las Vegas)
(40 - 8:18pm, Sep 24)
Last: Howie Menckel

NewsblogRepublicans propose $614M in public funds for Brewers' stadium upgrades
(36 - 6:03pm, Sep 24)
Last: base ball chick

NewsblogIs It Time to Stop Using Scripts on Sports Uniforms?
(19 - 4:55pm, Sep 24)
Last: base ball chick

NewsblogLeft-handed hitters describe the craft of hitting lefty pitching
(2 - 12:25pm, Sep 24)
Last: McCoy

NewsblogHere are the 2023 Arizona Fall League rosters
(1 - 11:36am, Sep 24)
Last: Ziggy: social distancing since 1980

NewsblogYankees' status quo under Brian Cashman resulted in 'disaster' season, and a fresh perspective is needed
(7 - 2:59am, Sep 24)
Last: TVerik - Dr. Velocity

Page rendered in 1.1922 seconds
48 querie(s) executed