Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Wednesday, March 07, 2001

A couple of questions regarding Pitcher Abuse Points

First Appeared: March 3, 2001. Note that this article has been revised several times thanks to discussions with several different people. While I don't feel anything said initially was particularly out of line, my intent is to discuss the matter rationally and scientifically, so I have attempted to comment purely on the facts or my opinion of the facts. This current revision was written March 7, 2001. There is a link to the original article at the bottom of this page.

Rob Neyer ran a recent column (ESPN.com - Rob Neyer - Sometimes they just get hurt) in which he made the following statement.

I mean, I had some evidence (speaking of pitcher injuries and pitches thrown). Craig Wright wrote brilliantly on the subject in his classic book, "The Diamond Appraised" (a book that Price has read), and there's certainly plenty of anecdotal "evidence" that throwing, say, 145 pitches in a game bodes ill for a pitcher's short-term performance, if not his long-term health. But the research has never been particularly comprehensive ... until now. If you're interested in this subject--or if you're interested in baseball at all -- then I refer you to the aforementioned "Baseball Prospectus," which is only the best baseball book you'll read this (or most any other) spring.

And while I don't want to give everything away, BP's Keith Woolner and Rany Jazayerli make a convincing argument that high pitch counts tend to result in both short-term performance decline and long-term injury risk. This greatly simplifies Woolner's and Jazayerli's findings, and I expect to discuss them in greater depth at some future point. In the meantime, though, I urge you to buy the book, and I also urge the Mariners to stick with their program. Yes, pitchers are always going to get hurt, because throwing a baseball 90-plus miles per hour just isn't natural. But that doesn't mean you can't do anything about it.

A few weeks ago, January 26th to be exact, Rob Neyer commented that peer review would be a good thing for sabermetrics to have. In his latest column, he talks about the new PAP3 (pitcher abuse points) measurement, proposed by Keith Woolner and Rany Jazayerli, as a good new metric for measuring pitcher use and abuse. Since I received my copy of Baseball Prospectus from Amazon.com, I've been working up some thoughts about PAP. And since Rob Neyer has brought this important topic rightfully to the fore, I'm going to give some of my impressions of the study Keith and Rany performed.

I have several concerns about the methods and studies Keith Woolner and Rany Jazayerli discuss in pages 505-516 of their 2001 edition.

  1. The formulation of PAP3 is based on an effect that shows single high-pitch count games (140 pitches or greater) cause pitchers to allow less than one additional run over their next 24 innings pitched (4 starts). Such an effect while interesting (and I don't question the validity of that number) is far from a smoking gun that anything over 100 pitches is bad.
  2. The second study that attempts to show PAP3 predicts long-term injuries better than pitch counts compares the wrong two metrics in my opinion. Keith Woolner compares career PAP3 with the career total of pitches thrown. Comparing PAP3 to average pitches thrown per start would be a much more illuminative comparison. While an extreme example, does anyone really believe that 10 starts of 60 pitches and 6 starts of 100 pitches are the same thing? For this study they are. Some control for averages per start should be considered as well.
  3. I would like to see the authors make some of the data available. The scatter plot on page 513 is very confusing and seeing the data points for ourselves would help us understand the method far better. It is also unclear to me how less than 30% of the starters can be above average as is stated on page 513 unless handful of data points are severely skewing the overall totals. I understand that not everyone shares my academic background, but I think that everyone benefits if theories are tested, played with (I mean that in a good sense, mathematicians routinely "play" with numbers and ideas), and found by many people to be true.

Now, I'll take these points one by one.

The History of PAP and Formulation of PAP3

Before I get started I just wanted to disagree with a statement Rany made in the introduction, "Research dating back to Craig Wright's The Diamond Appraised has suggested a 100-pitch limit for developing pitchers." (p. 505)

If you refer to Wright's suggested limits in his book, they are far, far more lenient than a 100-pitch limit, which I assume Rany meant as a max 100-pitch limit. Looking at Wright's words, here are some recommendations he makes on p. 211 of his book. I've added emphasis.

  • As a teenager, a pitcher should not be allowed to throw two-hundred-inning seasons or have a BFS (batters faced per start) over 28.5 in any significant span (150-plus innings). This does not include instructional league or winter-ball innings if there is a reasonable amount of time off between the leagues. (MY NOTE: 28.5 BFS is a 100 pitch limit, note that this is the average and not an absolute limit.)
  • A teenage pitcher should not start on three days´ rest, which generally means no four-man rotations in A-ball.
  • For ages twenty to twenty-two, they should average no more than 105 pitches per start for the season (105 pitches is roughly equivalent to 30.0 BFS). A single-game ceiling should be set at 130 pitches.
  • For age twenty-three to twenty-four, the restraints can be eased up, but their season average should stay under 110 pitches in most cases. The single-game ceiling can be jumped up to 140 as long as the pitcher is still strong.

Wright's final five suggestions deal with conditioning and organizational policies like player promotion and an emphasis on throwing strikes.

There may be other work that supports a max of 100 pitches for developing starters, but I am not aware of it. I do know that Rany proposed this theory in the first version of PAP produced a few summers ago (see the link immediately below). I'd be very happy to review other references and revise that statement.

PAP was introduced by Rany Jazayerli in the summer of 1998 on the Baseball Prospectus website. At least in print or online there was no study made to discuss its validity at predicting injuries or whether it was better than simple pitch counts at measuring pitcher overuse. Again I'll happily revise that statement if a reference is pointed out to me. While it is possible Rany did not intend for it to receive widespread attention, it did receive considerable attention and was mentioned by Rob Neyer in his ESPN.com column many times (most recently May 23, 2000). With increased exposure comes increased scrutiny.

From my perspective, I want to see some evidence that these theories are supported with evidence. I personally was very disappointed that no study appeared in the 1999 book and again in the 2000 book backing up the claims its authors (such as it measured damage to arms and that low PAP was a sign a manager was careful) and others (primarily Rob Neyer who has called it groundbreaking and repeatedly cited it to take managers to task for their handling of young starters) had made for it. It should also be noted that PAP did appear prominently in both editions. In the 2001 book, Rany and Keith sought to firm up the reasoning behind PAP's formulation, which I will discuss below.

Keith Woolner has undertaken two studies which generated a new method called PAP3. In the first study, he considered the performance of what he terms high endurance pitchers. These pitchers are generally the better than average pitchers. For his cutoff more than 50% of their starts must be longer than the league average outing. While his study is from the years 1988-1998, pitchers who would have been placed in this group last year included Livan Hernandez, Randy Johnson, Jon Lieber and Kris Benson. Pitchers not in this group include John Halama, Jeff Fassero, David Cone, Greg Maddux and Andy Ashby.

For these pitchers, Keith looked at the number of pitches thrown in a start and then compared the pitcher's performances for the 21 days following the start to the performance for the 21 days prior to the start. He calculated ratios for the post-start outings to the pre-start outings for Innings Pitched/Games Started (IP/GS), Runs Allowed/Inning (RA), Strikeouts and Hits Allowed. He combined these four measurements into a measurement called performance index. Here are the results stated in the book in table form. A ratio of 1.00 indicates the index is unchanged, greater than one means it went up and the pitcher pitched worse, and less than one it went down and the pitcher pitched better. I'm rounding to the nearest .005. I added a column with the total starts in that category from 1994-2000 (not just the high endurance pitchers).

Pitches      Perf Index     Total starts (1994-2000)
 90-99       1.010           6317
100-109      1.005           6554
110-119      1.010           4725
120-129      1.010           2460
130-139      1.020            635
140-149      1.050            113   (R. Johnson 21, Clemens 7)
                                   18614 total
Wghtd. Avg.  1.010       

As Keith notes in his study the average pitcher sees an increase of 1% in performance index for the 21 days after every start, so this becomes our baseline. As the table makes clear and as it is mentioned on page 506, the range of pitches from 90-99 is presented as a single point at x=90. It is unclear to me why the left edge of the range of pitches was chosen to represent the range when the middle (x=95) would seem a much more logical choice. In fact, it would be more accurate to display this value as a range of values or a simple bar chart. I've produced the book's representation using a line graph and another one showing bar charts.

Keith then uses his line graph and attempts to fit a number of different curves to the graph. He decides upon (Number of pitches - 100)3 as the best fit. While I agree that it does fit best of the choices he gives in the book especially when compared to the line graph, it does not accurately reflect the data because a left extreme point of the range is used to represent the entire range. If you use the bar chart, the fit isn't quite so good. Please note that I extrapolated where the PAP3 value for x=150 would hit. I think that to properly compare PAP3 to the data you you need to consider the full range of the data and not just the left end point of the ranges.

I find it interesting that there is no decay in the pitcher's performance (it's nearly ramrod flat) until you get above 130 pitches. Given this, wouldn't the simplest choice be to set the cut off for a metric at 130 pitches rather than at 100. Do you see a difference between the effect of a 90-99 pitch start and a 120-129 pitch start? In fact, the pitchers who threw 120-129 pitches performed better than those throwing 90-99 or 110-119. I'm not saying that you should make your pitcher throw ten more pitches if he is at 112, but the logical conclusion is that it doesn't make much difference until you get to 130 pitches. Additionally, setting the penalty threshold at 130 would be further corroborated by Craig Wright's work.

While there clearly is some decay over 130 pitches, the next question is, Is this level of decay significant? Let's leave the performance index for a moment and look at Runs Allowed. In their study, Keith and Rany found that there was a 2% rise in runs allowed for all starters and that for pitch counts above 140 the rise was 7.5%. If we subtract off the 2% baseline we see that a 140-149 pitch start (something that happens 16 times a year, once a year for every other team) will increase the RA ratio by 5.5% over the next 21 days. This means a 4.28 RA (the average over the course of Keith´s study) rises to 4.52. (4.52 - 4.28)/9 comes out to an increase of 0.026 runs per inning.

The 5.5% increase in runs appears significant, but think about what this means over the course of 21 days. In a five-man rotation with no off days this corresponds to 4 starts or roughly 24 innings (using league averages). Taking 0.026 times 24 innings you have an increase of 0.63 runs for the entire 24 innings. And that is after a 140-149 pitch start. The effect is about 30% of that for a 130-139 start.

Now if you are Jimy Williams and your Red Sox are leading the Yanks 1-0 in the top of the 9th, do you bring in Derek Lowe, who has saved to the two previous games, to face Jeter, O´Neill and Williams, do you stay with Pedro Martinez who is at 118 pitches and is working on a 14-strikeout shutout or do you bring in a rested Rich Garces? Looking at this evidence, there had better be some very serious extenuating circumstances if Pedro isn't on the mound in the ninth inning. One run in this situation is far more important than one potential run over his next four starts.

Rather than showing that pitcher abuse has immediate, dire consequences this tells me that pitchers are much more resilient than previously thought. While this doesn't speak to the long-term consequences, the short-term consequences of a single high pitch count start are essentially non-existent.

This leads us to a question. If a single high pitch start (140+ pitches) raises the number of runs allowed over the next four starts (24 innings) by less than one run, how valid is a method that bases its form on this effect? An effect that comes into play in a mere 0.5% (1 in 200) of all starts.

PAP and long-term injuries

In part two of his study, Keith Woolner attempts to show that high PAP3 totals lead to greater incidence of injury. And he correctly posits that if PAP3 isn't any better than plain vanilla pitch counts then it isn't a worthwhile metric. To summarize his study (hopefully correctly), he finds a list of starters injured from 1988-1998 (years corresponding to available pitch count data) from the Sports Encyclopedia and then finds uninjured pitchers who have thrown a comparable number of career pitches by the same age. He then compares the career PAP and career pitch count values for the healthy and injured pitchers.

Given his stated goal, this is a very sound way to study it. I have one large concern about how he approached it though. The premise behind PAP is that isolated high pitch count outings are worse than low pitch count outings. For instance according to PAP3, starts of 120-80-120-80 are much, much (mathematically they are infinitely) worse than starts of 100-100-100-100. I don't believe Keith's study determines if that is in fact true. In using career pitch count totals to find comps rather than career pitch count totals and games started (in effect pitches per start), he is placing high workload short careers in the same bin with low workload long careers. For instance, Jason Bere had 7800 pitches by age 25. Keith compared him with pitchers who have had 7020 to 8580 pitches by age 25. Here is a list I came up with from 1994 to the present.

+-----------------+------+-----+-------+
| name            |  NP  | GS  | avgNP |
+-----------------+------+-----+-------+
| Shawn Estes     | 7338 | 71  |   103 |
| Chan Ho Park    | 7527 | 74  |   102 |
| Justin Thompson | 7833 | 77  |   102 |
| Joey Hamilton   | 8003 | 79  |   101 |
| Scott Karl      | 8242 | 82  |   101 |
| Glendon Rusch   | 8073 | 81  |   100 |
| Jason Schmidt   | 8404 | 84  |   100 |
| Steve Trachsel  | 8016 | 82  |    98 |
| Jason Bere      | 7800 | 80  |    98 |
| Steve Woodard   | 7345 | 84  |    87 |
+-----------------+------+----+--------+

Now the question I have is, Do these pitchers have a similar profile? At least in Steve Woodard's case I would say no. I think to do this study you really have to control for the average pitches thrown per start.

I do like the running window chart that Keith runs to show how workload has some correlation to injury, but I would have liked to see a similar chart for pitchers ordered by pitches per start and other variables as well.

Conclusions

I hope that Keith Woolner will make much of his data available to the general public. For instance, I would like to know why less than 30% of pitchers in the study have an above average ratio of PAP3 to career pitches. I'm guessing that Randy Johnson is such an outlier that he alone is skewing the mean far higher than the median. The median would be a far more robust measurement to use on page 513. If the data were available (for instance, name, GS, Pitches, PAP3, Injury), readers could test a number of the conclusions on their own.

While I applaud their effort and I hope that they will continue to produce more research on this topic, I feel that Rany and Keith need to address some of the concerns presented here before we can accept PAP3 as a valuable tool. And I also hope that others will refrain from praising PAP3, until these serious questions have been resolved.

My purpose here has been to give a balanced, well-reasoned critique of the theory behind PAP3. Again, it is not my intent to attack anyone in any way, but to push forward our understanding of the many dynamics at work when we consider pitch counts. I hope that Keith and Rany will comment on this work and that we can take part in a constructive dialogue and develop a measurement that balances the concerns of teams and fans who want to win games and players who want to enjoy a long career in baseball.

Appendix

I've included a table of starts in a couple of formats from 1994-2000, which contain number of pitches, groundball-flyball, batters faced and other goodies in my data section.

I've done some small studies myself on pitcher usage. I'm continuing to work on this area and hope to have some more information in the future on our new webzine, Baseball Primer.

The first version of this article (and I'd like to hear if you feel it was over the line).

Baseball Prospectus website

http://www.bigbadbaseball.com/statofday/sotd_20000611.html

http://www.bigbadbaseball.com/statofday/sotd_20000426.html

http://www.bigbadbaseball.com/statofday/sotd_19990808.html

http://www.bigbadbaseball.com/articles/forman_19990913.html

Sean Forman Posted: March 07, 2001 at 12:00 AM | 4 comment(s)
  Related News: General

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 1 pages
   1. Craig Calcaterra Posted: March 12, 2001 at 07:00 PM (#603426)
Sean,

I read both the new and old version of the article. For what it's worth, I don't think that your comments are any more "over the line" than what Neyer typically has to say about journalists or baseball people. That said, Neyer is in a slightly different field than you when he writes his comments. Neyer's ESPN column puts him in a commentator/wonk role, while your peer review efforts places you more in the role of a hard scientist. Yes, this is slightly a matter of semantics, but a real distinction does exist in my mind. For the purposes of your PAP piece, you are really like the physicist reviewing a colleagues work for a scholarly journal, whereas Neyer is like the technology reporter for the New York Times who digests the work of the scientist (Woolner) and provides context and commentary for lay people like me. This is not to say that Neyer is less of a scientist or you are less of a columnist. Indeed, the science reporters are usually every bit able as the people doing the primary research. You are, however, providing different functions.

This distinction is important, I think, because keeping it in mind will help sabermetrics gain respect as a "real" science. A science where peer review actually means something other than simply criticizing ones peers. Sure, physics and genetics has its share of personal squabbles, nit picking, and accusations embedded in the peer review process, but as a relatively new field, the science of baseball is going to be held to a higher standard than physics. Even if it is not, wouldn't it be great if, as it matures, the science of baseball finds itself to be a more congenial discipline? One that is immune from the pettiness that often characterizes the other, more established fields?

In closing, I think it is fair for the BP's methods to be questioned, and the fact that it has the dual intent of selling books and providing good analysis is important to consider. I just think that such broader criticisms are better suited to an opinion piece than a piece containing valuable critical anlysis.

Keep up the good work,

Craig Calcaterra

   2. Sean Forman Posted: March 13, 2001 at 07:00 PM (#603442)
John,

Thanks for the comments and thanks to everyone else as well. The point
I'm trying to make is that I don't believe in they've shown that
PAP measures abuse. We know repeated high pitch counts are bad, but
why do we need this formula when we have pitch counts already? The
formula implies that anything above 100 is bad and that the damage
accumulates at an ever quickening rate. These are assumptions that
must be proven, and I would argue that they haven't been.
   3. Michael Posted: April 10, 2001 at 07:03 PM (#603639)
I enjoyed your article, Sean.

I don't know whether the first version (not available apparently now) was out of line, but it can't be any more out of line than the 1999 & 2000 Big Bad Baseball Annuals were in making attacks and included namecalling. They seemed to disagree with PAP merely because someone else thought of it first and then posted color-coded graphs on their website of rolling 3-start pitch counts, again for no more obvious reason than they thought it made intuitive sense (or at least that's my recollection of it). Not that you implied otherwise, but it'd be nice to acknowledge that others throw out sabermetric measures without first backing them with research.

Now that we've progressed to the point where we can have well-written, well-researched peer review, maybe we can have some real research. It's incredible with the data now available and seemingly a fair number of interested researchers that we still are relying on things like small-scale studies Bill James did in the 1980s or Craig Wright's book published in the 1980s before the necessary data was widely available to support some of our most basic sabermetric tenets.
Page 1 of 1 pages

You must be Registered and Logged In to post comments.

 

<< Back to main

Support BBTF

donate

Thanks to
Don Malcolm
for his generous support.

My Bookmarks

You must be logged in to view your Bookmarks.

Vivid Seats is a sports ticket broker, concert ticket broker and theater ticket broker offering the best baseball tickets like Yankees tickets, Cubs tickets, and Red Sox tickets, as well as Police reunion tour tickets and Jersey Boys tickets.

We have baseball tickets, the NFL schedule, college football tickets and Cowboys tickets. We have NBA tickets like Celtics tickets and Lakers tickets. Plus, buy concert tickets, Patriots tickets and Colts tickets. Also check out our MLB baseball schedule

Baseball Bats

JustGreatTickets.com provides the best value for Chicago Cubs Tickets, MLB tickets including Red Sox Tickets, Yankees Tickets, SF Giants Tickets, LA Dodgers Tickets, Cleveland Indians Tickets. Get the best concert tickets like Jonas Brothers tickets and more Chicago Tickets.

Concerts Theatre NFL Angels Dodgers MLB Celtics Theater NBA Tickets Venues NHL Lakers Tickets NFL Yankees NHL Phillies NBA Wicked Marlins MLB Concerts Cubs Mets Red Sox Wicked WWE Red Sox Mets Yankees Dodgers

Major League Baseball: All Star Game, New York Yankees, Boston Red Sox, LA Angels, Washington Nationals, Chicago White Sox, and the Chicago Cubs.

Find terrific deals on Yankees tickets for the new home, Cubs tickets for classic Wrigley, or Red Sox tickets for Fenway with OnlineSeats. We have seats for every baseball game, including Dodgers tickets.

Page rendered in 0.6914 seconds
62 querie(s) executed