Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Wednesday, March 07, 2001

A couple of questions regarding Pitcher Abuse Points

A look at some the reasoning behind and significance of the Pitcher Abuse study run in the 2001 Baseball Prospectus.

First Appeared: March 3, 2001.  Note that this article has been
revised several times thanks to discussions with several different
people.  While I don’t feel anything said initially was particularly
out of line, my intent is to discuss the matter rationally and
scientifically, so I have attempted to comment purely on the facts or
my opinion of the facts.  This current revision was written March 7,
2001.  There is a link to the original article at the bottom of this
page.

Rob Neyer ran a recent column (

href="http://espn.go.com/mlb/columns/neyer_rob/42798.html">

ESPN.com -
Rob Neyer - Sometimes they just get hurt
) in which he made the
following statement.

I mean, I had some evidence (speaking of pitcher
injuries and pitches thrown). Craig Wright wrote brilliantly on the
subject in his classic book, “The Diamond Appraised” (a book that
Price has read), and there’s certainly plenty of anecdotal “evidence”
that throwing, say, 145 pitches in a game bodes ill for a pitcher’s
short-term performance, if not his long-term health. But the research
has never been particularly comprehensive ... until now. If you’re
interested in this subject—or if you’re interested in baseball at all
—then I refer you to the aforementioned “Baseball Prospectus,” which
is only the best baseball book you’ll read this (or most any other)
spring.

And while I don’t want to give everything away, BP’s Keith Woolner
and Rany Jazayerli make a convincing argument that high pitch counts
tend to result in both short-term performance decline and long-term
injury risk. This greatly simplifies Woolner’s and Jazayerli’s
findings, and I expect to discuss them in greater depth at some future
point. In the meantime, though, I urge you to buy the book, and I also
urge the Mariners to stick with their program. Yes, pitchers are
always going to get hurt, because throwing a baseball 90-plus miles
per hour just isn’t natural. But that doesn’t mean you can’t do
anything about it.

 

A few weeks ago,

HREF="http://espn.go.com/mlb/s/2001/0115/1017090.html">

January 26th to
be exact, Rob Neyer commented that peer review would be a good
thing for sabermetrics to have.  In his latest column, he talks about
the new PAP3 (pitcher abuse points) measurement, proposed
by Keith Woolner and Rany Jazayerli, as a good new metric for
measuring pitcher use and abuse.  Since I received my copy of

HREF="http://www.baseballprospectus.com/">

Baseball Prospectus from
Amazon.com, I’ve been working up some thoughts about PAP. And since
Rob Neyer has brought this important topic rightfully to the fore, I’m
going to give some of my impressions of the study Keith and Rany
performed.

I have several concerns about the methods and studies Keith
Woolner and Rany Jazayerli discuss in pages 505-516 of their 2001
edition.

  1. The formulation of PAP3 is based on an effect
    that shows single high-pitch count games (140 pitches or greater)
    cause pitchers to allow less than one additional run over their
    next 24 innings pitched (4 starts).  Such an effect while interesting
    (and I don’t question the validity of that number) is far from a
    smoking gun that anything over 100 pitches is bad.

  2.  

  3. The second study that attempts to show PAP3 predicts
    long-term injuries better than pitch counts compares the wrong two
    metrics in my opinion.  Keith Woolner compares career PAP3
    with the career total of pitches thrown.  Comparing PAP3 to
    average pitches thrown per start would be a much more illuminative
    comparison.  While an extreme example, does anyone really believe that
    10 starts of 60 pitches and 6 starts of 100 pitches are the same
    thing?  For this study they are.  Some control for averages per start
    should be considered as well.

  4.  

  5. I would like to see the authors make some of the data available.
    The scatter plot on page 513 is very confusing and seeing the data
    points for ourselves would help us understand the method far better.
    It is also unclear to me how less than 30% of the starters can be
    above average as is stated on page 513 unless handful of data points
    are severely skewing the overall totals.  I understand that not
    everyone shares my academic
    background
    , but I think that everyone benefits if theories are
    tested, played with (I mean that in a good sense, mathematicians
    routinely “play” with numbers and ideas), and found by many people to
    be true.

 

Now, I’ll take these points one by one.

The History of PAP and Formulation of PAP3

 

Before I get started I just wanted to disagree with a statement
Rany made in the introduction, “Research dating back to Craig Wright’s
The
Diamond Appraised
has suggested a 100-pitch limit for
developing pitchers.” (p. 505)

If you refer to Wright’s suggested limits in his book, they are
far, far more lenient than a 100-pitch limit, which I assume Rany
meant as a max 100-pitch limit.  Looking at Wright’s words, here are
some recommendations he makes on p. 211 of his book.  I’ve added
emphasis.

  • As a teenager, a pitcher should not be allowed to throw
    two-hundred-inning seasons or have a BFS (batters faced per start)
    over 28.5 in any significant span (150-plus innings).  This does not
    include instructional league or winter-ball innings if there is a
    reasonable amount of time off between the leagues.  (MY NOTE: 28.5 BFS
    is a 100 pitch limit, note that this is the average and not an
    absolute limit.)

  •  

  • A teenage pitcher should not start on three days´ rest,
    which generally means no four-man rotations in A-ball.

  •  

  • For ages twenty to twenty-two, they should average no more
    than 105 pitches per start for the season (105 pitches is roughly
    equivalent to 30.0 BFS).  A single-game ceiling should be set at
    130 pitches.

  •  

  • For age twenty-three to twenty-four, the restraints can be eased
    up, but their season average should stay under 110 pitches in
    most cases.  The single-game ceiling can be jumped up to 140 as
    long as the pitcher is still strong.

 

Wright’s final five suggestions deal with conditioning and
organizational policies like player promotion and an emphasis on
throwing strikes.

There may be other work that supports a max of 100 pitches for
developing starters, but I am not aware of it.  I do know that Rany
proposed this theory in the first version of PAP produced a few
summers ago (see the link immediately below).  I’d be very happy to
review other references and revise that statement.

HREF="http://www.baseballprospectus.com/news/19980619jazayerli.html">

PAP
was introduced by Rany Jazayerli in the summer of 1998 on the
Baseball Prospectus website.  At least in print or online there was no
study made to discuss its validity at predicting injuries or whether
it was better than simple pitch counts at measuring pitcher overuse.
Again I’ll happily revise that statement if a reference is pointed out
to me.  While it is possible Rany did not intend for it to receive
widespread attention, it did receive considerable attention and was
mentioned by Rob Neyer in his ESPN.com column many times (most
recently May
23, 2000
).  With increased exposure comes increased scrutiny.

From my perspective, I want to see some evidence that these
theories are supported with evidence. I personally was very
disappointed that no study appeared in the 1999 book and again in the
2000 book backing up the claims its authors (such as it measured
damage to arms and that low PAP was a sign a manager was careful) and
others (primarily Rob Neyer who has called it groundbreaking and
repeatedly cited it to take managers to task for their handling of
young starters) had made for it.  It should also be noted that PAP did
appear prominently in both editions.  In the 2001 book, Rany and Keith
sought to firm up the reasoning behind PAP’s formulation, which I will
discuss below.

Keith Woolner has undertaken two studies which generated a new
method called PAP3.  In the first study, he considered the
performance of what he terms high endurance pitchers.  These pitchers
are generally the better than average pitchers.  For his cutoff more
than 50% of their starts must be longer than the league average
outing.  While his study is from the years 1988-1998, pitchers who
would have been placed in this group last year included Livan
Hernandez, Randy Johnson, Jon Lieber and Kris Benson.  Pitchers not in
this group include John Halama, Jeff Fassero, David Cone, Greg Maddux
and Andy Ashby.

For these pitchers, Keith looked at the number of pitches thrown in
a start and then compared the pitcher’s performances for the 21 days
following the start to the performance for the 21 days prior to the
start.  He calculated ratios for the post-start outings to the
pre-start outings for Innings Pitched/Games Started (IP/GS), Runs
Allowed/Inning (RA), Strikeouts and Hits Allowed.  He combined these
four measurements into a measurement called performance index.  Here
are the results stated in the book in table form.  A ratio of 1.00
indicates the index is unchanged, greater than one means it went up
and the pitcher pitched worse, and less than one it went down and the
pitcher pitched better.  I’m rounding to the nearest .005.  I added a
column with the total starts in that category from 1994-2000 (not just
the high endurance pitchers).

Pitches     Perf Index   Total starts (1994-2000)
90-99     1.010       6317
100-109     1.005       6554
110-119     1.010       4725
120-129     1.010       2460
130-139     1.020         635
140-149     1.050         113   (R. Johnson 21, Clemens 7)
                        18614 total
Wghtd. Avg.  1.010

 

As Keith notes in his study the average pitcher sees an increase of
1% in performance index for the 21 days after every start, so this
becomes our baseline.  As the table makes clear and as it is mentioned
on page 506, the range of pitches from 90-99 is presented as a single
point at x=90.  It is unclear to me why the left edge of the range of
pitches was chosen to represent the range when the middle (x=95) would
seem a much more logical choice.  In fact, it would be more accurate
to display this value as a range of values or a simple bar chart.
I’ve produced the book’s representation using a line graph and another
one showing bar charts.

Keith then uses his line graph and attempts to fit a number of
different curves to the graph. He decides upon (Number of pitches -
100)3 as the best fit.  While I agree that it does fit best
of the choices he gives in the book especially when compared to the
line graph, it does not accurately reflect the data because a left
extreme point of the range is used to represent the entire range.  If
you use the bar chart, the fit isn’t quite so good.  Please note that
I extrapolated where the PAP3 value for x=150 would hit.  I
think that to properly compare PAP3 to the data you you
need to consider the full range of the data and not just the left end
point of the ranges.

I find it interesting that there is no decay in the pitcher’s
performance (it’s nearly ramrod flat) until you get above 130 pitches.
Given this, wouldn’t the simplest choice be to set the cut off for a
metric at 130 pitches rather than at 100.  Do you see a difference
between the effect of a 90-99 pitch start and a 120-129 pitch
start?
In fact, the pitchers who threw 120-129 pitches performed
better than those throwing 90-99 or 110-119.  I’m not saying
that you should make your pitcher throw ten more pitches if he is at
112, but the logical conclusion is that it doesn’t make much
difference until you get to 130 pitches.  Additionally, setting the
penalty threshold at 130 would be further corroborated by Craig
Wright’s work.

While there clearly is some decay over 130 pitches, the next
question is, Is this level of decay significant? Let’s leave the
performance index for a moment and look at Runs Allowed.  In their
study, Keith and Rany found that there was a 2% rise in runs allowed
for all starters and that for pitch counts above 140 the rise was
7.5%. If we subtract off the 2% baseline we see that a 140-149 pitch
start (something that happens 16 times a year, once a year for every
other team) will increase the RA ratio by 5.5% over the next 21
days. This means a 4.28 RA (the average over the course of
Keith´s study) rises to 4.52.  (4.52 - 4.28)/9 comes out to an
increase of 0.026 runs per inning. 

The 5.5% increase in runs appears significant, but think about what
this means over the course of 21 days.  In a five-man rotation with no
off days this corresponds to 4 starts or roughly 24 innings (using
league averages).  Taking 0.026 times 24 innings you have an increase
of 0.63 runs for the entire 24 innings.  And that is after a
140-149 pitch start.  The effect is about 30% of that for a 130-139
start.

Now if you are Jimy Williams and your Red Sox are leading the Yanks
1-0 in the top of the 9th, do you bring in Derek Lowe, who has saved
to the two previous games, to face Jeter, O´Neill and Williams,
do you stay with Pedro Martinez who is at 118 pitches and is working
on a 14-strikeout shutout or do you bring in a rested Rich Garces?
Looking at this evidence, there had better be some very serious
extenuating circumstances if Pedro isn’t on the mound in the ninth
inning.  One run in this situation is far more important than one
potential run over his next four starts.

Rather than showing that pitcher abuse has immediate, dire
consequences this tells me that pitchers are much more resilient than
previously thought.  While this doesn’t speak to the long-term
consequences, the short-term consequences of a single high pitch count
start are essentially non-existent.

This leads us to a question.  If a single high pitch start (140+
pitches) raises the number of runs allowed over the next four starts
(24 innings) by less than one run, how valid is a method that bases
its form on this effect?
An effect that comes into play in a mere
0.5% (1 in 200) of all starts.

PAP and long-term injuries

 

In part two of his study, Keith Woolner attempts to show that high
PAP3 totals lead to greater incidence of injury.  And he
correctly posits that if PAP3 isn’t any better than plain
vanilla pitch counts then it isn’t a worthwhile metric.  To summarize
his study (hopefully correctly), he finds a list of starters injured
from 1988-1998 (years corresponding to available pitch count data)
from the Sports Encyclopedia and then finds uninjured pitchers who
have thrown a comparable number of career pitches by the same age.  He
then compares the career PAP and career pitch count values for the
healthy and injured pitchers.

Given his stated goal, this is a very sound way to study it.  I
have one large concern about how he approached it though.  The premise
behind PAP is that isolated high pitch count outings are worse than
low pitch count outings.  For instance according to PAP3,
starts of 120-80-120-80 are much, much (mathematically they are
infinitely) worse than starts of 100-100-100-100.  I don’t believe
Keith’s study determines if that is in fact true.  In using career
pitch count totals to find comps rather than career pitch count totals
and games started (in effect pitches per start), he is placing high
workload short careers in the same bin with low workload long careers.
For instance, Jason Bere had 7800 pitches by age 25. Keith compared
him with pitchers who have had 7020 to 8580 pitches by age 25.  Here
is a list I came up with from 1994 to the present.

+————————-+———+——-+———-+
| name         |  NP | GS | avgNP |
+————————-+———+——-+———-+
| Shawn Estes   | 7338 | 71 |  103 |
| Chan Ho Park   | 7527 | 74 |  102 |
| Justin Thompson | 7833 | 77 |  102 |
| Joey Hamilton   | 8003 | 79 |  101 |
| Scott Karl     | 8242 | 82 |  101 |
| Glendon Rusch   | 8073 | 81 |  100 |
| Jason Schmidt   | 8404 | 84 |  100 |
| Steve Trachsel | 8016 | 82 |  98 |
| Jason Bere     | 7800 | 80 |  98 |
| Steve Woodard   | 7345 | 84 |  87 |
+————————-+———+——+————+

 

Now the question I have is, Do these pitchers have a similar
profile?  At least in Steve Woodard’s case I would say no.  I think to
do this study you really have to control for the average pitches
thrown per start.

I do like the running window chart that Keith runs to show how
workload has some correlation to injury, but I would have liked to see
a similar chart for pitchers ordered by pitches per start and other
variables as well.

Conclusions

 

I hope that Keith Woolner will make much of his data available to
the general public.  For instance, I would like to know why less than
30% of pitchers in the study have an above average ratio of
PAP3 to career pitches.  I’m guessing that Randy Johnson is
such an outlier that he alone is skewing the mean far higher than the
median.  The median would be a far more robust measurement to use on
page 513.  If the data were available (for instance, name, GS,
Pitches, PAP3, Injury), readers could test a number of the
conclusions on their own.

While I applaud their effort and I hope that they will continue to
produce more research on this topic, I feel that Rany and Keith need
to address some of the concerns presented here before we can accept
PAP3 as a valuable tool.  And I also hope that others will
refrain from praising PAP3, until these serious questions
have been resolved.

My purpose here has been to give a balanced, well-reasoned critique
of the theory behind PAP3.  Again, it is not my intent to
attack anyone in any way, but to push forward our understanding of the
many dynamics at work when we consider pitch counts.  I hope that
Keith and Rany will comment on this work and that we can take part in
a constructive dialogue and develop a measurement that balances the
concerns of teams and fans who want to win games and players who want
to enjoy a long career in baseball.

Appendix

 

I’ve included a table of starts in a couple of formats from
1994-2000, which contain number of pitches, groundball-flyball,
batters faced and other goodies in my

HREF="http://www.baseball-reference.com/data/">

data section.

I’ve done some small studies myself on pitcher usage.  I’m
continuing to work on this area and hope to have some more information
in the future on our new webzine, Baseball Primer.

HREF="http://www.baseball-reference.com/otb/pitcher_usage_old.php">

The
first version of this article (and I’d like to hear if you feel it
was over the line).

Baseball Prospectus website

http://www.bigbadbaseball.com/statofday/sotd_20000611.html

http://www.bigbadbaseball.com/statofday/sotd_20000426.html

http://www.bigbadbaseball.com/statofday/sotd_19990808.html

http://www.bigbadbaseball.com/articles/forman_19990913.html

Sean Forman Posted: March 07, 2001 at 06:00 AM | 4 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Craig Calcaterra Posted: March 13, 2001 at 01:00 AM (#603426)
Sean,

I read both the new and old version of the article. For what it's worth, I don't think that your comments are any more "over the line" than what Neyer typically has to say about journalists or baseball people. That said, Neyer is in a slightly different field than you when he writes his comments. Neyer's ESPN column puts him in a commentator/wonk role, while your peer review efforts places you more in the role of a hard scientist. Yes, this is slightly a matter of semantics, but a real distinction does exist in my mind. For the purposes of your PAP piece, you are really like the physicist reviewing a colleagues work for a scholarly journal, whereas Neyer is like the technology reporter for the New York Times who digests the work of the scientist (Woolner) and provides context and commentary for lay people like me. This is not to say that Neyer is less of a scientist or you are less of a columnist. Indeed, the science reporters are usually every bit able as the people doing the primary research. You are, however, providing different functions.

This distinction is important, I think, because keeping it in mind will help sabermetrics gain respect as a "real" science. A science where peer review actually means something other than simply criticizing ones peers. Sure, physics and genetics has its share of personal squabbles, nit picking, and accusations embedded in the peer review process, but as a relatively new field, the science of baseball is going to be held to a higher standard than physics. Even if it is not, wouldn't it be great if, as it matures, the science of baseball finds itself to be a more congenial discipline? One that is immune from the pettiness that often characterizes the other, more established fields?

In closing, I think it is fair for the BP's methods to be questioned, and the fact that it has the dual intent of selling books and providing good analysis is important to consider. I just think that such broader criticisms are better suited to an opinion piece than a piece containing valuable critical anlysis.

Keep up the good work,

Craig Calcaterra

   2. Sean Forman Posted: March 14, 2001 at 01:00 AM (#603442)
John,

Thanks for the comments and thanks to everyone else as well. The point
I'm trying to make is that I don't believe in they've shown that
PAP measures abuse. We know repeated high pitch counts are bad, but
why do we need this formula when we have pitch counts already? The
formula implies that anything above 100 is bad and that the damage
accumulates at an ever quickening rate. These are assumptions that
must be proven, and I would argue that they haven't been.
   3. Michael Posted: April 11, 2001 at 12:03 AM (#603639)
I enjoyed your article, Sean.

I don't know whether the first version (not available apparently now) was out of line, but it can't be any more out of line than the 1999 & 2000 Big Bad Baseball Annuals were in making attacks and included namecalling. They seemed to disagree with PAP merely because someone else thought of it first and then posted color-coded graphs on their website of rolling 3-start pitch counts, again for no more obvious reason than they thought it made intuitive sense (or at least that's my recollection of it). Not that you implied otherwise, but it'd be nice to acknowledge that others throw out sabermetric measures without first backing them with research.

Now that we've progressed to the point where we can have well-written, well-researched peer review, maybe we can have some real research. It's incredible with the data now available and seemingly a fair number of interested researchers that we still are relying on things like small-scale studies Bill James did in the 1980s or Craig Wright's book published in the 1980s before the necessary data was widely available to support some of our most basic sabermetric tenets.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Vegas Watch
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.3809 seconds
47 querie(s) executed