Baseball for the Thinking Fan

Login | Register | Feedback

You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Monday, August 19, 2002

Win Values:  A New Method to Evaluate Starting Pitchers

Your opinion of starting pitchers will not be the same.



I have developed a new system to evaluate the contribution a starting pitcher   makes to his team.? Let me contrast my approach to the approach reflected   in a pitcher?s Wins Above Average (WAA) figure.? WAA looks at a pitcher?s   ERA for a season and the number of innings he pitched, and together with information   on the pitcher?s home park derives the number of runs that the pitcher saved   his team over and above what a league average pitcher would have allowed.?   The final step of converting these saved runs into a number of wins is done   by estimating how many additional runs, on average, lead to an additional   team win.

WAA is a very compelling stat and has been the backbone of pitcher evaluations   for many years.? My system takes into account two additional factors in evaluating   a starting pitcher.? First, I evaluate a starting pitcher?s contributions   on a game-by-game basis rather than simply evaluating his end-of-season stats.?   After all, an average such as ERA can obscure as well as reveal information.?   A team for whom the starting pitcher gives up 0, 1, and 17 runs in three starts   is likely to win more games than if the pitcher gives up 6 runs in each game,   even though the average runs allowed is the same in the two cases.

Evaluation schemes for hitters are almost always performed using seasonal   data rather than game-by-game (play-by-play) data.? The reason is that hitters   come up to bat 600-700 times over the course of a season.? This represents   a large enough sample for things to generally ?even out? over the course of   a season.? Evaluations of hitters based on seasonal stats are quite consistent   with more detailed evaluations based on play-by-play data.? Therefore, it   is not worth the extra effort to utilize more detailed evaluation methods   for hitters.

Pitchers, on the other hand, start only about 30 games a season in today?s   era.? 30 games is not enough for things to ?even out? over the course of a   season.? We will see that, contrary to hitters, the evaluation of pitchers   using game-by-game data can often be significantly different than the evaluation   using seasonal stats.

Doing the evaluation on a game-by-game basis requires a great deal of detailed   data as well as an entirely new set of machinery.? The rules of thumb that   apply to seasonal averages (such as the number of runs needed for an additional   win) no longer apply on a game-by-game basis.? In addition, depending upon   what elements of the game you include in the evaluation, probabilities may   need to enter the fray.? For example, if a starting pitcher gives up 3 runs   in a game, how should we evaluate this outing?? If you choose to abstract   from his team?s actual offensive run support, you would try to estimate how   often the pitcher?s team would have won a game allowing 3 runs based upon   the league average distribution for its own runs scored.[1]? Clearly, the   fewer runs allowed, the more likely the team would have won the game with   average run support.? While I think Michael Wolverton?s game-by-game Support-Neutral   Win (SNW) system is an improvement to the seasonal-based WAA system, I don?t   think he goes far enough.?

Second, my system takes into account how many runs the pitcher?s own team   actually scored in the game.? Clearly WAA or SNW do not take into account   a pitcher?s run support.? Those systems purposefully abstract run support   so as to evaluate a pitcher solely on what he has control over.?

While this sentiment is laudable, it does not necessarily lead to the most   accurate evaluation of a pitcher?s actual contribution to his team?s actual   winning of baseball games.? One or two examples will suffice.? A pitcher who   gives up 2 runs in a 3-2 win contributed significantly more to his team winning   the game than a pitcher who gives up 2 runs in a 14-2 win.? In the first game,   the team that scored only 3 runs could easily have lost the game with league   average pitching, whereas in the second game the team that scored 14 runs   would very likely have won the game even with league average pitching.[2]

Consider the flip side of the coin.? Suppose a team loses a game 12-0.?   The starting pitcher should not shoulder a large portion of the blame for   losing the game, despite giving up 12 runs.? Even with league average pitching   (say allowing 5 runs), the team would not have come close to winning since   it did not manage to score any runs.

Each of the evaluation methods described above, WAA, SNW, and my new Win   Value stat, attempts to estimate how many extra games a pitcher?s team won   due to his contributions over and above the contributions of a league average   pitcher.? Acknowledging that run support can affect the importance of a pitcher?s   runs allowed seems a definite step in the right direction.?

The confluence of personal computers, the internet, and the electronic availability   of baseball data allows more accurate formulas to be developed.? WAA uses   a player?s seasonal data, and therefore is necessarily a more general formula.?   SNW and Win Values both depend upon game-by-game data, and are therefore more   specific and more accurate in what they measure.

Stats such as WAA and SNW are good stats and are very good predictors of   future success.? The reason is that they abstract from the pitcher?s run support   which is notoriously variable from season to season.? However, this aspect   that makes these stats good predictors (looking forward) is the reason that   they may not be very good descriptors (looking backward).? For only by considering   a team?s run support can you accurately evaluate a pitcher?s actual contribution   to his team actually winning the game.

Win Values is the only stat that properly integrates run prevention information   with win-loss information.? Win Values attempts to reflect the strengths of   both types of information in a single stat.? By considering what actually   happened in each game, Win Values is a very good descriptive stat.? When I   look in a Baseball Encyclopedia and see that Sandy Koufax is deemed to have   contributed 6.0 wins to the 1966 Dodgers, I want that figure to be the best   possible estimate.? I have designed Win Values to be the best possible estimate.

Part 1: Introduction
  Part   2: Conceptual Framework
  Part 3: High-Level Results

  Part   4: Formulas
  Part   5: Empirical Data for AL 2000
  Part   6: Example: David Wells in AL 2000
  Part   7: Yearly Results for 1978-2001
  Part   8: Top Stars
  Part   9: Concluding Remarks

[1]?   This is essentially what Michael Wolverton does in his Support-Neutral Wins.?   I should also say that my system is similar to Doug   Drinen?s Win Probability Added stat that appeared in the Big Bad Baseball   Annual.

[2]?   After all, a starting pitcher is often told to ?keep his team in the game?   or to ?give his team a chance to win?.


Rob Wood Posted: August 19, 2002 at 06:00 AM | 13 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. tangotiger Posted: August 19, 2002 at 12:41 AM (#605889)


Good stuff.  As a proponent of “value-added” metrics, be it run-added, or win-added, I am all in favor of approaches that try to adhere to that.

While I have not read Doug Drinen’s work, I get the feeling that what he did, what you do, what I do is all driven by the same approach first brought forth by Mills Brothers.  That is, try to figure out the chances of winning at some state in the game without your contribution, the chances of winning with your contribution, and give the difference to the player or event that caused the difference.

There is much to comment on, so I’d like to just offer the first piece of information: runs allowed.  If I understand your process, you are treating runs allowed as entirely within the control of the pitcher.  And therefore, when determining the “win added” value, it is based on this value.

However, how do you account for the fielders?  I would think a DIPS approach would be in order.

My approach to win-added processes is to always ask the question: “what would an average pitcher have done, given these conditions”.  However, the only conditions you have accounted for is park and run support.  Fielding support should also play a role.  (Not that it’ll change things much.)

   2. Rob Wood Posted: August 19, 2002 at 12:41 AM (#605891)

First off, I’d like to thank Jim Furtado for posting my article (and spending endless hours painstakingly converting my article into HTML).

Tango, yes you are right.  The win values system (at least in its current form) treats all runs as the responsibility of the pitcher.  This follows in the tradition of ERA, WAA, SNW, WPA, etc.

I can think of two ways to incorporate DIPS-type thinking into the win value system.  First, we could adjust the win values numbers (outside the system) to account for the general contribution of the defense to run prevention.  This could conceivably be done by taking into account the pitcher’s strikeout proclivity, among other things.  Second, we could adjust the win value numbers (perhaps inside the system) to account for the specific defense behind the pitcher that day.  This second approach seems very difficult insofar as there just aren’t enough games in a season to form a rich enough dataset at this level of detail.

In any event, I welcome any comments along these lines (or others) as to how the win values system could be improved.

   3. Michael Humphreys Posted: August 19, 2002 at 12:41 AM (#605894)

Excellent article; conceptually elegant, clearly and gracefully written, well-organized.  As I understand it, you’ve developed the ultimate tool for reconciling ERA, W-L records and, most importantly, the perceived ability on the part of certain pitchers to pitch “just well enough to lose” (or, conversely, just well enought to win).  Few questions/comments.

First, as Tangotiger mentions and as you’ve acknowledged, it would be helpful if we could somehow incorporate fielding in the model.  You’re right that per-game fielding data would be too complicated to develop.  On a seasonal or career basis, have you done any regressions of pure pitching statistics, such as strikeout, walk or homerun rates, onto Win Value performance?  That might indicate the general combination of pitcher skills that most strongly correlate with the ability to keep a team in the game, or even allow us to back-into an estimate of a pitcher’s Win Value independent of fielding defense.

Second, my sense is that pure pitching stats have an even greater impact on Win Value performance than they do on ERA performance.  In other words, the impact of fielders is slightly lower in determing the extent to which a pitcher can keep his team in the game.  In a close game, the ability to bear down and get the strikeout or avoid the walk or homerun seems to be much more in the pitcher’s control than the ability to make a good fielding play is in the control of even a good fielding team.  Batted-balls-in-play just seem to be randomness generators.  The actual Win Value impact of the real ability of a pitcher to generate a ground ball that might be turned into an inning-ending double play is more dependent on luck, even controlling for the actual ability of his fielders, and therefore by definition less reliable.

Third, your methodology would be particularly useful, I think, in evaluating pitchers in the dead-ball era, when there were very few homeruns.  In that environment, the risk of giving up a big lead *quickly* was vanishingly low, so a pitcher whose team had a big lead could “relax” and save his stuff until such time as the lead shrunk.  In other words, the really valuable part of your system, the measurement of the extent to which a pitcher is pitching just well enough to lose or win, would be, I think, much more significant in the dead-ball era, when pure pitching events such as strikeouts, walks and homeruns constituted a much smaller part of the game, in the *aggregate*, than they have since then.  I think what we would find is that the great pitchers of the deadball era could reliably generate strikeouts at a higher rate when the game was close or on the line.

Of course, this raises the issue of whether we should rate dead ball pitchers alongside modern pitchers.  In Bill James’ latest Abstract, pitching ratings seem to decline inexorably over time, and the best ratings are from the deadball era, as Mr. James himself notes.  My sense is that it is difficult to determine whether or to what extent the pitchers who mastered the art of saving their stuff would be able to succeed in modern baseball, in which the game is much more frequently on the line.

Now that I think about it, we may have stumbled upon the reason that pitchers apparently *had* more of an impact in the dead ball era than they seem to today, including under the Win Shares system.  They *only* “entered” the game when the game was on the line!

Fourth, picking up on the point that pitchers of today seem to have less impact . . . the number of “wins” above league average under your system, which is probably the best system yet devised that does not take into account fielding, indicates that pitchers of today as a group and over the long haul just don’t have the same individual impact as the best everyday players.  Bill James in his Abstract and in Win Shares expresses concern that objective criteria seem to result in pitcher evaluations that suggest that pitchers are less important than we’ve traditionally thought.  Your study provides further evidence that that is simply the truth.

Fifth, I would imagine that pitchers who completely fail to adjust their game for the game situation would not do so well under your system.  Nolan Ryan tried to strike out every batter no matter who he was or what lead Nolan’s team had.  (I think that Nolan pitched during the time-period of your study, and I’ll go look for him.)  In contrast, Bob Feller seemed to have better W-L records than one would guess from his walk and ERA stats.

Sixth, did you calculate Win Values for all starting pitchers during the time period of your study?  Although you explicitly note that replacement value would not be addressed in your article, it would be a fascinating thing to know the exact distribution of Win Values so that we could begin to address the replacement value issue.

A beautiful article proposing a useful new system.  Thanks.

   4. Rob Wood Posted: August 19, 2002 at 12:41 AM (#605895)

Michael, thanks for the kind words.  You raise several good issues that I’ll need some time to digest.  However, I can comment on a few things.  First, win values are indeed calcuated for every pitcher who started at least one game in every league-season for which I have the required detailed data (the posted article covers 1978-2001).  If anyone is interested, email me and I will send you the raw data.

Second, Retrosheet has recently released the detailed play-by-play data for additional seasons (1969, 1974-1977).  I am in the process of extending my analysis to those seasons, and pitchers of that era.  So I’ll have more to say about Nolan Ryan in my next installment. 

Third, I will have more to say about the relative to league average approach (which I take) versus the relative to replacement level approach (which I also like) in my next installment.  Suffice it here to say that I have found a way to go back and forth between the two, and will present results using both baselines in my next article.

Finally, regarding the importance of pitchers vs position players (leaving aside for the moment the allocation of win values to fielding), I was under the impression that some of these win value figures are pretty large.  Four times has a pitcher contributed more than 8 wins above average since 1978.  Isn’t this a large number, perhaps even comparable to the best position players?

   5. Michael Humphreys Posted: August 19, 2002 at 12:41 AM (#605897)


Thanks for your response.  Regarding the last point, it is true that occasionally a pitcher will have an eight-wins-above-average year.  But it seems that once you get past those outliers, the values are much more likely to be bunched around two or three wins, whereas there generally seem to be more everyday players who can put together Total Baseball Total Player Ratings of three, four, five or six wins above average.  I haven’t really checked this, so maybe I’m off base.  Even if I’m correct, I don’t think that is any negative reflection on your model.  Most systems, including Bill James’ Win Shares system and Total Baseball Pitcher Index, seem to get to approximately the same result.  And on a career basis, rather than a single-season basis, the pattern, unless I’m seeing things, is even greater.

This is where replacement level analysis could have a big impact.  Bill James believes, and I tend to agree with him, that the worst “regular” pitchers are significantly worse than the worst everday players—the talent distribution of pitchers and non-pitchers is not the same.  Or maybe it’s just that there is theoretically no limit to how many runs a bad pitching staff can give up, whereas the worst possible lineup cannot score fewer than zero runs.  So if one derives replacement level ratings, pitchers may re-emerge with higher ratings that are closer to those for everyday players.

Thanks again.

   6. MattB Posted: August 20, 2002 at 12:41 AM (#605907)


Still processing the content.

Section 7, leaders from 1978-2001 begins in 1980.  There are three years missing.  (Wanted to see where Steve Stone ranked.)


   7. Rob Wood Posted: August 20, 2002 at 12:41 AM (#605908)

MattB, thanks for pointing out the missing 1978-1980 results.  I have asked Jim Furtado to look into posting those tables.

Since I have the tables in front of me, I can give you some information on Steve Stone in 1980.  Of course this was the year he went 25-7 (with a good, not great, 123 ERA+) and won the Cy Young award.  Mike Norris led the AL in win values (5.47), followed by Britt Burns (4.05), and then comes Stoney (3.42).

Again, sorry for not having all the tables posted properly.  Hopefully, Jim will get them posted soon.

   8. tangotiger Posted: August 20, 2002 at 12:41 AM (#605912)

While there is no win to distribute, doesn’t mean that everyone gets 0 wins, ala Win Shares.

There are -.500 wins to distribute.  The pitcher can easily get +.5 wins, while the rest of the team gets -1.0 wins for their dismal performance.  Overall, that gives this team -.5 wins.

   9. Rob Wood Posted: August 21, 2002 at 12:42 AM (#605921)

Catching up on the latest questions.  David Smyth raises an interesting point.  I see where you are coming from, but I don’t think anyone wants to go that far.  There are zillions of reasons why we should not evaluate players based solely on how they performed in games their teams won (even pitchers).  TangoTiger’s reply sums up the best response.  In a 1-0 loss, say, the hitters significantly underperformed their norm so that they should be debited; the 1-0 hard-luck losing pitcher should actually be credited for his performance.

RossyW asks about Jack Morris and pitchers whose win values outperform their WAA.  Honestly, I have not done enough research into win values to form conclusions at this point.  I do believe that pitchers *have the opportunity* to change their approach based upon the score, the hitter, the inning, the base-out situation, etc.  Thus, I think it is an issue worth keeping an open mind about.  In any event, Jack Morris’ teams received greater “value” from his performances than would be reflected solely by his WAA figures.

   10. Rob Wood Posted: August 21, 2002 at 12:42 AM (#605943)

David, I don’t mean to be difficult, but I do think it appropriate to say that win values are “value” measures.  That’s exactly what they are intended to capture.  Think of a similar value-added approach to measuring a hitter’s contributions.  You could (and people have) track the improvement that a player makes to his team’s chance of winning the game in each plate appearances by comparing the win prob before his PA and after his PA.  Sum these over each PA in a season.  I would say that these are straight value measures, even though the system would credit him with “value” even in games the team wound up losing. 

Same is true of Bill James’ win share system.  He allocates the team’s wins in a season among its players based upon their performances.  Of course, some performances in losing causes contribute to positive win values, as is appropriate.

I am guessing that you are raising a semantic point about the meaning of “value”.  However, the strict defn that you seem to suggest is both theoretically and empirically unsound.  That defn would be consistent with many dumb broadcasters and inconsistent with most serious analysis.

Hopefully, you’ll agree with the following.  Win values is as far at the value end of the value-ability spectrum that is reasonable to go.
In any event, thanks for your comments.

   11. tangotiger Posted: August 22, 2002 at 12:42 AM (#605950)

Value can be defined in at least two ways:
1 - Value is the sum of the incremental change in the theoretical win expectancy as a result of the performance of the player (i.e., Expos have a .42 chance of winning a game at some point, Vlad hits a solo HR, and now the Expos have a .51 chance of winning, giving Vlad +.09 wins)

2 - Value is generated only by the team winning the final unit of something, like a game.  How that value is portioned out is another topic of discussion.

David’s position, it seems that while he does not necessarily advocate the second position, he’s saying, I think, that true value can only be represented by #2.  And that while #1 is a valid thing to do, it is not true value, by his reckoning, I think, and is made up of some blend of value and ability.

So, to continue on this train of thought, with true ability on one end, and true value on another end, there is different shades of where ability and value can be incorporated in a measure of something.

That perhaps, #1 above is “90% value and 10% ability” (or whatever), but it is not true value, as explained by #2.

Perhaps it is semantics, but before discarding such position, there is value in understanding and exploring such ideas.

   12. Rob Wood Posted: August 23, 2002 at 12:42 AM (#605965)

The 1978-1980 tables are now posted in section 7.  Sorry for the mixup.

   13. Rob Wood Posted: August 23, 2002 at 12:42 AM (#605967)

Another reply to David Smyth.  You are raising an important and valid point.  I readily admit that.  And, yes, many of the debates over the years have rested upon semantic distinctions.  I also agree that the distinction between “ability” measures (I call these forward-looking) and “value” measures (I call these backward-looking) is important.

All I am saying is that at this point in the development of win values, not enough research has been undertaken to investigate how much of win values is actually “value” and how much is actually “ability”.  I guess one way to start to look into this is to compare the persistence of win values to that of WAA over time (for a given pitcher).  Or, maybe equivalently, look at how the difference of WAA and win values behaves over time (for a given pitcher).

You must be Registered and Logged In to post comments.



<< Back to main

BBTF Partner

Dynasty League Baseball

Support BBTF


Thanks to
Adam M
for his generous support.


You must be logged in to view your Bookmarks.


Page rendered in 0.2868 seconds
59 querie(s) executed