Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Tuesday, February 04, 2003

More on the Modern Bullpen

Rob looks at some simulations based on recent reliever research.

I?d like to follow up on some recent work, most notably by TangoTiger, Bill James, Walt Davis, and Rany Jazayerli, into the way modern managers use their bullpens.  The recent work has argued that managers under-utilize their closers by largely reserving them for ninth inning save situations.  The prototypical example is bringing in your best relief pitcher to save a game when leading by 3 runs in the ninth inning.  This is a waste of resources, so the argument goes.  Any relief pitcher could be expected to get three outs before the other team scores 3 runs. 

 

In fact, some have argued that how managers utilize their bullpens represents the biggest impact of statistics on baseball ? a negative impact!  When the save statistic gained widespread popularity, managers began to use their bullpens in a way that maximized the total number of saves their closer got in a season, while the real objective should be to minimize the number of runs allowed given some constraints related to not over-using relief pitchers.

 

I ran a few simulations to look into this issue further.  I used the 2001 AL runs scored distribution (by half inning) as the baseline of my simulations.  In this league, teams averaged 4.86 runs per game, or about 0.54 runs per half inning.

 

To start simply, I suppose that all pitchers on each team are of league average ability and give up runs according to the same underlying distribution as exhibited by the 2001 AL league.  For those interested in the details, in each half inning the pitcher gives up 0 runs with probability 70.53%, 1 run (15.89%), 2 runs (7.34%), 3 runs (3.41%), 4 runs (1.64%), 5 runs (0.70%), 6 runs (0.29%), 7 runs (0.12%), 8 runs (0.05%), 9 runs (0.02%), 10 runs (0.01%).

 

All pitchers save one who we will call the Stopper of the team?s bullpen.  The Stopper gives up exactly one half the league average number of runs.  I accomplished this by dividing all the probabilities of giving up (positive) runs, and adding the leftover probability to the likelihood of giving up zero runs.  So in each half inning the stopper gives up 0 runs with probability 85.265%, 1 run (7.945%), 2 runs (3.670%), 3 runs (1.705%), etc.  Rather than giving up 4.86 runs per game, the Stopper gives up 2.43 runs per game, so has an ERA+ of 200. 

 

My simulations were performed at the granularity of a half-inning, not specific base-out-inning situations.  So managers in my simulated baseball must decide to bring in the Stopper at the beginning of an inning, not after a rally is under way.  I think this is a reasonable assumption for these preliminary simulations.  In fact, this restriction could be considered to be a reflection of the time it takes a closer to warm up in the bullpen (and not wanting a stopper to warm up and then not come into the game).  All simulations were performed over more than 1,000,000 games so the standard errors of the reported win percentages are less than .001.

 

Let?s get to the results.  I have evaluated five different bullpen usage strategies in this toy world.  Only one team in the league has a Stopper in these preliminary simulations, and, of course, I will report the winning percentage of that team.  In all cases, the stopper never pitches the tenth or subsequent innings.

 

Case 1:  Stopper pitches 9th inning if save situation (lead of 1, 2, or 3 runs).

Win Pct = .513 or 2.17 games better than .500 in a 162- game season; stopper pitches 42 innings in a typical 162-game season.

 

Case 2: Stopper pitches 9th inning if save situation or game is tied (lead of 0, 1, 2, or 3 runs).  Win Pct = .521 or +3.39 games; stopper pitches 58 innings per season.

 

Case 3: Stopper pitches 8th and 9th innings if a save situation (lead of 1, 2, or 3 runs).  Win Pct = .525 or +4.09 games; stopper pitches 88 innings per season.

 

Case 4: Stopper pitches 7th, 8th, and 9th innings (two innings max) if game is within one run (lead of ?1, 0, or 1 run).  Win Pct = .539 or +6.33 games; stopper pitches 142 innings per season.  I will comment on this workload below.

 

Case 5: Stopper pitches 7th inning if a 1-run lead, and the 8th and 9th innings if tied or a 1-run lead (two innings max).  Win Pct = .533 or +5.35 games; stopper pitches 92 innings per season.  This last scenario was an attempt to pare some of the innings of case 4 in order to get the stopper?s workload down to that similar to case 3.  Most people believe that a stopper can pitch effectively for 90 or so innings per season, but 140 innings would be beyond the point where his effectiveness would suffer.

 

Comparing Case 1 to the other cases strongly suggests that stoppers should be used beyond 1-inning save situations.  Comparing case 1 to case 5 indicates that 3 or more victories per season may be available to teams who recharacterize the way they use their best relief pitcher.

 

Comparing Case 3 to Case 5 suggests that pitching your stopper in close games, including tie games and games when the team may even be trailing, can increase a team?s win total by around 1-2 games per season.

 

To be honest, I have been a skeptic of Bill James? and TangoTiger?s arguments along these lines.  First, I did not believe that the potential benefit to expanding (recharacterizing) the stopper?s role was significant.  However, these simulation results seem to indicate that expanding the role and shifting innings from non-leveraged to leveraged can boost a team?s win total by 2-3 games.  And, remember, these simulations brought in the stopper at the beginning of each inning, whereas we know that the most leveraged situations are when runners are already on base, and one or two key outs are needed.

 

Second, I believe that the "costs" of making changes to the modern bullpen, though hard to pin down, are much more significant than others have suggested.  I have always put a great deal of importance on having well-defined and fixed roles in bullpens.  Some have argued that the modern closer?s role has become too restrictive, and that other roles could be defined that are more effective (my cases give examples).  Nevertheless I remain skeptical that a bullpen could flourish with such "fluid" roles in the real world.

 

There are other hidden costs of changing bullpen usage, most notably the issue of needing to warm up.  One advantage of the ninth inning save is that the stopper can mentally and physically prepare for coming into the game.  Having the stopper available to come into the game as early as the 7th inning may play havoc on his preparation.

 

In addition, he may have to get warmed up repeatedly in anticipation of coming into the game as early as the seventh inning.  For example, suppose his team is down by two in the bottom of the sixth with two out and runners on second and third.  He?d presumably have to start warming up since if the next batter were to get a hit, the stopper would be called on to enter the game in the top of the seventh.  The point is that the stopper now has to start and stop, so to speak, warming up in the bullpen throughout the late innings of many games.  It is unclear what effect this would have on his availability and his effectiveness. 

 

We can observe that managers have been managing as if they are fearful of over-using their closers.  They prefer the sure-thing of having their ace pitch effectively for 60 or so innings rather than risk lowering his effectiveness over 100 or so innings.

 

In summary, my simulations seem to confirm what the advocates of an expanded role for the modern closer have been arguing.  The potential in additional team victories is significant.  However, there may be hidden costs, of unknown size, accompanying these changes. 

 

Areas for further research include: investigating the team effects of bullpen construction (e.g., how you allocate a given number of innings during a season among your varying quality relief pitchers); investigating these issues at the granularity of base-out-inning situations (since most of the highest leverage situations have runners on base), maybe even the specific batters due up; developing methods to get at the importance of fixed roles and the need to warm up (the "hidden" costs); the effect of pitching your stopper for more than one inning per appearance (both good and bad); etc.

 

Rob Wood Posted: February 04, 2003 at 06:00 AM | 19 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Charles Saeger Posted: February 04, 2003 at 02:29 AM (#608709)
I don't think I would call Case #5 as all that fluid. It's defined, but not in the way we think of it as defined.

IAE, a role where your best reliever protects a 3-run lead for an inning is clearly the wrong role.
   2. bill Posted: February 04, 2003 at 02:29 AM (#608710)
Something else to think about is that I'm not sure teams aren't already "accidentally" doing this. You certainly can't argue that the Alfonseca's, Mesa's, and Roberto Hernandez' of the world are really the best relievers on their teams, at least not in all cases.

Dotel in Houston would be a good case in point. Wagner is the "closer" but it sure looks like Dotel might be the "stopper".
   3. studes Posted: February 04, 2003 at 02:29 AM (#608711)
Very nice, straightforward analysis, Rob. Over the past couple of years, Bobby Valentine's constant rejuggling of his lineup drove fans (and, supposedly, players) crazy. Yet, he was the model of consistency in his use of Benitez. Under BV, Benitez perfectly fit the prototype of a ninth-inning-only closer.

Bobby Valentine is a pretty smart baseball guy, so you may have a point about the need for a closer to have a very fixed role.

However, there is another way to look at this: "flexible" closers are worth much more than "inflexible" ones. If a closer is able to handle close eighth-inning situations when called uon, his value goes up immensely (for a single player).

If I were a major league manager (God forbid!), I would think that this flexibility would be worth the gamble.

A question: Does this make the Koch for Foulke trade even more slanted toward the A's? Or am I wrong in my assumption about their use patterns?
   4. Charles Saeger Posted: February 04, 2003 at 02:29 AM (#608713)
Rob -- what was the W/L/Sv/BSv/Hd/GF numbers for each case?
   5. Kevin Harlow Posted: February 04, 2003 at 02:29 AM (#608714)
1) For games tied after 9 innings, did you continue the simulation or define the chances of winning to be 50%?

2) In your simulations, does the Stopper pitch in the bottom or top (or equal amounts of both) of each inning? I'm guessing just the bottom of the inning, based on Cases 1 & 2. Isn't Case 2 just Case 1 plus the Stopper pitching in the 9th with the score tied? The difference in the results is an extra 3.39-2.17=1.22 wins above average (WAA) over 58-42=16 G or IP. So if the Stopper is pitching only in the bottom of the 9th, his WAA would be about 16*[(.85265-.7053)/2]=1.1788. That's pretty close to your simulated 1.22 WAA.

3) Is the mean and standard deviation of your Stopper runs allowed (RA) distributions reasonable? Have you tried the simulations with different distributions of runs allowed? I would think that Stoppers would have a higher RA variance than verage pitchers. Your assumed distribution for Stoppers has a standard deviation 3 times that of the average pitcher. Is this reflective of reality? Also, you've used an *ERA+ of 200. However, looking at the saves leaders from last season (2002) you have

*ERA+ Name
127 J Smoltz
192 E Gagne
147 M Williams
127 J Mesa
151 E Guardado
142 B Koch
172 R Nen
138 J Jimenez
226 T Percival
148 U Urbina
140 T Hoffman
105 K Escobar
166 K Sasaki
216 B Kim
170 B Wagner
172 A Benitez

, which averages to an *ERA+=159. Given that the *ERA+=159 includes partial innings of 2 and 1 outs, is your assumed *ERA+=200 reasonable for Stoppers pitching a full 3-out inning? It seems a bit high to me, which means the differences in your scenarios will be smaller than given in your analysis.

4) The "typical" current usage (case 2) results in the greatest leverage index (LI) of the cases presented, and likely near-maximal universal LI for a Stopper. However, the goal is not to maximize LI & Saves, the goal is to maximize wins, which is approximated by waa=(ip/9)*(LgERPG-.5*LgERPG)/(2*PF*LgERPG). Clearly case 4 maximizes wins but is quite unrealistic. Rob's case 5 attempts to bring the IP and appearances more in line with a typical number. However, as Rob points out, using your Stopper in the earlier innings will probably result in warming the Stopper up unnecessarily. Since there is likely a negative relation, not accounted for in the article, between times-warmed-up over the course of a season and effectiveness, cases 4 and 5 are probably not as attractive as they seem based on waa.

case = case number, given
IP = innings pitched, given
WAA = wins above average, given
waa/(9IP) = calculated qty, =waa/(ip/9)
waa/58G_est = calculated qty, = waa * (58/G_est)
G_est* = number of estimated games Stopper appeared in.
WAA_SP = wins above average for starting pitcher with *ERA+=200, = (ip/9)*(4.86-.5*4.86)/(2*1.0*4.86)
LI = leverage index = waa/WAA_SP

case ip waa G_est waa/(9IP) waa/58G_Est WAA_SP LI
1 042 2.17 42.0 0.47 3.00 1.17 1.86
2 058 3.39 58.0 0.53 3.39 1.61 2.10
3 088 4.09 58.7 0.42 4.04 2.44 1.67
4 142 6.33 85.2 0.40 4.31 3.94 1.60
5 092 5.35 57.5 0.52 5.40 2.56 2.09

*G_est - specifically, I used the following equations, solving for G:
case 1: IP = 1 * G
case 2: IP = 1 * G
case 3: IP = 1/2 * G + 2* 1/2 * G
case 4: IP = 1/3 * G + 2 * 2/3 * G
case 5: IP = 0.4 * 1 * G + 0.4 * 2 *G + 0.2 * G *2

Conclusion:
The current usage of Stoppers is close enough to optimal that experimentation with Stopper usage when you are a contender is unwarrented. More wins above average is *possible* if you think your Stopper can handle the extra innings and the extra unnecessary warmups. If you think your Stopper can handle cases 4 or 5 then you should go for it. However, there's little evidence that they can. On the other hand, there is plenty of evidence that Stoppers can handle the current usage, best exemplified by case 2. Beyond that type of usage is speculation. Furthermore, the differences (possibly reduced by #3 above) may not be as large as reported and would be easily rendered meaningless when compared to the psychology-physiology calculation error.
   6. Charles Saeger Posted: February 04, 2003 at 02:29 AM (#608715)
Oh, yeah, and games in relief.

He started the inning each time, right? If not, how often did he come in with runners on base in each case?
   7. Kevin Harlow Posted: February 05, 2003 at 02:29 AM (#608717)
I apologize for the table in part 4 not formatting correctly - unfortunately there is no preview when posting a comment to an article. The following table, which goes in my part 4 above, was tested out successfully in a preview for a clutch hits article. Hopefully it will now post here correctly:

case ip waa G_est waa/(9IP) waa/58G_Est WAA_SP LI
1 042 2.17 42.0 0.47 3.00 1.17 1.86
2 058 3.39 58.0 0.53 3.39 1.61 2.10
3 088 4.09 58.7 0.42 4.04 2.44 1.67
4 142 6.33 85.2 0.40 4.31 3.94 1.60
5 092 5.35 57.5 0.52 5.40 2.56 2.09
   8. Rob Wood Posted: February 05, 2003 at 02:29 AM (#608718)
Thanks for the good comments. I definitely think that this issue (set of issues) is worthy of analysis and our attention. Anyway, here are some replies to people's questions.

I have always believed that some relievers have the mental make up to be closers and some do not. A real short memory is probably the crucial attribute.

I did not keep track of the stopper's stats that Charlie asked about. W/L/Sv/BSv/Hd/G/GF. And, yes, my stoppers began each inning, so never entered in the middle of an inning.

When a game went into extra innings, I played it out with the simulation. However, since I played so many games, I imagine that this is essentially calling it a 50% chance of winning each game.

Half the games were home games and half the games were road games. So, yes, Kevin raises a good point about the relative lack of save opportunities for a stopper at home.

Tango also wondered about the runs allowed distribution I used for the stopper. He has a neat program on his website that generates a representative runs distribution for any average. I used his program, which led to slightly different runs at the tails, and re-ran the simulations. The results still stand, though the win pcts were different by about .001 or .002 from what I reported above.

The ERA+ of 200 is the extreme case, though several relievers have accomplished that feat over the years. Think of it merely as a milestone figure and not taken to be representative of all stoppers.

Kevin, your table still didn't format properly. If it wouldn't be too much trouble, could you summarize the results for us.

Dave raises another interesting point. Right now stoppers are paid to rack up saves and they use the save stat in negotiations. It will not be easy to convince the stopper to give up tons of money "for the good of the team".

Thanks again.
   9. tangotiger Posted: February 05, 2003 at 02:29 AM (#608720)
I think the article itself, and all the comments that followed, is some of the best, complete and most straightforward thing that I've read on this subject.

I think the hardest part to quantify is the effectiveness of a fireman's workload, with respect to warming up, and going 100+ innings in a season. As for reliever mindset, you just need to create more stats to appease them, like "fires put out" or some such. Every player (or even person) wants to be appreciated to the extent of their efforts, I'd guess.
   10. Mike Posted: February 05, 2003 at 02:29 AM (#608726)
This is a great piece of work, which sheds some much needed light on relief-pitcher usage. A few thoughts:

1) The analysis openly assumes that "stoppers" give up one-half the league average of runs. Most teams, even the crappy ones, have a pitcher or two who has a superficially pretty ERA and who, on the surface, looks like a stopper. But what creates the "stopper" halo - the relief pitcher's actual abilities or the manager's usage patterns? The closer will rarely if ever appear in a blow-out, while the middle guys will have to stay in, get fragged, and take a hit to their ERAs. The modern closer also rarely stays in more than an inning, so he can throw all the gas he wants during that inning without worrying about what he's going to throw after. Where I'm going with this is that if you take a fake "stopper" and start using him like the other relievers, his effectiveness is liable to dwindle accordingly and he may start looking just like the other relievers.

2) To make a sweeping assertion without any evidence to back it up, I posit that there are very few true stoppers out there. If you have a guy like Mariano Rivera or Goose Gossage, your model holds: use the guy in high leverage situations whenever they arise, rather than just the 9th inning. But what if you don't? If you're a Red Sox fan, do you want Urbina pitching in the eighth inning of a tied game?

3) Perhaps a better line of inquiry would be how to handle the pitching staff if you don't have a true stopper, since this is the case with the vast majority of teams. Maybe the 9th-inning-only guy model works (well, sort of - if you like having Urbina-induced heart attacks). Or maybe the Theo Bullpen Experiment of 2003 will shed new light on this matter.
   11. Mike Posted: February 05, 2003 at 02:29 AM (#608728)
This is a great piece of work, which sheds some much needed light on relief-pitcher usage. A few thoughts:

1) The analysis openly assumes that "stoppers" give up one-half the league average of runs. Most teams, even the crappy ones, have a pitcher or two who has a superficially pretty ERA and who, on the surface, looks like a stopper. But what creates the "stopper" halo - the relief pitcher's actual abilities or the manager's usage patterns? The closer will rarely if ever appear in a blow-out, while the middle guys will have to stay in, get fragged, and take a hit to their ERAs. The modern closer also rarely stays in more than an inning, so he can throw all the gas he wants during that inning without worrying about what he's going to throw after. Where I'm going with this is that if you take a fake "stopper" and start using him like the other relievers, his effectiveness is liable to dwindle accordingly and he may start looking just like the other relievers.

2) To make a sweeping assertion without any evidence to back it up, I posit that there are very few true stoppers out there. If you have a guy like Mariano Rivera or Goose Gossage, your model holds: use the guy in high leverage situations whenever they arise, rather than just the 9th inning. But what if you don't? If you're a Red Sox fan, do you want Urbina pitching in the eighth inning of a tied game?

3) Perhaps a better line of inquiry would be how to handle the pitching staff if you don't have a true stopper, since this is the case with the vast majority of teams. Maybe the 9th-inning-only guy model works (well, sort of - if you like having Urbina-induced heart attacks). Or maybe the Theo Bullpen Experiment of 2003 will shed new light on this matter.
   12. Marc Posted: February 05, 2003 at 02:29 AM (#608729)
This is great work, and certainly an area where some creative thinking is needed.

The most questionable assumption, as others have pointed out, is first that the stopper will have an ERA+ of 200, and second that he will maintain 200 regardless of which of the usage scenarios occurs.

How do the numbers change if the stopper is only at 175, or if he is at 200 unless used more than X number of innings per week or some other threshold and then he is at 150? Would that substantially the relative value in the various scenarios, or merely reduce them all more or less equally?
   13. Rob Wood Posted: February 06, 2003 at 02:29 AM (#608732)
Good discussion. Two quick points. Managers and relievers probably have some idea of the impact of warming up. Warming up multiple times in a game, and how much warming up in a game and not coming in takes out of a reliever compared to coming into the game. Anyway, this information, if only anecdotal at this point, should be investigated so that the analytical community can utilize it for the better good.

Second, regarding the relationship between a relief pitcher's IP and ERA, again experienced managers probably have a good idea. Clearly the ERA is going to eventually soar with more workload. Managers realize this since they don't use their best relievers every day and they limit his innings per appearance too. Of course, this is all related to the pitcher-abuse issue, but I don't know if PAP studies or articles have been applied to relievers.

Thanks again.
   14. tangotiger Posted: February 06, 2003 at 02:30 AM (#608736)
I think Dave is on the money here.
   15. TeddyA Posted: February 06, 2003 at 02:30 AM (#608737)
I agree that Dave is right; after all the change in strategy leads to only a relatively small expected improvement in wins. A manager who uses a "fireman" strategy that leads to 4 games that would have been losses with a "closer" strategy but one game in which a third tier reliever blows a ninth inning lead might still be perceived as a bonehead.

That's why I think it will be a GM run team (Oakland/Boston/Toronto) that institutes a change in bullpen Conventional Wisdom.

My thought on the optimal bullpen strategy is to use Rob's Case 4 but construct your roster so that you have 2 pretty good, cheap "stoppers" and have them split the 142 innings or so by alternating days. Then have 2 reliever and 2 5th starter/long relievers pitch the remaining relief innings giving only 10 pitchers.

An underlooked aspect of bullpen management is that the roster spots that the variety of relief specialists occupy are highly valuable. A good pinch hitter is worth a few games a year minimum.
   16. Rob Wood Posted: February 06, 2003 at 02:30 AM (#608742)
I ran what David Smyth called Case 6. Stopper pitches only in the ninth inning when his team is up by 1 or 2, or if game is tied, or if his team is trailing by a run.

Results are very similar to case 2 above where stopper pitched only the ninth inning when his team is up by 1, 2, or 3 runs, or if game is tied.

In case 6 stopper pitches 62.5 innings per 162-games, and team win pct is .521.

I agree with David (and others) point that the leverage insights are undoubtedly known to major league managers. But they have evolved the role of the stopper to its present form, largely a ninth-inning save situation role exclusively. David concludes from these two observations that the hidden costs of significantly changing the closer's role must be large.

Another explanation is that managers are not as bright as they should be, or that managers believe that the hidden costs are large (but they aren't), or that managers do not fully appreciate the added benefit of shifting the closer's innings around due to leverage.

As I said, when I started thinking about these issues, I was firmly in David's camp. But now I am not so sure.

In addition, didn't Casey Stengel often use his best pinch hitters and relief ace much earlier in the game than other managers? I distinctly remember stories about Phil Rizutto being pinch hit for late in his career in the early innings (which Rizutto hated with a passion). I may be remembering how he used Joe Page in 1949 during Page's best year. However, Page never was the same thereafter and some have blamed his "over-use" during 1949 as the cause.
   17. Mike Emeigh Posted: February 07, 2003 at 02:30 AM (#608745)
The closer will rarely if ever appear in a blow-out, while the middle guys will have to stay in, get fragged, and take a hit to their ERAs.

I wanted to note that this statement is untrue. Managers often use their designated closer in the last inning of a blowout game (either way) in order to keep the closer sharp, especially if he hasn't faced a save situation in the last couple of games. Most teams don't have 65 save situations a year, and reserving the closer exclusively for those situations on some teams might mean that the closer goes a week or more without pitching.

The standard usage pattern seems to be that the closer come in to start the last inning (a) in a save situation, (b) whenever the team has a late lead of any size and the closer hasn't pitched in a few days, or (c) in a blowout loss when the closer hasn't pitched in a few days. The closer will also frequently pitch the 10th inning of an extra-inning tie game when his team is at home since the home team will no longer have a save situation for him.

-- MWE
   18. TeddyA Posted: February 07, 2003 at 02:30 AM (#608746)
Mike,

Although the closer is used in non-save situations, he is rarely used in situations that are likely to be disadvantageous to his personal stats: he is not overused (unless it's the 16th inning), he gets to pitch in blowouts to stay sharp, etc. Other relievers are used willy-nilly including having to eat up innings when they're getting shelled.

Despite the closer's advantages, most "elite" closers only have an ERA+ of 160.

I'm not sure it's worth manipulating your bullpen usage to maximize the value or most "stoppers."
   19. Rob Wood Posted: February 10, 2003 at 02:30 AM (#608753)
I think this answers David's question. If I make the stopper in Case 1 the ideal pitcher, giving up absolutely no runs in ninth-inning save situations, the team's win pct is .527. This is a lesser win pct than the high-leverage high-usage cases. So I guess this is another way of saying the same thing people have been saying.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
TedBerg
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.5766 seconds
66 querie(s) executed