Page rendered in 0.2476 seconds
64 querie(s) executed
— Where BTF's Members Investigate the Grand Old Game
Monday, August 19, 2002
Win Values: A New Method to Evaluate Starting Pitchers - Part 2
I seek a system that evaluates the pitcher?s runs allowed in a game, in light of his team?s offensive run support.? An extreme view would be to ?fix? the team?s run support (RS) and perform a pair of operations.? The first operation would be merely to see how many runs the pitcher in question allowed (RA), and then see if RA<RS (i.e., whether the team actually won or lost the game).? The second operation would be to replace the pitcher in question with league average pitching, and see how often the team would win or lose.? This second operation brings in probabilities since we would need to consider the probability that league average pitching allows 0 runs, 1 run, 2 runs, ?, 25 runs.
Under this extreme view, the pitcher?s win contribution would be reflected in the difference between his team?s actual win or loss (depending upon if RA<RS or RA>RS) and the probability that the team would have won with league average pitching.? For illustrative purposes, suppose the pitcher gave up 2 runs and his team scored 4 runs.? Of course his team won the game.? Suppose we find that league average pitching would give up 3 or fewer runs 35% of the time.? Then we could say that the pitcher?s contribution is 0.65 wins since his pitching turned a situation in which it was 35% likely to result in a win into a 100% win situation.
The astute reader will likely have several questions at this point.? What about hypothetical tie games?? What if the pitcher did not pitch a complete game?? What about the effect of the ballpark?? Isn?t fixing the team?s offensive support too ?extreme??? I will address these one by one.
Hypothetical tie games in which league average pitching gives up exactly the same number of runs as the team scored is easily handled.? The simplest and most straightforward way to handle this situation is to say that the hypothetical game would go into extra innings, and that the team would have a 50% chance of winning the game.? I will utilize this approach in what follows.
How do we handle the case of a pitcher pitching less than a complete game?? Suppose the pitcher pitches 7 innings and leaves the game with his team leading 4-2.? Since the game is not yet over, we can?t do our first operation of simply seeing if the team actually won the game (RA<RS).? Now there is more to it than that.? Instead of a simple check to see if RA<RS, we need to modify the method to calculate the probability that a team leading 4-2 at the conclusion of the seventh inning will go on to win the game.? This may be 70%, say.? So even for the first operation we need to leave the realm of certainty and enter the realm of probabilities.? Conceptually calculating these probabilities is not too difficult, all it requires is tons of data.
The second operation of seeing how many runs league average pitching would allow can also be modified to take into account the number of innings the starting pitcher in question pitched.? In the above example, we would like to know the probabilities of league average pitching allowing any number of runs, say 0 up to 25, through 7 innings.? We then bring in the probability machinery described in the paragraph above to assess the probability of the team winning the game with the score 4-X after 7 innings, where X ranges from 0 to 25.
Incorporating the effect of the home ballpark into the system is also conceptually straightforward.? All that we would need is to calculate all of these probabilities in the context of the home ballpark.? The probability of winning a game when scoring 4 runs is apt to be significantly higher if the game were being played in the Astrodome than at Coor?s Field.? Unfortunately, there is insufficient data to estimate all of the required inning-by-inning probabilities for each major league ballpark in a season.? Instead I have grouped parks into similar categories and pooled the data for those parks in the same category.? This approach has allowed me to incorporate park effects into my system.? I will have more to say about this below.
One of the cornerstones of my new Win Values method is the fact that I explicitly take into account the run support a pitcher is provided.? However, I don?t want to go overboard on this front.? Here?s an example why.? Suppose a team wins a game 6-5.? If I evaluate the starting pitcher?s contribution by ?fixing? his run support at 6 runs, then any pitcher who gives up fewer than 6 runs will be deemed to have contributed the same amount to winning the game.? The question, then, is how within this framework to reflect the notion that a pitcher who wins a game 6-2 has contributed more to the team win than a pitcher who wins a game 6-5.
My solution is to do another operation.? This one deals with the team?s run support.? The reason that we feel the 6-2 game pitcher did ?better? than the 6-5 game pitcher is because there was no guarantee that the team would actually score 6 runs that game.? This notion is the kernel of my treatment of the pitcher?s run support in the game.? Rather than simply take the run support as a given entity, I seek a distribution of run support for that game.? Remember that this is analogous to what Michael Wolverton does in his Support-Neutral Win system.? As the name suggests, he abstracts away the actual run support provided in the game and uses the same league average run support distribution for each game (adjusted for park).? I don?t want to go that far for the reasons described above.
Consider the following stylized example.? Suppose a team is trailing 1-0 in the bottom of the ninth with two outs and a runner on first base.? The next batter belts a long flyball that may be a home run, may hit off the wall scoring the runner from first, or may be caught on the warning track by a speedy outfielder.? As the ball is flying through the air toward the seats, what are the possible run supports the home team?s starting pitcher may receive that day?? Well, if it is a home run, he will be provided two runs (and be a 2-1 winner), if it is off the wall he will be provided one run (and the game will be tied), and if the ball is caught on the warning track he will be provided no runs (and be a hard-luck 1-0 loser).? My idea of ?could have been? run support levels attempts to capture these possibilities.?
Initially I planned on simply assigning probabilities to different run support levels, centered around the actual run support, using what I thought were reasonable probabilities to reflect what ?could have? happened in the game.? For example, if a team scored 6 runs, I might have said that it ?could have? scored 5 runs with 25% probability, 6 runs with 50% probability, and 7 runs with 25% probability.
However, after further thinking and a perusal of my college statistics textbook, I came up with a better method.? The method relies upon Bayesian inference and partial game scores.? The idea is that a team that scores 7 runs in a game may well have scored 5 runs, say, at the conclusion of the 6th inning.? Of course, a team that scores 4 runs in a game will never have scored 5 runs at the conclusion of the 6th inning.? Thus, the number of runs that a team scores in partial games (say at the conclusion of the 6th inning) conveys information as to how many final runs it will score in the game.? Bayes Rule is the general principle that allows the inference to go either way so that knowing the final score can convey information on how many runs were likely to have been scored in partial games.? This is helpful to our notion of ?could have? scored, since we can then bootstrap the run scoring process going forward again, say, starting at the top of the 7th inning.?
I know this may be confusing, so I will give an example.? Consider a team that scores 7 runs in a 9-inning game.? Suppose I find a team that scores 7 runs in a game will have scored 5 runs at the conclusion of the 6th inning 10% of the time.? I also know the distribution of final scores of every team that scored 5 runs at the conclusion of the 6th inning.? Say they wind up with 5 runs 12% of the time, 6 runs 20% of the time, 7 runs 25% of the time, ?, and 15 runs 1% of the time.
I can calculate this bootstrapped distribution of final scores for every possible number of runs scored at the conclusion of the 6th inning.? To find the ultimate ?could have been? distribution of final scores, I would then weight these probability distributions of each possible runs scored outcome by the respective probability of having that many runs scored at the conclusion of the 6th inning (10% in the case for starting with 5 runs in the example above).?
The result is a ?smearing? of the run support provided in a game.? For example, this method may find that a team that actually scored 7 runs ?could have? scored runs with the following probabilities: 0 runs (1%), 1 run (2%), 2 runs (4%), 3 runs (6%), 4 runs (7%), 5 runs (9%), 6 runs (12%), 7 runs (15%), 8 runs (10%), 9 runs (8%), 10 runs (7%), 11 runs (6%), 12 runs (5%), 13 runs (4%), 14 runs (3%), and 15 runs (1%).? I would then use this ?could have been? smeared probability distribution for the pitcher?s possible run support in evaluating his outing.
Now that I have answered some questions that you may have had, let me try to summarize the conceptual approach I take.? I am introducing a method that evaluates a starting pitcher?s contribution to his team?s chance of winning the game if the score is RS to RA when he leaves the game at the conclusion of the Zth inning.? I will first ?smear? the run support based upon RS and Z using a backwards Bayesian bootstrapping method.? That will give me a probability distribution that the team could have scored X runs at the conclusion of the Zth inning, where X ranges from 0 to 25, say.
Next, using the smeared run support distribution, I will estimate the probability that the team would win a game when giving up RA runs at the conclusion of the Zth inning.? Then, using the smeared run support distribution, I will estimate the probability that the team would win this game with league average pitching.? I then will subtract these two probabilities to derive the pitcher?s win contribution for that game.? For those readers interested in a mathematical representation, all the formulas are presented below.
? Under this approach, then, we should not use extra inning runs scored in calculating the probabilities of league average pitching allowing a certain number of runs.
? Under that framework, remember, we would be forced to deem the pitcher?s contribution in a 14-2 win to be equal to the contribution in a 3-2 win.
You must be logged in to view your Bookmarks.
What do you do with Deacon White?
(17 - 1:12pm, Dec 23)
Last: Alex King
(15 - 12:05am, Oct 18)
Nine (Year) Men Out: Free El Duque!
(67 - 10:46am, May 09)
Who is Shyam Das?
(4 - 8:52pm, Feb 23)
Last: RoyalsRetro (AG#1F)
Greg Spira, RIP
(45 - 10:22pm, Jan 09)
Last: Jonathan Spira
Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010
(5 - 12:50am, Sep 18)
Mike Morgan, the Nexus of the Baseball Universe?
(37 - 12:33pm, Jun 23)
Last: The Keith Law Blog Blah Blah (battlekow)
Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011
(2 - 8:03pm, May 16)
Last: Diamond Research
Retrosheet Semi-Annual Site Update!
(4 - 4:07pm, Nov 18)
What Might Work in the World Series, 2010 Edition
(5 - 3:27pm, Nov 12)
Last: fra paolo
Predicting the 2010 Playoffs
(11 - 5:21pm, Oct 20)
SABR 40: Impressions of a First-Time Attendee
(5 - 11:12pm, Aug 19)
Last: Joe Bivens, Minor Genius
St. Louis Cardinals Midseason Report
(12 - 12:42am, Aug 10)
Napoleon Lajoie: Definition of Grace
(9 - 12:38am, Jul 01)
Last: Hang down your head, Tom Foley
Youth Baseball Hitting Drills: Shine the Light
(5 - 6:47am, Mar 11)
Last: Pat Rapper's Delight