Part
1: Introduction
Part 2: Conceptual Framework
Part 3: High-Level Results
Part
4: Formulas
Part
5: Empirical Data for AL 2000
Part
6: Example: David Wells in AL 2000
Part
7: Yearly Results for 1978-2001
Part
8: Top Stars
Part
9: Concluding Remarks
Conceptual Framework
I seek a system that evaluates the pitcher?s runs allowed in a game,
in light of his team?s offensive run support.? An extreme view would be to
?fix? the team?s run support (RS) and perform a pair of operations.? The first
operation would be merely to see how many runs the pitcher in question allowed
(RA), and then see if RA<RS (i.e., whether the team actually won or lost
the game).? The second operation would be to replace the pitcher in question
with league average pitching, and see how often the team would win or lose.?
This second operation brings in probabilities since we would need to consider
the probability that league average pitching allows 0 runs, 1 run, 2 runs,
?, 25 runs.
Under this extreme view, the pitcher?s win contribution would be reflected
in the difference between his team?s actual win or loss (depending upon if
RA<RS or RA>RS) and the probability that the team would have won with
league average pitching.? For illustrative purposes, suppose the pitcher gave
up 2 runs and his team scored 4 runs.? Of course his team won the game.? Suppose
we find that league average pitching would give up 3 or fewer runs 35% of
the time.? Then we could say that the pitcher?s contribution is 0.65 wins
since his pitching turned a situation in which it was 35% likely to result
in a win into a 100% win situation.
The astute reader will likely have several questions at this point.? What
about hypothetical tie games?? What if the pitcher did not pitch a complete
game?? What about the effect of the ballpark?? Isn?t fixing the team?s offensive
support too ?extreme??? I will address these one by one.
Hypothetical tie games in which league average pitching gives up exactly
the same number of runs as the team scored is easily handled.? The simplest
and most straightforward way to handle this situation is to say that the hypothetical
game would go into extra innings, and that the team would have a 50% chance
of winning the game.? I will utilize this approach in what follows.
How do we handle the case of a pitcher pitching less than a complete game??
Suppose the pitcher pitches 7 innings and leaves the game with his team leading
4-2.? Since the game is not yet over, we can?t do our first operation of simply
seeing if the team actually won the game (RA<RS).? Now there is more to
it than that.? Instead of a simple check to see if RA<RS, we need to modify
the method to calculate the probability that a team leading 4-2 at the conclusion
of the seventh inning will go on to win the game.? This may be 70%, say.?
So even for the first operation we need to leave the realm of certainty and
enter the realm of probabilities.? Conceptually calculating these probabilities
is not too difficult, all it requires is tons of data.
The second operation of seeing how many runs league average pitching would
allow can also be modified to take into account the number of innings the
starting pitcher in question pitched.? In the above example, we would like
to know the probabilities of league average pitching allowing any number of
runs, say 0 up to 25, through 7 innings.? We then bring in the probability
machinery described in the paragraph above to assess the probability of the
team winning the game with the score 4-X after 7 innings, where X ranges from
0 to 25.
Incorporating the effect of the home ballpark into the system is also conceptually
straightforward.? All that we would need is to calculate all of these probabilities
in the context of the home ballpark.? The probability of winning a game when
scoring 4 runs is apt to be significantly higher if the game were being played
in the Astrodome than at Coor?s Field.? Unfortunately, there is insufficient
data to estimate all of the required inning-by-inning probabilities for each
major league ballpark in a season.? Instead I have grouped parks into similar
categories and pooled the data for those parks in the same category.? This
approach has allowed me to incorporate park effects into my system.? I will
have more to say about this below.
One of the cornerstones of my new Win Values method is the fact that I explicitly
take into account the run support a pitcher is provided.? However, I don?t
want to go overboard on this front.? Here?s an example why.? Suppose a team
wins a game 6-5.? If I evaluate the starting pitcher?s contribution by ?fixing?
his run support at 6 runs, then any pitcher who gives up fewer than 6 runs
will be deemed to have contributed the same amount to winning the game.? The
question, then, is how within this framework to reflect the notion that a
pitcher who wins a game 6-2 has contributed more to the team win than a pitcher
who wins a game 6-5.
My solution is to do another operation.? This one deals with the team?s
run support.? The reason that we feel the 6-2 game pitcher did ?better? than
the 6-5 game pitcher is because there was no guarantee that the team would
actually score 6 runs that game.? This notion is the kernel of my treatment
of the pitcher?s run support in the game.? Rather than simply take the run
support as a given entity, I seek a distribution of run support for that game.?
Remember that this is analogous to what Michael Wolverton does in his Support-Neutral
Win system.? As the name suggests, he abstracts away the actual run support
provided in the game and uses the same league average run support distribution
for each game (adjusted for park).? I don?t want to go that far for the reasons
described above.
Consider the following stylized example.? Suppose a team is trailing 1-0
in the bottom of the ninth with two outs and a runner on first base.? The
next batter belts a long flyball that may be a home run, may hit off the wall
scoring the runner from first, or may be caught on the warning track by a
speedy outfielder.? As the ball is flying through the air toward the seats,
what are the possible run supports the home team?s starting pitcher may receive
that day?? Well, if it is a home run, he will be provided two runs (and be
a 2-1 winner), if it is off the wall he will be provided one run (and the
game will be tied), and if the ball is caught on the warning track he will
be provided no runs (and be a hard-luck 1-0 loser).? My idea of ?could have
been? run support levels attempts to capture these possibilities.?
Initially I planned on simply assigning probabilities to different run support
levels, centered around the actual run support, using what I thought were
reasonable probabilities to reflect what ?could have? happened in the game.?
For example, if a team scored 6 runs, I might have said that it ?could have?
scored 5 runs with 25% probability, 6 runs with 50% probability, and 7 runs
with 25% probability.
However, after further thinking and a perusal of my college statistics textbook,
I came up with a better method.? The method relies upon Bayesian inference
and partial game scores.? The idea is that a team that scores 7 runs in a
game may well have scored 5 runs, say, at the conclusion of the 6th inning.?
Of course, a team that scores 4 runs in a game will never have scored 5 runs
at the conclusion of the 6th inning.? Thus, the number of runs that a team
scores in partial games (say at the conclusion of the 6th inning) conveys
information as to how many final runs it will score in the game.? Bayes Rule
is the general principle that allows the inference to go either way so that
knowing the final score can convey information on how many runs were likely
to have been scored in partial games.? This is helpful to our notion of ?could
have? scored, since we can then bootstrap the run scoring process going forward
again, say, starting at the top of the 7th inning.?
I know this may be confusing, so I will give an example.? Consider a team
that scores 7 runs in a 9-inning game.? Suppose I find a team that scores
7 runs in a game will have scored 5 runs at the conclusion of the 6th inning
10% of the time.? I also know the distribution of final scores of every team
that scored 5 runs at the conclusion of the 6th inning.? Say they wind up
with 5 runs 12% of the time, 6 runs 20% of the time, 7 runs 25% of the time,
?, and 15 runs 1% of the time.
I can calculate this bootstrapped distribution of final scores for every
possible number of runs scored at the conclusion of the 6th inning.? To find
the ultimate ?could have been? distribution of final scores, I would then
weight these probability distributions of each possible runs scored outcome
by the respective probability of having that many runs scored at the conclusion
of the 6th inning (10% in the case for starting with 5 runs in the example
above).?
The result is a ?smearing? of the run support provided in a game.? For example,
this method may find that a team that actually scored 7 runs ?could have?
scored runs with the following probabilities: 0 runs (1%), 1 run (2%), 2 runs
(4%), 3 runs (6%), 4 runs (7%), 5 runs (9%), 6 runs (12%), 7 runs (15%), 8
runs (10%), 9 runs (8%), 10 runs (7%), 11 runs (6%), 12 runs (5%), 13 runs
(4%), 14 runs (3%), and 15 runs (1%).? I would then use this ?could have been?
smeared probability distribution for the pitcher?s possible run support in
evaluating his outing.
Now that I have answered some questions that you may have had, let me try
to summarize the conceptual approach I take.? I am introducing a method that
evaluates a starting pitcher?s contribution to his team?s chance of winning
the game if the score is RS to RA when he leaves the game at the conclusion
of the Zth inning.? I will first ?smear? the run support based upon RS and
Z using a backwards Bayesian bootstrapping method.? That will give me a probability
distribution that the team could have scored X runs at the conclusion of the
Zth inning, where X ranges from 0 to 25, say.
Next, using the smeared run support distribution, I will estimate the probability
that the team would win a game when giving up RA runs at the conclusion of
the Zth inning.? Then, using the smeared run support distribution, I will
estimate the probability that the team would win this game with league average
pitching.? I then will subtract these two probabilities to derive the pitcher?s
win contribution for that game.? For those readers interested in a mathematical
representation, all the formulas are presented below.
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
You must be Registered and Logged In to post comments.
<< Back to main