Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Monday, August 19, 2002

Win Values:  A New Method to Evaluate Starting Pitchers - Part 5

Empirical Data for AL 2000


Part   1: Introduction
  Part   2: Conceptual Framework
  Part 3: High-Level Results

  Part   4: Formulas
  Part 5: Empirical Data for AL 2000
  Part   6: Example: David Wells in AL 2000
  Part   7: Yearly Results for 1978-2001
  Part   8: Top Stars
  Part   9: Concluding Remarks

Empirical Data for AL 2000

In this section I will present examples of the empirical data corresponding   to all the variables entering the Win Values formulas presented above.? For   convenience the AL 2000 will serve as the representative season.? The following   tables will present the empirical distributions using the entire AL 2000 season   as the database.? In the Win Values framework, these distributions are derived   separately for each league and for each season under study.

Table 4:? Win Probs with Average Pitching, AWin(RS,Z), for AL 2000

>

















 


      At the conclusion of the Zth inning


Run Scored (RS)


1


2


3


4


5


6


7


8


9


0


.436


.383


.346


.288


.235


.189


.126


.051


.000


1


.580


.496


.403


.347


.332


.277


.197


.146


.068


2


.597


.574


.535


.463


.371


.318


.296


.260


.204


3


.742


.715


.622


.599


.531


.444


.390


.321


.281


4


.800


.786


.725


.680


.633


.583


.516


.440


.431


5


.875


.840


.810


.788


.692


.646


.641


.615


.551


6


.950


.870


.839


.820


.810


.772


.702


.651


.610


7


.970


.913


.878


.857


.840


.820


.800


.759


.728


8


1.000


1.000


.958


.900


.891


.874


.850


.845


.840


9


1.000


1.000


.975


.965


.955


.945


.923


.880


.860


10


1.000


1.000


1.000


.975


.960


.950


.940


.932


.887


11


1.000


1.000


1.000


1.000


.975


.960


.950


.940


.900


12


1.000


1.000


1.000


1.000


.980


.965


.960


.950


.940


13+


1.000


1.000


1.000


1.000


1.000


1.000


1.000


1.000


1.000

Table 4 reports the empirical win probabilities using the entire AL 2000   season.? The first row of the table presents the probabilities a team will   go on to win the game when it has scored exactly 0 runs at the conclusion   of each inning 1-9, given average pitching.? Of course, based solely upon   runs scored information, before the game starts each team has a .500 expected   win prob.? If a team is scoreless after 1 inning, its win prob falls to .436,   after 2 innings to .383, and so on until it has .000 chance of winning with   0 runs after 9 innings.

Maybe the better way to look at the table is by column, that is by what   inning the game is in.? After 5 innings, a team that has yet to score has   a .235 win prob with average pitching, a team with 1 run has a .332 win prob,   a team with 2 runs has a .371 win prob, and so on all the way up to a 1.000   win prob if the team has scored 13 or more runs.

As described above, these win probabilities are based upon every game in   the entire AL 2000 season.? Accordingly, I treat these probabilities as being   reflective of the win probabilities a team would have with ?average? pitching.

Table 5:? Win Probs using Pitcher?s   Performance, DWin(RA-RS, Z), for AL 2000

>












 


      At the conclusion of the Zth inning


Deficit (RA-RS)


1


2


3


4


5


6


7


8


9


0


.500


.500


.500


.500


.500


.500


.500


.500


.500


1


.410


.390


.380


.365


.344


.274


.226


.179


.000


2


.358


.299


.275


.264


.234


.200


.114


.062


.000


3


.215


.205


.195


.176


.164


.089


.059


.023


.000


4


.146


.130


.115


.107


.075


.051


.030


.015


.000


5


.105


.085


.077


.068


.060


.045


.025


.010


.000


6


.050


.040


.035


.030


.027


.025


.014


.005


.000


7


.025


.020


.015


.010


.005


.000


.000


.000


.000


8+


.000


.000


.000


.000


.000


.000


.000


.000


.000

Table 5 reports the expected win probabilities using the performance of   the starting pitcher under study.? That is, we use both the team?s offensive   run support (RS) and also the runs allowed (RA) the starting pitcher has given   up.? As above, the win probabilities vary by what inning the game is in.

By extensive analysis I have found that the win probabilities are based   largely on the deficit (or lead) the team faces, and that the specific RA,   RS information is not needed.? This allows a more robust estimation of these   probabilities since there are not enough games in any season that have the   exact same score at the conclusion of any specific inning.? But there are   lots of games that have the same deficit.

The first row of the table indicates that the expected win probability of   a team in a game that is tied at the conclusion of any inning is .500.? The   second row indicates how the win probability of a team that is 1-run behind   varies depending upon what inning the game is in.? After 1 inning, the probability   a team can expect to overcome a 1-run deficit (going on to win the game) is   .410; after 2 innings .390, etc.

The columns of the table indicate how the win probability changes with the   deficit[1] to be overcome at the conclusion of a specific   inning.? For example, the 5th column indicates that a 1-run deficit can be   expected to be overcome .344 after five innings, a 2-run deficit .234, a 3-run   deficit .164, etc.

Table 6: ?Could Have Been? Run Scored   Probabilities, Smear(m;RS,Z), for AL 2000

>























 


      Actual Runs Scored (RS); at Z=9


Could Have Scored (m)


0


1


2


3


4


5


6


7


8


0


.338


.130


.059


.033


.015


.010


.008


.005


.002


1


.267


.306


.148


.099


.052


.047


.022


.015


.012


2


.155


.192


.279


.166


.110


.074


.049


.030


.023


3


.105


.153


.198


.264


.179


.116


.082


.062


.045


4


.060


.090


.130


.176


.234


.154


.133


.100


.071


5


.038


.062


.079


.104


.141


.231


.175


.124


.081


6


.020


.029


.053


.068


.114


.162


.209


.157


.136


7


.014


.015


.023


.041


.067


.089


.122


.202


.157


8


.003


.011


.014


.022


.036


.045


.081


.119


.184


9


.000


.006


.007


.010


.023


.031


.045


.069


.089


10


.000


.003


.004


.007


.014


.020


.036


.052


.073


11


.000


.002


.003


.004


.005


.007


.018


.034


.069


12


.000


.001


.002


.003


.004


.005


.008


.014


.022


13


.000


.000


.001


.002


.003


.004


.005


.006


.017


14


.000


.000


.000


.001


.002


.003


.004


.005


.009


15


.000


.000


.000


.000


.001


.002


.002


.003


.005


16


.000


.000


.000


.000


.000


.000


.001


.002


.003


17


.000


.000


.000


.000


.000


.000


.000


.001


.002


18


.000


.000


.000


.000


.000


.000


.000


.000


.000


SUM


1.000


1.000


1.000


1.000


1.000


1.000


1.000


1.000


1.000

Table 6 reports the ?could have been? smearing probabilities for a team?s   possible run support, given that it actually scored a specific number of runs   in the game.? The table is a partial reporting of these smearing probabilities   for the final scores (after 9 innings); there is a different smearing probability   array for each inning.

This table should be viewed by column only.? Each column represents the   probability distribution that the team could have scored any number of runs   (indicated by the row labels), given that it actually scored the number of   runs indicated by the column header.? For example, the first column (with   column header 0) indicates that a team that actually scored 0 runs in a game   ?could have? scored 0 runs with probability .338, 1-run .267, 2-runs .155,   3-runs .105, etc.? The bolded elements correspond to the ?could have? runs   scored being equal to the actual runs scored.

Table 7:? Pct Park Adders, PAddPct(RS), for AL 2000


>














Run Scored (RS)


Pct Park Adder


0


.000


1


.002


2


.004


3


.007


4


.010


5


.009


6


.008


7


.006


8


.004


9


.002


10+


.000

Table 7 shows how the effect of a home park on a team?s win probability   varies by how many runs it scores in a game.? As described above, the entries   were estimated empirically as the per percentage point park effect.? For example,   the Oakland Coliseum had a 97 park factor, indicating that runs scored were   generally 6% less prevalent than in a neutral park.? Thus, for games played   in Oakland the numbers in the table above would be multiplied by 6.? As the table   reflects, parks have a negligible effect on winning when a team is shutout   or scores 10 or more runs.? The park has the most effect when a team scores   around 3-7 runs.




[1]?   The win prob of a team with a lead of X runs is, of course, simply equal to   1 minus the win prob of a team facing a deficit of X runs.

 

Rob Wood Posted: August 19, 2002 at 06:00 AM | 0 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Harry Balsagne, anti-Centaur hate crime division
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.3768 seconds
64 querie(s) executed