Beyond the Favorite Toy
Can the Favorite Toy become the Favorite Tool?
For as long as there have been baseball players, baseball fans,
and readily-available statistics, fans have speculated about the future
statistics of their favorite players. Special interest has long centered
on round numbers such as 500 home runs, 3000 hits, and 300 wins. In the
past few years, with Hank Aaron’s home run record seemingly at risk,
increasing interest has focused on the question of which, if any, of
the current crop of stars will break that record.
One well-known method for addressing these sorts of questions is the Favorite Toy, which was developed by Bill James and presented in his annual Baseball Abstracts during the 1980s. To estimate the probability of a player reaching a certain career total of say, home runs, one begins by computing that player’s established level. This established level is computed via the formula EL = (3*Y0 + 2*Y1 + Y2)/6, where Y0 is the total for the most recent year, Y1 is the total for the previous year, and Y2 is the total for the year before that. One then estimates the number of years remaining in the player’s career via the formula
YR = max(0.6*(40-age),1.5). Thus a 20-year-old player is estimated to
have 12 years remaining, and no active player is estimated to have fewer
than 1.5 years remaining. The player’s expected remaining home run total
is then the product EXP = EL*YR. The probability P that the player in fact
hits X additional home runs in his career is estimated via P = EXP/X - 0.5,
where P is of course restricted to the interval [0,1].
Example 1: Player A has 200 home runs through his age 26 season and has
hit exactly 40 home runs in each of the past three years. We thus compute
that his established home run level is EL = 40. We estimate his remaining
years as YR = 0.6(40-26) = 8.4 years. His expected remaining home run total
is thus 40*8.4 = 336. Since he needs 300 additional home runs to reach 500,
his chance of reaching 500 career homers is estimated as
P = 336/300 - 0.5 = 0.62.
This Favorite Toy formula has a number of nice properties. It
completely specifies an estimated distribution for the player’s remaining
home run total, and this distribution can be used to find upper bounds,
lower bounds, and confidence intervals of any desired level. Unfortunately,
this estimated distribution is entirely contained in the interval
[EXP/1.5, 2*EXP], which is unrealistically short. Consequently, as we will
see later on, confidence intervals produced from the formula are much too
short to achieve their nominal coverage probabilities.
Example 2: Using player A of Example 1, let us construct a 90 % confidence
interval for the player’s career home run total. One such interval is the
interval between the value that the player will exceed 95 % of the time and
the value that he will exceed only 5 % of the time. To find the former,
one solves 0.95 = 336/X - 0.5, getting X = 336/1.45 = 232. To find the
latter, one solves 0.05 = 336/X - 0.5, getting X = 336/0.55 = 611. Thus
the interval (232, 611) is a 90 % confidence interval for the player’s
remaining homers, and (432,811) is a 90 % confidence interval for the
player’s career total.
In this article, we develop a series of formulas, one for each age,
which produce confidence intervals which do achieve their nominal coverage
probabilities. We then apply those formulas to produce confidence intervals
for prominent home run hitters active in the 2002 season.
Development of the Formulas:
Postponing for the moment an examination of the accuracy of the
Favorite Toy formula, we first develop a set of formulas to compete with
the Favorite Toy in predicting future home run totals and forming confidence
intervals. A natural approach to take here is that of trying to find
formulas which fit baseball’s actual historical record. With this goal in
mind, the database from www.baseball1.com was downloaded, and a dataset
was produced that contained for almost every player-season in baseball
history the following variables: name, age (as of 12 AM on July 1), home
runs in that season and each of the previous 4 years, and actual remaining
career home run total. Those entries corresponding to players who are still
active or who hit 0 homers in the year in question were then eliminated
from the dataset.separately for each age, remaining career home run totals were
regressed against the home run totals in the current and the previous 4
seasons. The previous home run totals were then removed from the model
chronologically until only those variables which had positive coefficients
and made a genuine contribution to predicting the remaining career home run
totals were left. The square of homeruns was also considered as a variable,
but it was in no case found to be useful. The prediction formulas which were
obtained were of the form EXP = c0*Y0 + c1*Y1 + c2*Y2, where the coefficients
are as tabled below:
Table 1:
Age c0 c1 c2
19 21.47
20 16.21
21 11.68 3.01
22 9.51 1.76
23 7.88 1.87
24 5.98 2.47
25 4.43 1.91 1.35
26 3.93 1.43 1.03
27 3.51 1.07 1.01
28 3.04 1.03 0.90
29 1.94 1.53 0.80
30 2.46 1.17
31 2.58 0.74
33 1.71 0.83
34 1.85 0.43
35 1.60 0.37
36 1.74
37 1.54
38 1.46
39 1.14
40 0.82
41 0.47
What these prediction formulas suggest is that only the home run
totals in the most recent year and the two previous years are relevant in
predicting remaining home runs. For very young and very old players, only
the most recent year’s home run total is important. Most of these prediction
formulas account for between 60 % and 70 % of the variability in remaining
home run totals, with the formulas for very old and very young players
explaining somewhat less of the variability.
Example 3: We use these formulas to compute the expected remaining home run
total of Player A of Example 1. We have that Y0 = Y1 = Y2 = 40, and the
player is age 26. Thus EXP = 3.93*40 + 1.43*40 + 1.03*40 = 256, and we would
expect player A to have a career total of 200 + 256 = 456 home runs. This
may be contrasted with the Favorite Toy’s estimate of 536 career home runs.
Given these formulas, we would now like to derive confidence
intervals for future home run totals. To do this, we computed for each
player-season the difference between the true number of remaining home runs
and the predicted value. We then regressed the absolute values of these
differences on the predicted values, obtaining a linear formula giving the
magnitude for what we might call a typical error. The motivation for this
step was the feeling, confirmed by graphs, that the magnitude of the errors
produced by the prediction formulas depended on the size of the predicted
values. These linear formulas were of the form TE = d0 + d1*EXP, where d0
and d1 were as given below:
Table 2:
Age d0 d1
19 49.9 0.45
20 53.5 0.25
21 39.2 0.26
22 28.7 0.35
23 22.0 0.36
24 16.3 0.39
25 11.9 0.43
26 9.2 0.43
27 6.1 0.46
28 6.5 0.45
29 4.7 0.50
30 3.9 0.54
31 3.2 0.56
32 2.3 0.59
33 2.3 0.61
34 2.8 0.56
35 1.7 0.64
36 1.2 0.63
37 1.8 0.63
38 2.1 0.60
39 2.1 0.50
40 1.5 0.55
41 0.9 0.79
We then computed for each player-season the ratio between the
actual error and the typical error TE. Within each age group, we found
the 0.05, 0.10, 0.25, 0.75. 0.90, and 0.95 quantiles of the distribution
of these ratios. These quantiles, tabled below, were then used as the
coefficients for confidence intervals.
Table 3:
Age q(.95) q(.90) q(.75) q(.25) q(.10) q(.05)
19 3.92 2.57 1.00 -0.36 -0.68 -0.88
20 3.89 2.44 0.84 -0.33 -0.70 -1.00
21 4.06 2.13 0.69 -0.38 -0.73 -0.99
22 3.76 2.21 0.55 -0.47 -0.81 -1.08
23 3.49 1.96 0.51 -0.56 -1.00 -1.28
24 3.32 1.91 0.50 -0.58 -1.03 -1.25
25 3.01 1.76 0.44 -0.71 -1.08 -1.32
26 2.74 1.61 0.32 -0.77 -1.17 -1.37
27 2.40 1.40 0.24 -0.88 -1.23 -1.42
28 2.45 1.43 0.21 -0.88 -1.25 -1.44
29 2.39 1.33 0.19 -0.90 -1.25 -1.42
30 2.23 1.27 0.17 -0.94 -1.25 -1.38
31 2.25 1.29 0.09 -0.94 -1.26 -1.36
32 2.29 1.22 0.06 -1.01 -1.26 -1.35
33 2.22 1.20 0.09 -0.95 -1.25 -1.36
34 2.01 1.13 0.04 -0.95 -1.24 -1.38
35 2.10 1.20 -0.07 -1.01 -1.22 -1.30
36 2.23 0.96 0.01 -1.01 -1.23 -1.35
37 2.52 1.16 -0.02 -0.96 -1.22 -1.30
38 2.62 1.32 0.07 -0.92 -1.13 -1.32
39 1.83 1.47 0.12 -0.91 -1.26 -1.39
40 3.15 1.12 0.09 -0.86 -1.09 -1.32
41 2.73 1.23 -0.36 -0.58 -0.86 -1.03
To find a confidence interval for the remaining home run total for
a player using this method, one proceeds in the following way, making sure
to use the row in each table corresponding to the age of the player in the
most recent season:
1. Using the coefficients in Table 1, find the expected remaining home run
total EXP for the player.
2. Using the coefficients in Table 2, find the typical error TE = d0 + d1*EXP.
3. For a 90 % confidence interval, use the interval
(EXP + TE*q(0.05), EXP + TE*q(0.95)). For an 80 % confidence interval,
use the interval (EXP + TE*q(0.10), EXP + TE*q(0.90)). For a 50 % confidence
interval, use the interval (EXP + TE*q(0.25), EXP + TE*q(0.75)).
To find a confidence interval for a player’s career total, simply
add his current career total to each coordinate of the confidence interval
for his remaining home runs.
Example 4: We compute a 90 % confidence interval for the remaining home
runs of player A of Example 1. We saw in Example 3 that EXP = 256. Thus
the typical error, using Table 2 and the fact that player A is 26, is
TE = 9.2 + 0.43*256 = 119. Since q(0.05) = -1.37 and q(0.95) = 2.74 for
age 26, our confidence interval goes from 256-1.37*119 = 93 to
256+2.74*119 = 582. Our 90 % confidence interval for his career total is
then (293, 782). This may be contrasted with the interval (432, 811)
produced by the Favorite Toy.
These intervals have, by construction, exactly the confidence
levels specified, up to rounding error. That is, if one computes 90 %
confidence intervals for all of the players of some particular age, then
90 % of those intervals will contain the true number of remaining home runs
for the players in question. As long as players continue to age in roughly
the same way, we can expect our confidence intervals for active and future
players to be almost as good.
Comparing Coverage Probabilities:
For each player-season in the dataset, 90%, 80%, and 50% intervals
based on the Favorite Toy method were computed. The proportions of those
intervals that contained the true values were as tabled below:
Table 4:
Nominal Coverage Probabilities:
Age 90 % 80 % 50 %
19 0.158 0.126 0.084
20 0.214 0.206 0.122
21 0.222 0.180 0.113
22 0.285 0.261 0.158
23 0.275 0.248 0.151
24 0.262 0.233 0.148
25 0.260 0.234 0.156
26 0.262 0.238 0.148
27 0.256 0.235 0.145
28 0.227 0.208 0.121
29 0.204 0.186 0.125
30 0.204 0.185 0.121
31 0.193 0.173 0.111
32 0.201 0.178 0.118
33 0.175 0.168 0.109
34 0.179 0.166 0.100
35 0.171 0.147 0.100
36 0.196 0.172 0.116
37 0.206 0.180 0.118
38 0.217 0.199 0.111
39 0.246 0.225 0.155
40 0.154 0.121 0.099
41 0.109 0.091 0.036
We see from the table that for no age does the nominal 90 %
confidence interval achieve a coverage probability exceeding 30 %, and for
no age does the nominal 50 % confidence interval achieve a coverage
probability exceeding 16 %. The obvious conclusion here is that the
Favorite Toy, while easy to work with, does not produce anything even
approximating a reasonable estimate of the probabilities it attempts to
explore.
Confidence Intervals for Active Players:
Given below are confidence intervals, produced by the new method,
for career home runs totals for the active players who, after the 2002
season, were predicted by the new system to finish with more than 300 home
runs. Thus, for example, Barry Bonds was predicted to finish with 684
home runs. He was estimated to have a 50 % chance of finishing with 639
to 683 home runs, and a 90 % chance of finishing with 623 to 801 home runs.
The intervals are quite wide for younger players, but history suggests
that they need to be.
Player Age EXP 90 % Conf. 50 % Conf.
Barry Bonds 37 684 ( 623 , 801 ) (639 , 683)
Alex Rodriguez 26 639 ( 425 , 1065 ) (519 , 688)
Sammy Sosa 33 636 ( 519 , 826 ) (554 , 644)
Rafael Palmeiro 37 556 ( 500 , 666 ) (514 , 555)
Fred McGriff 38 522 ( 484 , 596 ) (496 , 524)
Jim Thome 31 504 ( 370 , 726 ) (412 , 513)
Ken Griffey 32 503 ( 472 , 554 ) (480 , 504)
Albert Pujols 22 459 ( 282 , 1079 ) (382 , 550)
Vladimir Guerrero 26 456 ( 298 , 773 ) (367 , 493)
Jeff Bagwell 34 454 ( 393 , 543 ) (412 , 456)
Andruw Jones 25 454 ( 285 , 837 ) (363 , 510)
Juan Gonzalez 32 450 ( 411 , 516 ) (421 , 451)
Manny Ramirez 30 439 ( 338 , 603 ) (370 , 452)
Mike Piazza 33 433 ( 359 , 555 ) (381 , 438)
Frank Thomas 34 430 ( 384 , 495 ) (398 , 431)
Troy Glaus 25 423 ( 251 , 814 ) (330 , 480)
Gary Sheffield 33 413 ( 349 , 516 ) (368 , 417)
Shawn Green 29 410 ( 278 , 631 ) (326 , 427)
Matt Williams 36 395 ( 376 , 427 ) (380 , 395)
Ellis Burks 37 394 ( 352 , 477 ) (363 , 394)
Larry Walker 35 391 ( 342 , 469 ) (353 , 388)
Andres Galarraga 41 390 ( 386 , 402 ) (388 , 389)
Carlos Delgado 30 389 ( 289 , 550 ) (321 , 401)
Eric Chavez 24 387 ( 229 , 807 ) (314 , 451)
Mo Vaughn 34 373 ( 332 , 433 ) (345 , 374)
Greg Vaughn 36 366 ( 352 , 388 ) (356 , 366)
Todd Helton 28 365 ( 240 , 579 ) (289 , 384)
Jason Giambi 31 362 ( 256 , 538 ) (288 , 369)
Chipper Jones 30 361 ( 275 , 501 ) (303 , 372)
Ron Gant 37 348 ( 323 , 396 ) (329 , 347)
Miguel Tejada 26 338 ( 202 , 609 ) (262 , 370)
Alfonso Soriano 24 338 ( 182 , 751 ) (265 , 400)
Tino Martinez 34 337 ( 292 , 403 ) (306 , 339)
Lance Berkman 26 336 ( 185 , 639 ) (251 , 372)
Robin Ventura 34 334 ( 285 , 406 ) (300 , 335)
Jeff Kent 34 331 ( 267 , 424 ) (287 , 333)
Raul Mondesi 31 327 ( 256 , 444 ) (278 , 332)
Richie Sexson 27 326 ( 200 , 540 ) (248 , 348)
Scott Rolen 27 326 ( 211 , 519 ) (255 , 345)
Adam Dunn 22 326 ( 189 , 803 ) (266 , 396)
Luis Gonzalez 34 325 ( 262 , 417 ) (282 , 327)
Magglio Ordonez 28 325 ( 202 , 535 ) (250 , 343)
David Justice 36 324 ( 306 , 354 ) (311 , 324)
Pat Burrell 25 322 ( 170 , 668 ) (240 , 372)
Ryan Klesko 31 321 ( 243 , 450 ) (267 , 326)
Tim Salmon 33 321 ( 275 , 396 ) (289 , 324)
Brian Giles 31 313 ( 214 , 479 ) (244 , 320)
Tony Batista 28 313 ( 202 , 502 ) (245 , 329)
Jim Edmonds 32 305 ( 235 , 423 ) (252 , 308)
Eric Karros 34 301 ( 273 , 340 ) (282 , 301)
A Final Note:
Because the coefficients recorded in Tables 1, 2, and 3 were derived
based on working with home run totals, they are almost certainly not
appropriate coefficients to use for other statistics such as hits or stolen
bases. The method by which these coefficients were derived should,
however, be applicable to other statistics.
Jesse Frey
Posted: September 18, 2003 at 06:00 AM |
17 comment(s)
Login to Bookmark
Related News:
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Michael Posted: September 19, 2003 at 02:41 AM (#612989)What I mean is that there is another analysis of The Favorite Toy in the May issue of "By The Numbers" (the newsletter of the SABR Statistical Analysis Committee) which came out just this month.
http://www.philbirnbaum.com/btn2003-05.pdf
It's by Shane Holmes and it starts on Page 15 of the above document. Holmes's study is on the overall class of players with established chances, and he concludes that over the entire sample, TFT does a pretty good job of predicting the total numbers of players who will reach the 3,000 hit and 500 homer benchmarks.
I'll give you the same reservations with this system as I give with PECOTA and with accuracy tests for run estimators, and virtually everything else out there:
You need to have data that is drawn from outside the sample to test the equations derived from the sample.
This is most easily done by selecting players by year of birth, and throw all odd-year players into 1 pool, and even-year players into the other pool. Derive your equations from 1, and test against the other. Or, randomly choose 80% of the players for your sample, and test against the other 20%.
That said, it's a refreshing new way to look at the issue.
I did try using current career total as a predictor variable, but it didn't seem to be very useful. In general, since I wanted to avoid overfitting the data at the expense of prediction power, I didn't include a variable unless the evidence was overwhelming that the variable was important for prediction.
Tango,
I agree with you that some kind of cross-validation would be good. I should mention as a point in my defense, though, that I didn't just seek out the best possible fit. I insisted that the evidence in favor of using a variable be very strong before I used it, and I made certain that the variables and coefficients were consistent in certain ways. For example, I insisted that the coefficients for each age be largest for the most recent season and get progressively smaller.
Gerry,
I would like to be able to predict probabilities of reaching particular landmarks. To do that well, though, I will likely need a complete model instead of just a table of quantiles (Table 3). I'm working on it.
Jesse
It's entirely possible that I misunderstood how you were using the historical information, but if I'm not, then won't your predictions be a bit on the low side?
I appreciate your comments, and I acknowledge that there is a lot of room for improvement in the prediction method. I think, though, that you may be underestimating it in some ways. The dataset used in determining all the coefficients contains players who went through injuries (Mark McGwire is one who came back strong), players who changed ballparks, and players who were active during league-wide changes in home run rates. As a result, these possibilities are not ignored, but show up as additional width for all the prediction intervals. Since this additional width shows up only in a average sense for all players, though, you are certainly right that extra knowledge we have about specific players like Helton and Griffey could help improve the predictions for those players.
Your idea about modeling AB and HR rate separately is interesting.
Bill James, in his "Breaking The Wand" essay in the 1988 Abstract, talked about the "Dear Jackass" letters he used to write. Once in a while, he said, people would write him demanding that he do something that they were interested in, or complaining that he was doing things they weren't interested in. He would respond with the "Dear Jackass" letter, which said (paraphrasing) "Dear Jackass, if you want something done why don't you do it yourself. I work on what interests me, please don't bother me telling me what to do."
Terry, you're being a jackass.
Did you search for other variables? Seems to me that "Home Run decay rates" might be related to the "type" of player the hitter is. Perhaps their historical w/k ratio, OBP, AB/season, or perhaps position played may have a stronger impact than simple age-relatd adjustments. A Dave Kingman type who has no average or position skill is going to exit sooner than a middle infielder. Adjustments by position would show players like A-Rod to have more probability of hitting the upper bands. Without discvering another factor, I doubt you can get any more accuracy out of this. It is a toy, but if you find a predictive relationship with other stats, you may accomplish something significant.
Tinkerers can examine players with injuries by using their last two full seasons in the formula, where exceptions are obvious, like Griffey. Wonder how McGquire would fit into such a model in 1998?
The version of the Favorite Toy that I used was one that I found in several places on the internet. I wasn't aware of the version that you describe. It may be better for older players, but I believe that the main difficulty with the FT, namely overly short confidence intervals, would remain.
The 684 that I list as the expected total for Bonds comes from the new method given in the article and not from the Favorite Toy, which would indeed have a higher estimate. I probably shouldn't have used the same name EXP for both estimates.
I didn't search very seriously for other variables. I agree that incorporating position and other variables might help in shortening the intervals.
Terry,
You seem to be under the impression that I am somehow on staff at Baseball Primer. I'm not. The editor was simply kind enough to give me a chance to, as described in the submission policy for the Visitor's Dugout, "work through an idea with some intelligent baseball fans."
I had fun writing this article, and I can't help smiling when you write about "hard, exacting, time-consuming work."
Since Jesse did not present his system in this forecasting light, but more along the lines of improving a Toy, it should be evaluated on that.
I do agree that you should break up the "playing time" and the "rate" stats into 2, as well as include a park or era adjustment. However, the era adjustment especially will be bothersome, because you wouldn't have predicted what happened in the last 10 years. So, from that standpoint, you are going to increase your margin for error (as well you should).
Any of the Coors hitters has to be evaluated in terms of "when are they going to leave Coors"? And, of course, you have to add a smidge to every other hitter in the league to account for "when will they play half their home games at Coors"? To do that, you need to look at contract status, age, etc.
So, you have to be very very careful what parameters you choose to introduce, and how far you are willing to take them.
Finally, even if Jesse's work is deemed to be "useless" by some, it contains enough original and thoughtful information that it inspires original and thoughtful replies to his work.
And I do commend Terry on at least having the b-lls to include his email adddress with his link. His comments are probably not respected by many, if any, readers here, but at least he didn't post them with a bag over his head.
I would suggest to Terry and others that if you want people to take your words more seriously, that you be more polite in trying to say what you want to say. What ends up happening is that people will ignore you, and will only remember your attitude. If you saw Jesse face-to-face, would you actually speak as you did?
The coefficients for age 32 are 2.14 and 0.79. Thanks for catching that omission.
Thanks for putting my method to the test. Your numbers are right on, and they don't look good for my method. You were even kind enough not to point out that my 50% confidence intervals were correct for only 5 out of the 21 players in your sample. For samples such as the one you selected, my confidence intervals don't quite have their nominal coverage probabilities, and the Favorite Toy, which was presumably designed with these sorts of samples in mind, is at its best.
I will point out that if you expand your sample just slightly, say to include all 45 players who hit at least 20 homers in 1990, the picture changes dramatically. I won't post the list, but the numbers of correct intervals are as follows. The FT has 14 correct 90 % intervals, 13 correct 80 % intervals, and 9 correct 50 % intervals, while the new method has 39 correct 90 % intervals, 32 correct 80 % intervals, and 17 correct 50 % intervals. The average lengths of the intervals over this sample of 45 players were 176, 148, and 83 for the FT, 250, 169, and 69 for the new method.
You must be Registered and Logged In to post comments.
<< Back to main