Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Thursday, September 18, 2003

Beyond the Favorite Toy

Can the Favorite Toy become the Favorite Tool?

For as long as there have been baseball players, baseball fans,
and readily-available statistics, fans have speculated about the future
statistics of their favorite players.  Special interest has long centered
on round numbers such as 500 home runs, 3000 hits, and 300 wins.  In the
past few years, with Hank Aaron’s home run record seemingly at risk,
increasing interest has focused on the question of which, if any, of
the current crop of stars will break that record.

One well-known method for addressing these sorts of questions is the Favorite Toy, which was developed by Bill James and presented in his annual Baseball Abstracts during the 1980s.  To estimate the probability of a player reaching a certain career total of say, home runs, one begins by computing that player’s established level.  This established level is computed via the formula EL = (3*Y0 + 2*Y1 + Y2)/6, where Y0 is the total for the most recent year, Y1 is the total for the previous year, and Y2 is the total for the year before that.  One then estimates the number of years remaining in the player’s career via the formula
YR = max(0.6*(40-age),1.5).  Thus a 20-year-old player is estimated to
have 12 years remaining, and no active player is estimated to have fewer
than 1.5 years remaining.  The player’s expected remaining home run total
is then the product EXP = EL*YR.  The probability P that the player in fact
hits X additional home runs in his career is estimated via P = EXP/X - 0.5,
where P is of course restricted to the interval [0,1].

Example 1: Player A has 200 home runs through his age 26 season and has
hit exactly 40 home runs in each of the past three years.  We thus compute
that his established home run level is EL = 40.  We estimate his remaining
years as YR = 0.6(40-26) = 8.4 years.  His expected remaining home run total
is thus 40*8.4 = 336. Since he needs 300 additional home runs to reach 500,
his chance of reaching 500 career homers is estimated as
P = 336/300 - 0.5 = 0.62.

This Favorite Toy formula has a number of nice properties.  It
completely specifies an estimated distribution for the player’s remaining
home run total, and this distribution can be used to find upper bounds,
lower bounds, and confidence intervals of any desired level.  Unfortunately,
this estimated distribution is entirely contained in the interval
[EXP/1.5, 2*EXP], which is unrealistically short.  Consequently, as we will
see later on, confidence intervals produced from the formula are much too
short to achieve their nominal coverage probabilities.

Example 2: Using player A of Example 1, let us construct a 90 % confidence
interval for the player’s career home run total.  One such interval is the
interval between the value that the player will exceed 95 % of the time and
the value that he will exceed only 5 % of the time.  To find the former,
one solves 0.95 = 336/X - 0.5, getting X = 336/1.45 = 232.  To find the
latter, one solves 0.05 = 336/X - 0.5, getting X = 336/0.55 = 611.  Thus
the interval (232, 611) is a 90 % confidence interval for the player’s
remaining homers, and (432,811) is a 90 % confidence interval for the
player’s career total.

In this article, we develop a series of formulas, one for each age,
which produce confidence intervals which do achieve their nominal coverage
probabilities.  We then apply those formulas to produce confidence intervals
for prominent home run hitters active in the 2002 season.


Development of the Formulas:

Postponing for the moment an examination of the accuracy of the
Favorite Toy formula, we first develop a set of formulas to compete with
the Favorite Toy in predicting future home run totals and forming confidence
intervals.  A natural approach to take here is that of trying to find
formulas which fit baseball’s actual historical record.  With this goal in
mind, the database from www.baseball1.com was downloaded, and a dataset
was produced that contained for almost every player-season in baseball
history the following variables: name, age (as of 12 AM on July 1), home
runs in that season and each of the previous 4 years, and actual remaining
career home run total.  Those entries corresponding to players who are still
active or who hit 0 homers in the year in question were then eliminated
from the dataset.separately for each age, remaining career home run totals were
regressed against the home run totals in the current and the previous 4
seasons.  The previous home run totals were then removed from the model
chronologically until only those variables which had positive coefficients
and made a genuine contribution to predicting the remaining career home run
totals were left.  The square of homeruns was also considered as a variable,
but it was in no case found to be useful.  The prediction formulas which were
obtained were of the form EXP = c0*Y0 + c1*Y1 + c2*Y2, where the coefficients
are as tabled below:

Table 1:

Age c0 c1 c2

19 21.47
20 16.21
21 11.68 3.01
22 9.51 1.76
23 7.88 1.87
24 5.98 2.47
25 4.43 1.91 1.35
26 3.93 1.43 1.03
27 3.51 1.07 1.01
28 3.04 1.03 0.90
29 1.94 1.53 0.80
30 2.46 1.17
31 2.58 0.74
33 1.71 0.83
34 1.85 0.43
35 1.60 0.37
36 1.74
37 1.54
38 1.46
39 1.14
40 0.82
41 0.47

What these prediction formulas suggest is that only the home run
totals in the most recent year and the two previous years are relevant in
predicting remaining home runs.  For very young and very old players, only
the most recent year’s home run total is important.  Most of these prediction
formulas account for between 60 % and 70 % of the variability in remaining
home run totals, with the formulas for very old and very young players
explaining somewhat less of the variability.

Example 3:  We use these formulas to compute the expected remaining home run
total of Player A of Example 1.  We have that Y0 = Y1 = Y2 = 40, and the
player is age 26.  Thus EXP = 3.93*40 + 1.43*40 + 1.03*40 = 256, and we would
expect player A to have a career total of 200 + 256 = 456 home runs.  This
may be contrasted with the Favorite Toy’s estimate of 536 career home runs.

Given these formulas, we would now like to derive confidence
intervals for future home run totals.  To do this, we computed for each
player-season the difference between the true number of remaining home runs
and the predicted value.  We then regressed the absolute values of these
differences on the predicted values, obtaining a linear formula giving the
magnitude for what we might call a typical error.  The motivation for this
step was the feeling, confirmed by graphs, that the magnitude of the errors
produced by the prediction formulas depended on the size of the predicted
values.  These linear formulas were of the form TE = d0 + d1*EXP, where d0
and d1 were as given below:

Table 2:

Age d0 d1

19 49.9 0.45
20 53.5 0.25
21 39.2 0.26
22 28.7 0.35
23 22.0 0.36
24 16.3 0.39
25 11.9 0.43
26 9.2 0.43
27 6.1 0.46
28 6.5 0.45
29 4.7 0.50
30 3.9 0.54
31 3.2 0.56
32 2.3 0.59
33 2.3 0.61
34 2.8 0.56
35 1.7 0.64
36 1.2 0.63
37 1.8 0.63
38 2.1 0.60
39 2.1 0.50
40 1.5 0.55
41 0.9 0.79

We then computed for each player-season the ratio between the
actual error and the typical error TE.  Within each age group, we found
the 0.05, 0.10, 0.25, 0.75. 0.90, and 0.95 quantiles of the distribution
of these ratios.  These quantiles, tabled below, were then used as the
coefficients for confidence intervals.

Table 3:

Age q(.95) q(.90) q(.75) q(.25) q(.10) q(.05)

19 3.92 2.57 1.00 -0.36 -0.68 -0.88
20 3.89 2.44 0.84 -0.33 -0.70 -1.00
21 4.06 2.13 0.69 -0.38 -0.73 -0.99
22 3.76 2.21 0.55 -0.47 -0.81 -1.08
23 3.49 1.96 0.51 -0.56 -1.00 -1.28
24 3.32 1.91 0.50 -0.58 -1.03 -1.25
25 3.01 1.76 0.44 -0.71 -1.08 -1.32
26 2.74 1.61 0.32 -0.77 -1.17 -1.37
27 2.40 1.40 0.24 -0.88 -1.23 -1.42
28 2.45 1.43 0.21 -0.88 -1.25 -1.44
29 2.39 1.33 0.19 -0.90 -1.25 -1.42
30 2.23 1.27 0.17 -0.94 -1.25 -1.38
31 2.25 1.29 0.09 -0.94 -1.26 -1.36
32 2.29 1.22 0.06 -1.01 -1.26 -1.35
33 2.22 1.20 0.09 -0.95 -1.25 -1.36
34 2.01 1.13 0.04 -0.95 -1.24 -1.38
35 2.10 1.20 -0.07 -1.01 -1.22 -1.30
36 2.23 0.96 0.01 -1.01 -1.23 -1.35
37 2.52 1.16 -0.02 -0.96 -1.22 -1.30
38 2.62 1.32 0.07 -0.92 -1.13 -1.32
39 1.83 1.47 0.12 -0.91 -1.26 -1.39
40 3.15 1.12 0.09 -0.86 -1.09 -1.32
41 2.73 1.23 -0.36 -0.58 -0.86 -1.03

To find a confidence interval for the remaining home run total for
a player using this method, one proceeds in the following way, making sure
to use the row in each table corresponding to the age of the player in the
most recent season:

1. Using the coefficients in Table 1, find the expected remaining home run
total EXP for the player.

2. Using the coefficients in Table 2, find the typical error TE = d0 + d1*EXP.

3. For a 90 % confidence interval, use the interval
(EXP + TE*q(0.05), EXP + TE*q(0.95)).  For an 80 % confidence interval,
use the interval (EXP + TE*q(0.10), EXP + TE*q(0.90)).  For a 50 % confidence
interval, use the interval (EXP + TE*q(0.25), EXP + TE*q(0.75)).

To find a confidence interval for a player’s career total, simply
add his current career total to each coordinate of the confidence interval
for his remaining home runs.

Example 4:  We compute a 90 % confidence interval for the remaining home
runs of player A of Example 1.  We saw in Example 3 that EXP = 256.  Thus
the typical error, using Table 2 and the fact that player A is 26, is
TE = 9.2 + 0.43*256 = 119.  Since q(0.05) = -1.37 and q(0.95) = 2.74 for
age 26, our confidence interval goes from 256-1.37*119 = 93 to
256+2.74*119 = 582.  Our 90 % confidence interval for his career total is
then (293, 782).  This may be contrasted with the interval (432, 811)
produced by the Favorite Toy.

These intervals have, by construction, exactly the confidence
levels specified, up to rounding error.  That is, if one computes 90 %
confidence intervals for all of the players of some particular age, then
90 % of those intervals will contain the true number of remaining home runs
for the players in question.  As long as players continue to age in roughly
the same way, we can expect our confidence intervals for active and future
players to be almost as good.

Comparing Coverage Probabilities:

For each player-season in the dataset, 90%, 80%, and 50% intervals
based on the Favorite Toy method were computed.  The proportions of those
intervals that contained the true values were as tabled below:

Table 4:

Nominal Coverage Probabilities:
Age 90 % 80 % 50 %

19 0.158 0.126 0.084
20 0.214 0.206 0.122
21 0.222 0.180 0.113
22 0.285 0.261 0.158
23 0.275 0.248 0.151
24 0.262 0.233 0.148
25 0.260 0.234 0.156
26 0.262 0.238 0.148
27 0.256 0.235 0.145
28 0.227 0.208 0.121
29 0.204 0.186 0.125
30 0.204 0.185 0.121
31 0.193 0.173 0.111
32 0.201 0.178 0.118
33 0.175 0.168 0.109
34 0.179 0.166 0.100
35 0.171 0.147 0.100
36 0.196 0.172 0.116
37 0.206 0.180 0.118
38 0.217 0.199 0.111
39 0.246 0.225 0.155
40 0.154 0.121 0.099
41 0.109 0.091 0.036

We see from the table that for no age does the nominal 90 %
confidence interval achieve a coverage probability exceeding 30 %, and for
no age does the nominal 50 % confidence interval achieve a coverage
probability exceeding 16 %.  The obvious conclusion here is that the
Favorite Toy, while easy to work with, does not produce anything even
approximating a reasonable estimate of the probabilities it attempts to
explore.

Confidence Intervals for Active Players:

Given below are confidence intervals, produced by the new method,
for career home runs totals for the active players who, after the 2002
season, were predicted by the new system to finish with more than 300 home
runs.  Thus, for example, Barry Bonds was predicted to finish with 684
home runs.  He was estimated to have a 50 % chance of finishing with 639
to 683 home runs, and a 90 % chance of finishing with 623 to 801 home runs. 
The intervals are quite wide for younger players, but history suggests
that they need to be.

Player Age EXP 90 % Conf. 50 % Conf.

Barry     Bonds     37 684 ( 623 ,  801 ) (639 , 683)
Alex       Rodriguez   26 639 ( 425 , 1065 ) (519 , 688)
Sammy     Sosa       33 636 ( 519 ,  826 ) (554 , 644)
Rafael     Palmeiro   37 556 ( 500 ,  666 ) (514 , 555)
Fred       McGriff     38 522 ( 484 ,  596 ) (496 , 524)
Jim       Thome     31 504 ( 370 ,  726 ) (412 , 513)
Ken       Griffey     32 503 ( 472 ,  554 ) (480 , 504)
Albert     Pujols     22 459 ( 282 , 1079 ) (382 , 550)
Vladimir   Guerrero   26 456 ( 298 ,  773 ) (367 , 493)
Jeff       Bagwell     34 454 ( 393 ,  543 ) (412 , 456)
Andruw     Jones     25 454 ( 285 ,  837 ) (363 , 510)
Juan       Gonzalez   32 450 ( 411 ,  516 ) (421 , 451)
Manny     Ramirez     30 439 ( 338 ,  603 ) (370 , 452)
Mike       Piazza     33 433 ( 359 ,  555 ) (381 , 438)
Frank     Thomas     34 430 ( 384 ,  495 ) (398 , 431)
Troy       Glaus     25 423 ( 251 ,  814 ) (330 , 480)
Gary       Sheffield   33 413 ( 349 ,  516 ) (368 , 417)
Shawn     Green     29 410 ( 278 ,  631 ) (326 , 427)
Matt       Williams   36 395 ( 376 ,  427 ) (380 , 395)
Ellis     Burks     37 394 ( 352 ,  477 ) (363 , 394)
Larry     Walker     35 391 ( 342 ,  469 ) (353 , 388)
Andres     Galarraga   41 390 ( 386 ,  402 ) (388 , 389)
Carlos     Delgado     30 389 ( 289 ,  550 ) (321 , 401)
Eric       Chavez     24 387 ( 229 ,  807 ) (314 , 451)
Mo       Vaughn     34 373 ( 332 ,  433 ) (345 , 374)
Greg       Vaughn     36 366 ( 352 ,  388 ) (356 , 366)
Todd       Helton     28 365 ( 240 ,  579 ) (289 , 384)
Jason     Giambi     31 362 ( 256 ,  538 ) (288 , 369)
Chipper     Jones     30 361 ( 275 ,  501 ) (303 , 372)
Ron       Gant       37 348 ( 323 ,  396 ) (329 , 347)
Miguel     Tejada     26 338 ( 202 ,  609 ) (262 , 370)
Alfonso     Soriano     24 338 ( 182 ,  751 ) (265 , 400)
Tino       Martinez   34 337 ( 292 ,  403 ) (306 , 339)
Lance     Berkman     26 336 ( 185 ,  639 ) (251 , 372)
Robin     Ventura     34 334 ( 285 ,  406 ) (300 , 335)
Jeff       Kent       34 331 ( 267 ,  424 ) (287 , 333)
Raul       Mondesi     31 327 ( 256 ,  444 ) (278 , 332)
Richie     Sexson     27 326 ( 200 ,  540 ) (248 , 348)
Scott     Rolen     27 326 ( 211 ,  519 ) (255 , 345)
Adam       Dunn       22 326 ( 189 ,  803 ) (266 , 396)
Luis       Gonzalez   34 325 ( 262 ,  417 ) (282 , 327)
Magglio     Ordonez     28 325 ( 202 ,  535 ) (250 , 343)
David     Justice     36 324 ( 306 ,  354 ) (311 , 324)
Pat       Burrell     25 322 ( 170 ,  668 ) (240 , 372)
Ryan       Klesko     31 321 ( 243 ,  450 ) (267 , 326)
Tim       Salmon     33 321 ( 275 ,  396 ) (289 , 324)
Brian     Giles     31 313 ( 214 ,  479 ) (244 , 320)
Tony       Batista     28 313 ( 202 ,  502 ) (245 , 329)
Jim       Edmonds     32 305 ( 235 ,  423 ) (252 , 308)
Eric       Karros     34 301 ( 273 ,  340 ) (282 , 301)

A Final Note: 

Because the coefficients recorded in Tables 1, 2, and 3 were derived
based on working with home run totals, they are almost certainly not
appropriate coefficients to use for other statistics such as hits or stolen
bases.  The method by which these coefficients were derived should,
however, be applicable to other statistics.

Jesse Frey Posted: September 18, 2003 at 06:00 AM | 17 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Michael Posted: September 19, 2003 at 02:41 AM (#612989)
Yes, MKT your analysis of a flaw in the Post Season odds report is very, very true. I had calculated a similar amount of information right around mid-June when the Twins were at their best. I had figured out that if teams were really at their observed win% (taking into account SoS) that the Twins would win 105 +/ 4.5 games and had a bet with a friend that the Twins would win at least 100 games. Of course they then lost something like 22 of the next 30 to make my bet look really, really silly. They will likely win about 90 and that isn't surprising given regression to the mean.
   2. User unknown in local recipient table (Craig B) Posted: September 19, 2003 at 02:41 AM (#612990)
It never rains but it pours, eh? (Not to be taken as a reference to the weather in the East today. Man, is it raining here in Toronto).

What I mean is that there is another analysis of The Favorite Toy in the May issue of "By The Numbers" (the newsletter of the SABR Statistical Analysis Committee) which came out just this month.

http://www.philbirnbaum.com/btn2003-05.pdf

It's by Shane Holmes and it starts on Page 15 of the above document. Holmes's study is on the overall class of players with established chances, and he concludes that over the entire sample, TFT does a pretty good job of predicting the total numbers of players who will reach the 3,000 hit and 500 homer benchmarks.
   3. tangotiger Posted: September 19, 2003 at 02:41 AM (#612991)
Good job all-around!

I'll give you the same reservations with this system as I give with PECOTA and with accuracy tests for run estimators, and virtually everything else out there:

You need to have data that is drawn from outside the sample to test the equations derived from the sample.

This is most easily done by selecting players by year of birth, and throw all odd-year players into 1 pool, and even-year players into the other pool. Derive your equations from 1, and test against the other. Or, randomly choose 80% of the players for your sample, and test against the other 20%.

That said, it's a refreshing new way to look at the issue.
   4. User unknown in local recipient table (Craig B) Posted: September 19, 2003 at 02:41 AM (#612994)
By the way, I just finished reading the whole piece. This is amazing stuff. Fully deserving of congratulations.
   5. User unknown in local recipient table (Craig B) Posted: September 19, 2003 at 02:41 AM (#612995)
By the way, I just finished reading the whole piece. This is amazing stuff. Fully deserving of congratulations.
   6. Jesse Frey Posted: September 19, 2003 at 02:41 AM (#612998)
MKT,

I did try using current career total as a predictor variable, but it didn't seem to be very useful. In general, since I wanted to avoid overfitting the data at the expense of prediction power, I didn't include a variable unless the evidence was overwhelming that the variable was important for prediction.

Tango,

I agree with you that some kind of cross-validation would be good. I should mention as a point in my defense, though, that I didn't just seek out the best possible fit. I insisted that the evidence in favor of using a variable be very strong before I used it, and I made certain that the variables and coefficients were consistent in certain ways. For example, I insisted that the coefficients for each age be largest for the most recent season and get progressively smaller.

Gerry,

I would like to be able to predict probabilities of reaching particular landmarks. To do that well, though, I will likely need a complete model instead of just a table of quantiles (Table 3). I'm working on it.


Jesse
   7. Ziggy Posted: September 19, 2003 at 02:41 AM (#612999)
If you're basing your calculations on actual, historical, performance, can they take into account the changes in baseball environment that have occured in the past few years? That is, if one of your assumptions that players will continue to produce home runs at rates comparable with those of previous decades, doesn't that overlook the fact that something changed in the mid 1990s?

It's entirely possible that I misunderstood how you were using the historical information, but if I'm not, then won't your predictions be a bit on the low side?
   8. Ned Garvin: Male Prostitute Posted: September 19, 2003 at 02:41 AM (#613001)
I think it would be interesting, if not necesarily enlightening, to do this for sometime in the past. In other words, take some year, any year, at least 20 years ago. Test young-ish players from that year, and see how many are supposed to get 2000 or 3000 hits, 300/400/500 HRs, etc, and see how it all turned out. If the predictions are reasonably close, I think that is a strong statement.
   9. Jesse Frey Posted: September 19, 2003 at 02:41 AM (#613003)
FJM,

I appreciate your comments, and I acknowledge that there is a lot of room for improvement in the prediction method. I think, though, that you may be underestimating it in some ways. The dataset used in determining all the coefficients contains players who went through injuries (Mark McGwire is one who came back strong), players who changed ballparks, and players who were active during league-wide changes in home run rates. As a result, these possibilities are not ignored, but show up as additional width for all the prediction intervals. Since this additional width shows up only in a average sense for all players, though, you are certainly right that extra knowledge we have about specific players like Helton and Griffey could help improve the predictions for those players.

Your idea about modeling AB and HR rate separately is interesting.
   10. User unknown in local recipient table (Craig B) Posted: September 20, 2003 at 02:41 AM (#613009)
Terry,

Bill James, in his "Breaking The Wand" essay in the 1988 Abstract, talked about the "Dear Jackass" letters he used to write. Once in a while, he said, people would write him demanding that he do something that they were interested in, or complaining that he was doing things they weren't interested in. He would respond with the "Dear Jackass" letter, which said (paraphrasing) "Dear Jackass, if you want something done why don't you do it yourself. I work on what interests me, please don't bother me telling me what to do."

Terry, you're being a jackass.
   11. trantor Posted: September 21, 2003 at 02:41 AM (#613014)
Jesse;

Did you search for other variables? Seems to me that "Home Run decay rates" might be related to the "type" of player the hitter is. Perhaps their historical w/k ratio, OBP, AB/season, or perhaps position played may have a stronger impact than simple age-relatd adjustments. A Dave Kingman type who has no average or position skill is going to exit sooner than a middle infielder. Adjustments by position would show players like A-Rod to have more probability of hitting the upper bands. Without discvering another factor, I doubt you can get any more accuracy out of this. It is a toy, but if you find a predictive relationship with other stats, you may accomplish something significant.

Tinkerers can examine players with injuries by using their last two full seasons in the formula, where exceptions are obvious, like Griffey. Wonder how McGquire would fit into such a model in 1998?
   12. Jesse Frey Posted: September 21, 2003 at 02:41 AM (#613015)
Steve,

The version of the Favorite Toy that I used was one that I found in several places on the internet. I wasn't aware of the version that you describe. It may be better for older players, but I believe that the main difficulty with the FT, namely overly short confidence intervals, would remain.

The 684 that I list as the expected total for Bonds comes from the new method given in the article and not from the Favorite Toy, which would indeed have a higher estimate. I probably shouldn't have used the same name EXP for both estimates.
   13. Jesse Frey Posted: September 22, 2003 at 02:41 AM (#613020)
Trantor,

I didn't search very seriously for other variables. I agree that incorporating position and other variables might help in shortening the intervals.

Terry,

You seem to be under the impression that I am somehow on staff at Baseball Primer. I'm not. The editor was simply kind enough to give me a chance to, as described in the submission policy for the Visitor's Dugout, "work through an idea with some intelligent baseball fans."

I had fun writing this article, and I can't help smiling when you write about "hard, exacting, time-consuming work."
   14. tangotiger Posted: September 22, 2003 at 02:41 AM (#613022)
To go beyond the age as a variable is to now accept that we are beyond the "Toy" stage. For example, if you are going to consider the player's position, health, profile, skillset, speed, etc, you are now into the "forecasting" game. This is what your professional forecasters do. Nate Silver's PECOTA is an example of forecasting, along with his confidence intervals, similar to what Jesse is showing with his confidence intervals.

Since Jesse did not present his system in this forecasting light, but more along the lines of improving a Toy, it should be evaluated on that.

I do agree that you should break up the "playing time" and the "rate" stats into 2, as well as include a park or era adjustment. However, the era adjustment especially will be bothersome, because you wouldn't have predicted what happened in the last 10 years. So, from that standpoint, you are going to increase your margin for error (as well you should).

Any of the Coors hitters has to be evaluated in terms of "when are they going to leave Coors"? And, of course, you have to add a smidge to every other hitter in the league to account for "when will they play half their home games at Coors"? To do that, you need to look at contract status, age, etc.

So, you have to be very very careful what parameters you choose to introduce, and how far you are willing to take them.

Finally, even if Jesse's work is deemed to be "useless" by some, it contains enough original and thoughtful information that it inspires original and thoughtful replies to his work.

And I do commend Terry on at least having the b-lls to include his email adddress with his link. His comments are probably not respected by many, if any, readers here, but at least he didn't post them with a bag over his head.

I would suggest to Terry and others that if you want people to take your words more seriously, that you be more polite in trying to say what you want to say. What ends up happening is that people will ignore you, and will only remember your attitude. If you saw Jesse face-to-face, would you actually speak as you did?

   15. Rickey! trades in sheep and threats Posted: September 22, 2003 at 02:41 AM (#613026)
The day baseball is in anyway "work" is the day to stop watching, following, obsessing over or playing baseball. Jeez. For the love of Pete Terry. It's all frivolous. It's a bunch of obscenely wealthy guys that will never speak to you hitting a small ball around a glorified cow pasture.
   16. Jesse Frey Posted: September 23, 2003 at 02:41 AM (#613032)
Joe,

The coefficients for age 32 are 2.14 and 0.79. Thanks for catching that omission.
   17. Jesse Frey Posted: September 28, 2003 at 02:42 AM (#613055)
Edge,

Thanks for putting my method to the test. Your numbers are right on, and they don't look good for my method. You were even kind enough not to point out that my 50% confidence intervals were correct for only 5 out of the 21 players in your sample. For samples such as the one you selected, my confidence intervals don't quite have their nominal coverage probabilities, and the Favorite Toy, which was presumably designed with these sorts of samples in mind, is at its best.

I will point out that if you expand your sample just slightly, say to include all 45 players who hit at least 20 homers in 1990, the picture changes dramatically. I won't post the list, but the numbers of correct intervals are as follows. The FT has 14 correct 90 % intervals, 13 correct 80 % intervals, and 9 correct 50 % intervals, while the new method has 39 correct 90 % intervals, 32 correct 80 % intervals, and 17 correct 50 % intervals. The average lengths of the intervals over this sample of 45 players were 176, 148, and 83 for the FT, 250, 169, and 69 for the new method.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Sheer Tim Foli
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.3990 seconds
66 querie(s) executed