User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Buy MLB playoff tickets, plus 2011 World Series, 2011 ALCS tickets and NLCS game tickets. We also have Texas Rangers playoff schedule, tickets to Red Sox games and Yankees game tickets. Plus, buy Phillies baseball tickets, Tigers playoff tickets and the biggies like ALDS baseball tickets and 2011 NLDS tickets. |
Demarini, Easton and TPX Baseball Bats
|
AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets. |
Page rendered in 0.5088 seconds
56 querie(s) executed

Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
What I mean is that there is another analysis of The Favorite Toy in the May issue of "By The Numbers" (the newsletter of the SABR Statistical Analysis Committee) which came out just this month.
http://www.philbirnbaum.com/btn2003-05.pdf
It's by Shane Holmes and it starts on Page 15 of the above document. Holmes's study is on the overall class of players with established chances, and he concludes that over the entire sample, TFT does a pretty good job of predicting the total numbers of players who will reach the 3,000 hit and 500 homer benchmarks.
I'll give you the same reservations with this system as I give with PECOTA and with accuracy tests for run estimators, and virtually everything else out there:
You need to have data that is drawn from outside the sample to test the equations derived from the sample.
This is most easily done by selecting players by year of birth, and throw all odd-year players into 1 pool, and even-year players into the other pool. Derive your equations from 1, and test against the other. Or, randomly choose 80% of the players for your sample, and test against the other 20%.
That said, it's a refreshing new way to look at the issue.
I did try using current career total as a predictor variable, but it didn't seem to be very useful. In general, since I wanted to avoid overfitting the data at the expense of prediction power, I didn't include a variable unless the evidence was overwhelming that the variable was important for prediction.
Tango,
I agree with you that some kind of cross-validation would be good. I should mention as a point in my defense, though, that I didn't just seek out the best possible fit. I insisted that the evidence in favor of using a variable be very strong before I used it, and I made certain that the variables and coefficients were consistent in certain ways. For example, I insisted that the coefficients for each age be largest for the most recent season and get progressively smaller.
Gerry,
I would like to be able to predict probabilities of reaching particular landmarks. To do that well, though, I will likely need a complete model instead of just a table of quantiles (Table 3). I'm working on it.
Jesse
It's entirely possible that I misunderstood how you were using the historical information, but if I'm not, then won't your predictions be a bit on the low side?
I appreciate your comments, and I acknowledge that there is a lot of room for improvement in the prediction method. I think, though, that you may be underestimating it in some ways. The dataset used in determining all the coefficients contains players who went through injuries (Mark McGwire is one who came back strong), players who changed ballparks, and players who were active during league-wide changes in home run rates. As a result, these possibilities are not ignored, but show up as additional width for all the prediction intervals. Since this additional width shows up only in a average sense for all players, though, you are certainly right that extra knowledge we have about specific players like Helton and Griffey could help improve the predictions for those players.
Your idea about modeling AB and HR rate separately is interesting.
Bill James, in his "Breaking The Wand" essay in the 1988 Abstract, talked about the "Dear Jackass" letters he used to write. Once in a while, he said, people would write him demanding that he do something that they were interested in, or complaining that he was doing things they weren't interested in. He would respond with the "Dear Jackass" letter, which said (paraphrasing) "Dear Jackass, if you want something done why don't you do it yourself. I work on what interests me, please don't bother me telling me what to do."
Terry, you're being a jackass.
Did you search for other variables? Seems to me that "Home Run decay rates" might be related to the "type" of player the hitter is. Perhaps their historical w/k ratio, OBP, AB/season, or perhaps position played may have a stronger impact than simple age-relatd adjustments. A Dave Kingman type who has no average or position skill is going to exit sooner than a middle infielder. Adjustments by position would show players like A-Rod to have more probability of hitting the upper bands. Without discvering another factor, I doubt you can get any more accuracy out of this. It is a toy, but if you find a predictive relationship with other stats, you may accomplish something significant.
Tinkerers can examine players with injuries by using their last two full seasons in the formula, where exceptions are obvious, like Griffey. Wonder how McGquire would fit into such a model in 1998?
The version of the Favorite Toy that I used was one that I found in several places on the internet. I wasn't aware of the version that you describe. It may be better for older players, but I believe that the main difficulty with the FT, namely overly short confidence intervals, would remain.
The 684 that I list as the expected total for Bonds comes from the new method given in the article and not from the Favorite Toy, which would indeed have a higher estimate. I probably shouldn't have used the same name EXP for both estimates.
I didn't search very seriously for other variables. I agree that incorporating position and other variables might help in shortening the intervals.
Terry,
You seem to be under the impression that I am somehow on staff at Baseball Primer. I'm not. The editor was simply kind enough to give me a chance to, as described in the submission policy for the Visitor's Dugout, "work through an idea with some intelligent baseball fans."
I had fun writing this article, and I can't help smiling when you write about "hard, exacting, time-consuming work."
Since Jesse did not present his system in this forecasting light, but more along the lines of improving a Toy, it should be evaluated on that.
I do agree that you should break up the "playing time" and the "rate" stats into 2, as well as include a park or era adjustment. However, the era adjustment especially will be bothersome, because you wouldn't have predicted what happened in the last 10 years. So, from that standpoint, you are going to increase your margin for error (as well you should).
Any of the Coors hitters has to be evaluated in terms of "when are they going to leave Coors"? And, of course, you have to add a smidge to every other hitter in the league to account for "when will they play half their home games at Coors"? To do that, you need to look at contract status, age, etc.
So, you have to be very very careful what parameters you choose to introduce, and how far you are willing to take them.
Finally, even if Jesse's work is deemed to be "useless" by some, it contains enough original and thoughtful information that it inspires original and thoughtful replies to his work.
And I do commend Terry on at least having the b-lls to include his email adddress with his link. His comments are probably not respected by many, if any, readers here, but at least he didn't post them with a bag over his head.
I would suggest to Terry and others that if you want people to take your words more seriously, that you be more polite in trying to say what you want to say. What ends up happening is that people will ignore you, and will only remember your attitude. If you saw Jesse face-to-face, would you actually speak as you did?
The coefficients for age 32 are 2.14 and 0.79. Thanks for catching that omission.
Thanks for putting my method to the test. Your numbers are right on, and they don't look good for my method. You were even kind enough not to point out that my 50% confidence intervals were correct for only 5 out of the 21 players in your sample. For samples such as the one you selected, my confidence intervals don't quite have their nominal coverage probabilities, and the Favorite Toy, which was presumably designed with these sorts of samples in mind, is at its best.
I will point out that if you expand your sample just slightly, say to include all 45 players who hit at least 20 homers in 1990, the picture changes dramatically. I won't post the list, but the numbers of correct intervals are as follows. The FT has 14 correct 90 % intervals, 13 correct 80 % intervals, and 9 correct 50 % intervals, while the new method has 39 correct 90 % intervals, 32 correct 80 % intervals, and 17 correct 50 % intervals. The average lengths of the intervals over this sample of 45 players were 176, 148, and 83 for the FT, 250, 169, and 69 for the new method.
You must be Registered and Logged In to post comments.
<< Back to main