Baseball for the Thinking Fan

You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

## Tuesday, April 09, 2002

#### 2001 Projections ? A Look Back; Projections Part I

Whose crystal ball is best?

It?s time   once again to take a look at how the various projections did for the 2001   Season. In 2000, I looked at a bunch of various projection systems and how   well they did in projecting the actual results of the various players. I looked   at the top 170 players in terms of plate appearances (with some exceptions)   and how each system did in projecting OPS. The results of that exercise   can be found here.

Anyway,   I?ve decided to do the same for the 2001 season only I?ll limit the analysis   to myself, STATS, Inc. and Baseball Prospectus. The first thing   I?ll do is simply give you the results for the three for the top 170 players   in Plate Appearances:

 Source Corr. r^2 RMSE MAE MAPE Voros .748 .559 .081 .061 7.5% Stats, Inc. .739 .547 .082 .061 7.4% Baseball Prospectus .703 .487 .091 .069 8.5%

Corr. = Correlation Coefficient
r^2 = Square of Corr.
RMSE = Root Mean Squared Error
MAE = Mean Absolute Error
MAPE = Mean Absolute Percentage Error

Anyhow, before anybody gets out of ahead of me, I?d like   to talk about one of my favorite subjects: multiple   endpoints. While generally multiple endpoints arguments occur when someone   looks at a data set and then draws the restrictions to achieve a particular   result, sometimes multiple endpoints becomes an issue even when no intent   to rig the numbers is present.

This has   happened here. If I extend the restrictions to the top 176 players, STATS   winds up with a higher correlation coefficient than me. In other words, the   two systems were about equal in terms of their ability to project the OPS   of the players. This was essentially the same case as the year before as well.

I also wanted   to take a look at Baseball Prospectus? numbers. Last year BP?s projections   did every bit as well as mine and STATS and it was surprising that they not   only did worse than mine and STATS this year, but also worse than they had   done the year before. The first thing I checked was the Colorado   factor. In 2000, none of the Colorado   players were counted because I didn?t release projections for them (for reasons   I won?t get into). I also noted that there were some weird projections for   a few Colorado players (.978   OPS for Jeff Cirillo and a .654 OPS for Vinny Castilla). So I guessed that   maybe their system had trouble dealing with the unique nature of Coors field   and gave unreliable results.

Anyway,   I eliminated the four Rockies players from the 170   (Larry Walker, Todd Helton, Cirillo and Juan Pierre) and three more for whom   Coors could have messed with the projections (Eric Young, Vinny Castilla and   Ellis Burks) and re-did the correlations with the remaining 163 players. The   results were .728, .721, .682 for me, STATS and BP respectively. So I?m guessing   that Colorado was not the main   difference between this year and last. That being the case, I?m not sure I   can explain the discrepancy other than BP having an off year.

Another   thing worth looking at, are the players that were missed by the greatest amount.   The five players with the largest percentage miss by my system were: Brady   Anderson, Cal Ripken, Jason Kendall, Edgardo Alfonzo and Barry Bonds. For   STATS it was: Bonds, Anderson,   Kendall, Tim Salmon and Mark Quinn. For BP it was:   Kendall, Alfonzo, Bret Boone, Bonds and Johnny Damon.   Of the nine above players, none of the systems posted an above average projection,   meaning that the differences were largely due to abnormal performances compared   to their previous stats. The closest anybody came to an above average projection   for the nine was my projection for Mark Quinn (.819 projected, .757 actual).

What about   players projected using Minor League numbers? Well I really don?t know how   and when STATS or BP used minor league stats, but there were 13 players out   of the 170 who had less than 502 Major League appearances going into 2001.   The correlation coefficients for these 13 were .60, .53 and .41 for me, STATS   and BP respectively. While that looks much lower than overall, remember this   is a very small sample and more importantly the range in performance for these   13 was much smaller than the overall range, and that drives those numbers   down. The MAE (.037, .053, .051) and the RMSE (.045, .063, .068) are smaller   than the overall? numbers, but this again is due to the smaller range. The   MAPE (5.0%, 7.3%, 6.8%) is probably the most appropriate metric here, and   for it, the numbers are roughly the same.

It should   be noted however that the argument can be made that this represents a selective   sample of only the minor leaguers who racked up a bunch of at bats, and that   could affect the results. This is undoubtedly true, and this combined with   the small sample makes the above numbers regarding minor leaguers not very   meaningful. I?m going to try and do a more exhaustive look with next year?s   stats.

One final   note is that the problem from last year of underprojecting the group as a   whole did not happen this year. The average OPS of the 170 was .814 and myself,   STATS and BP had average OPS numbers for the group of .817, .814 and .820   respectively.

Projecting   player performance is a critical aspect of player analysis, possibly   the most critical. In the coming weeks I?ll discuss why that it is,   what the difficulties in projecting are, and ways future projections might   be able to be improved.

Voros McCracken Posted: April 09, 2002 at 06:00 AM | 10 comment(s) Login to Bookmark
Related News:

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

1. Voros McCracken Posted: April 10, 2002 at 12:28 AM (#605102)
a) That was in 2000 not last year (read the article). I did Colorado projections last year.
b) In 2000, I did them on the rush and just posted the park neutral projections. Therefore I knew the Colorado projections would be off, but guessed that the rest would be close enough since other than Colorado, parks don't have too big effect on things.
c) I did tell people why (in fact there's the answer to the question on my web page), but since this article wasn't even about the 2000 projections, I didn't figure it was worth wasting space over it.
d) What's with the attitude?
2. Voros McCracken Posted: April 10, 2002 at 12:28 AM (#605103)
DCS,

Between mine and STATS? No way. I believe I mentioned that they were "about equal."

Between STATS and mine and BP's? Hard to say. Probably not, but if the same difference happens once again this year, maybe this years difference should be given more consideration.

The idea was simply to present generally how each system did since this is something many people have asked for in the past.
3. Voros McCracken Posted: April 10, 2002 at 12:28 AM (#605108)
This didn't get into the article due to my own error (thought I had put it there) but the following players weren't in the sample of 170 players despite qualifying for the playing time requirement. The reasons were either that one of the three systems didn't give a projection for the player, or the player played only part of the season in Colorado making his projection unreliable either way. They were:

Ichiro!, Albert Pujols, David Eckstein, Neifi Perez, Todd Walker, Alex Ochoa, Jose Macias, Benito Santiago, Paul LoDuca, Shea Hillenbrand and Rickey Henderson.
4. Voros McCracken Posted: April 10, 2002 at 12:28 AM (#605109)
Rob Lane,

Here's the results for the 144 players who had 300 or more ABs in 2000 and were among the 170 player sample:

Voros: .747 Corr., .063 MAE, .084 RMSE
STATS: .740 Corr., .062 MAE, .085 RMSE
BP: .692 Corr., .074 MAE, .095 RMSE
2000 OPS: .690 Corr., .079 MAE, .100 RMSE

There were no players in the sample who played in Coors in 2001 who didn't play there in 2000, nor was the opposite true for any of the players. The Correlation Coefficent for the 2000 stats is probably not useful since it only measures how well the numbers track with the other set, not necessarily how good of a predictor _unregressed_ it would be. The Mean-Absolute and the Root-Mean-Squared errors are far more important for this comparison.

It's also critical to note that the correlation between my projections and the 2000 stats was .903 with an r-squared of .816. In other words, for any players who had 300 or more at bats in 2000, what they did in 2000 is obviously going to have a substantial impact on anyone's projections. So clearly since the 2000 numbers for such players might be considered the "baseline" for future expectations, the discrepancies between it and the projections are the critical thing to look at, and in those cases the projections tend to do quite well.
5. tangotiger Posted: April 11, 2002 at 12:28 AM (#605112)
I think what the poster was after was how big a deal are these projections? If you just use the previous year's stats, and don't do anything to them, that would be say the absolute worst that you can do, then what does it give you? I would also say, what if you take the last 3 years equally weighted, and don't do anything to them, what's the r-squared. In essence, the projections that people do should be adding to this "baseline".

As an extra control, if you just look at those players aged 25-29 in 2000, how does the Voros, STATS, do against 2001? And how does the 2000 stats by themselves do? At this point, I'm removing age from the equation to see where the "value-added" comes in.

Finally, if you take 1998-2000 stats of players aged 25-29 in year 1999, how does all this work out as well?

Are the predictions really so good because you guys have found something interesting, or is it simply because of the age adjustment? Answering the above question should shed some light on this.

Thanks, Tom
6. Voros McCracken Posted: April 11, 2002 at 12:28 AM (#605115)
Another thing to add is that to be of use, we really need to project more than just the players who had 300 or more at bats last year anyway. In fact, the group that doesn't fit that profile is the most critical since thses are the guys who would generally be the replacement pool of players.

And if we go back to use two and three years, the same problem occurs in that less and less players fit that profile.

James makes another good point in that the similarities between a projection and the players results last year are not the important issue, it's the differences that are key. When looking at the 7 biggest differences between the two on both ends, the projections were closer to the actual results than the 2000 numbers in 10 of the 14 cases, and in all but one case the actual results were in the direction of the projection away from the 2000 stats.

We should also note that this is only the results of a single season. I'll try and piece together 1999 to 2000 as the weeks go on. I have a lot to say on this subject, so I'll save the rest for future installments.
7. tangotiger Posted: April 12, 2002 at 12:28 AM (#605116)
There's another part to this, that I've mentioned to Voros in the past, and that's the number of PAs a player gets.

You may be surprised to know, but the more PAs a player hets, the better his performance. So, if you want to project a player's stats for 500 PA or for 200 PA, his RATE stats will be different. The probable reason is that managers, for those average players, are swayed by in-season stats, so much so that if someone has a bad month, his playing time is reduced, thereby not giving that player the benefit of regressing to his normal performance level.

So, if we arbitrarily choose 500 PA as the cutoff, then the projectors who do the best are those who regress the least. The lower the PA threshhold, the more regression has to be considered for those players.
8. tangotiger Posted: April 12, 2002 at 12:28 AM (#605118)
On fanhome somewhere, I showed how a group of average hitters in year 1 and year 3 became below average hitters with less than 200 PAs. I don't know the cause/effect relationship, or if injuries played a part in it. Your points are also valid.
9. Voros McCracken Posted: April 13, 2002 at 12:28 AM (#605125)
Because they're not free and I don't buy them, otherwise i would. In fact, if you want to send me his 2001 projections I'll run them.
10. Voros McCracken Posted: April 19, 2002 at 12:29 AM (#605141)
"Voros, don't you think it would be good to statistically test the differences? All this consternation over whose system projects the best is really silly if they're all enveloped by variance."

I'm not sure what you mean by "statistically test." Besides which, despite the tagline to the article, I'm really not all that concerned with whose system is "best," I personally am only concerned with my system, and I compare it to others as a means to measure where my numbers might be lacking (and I have several ideas on this). The idea is to work on the system to continue to provide insights into how various variables relate to future performance.

Big Ed,

It's really quite hard to figure out systematic differences and I suspect if they exist they'd be different for different systems.

For my system, I'm thinking that I might not handle increases in performance as well as can be handled. I also am not sure whether weight is a usable variable, and whether hieght might be better to use. The data set I have shows stronger relationships between weight and future performance than height, but weights can be so error-prone and different sources provide different weights, and weights changing over time...

...I'm think it might be a minefield that can't be navigated whereas height is a little easier to deal with.

Again, I'll get to a lot of this stuff as I release more articles on projections, I just need a certain amount of time to puut each together. Sorry.

You must be Registered and Logged In to post comments.

<< Back to main

### Support BBTF

Thanks to
Ray (RDP)
for his generous support.

### Bookmarks

You must be logged in to view your Bookmarks.

### Syndicate

 Demarini, Easton and TPX Baseball Bats AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets. For wholesale prices on baseball gifts and equipment, check these stores out! Baseball Autograph Signings Baseball Card Supplies Baseball Memorabilia Baseball Collectibles Baseball Equipment Baseball Protective Gear

Page rendered in 0.2629 seconds
64 querie(s) executed