2001 Projections ? A Look Back; Projections Part I
Whose crystal ball is best?
once again to take a look at how the various projections did for the 2001
Season. In 2000, I looked at a bunch of various projection systems and how
well they did in projecting the actual results of the various players. I looked
at the top 170 players in terms of plate appearances (with some exceptions)
and how each system did in projecting OPS. The results of that exercise
can be found here.
I?ve decided to do the same for the 2001 season only I?ll limit the analysis
to myself, STATS, Inc. and Baseball Prospectus. The first thing
I?ll do is simply give you the results for the three for the top 170 players
in Plate Appearances:
Corr. = Correlation Coefficient
r^2 = Square of Corr.
RMSE = Root Mean Squared Error
MAE = Mean Absolute Error
MAPE = Mean Absolute Percentage Error
Anyhow, before anybody gets out of ahead of me, I?d like
to talk about one of my favorite subjects: multiple
endpoints. While generally multiple endpoints arguments occur when someone
looks at a data set and then draws the restrictions to achieve a particular
result, sometimes multiple endpoints becomes an issue even when no intent
to rig the numbers is present.
happened here. If I extend the restrictions to the top 176 players, STATS
winds up with a higher correlation coefficient than me. In other words, the
two systems were about equal in terms of their ability to project the OPS
of the players. This was essentially the same case as the year before as well.
I also wanted
to take a look at Baseball Prospectus? numbers. Last year BP?s projections
did every bit as well as mine and STATS and it was surprising that they not
only did worse than mine and STATS this year, but also worse than they had
done the year before. The first thing I checked was the Colorado
factor. In 2000, none of the Colorado
players were counted because I didn?t release projections for them (for reasons
I won?t get into). I also noted that there were some weird projections for
a few Colorado players (.978
OPS for Jeff Cirillo and a .654 OPS for Vinny Castilla). So I guessed that
maybe their system had trouble dealing with the unique nature of Coors field
and gave unreliable results.
I eliminated the four Rockies players from the 170
(Larry Walker, Todd Helton, Cirillo and Juan Pierre) and three more for whom
Coors could have messed with the projections (Eric Young, Vinny Castilla and
Ellis Burks) and re-did the correlations with the remaining 163 players. The
results were .728, .721, .682 for me, STATS and BP respectively. So I?m guessing
that Colorado was not the main
difference between this year and last. That being the case, I?m not sure I
can explain the discrepancy other than BP having an off year.
thing worth looking at, are the players that were missed by the greatest amount.
The five players with the largest percentage miss by my system were: Brady
Anderson, Cal Ripken, Jason Kendall, Edgardo Alfonzo and Barry Bonds. For
STATS it was: Bonds, Anderson,
Kendall, Tim Salmon and Mark Quinn. For BP it was:
Kendall, Alfonzo, Bret Boone, Bonds and Johnny Damon.
Of the nine above players, none of the systems posted an above average projection,
meaning that the differences were largely due to abnormal performances compared
to their previous stats. The closest anybody came to an above average projection
for the nine was my projection for Mark Quinn (.819 projected, .757 actual).
players projected using Minor League numbers? Well I really don?t know how
and when STATS or BP used minor league stats, but there were 13 players out
of the 170 who had less than 502 Major League appearances going into 2001.
The correlation coefficients for these 13 were .60, .53 and .41 for me, STATS
and BP respectively. While that looks much lower than overall, remember this
is a very small sample and more importantly the range in performance for these
13 was much smaller than the overall range, and that drives those numbers
down. The MAE (.037, .053, .051) and the RMSE (.045, .063, .068) are smaller
than the overall? numbers, but this again is due to the smaller range. The
MAPE (5.0%, 7.3%, 6.8%) is probably the most appropriate metric here, and
for it, the numbers are roughly the same.
be noted however that the argument can be made that this represents a selective
sample of only the minor leaguers who racked up a bunch of at bats, and that
could affect the results. This is undoubtedly true, and this combined with
the small sample makes the above numbers regarding minor leaguers not very
meaningful. I?m going to try and do a more exhaustive look with next year?s
note is that the problem from last year of underprojecting the group as a
whole did not happen this year. The average OPS of the 170 was .814 and myself,
STATS and BP had average OPS numbers for the group of .817, .814 and .820
player performance is a critical aspect of player analysis, possibly
the most critical. In the coming weeks I?ll discuss why that it is,
what the difficulties in projecting are, and ways future projections might
be able to be improved.
Posted: April 09, 2002 at 06:00 AM | 10 comment(s)
Login to Bookmark