You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Friday, February 28, 2003Peak ProjectionsProspects and longterm potential.
The purpose of the Peak Projection system is to project what a prospect will achieve in his prime  his best three consecutive seasons  given his production, age, level of competition, and the standard baseball player’s development curve.
In the 80’s, Bill James derived his Major League Equivalency formula with the purpose of translating the DoubleA and TripleA performances of Minor League players to the Major League level. These translations proved useful to show if an underappreciated TripleA veteran would make for a quality Major League player.
However, these equivalencies have little standalone value when analyzing younger, less developed prospects. The Peak Projection system adds the age adjustment necessary to make the statistics a valuable tool in prospect analaysis.
The system takes the Major League equivalency concept and builds on it by projecting what the player is on target to achieve in his prime if he develops like the average Major League baseball player.
It is true that not every player develops the same. Magglio Ordonez is one of the best hitters in the game today yet put up unimpressive Minor League numbers. Similarly, Sammy Sosa has taken his game to the next level by evolving from an undisciplined hacker to one of the more patient sluggers in the game.
However, for the majority of players, the standard baseball player’s development curve is a fair representation of their growth.
To test the Peak Projection system and the assumptions it makes, I took Spring Training Magazine’s Top 100 prospect list and calculated the projections for all the hitters who accumulated at least 60 plate appearances in Low A or higher prior to 1994. It is important to note that while I normally adjust the minor league statistics for offensive context and league factors, I do not have that data for early90’s statistics. Therefore, these are presumably less acurrate than the current projections.
The top prospect list was pared down to 39 batters who fit the criterion  ranging in quality from Carlos Delgado to Howard Battle. The average player was 21years old. Of these 39 batters, Arquimedez Pozo, Brooks Kieschnick, Howard Battle, D.J. Boston, Chad Mottola, Michael Moore and Steve Gibralter never totaled 150 at bats for a threeyear period (rendering their Major League statistics nearly meaningless)  lowering our sample to 32 batters. Out of that group, Pozo (.889 OPS) and Kieschnick (.854 OPS) are the only players to have projections north of D.J. Boston’s .753 OPS. It is safe to say that these two would not have met their projections. For reference’s sake, Pozo was born in the Dominican Republic and the current stringent requirements for proving once’s age upon entering the U.S. did not exist.
Now the actual Major League peak performances for the prospects who accumulated enough Major League playing time was calculated. Each player’s best three consecutive years were averaged. Since the statistics were not park adjusted, seasons in Coors Field were ignored. This came into play only for Todd Hollandsworth and Jeffrey Hammonds.
Results of tests between Projected and Real Peak Performance:
Corr. = Correlation Coefficient
In comparison, Baseball Prospectus’s annual projections scored a correlation of .704 for OPS in 2001, according to Voros McCracken. Now, as with any projection, the bigger the sample the projection is based on, the more accurate the projection. Each projection is accompanied by a theoretical amount of plate appearances. Each stat line is weighted based on whether it was produced in the lower or upper minors (SingleA or DoubleA and TripleA) and the year it was achieved. A performance in DoubleA in 1993 is a more reliable indicator of ability than an equal performance in SingleA in 1990. If these stat lines were created by the same player, the former would get more weight in the projection than the latter.
Therefore, the larger the amount of theoretical plate appearances the more likely the player is to achieve their projection. 18 players had 600 plate appearances  a “full season” of these theoretical PA’s  or more in 1994.
For an abbreviated look:
In a less scientific test, Shawn Green was the only above averagegreat player in the group whose projection of .276 BA/ .329 OBP/ .364 SLG missed the boat completely. 19year olds Derek Jeter and Johnny Damon, along with an unheralded 20year old named Edgardo Alfonzo, hit for considerably more power in reality than their projection. However, Jeter and Alfonzo, much like today’s Joe Mauer, still projected to be above average Major Leaguers even without the power surges. Damon projected to be a typical lighthitting leadoff hitter. Also of note, after Alfonzo’s next Minor League season, his projected SLG rose considerably.
The tests had limited samples of 32 and 18 batters which does introduce a margin of error. As I continue to improve the system, I intend to continue to test it on bigger samples. Also, important to note that these players’ statistics were not apart of the sample used to derive the Peak Projection formulas.
Chris Reed owns and operates ProspectReport.com as well as contributes to various publications. Check out his site for statistical analysis, player features and reports on the future stars of baseball

BBTF PartnerSupport BBTFThanks to BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2014
(8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: Jolly Old St. Nick Is A Jolly Old St. Crip What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH SABR 40: Impressions of a FirstTime Attendee (5  11:12pm, Aug 19) Last: Joe Bivens, Minor Genius St. Louis Cardinals Midseason Report (12  12:42am, Aug 10) Last: bjhanke 

Page rendered in 0.2913 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Chris Reed Posted: February 28, 2003 at 01:34 AM (#609015)I may eventually do all hitters under 25 or so in the majors and minors so you can compare Josh Phelps to Victor Martinez, etc.
2) The OPS correlation of .65 for the 32 players seems leveraged on the two extremes on the high OPS side (Ramirez, Delgado) and the low OPS side (Goodwin, Alexander). The OPS correlation for the remaining 28 players is just below 0.36, which is just over half as much as with those extreme players.
3) The quoted age doesn't seem to be the "July 1" age. For instance, both Manny Ramirez and Shawn Green are listed as 21 years old, but Ramirez was born in the first half of 1972 and Green in the second half of 1972. Were you consistent with the age system? How did you define "age"?
4) You said that the peak projection system formulas were not derived from these 32 players statistics. What was the data set you used in creating your projections? How good are the projections for those players?
5) In your response to a comment regarding Soriano you stressed that these were projections of their 3yr peak. However, there were many 21 and under (and some 18 year olds) in the 1994 peak projections who might now be, what, 2830 years old? Woulndn't many of the 32 players not have completed their 3yr peak?
6) What aging patterns did you use? How did you calculate *PA?
More importantly, there seems to be a beleif that a player's peak season or seasons tells you a lot about his peak ability  namely when that peak ability occurs. I don't think this is true. In fact, I know it isn't.
I ran some theoretical calculations on the computer to determine the following:
1) If a hypothetical player had a normal aging pattern, with a peak OPS ability at around 2628, how often would his sample OPS peak at the various ages, by chance alone?
2) How much higher would the average sample peak OPS be than his "real" peak OPS, again by chance alone (a player's best year will always be much better than his ability, by definition)?
I used my own aging pattern chart, which is similar to Tango Tiger's. Here are the average RELATIVE OPS's for an average player at each age, from 22 to 34:
22 .706
23 .727
24 .740
25 .736
26 .749
27 .743
28 .739
29 .729
30 .725
31 .717
32 .697
33 .688
34 .677
As you can see, peak OPS performance occurs at around 2628 yo (it doesn't matter, for this analysis, whether this is exactly true or not).
Here are the % times that a player with 500 PA per year who played for 13 years (from age 22 to 34) peaked at the various ages:
22 4%
23 7.3
24 10.6
25 9.9
26 15.2
27 13.6
28 13.7
29 6.9
30 8.1
31 5.3
32 2.5
33 1.5
34 1.3
Remember that the above chart was generated by simulating the results of a hypothetical player over 13 seasons, with 500 PA per season. The hypothetical player had "true" stats at each age reflective of the first chart. I ran 1000 such player careers. As you can see from the above chart, only 42.5% of the time did the player's sample peak season occur at ages 2628, even though we know that his peak ability occurred during that time. In fact, almost 19% of the time, the player's peak season occurred after age 29. Remember that any peak season other than at age 26 (which is the "true" peak season for our hypothetical player) was due to fluctuation alone.
Basically, what this tells is if we assumed that our player's peak sample season was also his peak ability, we would be wrong almost 85% of the time, and very wrong (we thought it wasn't at age 2628) almost 58% of the time!!
Also, from the first chart, you can see that the average peak OPS of our player is .749 at age 26. What do you think the average peak sample OPS was? .825, 76 points higher than his true peak ability!
So when we look at a player's peak season, not only are we overwhelmingly likely to misjudge the age of his peak ability, we will also overshoot the value of his peak ability by 76 points!
What about if we look at the peak 3 years, as this study did? We should come closer to identifying his true peak ability age(s) and we shouldn't overshoot the value of his true ability by as much.
Here is the same chart as the last one, but this time each age represents the middle age of a 3year peak:
23 8.2
24 9.7
25 14.3
26 13.5
27 17.4
28 11.5
29 10.4
30 7.1
31 4.9
32 1.9
33 1.1
Since 27 is the "correct" peak age (2628), we are still "wrong" 82.6% of the time, and if we take 25, 26, and 27, as "close to" being right, we are "very wrong" almost 58% of time again!
Finally, the average 3year peak OPS from the samples is .775, which is still 32 points too high (the average "true" 3year peak OPS is .743).
The last point I want to make is that if we correlate a projection (a "peak" projection or otherwise), with a player's true peak year or years, I'm not sure, but I think we will automatically get an artificially high correlation coefficient, since by definition, a player's sample peak OPS is not randomly drawn (it is always going to be high) and is within a narrow range. So I don't think that you can fairly compare that kind of an "r" with the "r" that you get from regressing a projected OPS on a player's actual sample OPS from any given year (presumably the year you are projecting).
IOW, if you simply take a player's minor league MLE's, adjust for age by making a player look like he is age 27, and then "up that" value some more to make it look like a peak (lucky) year, I'm not sure that you wouldn't automatically get a very high r, if you correlate that number with the player's peak major league year or 3 running years, as was done in this study.
I'll have to do some work on that last point though...
I'll start with the easier ones and then hopefully tackle MGL tomorrow ;)
" Question: if you're using MLEbased data to do peak projections, why don't you use major league data? It just wasn't in the database, you're projecting solely from minor league stats on principle, or some other reason? "
What happened was that I added a few Major Leaguers to the projection data so I could see how Peak Projections worked for a couple fast rising stars (like Pujols and Soriano) as well as guys like Brad Wilkerson and Eric Hinske.
"If you're not using major league data, then I'd echo the suggestion that players with significant MLB PT be at least grouped together if not tossed entirely. Pujols  and, to pick a better example, Dunn  might not quite fit in with the 1994 group in that both are still a long ways from his peak assuming a normal career progression (not a safe bet with either of these guys, and ignoring the age question for Pujols); they're probably closer to that group than to the bulk of the 2003 list, however."
Good point, I will consider it and possibly take out the Major Leaguers.
"1) Not too important, but in the 1994 projections I believe the actual peak PA column is actual AB. For instance, Ramirez 19992001 peak had 1490 AB, not PA. And you defined peak by OPS? Or RAR, or what? 2)"
Uuuuuugh. Yes, you are correct. Good catch... The *PA for the 1994 batch is correct though, which is the important part. I defined peak by OPS.
"The OPS correlation of .65 for the 32 players seems leveraged on the two extremes on the high OPS side (Ramirez, Delgado) and the low OPS side (Goodwin, Alexander). The OPS correlation for the remaining 28 players is just below 0.36"
Interesting, I wonder how this compares with annual projection systems...Is the middle the hardest to project?
"3) The quoted age doesn't seem to be the "July 1" age. For instance, both Manny Ramirez and Shawn Green are listed as 21 years old, but Ramirez was born in the first half of 1972 and Green in the second half of 1972. Were you consistent with the age system? How did you define "age"?"
No worries, I was consistent (Used the D.O.B. and then subtracted by the 6/30/YEAR)... I believe I rounded the ages which could explain the Green/Ramirez ages.
"4) You said that the peak projection system formulas were not derived from these 32 players statistics. What was the data set you used in creating your projections? How good are the projections for those players?"
Amazingly enough, this Peak Projection formula (version 1.1 after last year's version) was not made by computing the peak projections for a set of players and then tinkering with the formulas based on how they coorelated with the real results. Rather, I used a slightly altered aging chart of Tango's as well as my own Major League equivalency formula (I derived the level factors by comparing the performance of players from one level to another level in the same year.)
"5) In your response to a comment regarding Soriano you stressed that these were projections of their 3yr peak. However, there were many 21 and under (and some 18 year olds) in the 1994 peak projections who might now be, what, 2830 years old? Woulndn't many of the 32 players not have completed their 3yr peak?"
Yes. Very true. And if anything, I think this hurt this particular sample as Todd Hollandsworth and Karim Garcia I feel will have better peak performances in another 13 years.
" 6) What aging patterns did you use? How did you calculate *PA? "
See above... *PA is the least statistically proven part of this system (IMO) though it's significance is obvious (IMO). Eyeballing the numbers I came up with a simple formula to adjust plate appearances to give you an idea of how much confidence you can have in the numbers.
Take Plate Appearances and multiply by: 1 if in the upper minors, .5 if in the lower minors, 1 if last year, 1/2 if the year before, 1/3 if the year before that...
Hope that is helpful.
You must be Registered and Logged In to post comments.
<< Back to main