## Monday, March 29, 2004

March 29, 2004

More on baseball’s historical talent levels.

Darwin Loves Baseball

My father is a biologist and a college professor.  I grew up catching snakes and grading papers.  Evolution occurs.  Species adapt to their surroundings and the fittest survive.  Man has gotten better since the day God created him (that?s a joke, kids, leave it alone).

Major league baseball players have to be getting better.  They have to.  Don?t they?

As we take a further look at Stephen Jay Gould?s work and the BBBA work of Ken Adams, we see that there isn?t a dramatic compression of variation.  Certainly not like the dramatic changes we see in nearly every other athletic endeavor.  Baseball isn?t track and field or swimming.  Let?s look further at the data we?re unraveling.

One aspect of the data that I meant to include was the correlation coefficients of the mean with the standard deviations:

 NL BA r NL OBP r NL SLG r NL ISO r 0.263 0.027 0.888 0.969

It is clear that as SLG and ISO go up, so does the variation.

After the last description of the change of talent in the major leagues, several people, Zak and JCB and including Mark Field prior to publication, said I should divide the standard deviations by the average to adjust for league offense.  I wasn?t really getting that for the longest time, but the discussion by people smarter than I convinced me to look at it.

Here?s what we see in the trend line equations:

NL BA SD/Mean: y = 0.000003x + 0.1295

NL OBP SD/Mean: y = -0.0004x + 0.1302

NL SLG SD/Mean: y = 0.001x + 0.1795

NL ISO SD/Mean: y = 0.0006x + 0.4213

Those are some flat trend lines.  Only on-base percentage trends downward and all of the trends are negligible.  The batting average and on-base percentage have very little play in them either.  Slugging has fluctuated, with the present era (1993-2002) exceeded by the variation during the extremely low offense era of the 1960s.  Today?s isolated power, while down, isn?t at all-time lows, and closely matches the era of low offense of the immediately preceding era (1983-1992).

Adjusting our chart to reflect the change in average, by dividing the standard deviation by the mean doesn?t appear to give the expected result ? the adjustment flattens the variation even more.

Playing off that, Mark and FJM both brought up the use of a better metric ? OPS+.  As I mentioned in response, that?ll take some time and data input.  Since I already have the fantastic Lahman database in my broken out eras, I added a column for the park factor, which I will again take from the new, wonderful, The Baseball Encyclopedia, 2004 ed. Palmer, Gillette, et al.  As I work the data, which will take some time, I am going to use 1.8 times OBP plus SLG.  Then multiply by the park factor.  In addition to standard deviation plots, for each era, I will report a count of players that are more than two and three standard deviations from the average.  With any luck this will be next week?s report on this study.

I said something else last week:

I said, "Pedro Martinez is average size for a 1920s player."

Brian, who I am not picking on because I?m sure many people went "cough*****cough", said, "I think Pedro would’ve towered over most players back then."

Okay, I?m a troll.  I threw that out as bait.  Werr uses reader mail, I use old-fashioned bait; by throwing out something that looks like it can?t possibly be true, when I have the research already in hand.  I apologize for the set-up ? I had to work in my player-size stuff somehow.  I look at it more like a grifting bar bet.

The data:

 Debut Period Avg Ht Avg Wt 00-04 70.50 173 05-09 70.74 175 10-14 70.99 172 15-19 70.90 171 20-24 71.08 173 25-29 71.16 174 30-34 71.73 178 35-39 71.88 180 40-44 72.19 182 45-49 71.97 183 50-54 72.34 185 55-59 72.63 187 60-64 72.84 188 65-69 72.84 189 70-74 73.00 188 75-79 73.18 190 80-84 73.32 191 85-89 73.36 192 90-94 73.46 193 95-99 73.44 192

Pedro is listed at 71 inches in height, probably wearing his spikes, if it is like every other profession where height is an issue with public perception.  Athletes are listed tall nowadays.  Was that always the case?  I don?t know, but I?m sure they are today.

Does Pedro weigh more than his listed 170 pounds?  He?s definitely added some weight in the last few years.  He isn?t nearly the waif he once was.  My handy-dandy Total Baseball IV (data through 1994) lists Pedro at 71 inches and a cool 150 pounds.

Since baseball integrated, players are about an inch taller and only five pounds heavier.  We do know that some of the weights are, well, approximate, but it doesn?t appear that players are much bigger.  At least not significantly.

The next argument, I?m sure, is that guys didn?t lift weights back then.  You are right ? they worked on farms and whatnot, with "whatnot" being selling insurance.  Players probably are a little stronger.  Nonetheless, I?ll come back to this: is muscle building that helpful on both sides of the ball?  It doesn?t appear to be the workout regimen for the best pitchers.  Subjective debates of this type don?t go anywhere, so I?m not making that one.

As a side socio-political debate for the Sam Hutcheson fans: is the general population of the Dominican Republic and the general health of said population bigger physically and/or better than the United States in the early part of the 20th century?  Discuss.

Pedro Martinez is roughly the size of the average major league ball player from 1915 to 1929.  I?m very comfortable thinking Walter Johnson (73 inches, 200 pounds) could throw every bit as hard.  I?d like to hear an argument that he couldn?t.

Chris Dial Posted: March 29, 2004 at 05:00 AM | 11 comment(s) Login to Bookmark
1. Chris Dial Posted: March 29, 2004 at 03:21 AM (#615438)
David,
I have a few breakdowns in changes in ht and wt over time - basically how they changed after 5 years, 10 years etc. Anything in particluar in your mind? Are you looking for a correlation in anything?

Next up, I'll publish the ht/wt graphs. Hopefully I'll be done with the OPS+ work then too.

Tim,
I haven't looked at the breakdown. I'd probably disagree. I suspect pitchers have often been the taller players - leverage for throwing and all that. Walter Johnson, Babe Ruth, etc. I'll see how easy it is for me to break that out and try to report what I find - but this is back-burnered relative to the OPS+ study. If you feel the need to nag me through email periodically, you can.

Yes, I think your second paragraph has some validity, but that flies in the face of the premise of "It is harder to dominate today". That wouldn't make it harder to dominate. Essentially, we seem to be seeing that it is not harder for Bonds to post 800+ SLG today than when Ruth played.

Agnes,
yes, that's possible. Remember the general position is that it is harder to dominate because the variance is decreasing - but is it?
2.  Posted: March 29, 2004 at 03:21 AM (#615442)
Just to follow up on Tim's point: I'm much to busy (translation: lazy) to do this myself, but having watched the game for over 50 years, it does seem that there are a lot more players, especially pitchers, who are over 6'4" than when I first paid attention. This seems to fly in the face of the mere 1" or so difference in average height which Chris reports . I should really get out an old Register or Dope Book and start counting, just to see how far off my memory is. Taking the 50's, I can recall exactly three players who stood 6' 6" or more--Ewell Blackwell, Gene Conley, and Frank Sullivan. How many others were there?
3. PhillyBooster Posted: March 30, 2004 at 03:22 AM (#615458)
Is there a statistic difference between the results you would see in a "no one gets any better" world and "players are getting better, but hitters and pitchers and defenders are getting better at the same rate" world?

It seems to me that standard deviations decrease in racing because runners are getting better. But there is no "defense" in running.

Isn't this analysis like concluding that France's military hasn't improved in the last 1000 years because it's Victory per War ratio (and standard deviation) has remained constant?
4. Mark Field Posted: March 30, 2004 at 03:22 AM (#615466)
Great work, Chris.

I've always been skeptical of listed player heights and weights. That said, the trend you show is pretty consistent with national data I have. The average American male has increased in height about 3" since 1900, and you show the same increase for the players (albeit at a higher level). I believe that. I'm a little more skeptical on listed weights, but the willingness to list true weight probably hasn't changed much over time, so the trend shown may be right.

I have this image that pitchers in the early years would have been larger. Is your data broken down in such a way that you can confirm this?
5. GregD Posted: March 30, 2004 at 03:22 AM (#615476)
Interesting stuff. I also am interested in how the graph of "everybody gets better at the same rate" would look. What little I understand about this discussion in Gould suggests that he assumes that high-talent players might always be successful. Therefore measuring the distance between the highest-talent players and the norm gives us an indicator of the overall level of quality. That basically makes sense, intuitively, but I do wonder what happens if either 1) high-talent players all left to play in the NFL and NBA and so some of the outliers disappeared or 2) high-talent players also get lifted up by the rising tide. Maybe I'm missing the point here, though.

On the height/weight front, there's an interesting piece in this week's New Yorker by Burkhard Bilger called "The Height Gap" which looks at explanations for Europeans' continued growth and Americans' flattening out. The scientists he profiles use height as an indicator of broad societal nutrition levels, and they argue that the U.S.'s lack of height growth is a product not of immigration but of some combination of the growth of economic inequity or of especially bad nutrition. The article opens and closes with a discussion of the enormous and apparently continuing increases in height among the Dutch.

I am, again, not entirely sure if this is relevant, but it would be interesting to know a bit more about how baseball players' height and weight compared to national averages, as a way of measuring some form of physical difference.
6. Greg Pope Posted: March 31, 2004 at 03:22 AM (#615497)
But pitching doesn't really depend on size or strength as much as it does on mechanics and technique.

Is this really true? I know for sure that no matter how perfect my mechanics are I'm not going to throw more than about 50 MPH.

On the MLB level, there are a lot of guys with good mechanics (Prior) and a lot of guys with bad mechanics (Weaver, Appier). Now I've heard that players with bad mechanics may break down, but I have not heard it said that if they could only improve their mechanics they could get more MPH on their fastball or more break on their curve.

In fact, I've never heard of a pitcher saying "Joe Pitching coach worked with me on my mechanics and I gained 5 MPH on my fastball." I think that size and strength are the primary thing that pitching ability depends on. Now it may not true be that you can work out and improve your fastball either. I also haven't heard a player say "I hit the weights this off-season and I gained 5 MPH on my fastball."
7. Old Matt Posted: March 31, 2004 at 03:23 AM (#615508)
I've always been under the idea that pitchers are taller on average than hitters (heavier too? I don't know). I guess this has been asked before, but is it possible to break out those ht/wt averages by hitter vs. pitcher?
8. Chris Dial Posted: April 01, 2004 at 03:23 AM (#615560)
Short answer - pitchers have always been taller than position players. By about an inch - which is a ton. They have been heavier by about 3-4 pounds too. This in 5 year debut segments from 1900.

Just some quick parts: since the 1975 debut season in 5 year segments, the average pitcher height has been: 74.09, 74.27, 74.02, 74.21, 74.18. The average weights over that period range from 194-196.

In reality, position players are gaining on the pitchers, and have been since the 1975 debut season.

One aspect to this "slow growth" is the influx of other nationalities. I don't have the data in front of me, but I would bet \$10,000 that the US average man size is larger than that of the Caribbean nations and South America (baseball nations) and the Far East.

Why pitching is technique and mechanics:
It's that and literally a gift. The body's ability to torque one's elbow in the manner required to throw a baseball 95 mph is genetic (or surgical). It's freakish. To be able to do it 200 times over a 3 hour period and then do it again in less than a week is even freakier. Mechanics and technique allow you to bounce back a little better, but we're talking ligaments, not muscle.

Curiously, with the success of Tommy John surgery, and the apparent increase in velocity resulting from it, why isn't that considered "cheating"? It certainly "cheats" old-timers' records.
9. Mark Field Posted: April 05, 2004 at 03:27 AM (#615773)
Chris:

I did some tests with the Lahman data that I wanted to report.

First, I found a way to separate out the pitchers. If you merge the pitchers and hitters databases, you can then delete anyone who appears twice. Just be careful not to delete someone like Ruth who pitches an inning or so in a season.

Second, I wanted to test what happened to the variance with different cutoffs for ABs. I tried 2 seasons, 1987 and 1922. In each case, the variance was about 15-20% higher when the AB cutoff was raised from 20 to 150. Twenty may be too low, but it does look like something lower than 150 may be necessary to get the full spectrum.
10. Chris Dial Posted: April 06, 2004 at 03:27 AM (#615794)
David,
no, but I have seen the commercials. It was something about having an artificial eye. Which is a good point - is lasik surgery cheating?

FJM,
you'll have to help. I'm a chemist, not a statistician. I'll check that when I can.

Mark,
I figured that out later. Unfortunately (and I actually did 10 ABs first) I have done too much work to dump it now. I don't think that we'll see too much difference. Those are ome pretty big sample sizes, and samples that I think accurately reflect the players that played - I don't think that it has value to demonstrate that the cup of coffee players from 2000 are better than the cup of coffee players from 1930. At 150 PAs (not ABs) - which is the high end, by the 1990s, I was at 100 PAs - you get the everyday players and the bench players. Maybe using 50 PAs would be slightly better, but I think the several hundred in each group - in both leagues - indicates plenty of stability.
11. Mark Field Posted: April 12, 2004 at 03:28 AM (#615822)
Chris, I'm still thinking about this problem, which may be unfortunate because I keep coming up with new issues. This one is partly your fault since you made a point about defense and glove technology which I've been mulling for a while.

Our basic premise in this effort is that we can adequately measure "performance", and then use the variation in performance to asess whether competition is increasing (or not). As a proxy for "performance", we've been using hitting metrics (OBP, SLG, ISO) because those correlate generally well with runs. Here's my problem:

I went back and looked at defense in the early part of the century. I did not realize how bad defenses were back then. Teams averaged about 330 errors/162 in the first pentade, compared to about 110 or so now. This means that today about 8% of runs score due to errors, but back then about 32%(!!) did.

The problem with this is that the number of errors weakens the connection between a hitter's performance and the overall performance of runs scored (which is what we need). In order to solve this problem, we'd need a metric which included both offense and fielding (and possibly even baserunning). We don't have one.

If we don't solve this problem, I think we risk measuring only the change in style of play, from one in which BIP made up the largest part of offense to one in which TTO do.

