THis post has more than the 75K max attachments so I had to split it up into 2 parts.
During a televised game a few weeks ago, there was a graphic of the highest strike out rate players showing how most of them hit a lot of home runs. The talking heads commented that higher power hitters usually have higher strike out rates. This seemed a pretty obvious statement just looking at the table, but I also wondered if this was not just ?convenient memory?? That is, I am of the opinion that most baseball fans tend to remember the outliers in a broad distribution of results much more than typical results, distorting their impression of the underlying trends. I think they remember the ?freakish? players much more than the middle of the pack. So I decided to look for correlation between SO rate and HR rate to see how strong the correlation is, and if correlated, how fast does the SO rate increase?
Rather than re-invent the wheel, I decided to follow Mark Pankin?s general approach to data selection and averaging used in his ?Subtle Aspects of the Game? presentation at SABR 26 conference. That is, I took all the plate appearance data for a specific set of years, removed all player seasons with less than 200 AB, and then grouped them according to the criteria of interest. I decided to limit myself to 21st century data since that seemed a catchy break point. My data source for the PA is Sean Lahman?s ?Baseball Archive? found at http://baseball1.com/content/view/57/82/. There are about 1.08 million palate appearances bats in this time period after selecting for at least 200 plate appearances, Pankin?s presentation can be found at http://www.pankin.com/markov/. The data I used covered the seasons 2000-2006, Pankin?s analysis covered the years 1984-1992. Pankin used about 1.4 million plate appearances.
I thought it might also be fun to look at data for periods about two decades apart, so I first followed Pankin?s exact sorting/grouping criteria for the 21at century data. I divided the season data into the same three slugging percentage groups that Pankin used, namely <=.367 SLG, >= .427 SLG, and in between. I was able to retrieve the HR and SO data from Pankin?s presentation for each of these groups, so I then calculated the average SO/100AB and HR/100 AB for each SLG group for each time period. I also calculated the average values for some basic batting stats for a fun comparison as well. Here are the averages for the entire populations. The fractional SO rate is the percent of outs that were SOs (I do not know if that is a standard stat, but the group on the forum I frequent thought it would be interesting to see as it relates to HR rates)
See Comparison Table2.JPG
With such large number of data points in each average, the error in the averages is less than 1%, so all the increases in stats are significant. Every stat has gone up, with the HR rate going up a whopping 27% between the two time periods. I don?t think there are any real surprises in this table.
Here is the direct comparison of average SO rate as a function of average HR rate per 100 AB for the two time periods using Pankin?s groupings. . The straight line for the 2000+ data is a linear fit to the data, and the curve for the older data is just a smooth line drawn to intersect the points.
The graph above illustrates both the increase in SO rate for each slugging group, as well as the increase in HR rate for the lower and upper SLG group over time. And of course it shows the increase in SO rate as the HR increases for both time periods. At this rather broad level of SLG grouping, the SO rate increase is really pretty small, less than 3 for the average SO/100 AB from the lowest SLG group to the highest for both time periods.
To do the apples to apples comparison above between time periods I used Pankin?s rather course SLG grouping. However, I felt the grouping may to be too course to reveal underlying trends. For instance, the actual HR rates range from 0 to more than 10 HR/100AB, but the large SLG grouping averages out these outer distribution HR rates to above 1 and bellow 5. I therefore took the 2000-2006 data, and sorted it by HR rates in the following manner. To represent the close to zero HR rate, I grouped all data lines with less than 0.5 home runs/100 AB to get close to zero HR rate. This group has about 35,000 AB. I also grouped the truly elite HR years (at least 10 HR per 100 at bats for a full season), and 8 groups for the remaining batters. There are only 10 data points in the elite group, 5 years for Bonds, 2 for McGwire, 1 each of Sosa, Thome, and Howard. This group had about 4,200 AB.
See Part 2, SOvsHR 2000-2006.jpg