THis post has more than the 75K max attachments so I had to split it up into 2 parts.
I then grouped the rest of the data into 8 groups with roughly equal number of AB each (about 114,000) for a total of ten data points for the graphs. The decision on how to do the 8 sub-groups was based on the fact that I am using Excel for quick linear regression fitting and graphing, and Excel does not do weighted fits. So I put the same number of AB in each sub group after the data was sorted by HR rate. Here is the SO/HR rate graph, with the point on the top right being for the elite hitters.
Refer to SOvsHR 2000-2006.jpg
Clearly the large grouping following Pankin?s methods (which was not aimed at looking at things like SO vs. HR rates) hides the basic structure of SO vs. HR rates. Up to about 5 HR/100 AB, there is a very nice linear increase in average SO rate and then it starts to flatten out. The average SO rate for the elite HR hitters is about 65% higher than the SO rate for the worst sluggers. But to more than double the HR rate from 5 to the elite hitters only raises the SO rate by about 15%, on average. So to answer the original question, yes indeed the SO rate is higher for the better HR hitters, but only about 5 more than a batter in the middle of the pack.
What about the fractional SO rate? Do the better HR hitters SO as a higher percentage of outs compared to other hitters? Here is the graph for this relationship.
Reger to FractionalSOvsHR 2000-20006.jpg
This graph is a little busy since I added additional information. The upper bound on the graph is the SO rate that only 15% of the hitters get above for each group. Only 15% of the hitters get below the lower bound. In other words, 70% of the batters are between the two lines. For the group of 8 data points the bounds are 2-3 standard deviations from the average. For the two end points, the bounds are roughly one standard deviation away from the mean. The light blue lines indicate the average HR and SO rate for all batters.
The fractional SO rate has almost the exact same shape as the true SO rate. It rises by a factor of 1.65 from the low end to the high end, same as before. Only thing I see a little different is that the ?zero HR? batters are above the linear fit. That is they do not follow the trend as well as for the other HR graph. I did not include this point in the linear fit. And yes, I double checked, the two graphs do have the one in a million (or more) coincidence that they have identical R^2 values to four decimal places.
I do not have enough baseball background to do more than the rudimentary interpretation of the results already given. Any additional comments or constructive criticism is welcomed.