Interesting. Thanks for taking the time out to look into that.
A few questions.
First, I’d have to wonder how much park factors play a part in those numbers. Not sure how easy that would be to factor out of the numbers. I would think they could potentially skew the data.
Secondly, would it be better to look at HR per batter faced? Just looking at HR/9 kind of gives the numbers a team defense skew doesn’t it? Good defenses get more outs and therefore the batters faced in the inning would be less. Also it probably would favor good pitchers over bad ones (good pitchers get out of the inning quicker).
Thirdly, how much randomness is in this number? BIPA seems to have a lot of randomness. I think “Baseball Between the Numbers” had something like 40% is randomness. I’m not even sure how to figure out that number.
Does anyone keep track of gb/fb ratio compared to actual hits instead of outs? Maybe I’m wrong but I think the espn.com numbers are tracked on outs only. Which would introduce some more defensive skew.
After these long hours at work end, I’m going to have to sit down and figure out how I can run some of these numbers myself.
Park Factors—I think that with 91 pitchers (averaging just very slightly over 3 pitchers per home park), and how everybody is on the road 50% of the time, that the park factors are virtually cancelled out.
HR/batter faced—Yes, this would be better. Again, with 91 pitchers involved, I think any outliers are negated here. However, it’s a silly complication that could be avoided easily.
Randomness—I know that line drives become hits 75% of the time, but I don’t know the rates for FB and GB, other than FBs are incredibly unsuccessful for the batter. FBs are either home runs, outs, or the 1-in-500 error. I have no idea about the GB hit/no hit ratio. Over a smaller sample size, this would wreak havoc over the regression, and even 91 is getting close to skew-vulnerable, but again, I think the large sample size helps. The 80+ innings also would, in theory, limit the luck involved from pitcher to pitcher, let alone over the whole thing.
However, to more closely determine GB/FB predicting HR/batter on a pitcher-to-pitcher basis, these problems need to be addressed. The PFs are pretty easily adressed, HR/batter faced should be easy enough to deal with (just make it actual batters faced rather than 27), and randomness can be addressed with larger sample sizes. Unfortunately, I haven’t attended any college math classes yet (I just finished graduating high school, where I didn’t pay attention to calculus), so I can’t say what simple size would be needed to effectively rule out luck.
Does Hardball Times’ GB/FB/LD stuff go on hits or outs or both? I would guess both, but I could be wrong.