Members: Login | Register | Feedback
 
   
Predicting Run Environment from SLG, OBP
Posted: 26 June 2009 02:08 PM   [ Ignore ]

Newbie to the forum and statistical analysis at large…

Trying to come up with a way to neutralize Negro League stats, I first needed to come up with an estimated Run Environment.  Sitting on an airplane gave me a lot of time to try many methods.  I have no background in stats, so I went back to my old 8th grade algebra method of try a variety of things until you find a pattern.

I discovered that by there’s a (general) consistent relation between SLG, OBP and the runs environment for a given year with the following formula.

(square root(OBP*1000))*(square root(SLG*1000))/13.5=square root of the Run Environment.

The 13.5 is a ratio that kept springing up of the 21 leagues I had with me on the plane.  It’s not a perfect number - it went as far down as 13.1 in the 1909 NL season (where league SLG was .314) and as high as 13.8 (1921 NL).  But in seven of the 21 leagues, the number was 3.5 when rounded, and in 15 of 21 cases it was between 13.4 and 13.6.

This exhausted my entire mathematical knowledge.  If there’s a way to calculate a deviation using park factors, etc. it is beyond my skills.  The project started solely so I could estimate Negro League stats for inclusion in an online baseball league and I stumbled on something I didn’t expect.  I invite anyone to completely butcher this formula or maybe make something useful out of it.

Mark

Posted: 26 June 2009 04:41 PM   [ Ignore ]   [ # 1 ]
Avatar

Mark, you might have more luck if you post this in the Dugout (found ont he main page) or in the lounge (in the forum).  There’s much more lively.

Here’s today’s lounge:
http://www.baseballthinkfactory.org/files/forums/viewthread/1872/P300/

And today’s Dugout:
http://www.baseballthinkfactory.org/files/newsstand/discussion/the_smoky_joe_vs_big_train_dugout/

 Signature 

Everybody’s doing it…

My retirement plan

Posted: 28 June 2009 12:38 PM   [ Ignore ]   [ # 2 ]

Thank you for the advice.  I’ll solidify my findings and post them.

Mark

Posted: 29 June 2009 04:24 AM   [ Ignore ]   [ # 3 ]

(square root(OBP*1000))*(square root(SLG*1000))/13.5=square root of the Run Environment.

Square both sides:

OBP*1000*SLG*1000/182.25 = run environment
OBP*SLG*5487 = run environment

There’s some kind of mistake here since the 5487 is way too high. But basic runs created is OBP*SLG*AB and the average number of AB in a game is something like 68. So a formula like:

OBP*SLG*68 = run environment

would work.

However! The average number of AB in a game will be closely related to the OBP. Assume that AB=PA*0.9. Then assume there are 50 OBP-outs per game. (Why 50? A double play counts as a single “OBP-out” but 2 actual outs. A caught stealing, etc. is zero OBP-outs and 1 actual out. An error is 1 OBP-out and 0 actual outs. There are 54 actual outs per full 9-inning game, but there are games without a bottom of the 9th, extra-inning games, rain-shortened games, and so forth. Put it all in a blender and you get 51 OBP-outs per game for the AL so far this year. 50 is close enough.)

If X is the number of PA per game, the number of OBP-outs per game will be (1-OBP)*X. So (1-OBP)*X=50 and X=50/(1-OBP). Thus AB=45/(1-OBP).

The formula I would use is this:

[OBP/(1-OBP)]*SLG*45 = run environment

The constant 45 probably needs to be tweaked somewhat.

Posted: 29 June 2009 04:28 AM   [ Ignore ]   [ # 4 ]

It just occurred to me that there’s a fundamental reason why it makes sense to divide by 1-OBP. As OBP increases to 1, the run environment increases to infinity. Putting 1-OBP in the denominator makes the formula behave that way too.

Posted: 29 June 2009 05:56 AM   [ Ignore ]   [ # 5 ]

Fret -

Thanks for responding.  I’m still trying to wrap my head around the equation as well as this site.  I published a Google spreadsheet with findings for over 100 seasons:

http://spreadsheets.google.com/pub?key=rrMvsj-RMswC8WGZXao0s5A&output=html

On the spreadsheet I include the actual denominator in the (SqRt OBP*1000)*(SqRt SLG*1000)/x = SqRt Run Env., as well as the results and % accuracy when I used two fixed numbers (13.54 and 13.49).  Because of my “disadvantages” with math the chart deals with OBP and SLG already multiplied by 1000.  I also included your formula with a twist - 4.5 seemed to run high, but when I used 3.6 things got a bit more accurate.  Again, the old trial and error method which I could never be broken of.

fret - 29 June 2009 04:24 AM

(square root(OBP*1000))*(square root(SLG*1000))/13.5=square root of the Run Environment.

Square both sides:

OBP*1000*SLG*1000/182.25 = run environment
OBP*SLG*5487 = run environment

There’s some kind of mistake here since the 5487 is way too high. But basic runs created is OBP*SLG*AB and the average number of AB in a game is something like 68. So a formula like:

OBP*SLG*68 = run environment

would work.

I’m not sure why 5487 is too high a number as it represents a constant that kept coming up in the formula.  It took me awhile to remember some Algebra - I appreciate you leaving your steps.

However! The average number of AB in a game will be closely related to the OBP. Assume that AB=PA*0.9. Then assume there are 50 OBP-outs per game. (Why 50? A double play counts as a single “OBP-out” but 2 actual outs. A caught stealing, etc. is zero OBP-outs and 1 actual out. An error is 1 OBP-out and 0 actual outs. There are 54 actual outs per full 9-inning game, but there are games without a bottom of the 9th, extra-inning games, rain-shortened games, and so forth. Put it all in a blender and you get 51 OBP-outs per game for the AL so far this year. 50 is close enough.)

If X is the number of PA per game, the number of OBP-outs per game will be (1-OBP)*X. So (1-OBP)*X=50 and X=50/(1-OBP). Thus AB=45/(1-OBP).

The formula I would use is this:

[OBP/(1-OBP)]*SLG*45 = run environment

The constant 45 probably needs to be tweaked somewhat.

I’m following your logic for the most part - and it’s nothing on your end, just me entering a new field of thought.  I’m guessing the number has to be adjusted for the deadball era on an almost different scale because of fielding and stolen bases, which may explain why the early years come up short.  1994-95 are the largest percent differences on both systems.

I’ll continue analyzing your method and see where the difference lies between 4.5 and 3.5 and see how that number will specifically relate to AB and all the factors we discussed.  Of course, this makes my task of neutralizing John Beckwith and George Scales all the more difficult as league totals are lacking for most of this.  I had to fudge league OBP for my estimates, though I did it somewhat fairly.  At least, scientifically enough to run a sim.

Mark

Posted: 15 October 2009 04:49 AM   [ Ignore ]   [ # 6 ]

Mark, you might have more luck if you post this in the Dugout (found ont he main page) or in the lounge (in the forum).  There’s much more lively.

Here’s today’s lounge:
http://www.baseballthinkfactory.org/files/forums/viewthread/1872/P300/

And today’s Dugout:
http://www.baseballthinkfactory.org/files/newsstand/discussion/the_smoky_joe_vs_big_train_dugout/

This is interesting. I never thought that it would be like this. I rally cant imagine from any point of view.


foxdrg
Simulation prêt