Members: Login | Register | Feedback

Formula For Predicting ERA?
 Posted: 11 November 2007 02:13 PM [ Ignore ]

Assuming I want to plug K/9, BB/9 and GB% into a formula to predict ERA, what should I use?  Further assume I believe that pitchers have no control over babip or hr/fb.

Signature

Ask me how I won \$3K at free fantasy baseball.

 Posted: 13 November 2007 04:17 PM [ Ignore ]   [ # 1 ]

I’ll take a stab at this one…

The biggest problem you’ll have is that we really need a fly ball percentage instead of a ground ball percentage.  I’ve seen different sites mean different things by gb% so you’ve got to watch out for that but our goal is to find a fb% that represents what percentage of balls in play are fly balls.

After that point I’d try to get some park adjusted hr/fb%.  You could look at a league average number instead but park adjusted numbers would be more accurate as a fly ball pitcher in Petco is much different than a fly ball pitcher in Great American Ballpark.  Once you’ve derived the fb% from the gb% and settled on a hr/fb% you’re ready to go.

To keep the formula easy to read I’ll do it in steps rather than one massive formula.

1) Ball in Play / 9 (BIP/9) = 25.38 - k/9
I use a number I’d found somewhere of 0.18 outs per inning on the basepaths via double-plays, baserunning, etc.  That gives 2.82 outs at the plate per inning or 25.38 outs per game.
2) HR/9 = BIP/9 * fb% * hr/fb%
So let’s say we’ve got a pitcher with a k/9 of 6, a fb% of 33.3% and a hr/fb% of 11%.  That would give us about 0.72 hr/9.
3) Just use a traditional formula for hr/9, k/9, and bb/9 at this point.  I usually go with (13 * hr/9 + 3 * bb/9 - 2 * k/9) / ip + 3 for the NL and + 3.2 for the AL but whatever formula you like is fine at that point.

The key part is getting from gb% to hr/9.  You’re going to have to do some digging to find out exactly how to get from the gb% listed at the site you’re referencing to the fb% and you’ll also want to determine an appropriate hr/fb%.  I can’t give you a hard and fast rule for either of those things but the accuracy of the projection will be heavily determined by the accuracy of those two numbers given the large role that hr/9 plays in era projections.

 Posted: 13 November 2007 04:55 PM [ Ignore ]   [ # 2 ]

In general I’m going to try to keep things simple, since my goal is useful fantasy projections rather than something that’s theoretically ‘correct’.  So let’s just aim for a park-neutral HR/9 prediction, and assume that we’ll use .11*FB.

I’m using GB% defined the way its used at The Hardball Times.  I believe that its ‘ground balls as a percentage of balls in play’.  So the denominator is ground balls + fly balls + line drives, and includes hits as well as outs.

Signature

Ask me how I won \$3K at free fantasy baseball.

 Posted: 13 November 2007 04:57 PM [ Ignore ]   [ # 3 ]

Loveable Losers - Your calculation for balls in play neglects to include HITS that were balls in play.

Signature

Ask me how I won \$3K at free fantasy baseball.

 Posted: 13 November 2007 04:57 PM [ Ignore ]   [ # 4 ]

Loveable Losers - Your calculation for balls in play neglects to include HITS that were balls in play.

Signature

Ask me how I won \$3K at free fantasy baseball.

 Posted: 13 November 2007 04:57 PM [ Ignore ]   [ # 5 ]

Loveable Losers - Your calculation for balls in play neglects to include HITS that were balls in play.

Signature

Ask me how I won \$3K at free fantasy baseball.

 Posted: 13 November 2007 05:58 PM [ Ignore ]   [ # 6 ]
zoobird - 13 November 2007 04:57 PM

Loveable Losers - Your calculation for balls in play neglects to include HITS that were balls in play.

You’re correct.  My mistake on that…I’m use to using that formula to determine hits in play and didn’t think about the fly balls overall (even the hits) being something to consider.

I was ready to post a correction but what makes this problematic is that gb’s, fb’s, and ld’s don’t go for hits at the same rates so estimating the number of fly ball hits in play could be tricky.  We’re estimating the number of fly ball outs in play in my earlier set of steps so we’d just need to tack on the number of fly ball hits and we’re home free simply by tacking those on to the total I outlined in my previous post.  But that’s where the problem would be.  From what you’re saying you know the gb, fb, and ld%‘s.  I typically predict total hits in play per 9 as 3/7 * (25.38 - k/9) so that would get you the total hits in play.  But figuring out how many of those hits in play come via the fly ball would be a bit trickier.  You might be able to get there if you consider that 21.2% of fly balls go for hits, 27.6% of ground balls, and 74.3% of line drives.  Then you’d have to also weight what percentage of each the player in question gets to get to what percentage of each are THAT pitcher’s hits.  Say you’ve got a 40/40/20 split between gb/fb/ld.  That would give you 8.48% fly ball hits, 11.04% ground ball hits, and 14.86% line drive hits for a rough ratio of 1 fly ball/1.3 ground ball/1.75 line drives.  So fly balls make up about 24.69% of this player’s hits in play.

So taking our 25.38 outs in play * 40% fly ball rate gives us 10.15 fly ball outs.  A 6 k/9 would give us about 8.31 hits in play and we’d have about 2.05 fly ball hits in play.  That would give us a total number of fly balls in play of about 12.2.  Let’s assume 11.11% hr/fb ratio for the sake of easy math…that gives us 1/9 hr/fb and 8/9 fly balls in play for a ratio of 8 fly balls in play to 1 hr.  So we just divide our 12.2 fly balls in play by 8 to get 1.53 hr/9.

That 1.53 hr/9 seems high to me for a 40/40/20 split so if anyone sees a mistake in the reasoning here please pipe up and we’ll see if we can refine it further.  It could be a product though of a 40/40/20 split and a lowish 6 k/9 but I’m definitely surprised by the results even though the logic seems solid.

 Posted: 14 November 2007 01:51 PM [ Ignore ]   [ # 7 ]

1.53 hr/g seems reasonable to me.  A pitcher who only has 6 k/g and a 40% GB rate isn’t generally very good.

Signature

Ask me how I won \$3K at free fantasy baseball.

 Posted: 18 November 2007 12:41 PM [ Ignore ]   [ # 8 ]

I think I’ve identified a problem with this approach.  Actually, its also a problem with a simpler approach I’m using right now.  My current formula is:

=3.0+(3*BB-2*K+13*((0.82-gb%)*(36-K))*0.11)/9

The (0.82-gb%) gives fb% assuming all pitchers allow a ld% of .18.
The (36-K) is a crude approximation of the number of balls in play per 9 innings.

When I look at the relative rankings of the pitchers, its clear that the formula is overrating ground ball pitchers and underrating fly ball pitchers.  Webb is #1, Felix is #2, and Lowe is roughly equal with guys like Santana and Peavy.

I think the issue is that the formula does nothing to take into account the fact that ground balls have a higher BABIP than fly balls.  It also ignores the higher double play rate for ground balls, but I think the BABIP more than offsets the double play impact, and I need to do something to account for it.

Signature

Ask me how I won \$3K at free fantasy baseball.

 Posted: 22 November 2007 12:22 AM [ Ignore ]   [ # 9 ]

Here’s one that Nate Silver on BP calls Quik ERA:

QERA =(2.69 + K%*(-3.4) + BB%*3.88 + GB%*(-0.66))^2

http://www.baseballprospectus.com/article.php?articleid=5560

Silver explains as follows:

“I call this toy QuikERA (QERA), which estimates what a pitcher’s ERA should be based solely on his strikeout rate, walk rate, and GB/FB ratio. These three components—K rate, BB rate, GB/FB—stabilize very quickly, and they have the strongest predictive relationship with a pitcher’s ERA going forward. What’s more, they are not very dependent on park effects, allowing us to make reasonable comparisons of pitchers across different teams….

Note that everything ends up expressed in terms of percentages: strikeouts per opponent plate appearance, walks per opponent plate appearance, and groundballs as a percentage of all balls hit into play. In the 2006 season, Andy Pettitte, for example, had a 19.6% K rate, a 7.9% BB rate, and a 62.7% GB rate, giving him a QERA of 3.68. Note further that QERA is exponential, which is appropriate since run scoring is not linear.”