Tuesday, February 07, 2023

The Boomers Were Right: Batting Average is REALLY Important

Every incremental point of batting average (i.e. going from .270 to .271) is worth 0.224 more runs per 600 plate appearances, followed by 0.2 runs per point of ISO and 0.156 runs per 600 PA for every additional point of BB%. By this analysis, the best way to improve your run scoring is to boost your batting average. It is important to note, that all 3 components had extremely low P-values, indicating they were all very statistically significant in predicting runs, as one would expect.

Looking at this another way, if you are trading batting average for ISO (a typical trade-off) you want to make sure you gain at least 10% more ISO points than AVG points. Hitting .300 + .150 ISO is slightly better than a .250 average and a .200 ISO, despite both of those being equivalent from a slugging standpoint.

I want to stress this point some more. Batting average is REALLY IMPORTANT. Saying that we have on base percentage and slugging, and therefore we don’t need batting average, ignores the fact that we are lumping batting average into their numbers (with a 50% weight for each). Naturally, we don’t need batting average when we have batting average + walk percentage. However, batting average, when separated out, is meaningful. It’s statistically significant. It matters. We absolutely should be showing it as a key stat when we look at a batter. The reason it boosted prediction slightly in Ben’s model, was that the weights are off in OBP and Slugging, so including it again allows the model to re-weight the average component.

RoyalsRetro (AG#1F) Posted: February 07, 2023 at 12:24 AM | 24 comment(s) Login to Bookmark
   1. Walt Davis Posted: February 07, 2023 at 05:35 AM (#6116006)
   2. McCoy Posted: February 07, 2023 at 06:03 AM (#6116007)
Seems to me he's saying if we fix batting average it becomes really important. Isn't that true of all stats if we "fix" them?
   3. SoSH U at work Posted: February 07, 2023 at 06:50 AM (#6116008)

It seems like I've heard this argument before. I can't put my finger on where.
   4. drdr Posted: February 07, 2023 at 07:27 AM (#6116009)
I don't see any revelation or anything controversial in this or in Ben's articles. Two singles are better than one double. It's better to reach the base on a hit, even a single, than on a walk. However, OBP and SLG contain two informations, not just one, like AVG, so if you have OBP and SLG, AVG is meaningless.
   5. McCoy Posted: February 07, 2023 at 07:59 AM (#6116010)
I think that what they're saying is that average is not meaningless even if you know OBP and SLG. Which is largely true.
   6. TomH Posted: February 07, 2023 at 08:29 AM (#6116011)
On the one hand, the article admts that if you know OBP and SLG, batting avg is not important. But then it says "Batting average is really important, arguably more important to run scoring than any other element in a batter’s traditional triple slash line.". Which is both inconsistent and incorrect.

It later qualifies that it means AVG is more impt than walk % or ISO, but yeah, who doesn't agree with that?

If you input AVG, OBP and SLG into team runs scored over a large # of team-seasons, the regression analysis will show that OBP and SLG are crucial, and AVG comes out as largely irrelevant.

Order of importance goes something like
OPS > SLG > OBP > AVG > ISO > walk%

   7. The Duke Posted: February 07, 2023 at 10:10 AM (#6116022)
I simply thought that the message was that because avg is a huge component part of all three numbers in the slash line that you'd be better off looking at the individual components.

He also simplifies OBP to bb% + avg which isn't actually true.

Baseball savant has a similar issue on their leaderboard where you do really well on a bunch of metrics if you hit the ball hard
   8. Bruce Chen's Huge Panamanian Robot Posted: February 07, 2023 at 01:02 PM (#6116037)
No it's not.
   9. John Reynard Posted: February 07, 2023 at 02:47 PM (#6116046)
There are game situations where batting average is 95%+ of what matters (man on 3rd, 2 outs, bottom of 9th, tied or 1 run lead).

Otherwise the primary significance of batting average is that it is part of both OBP and SLG and so, yes, its relevant because of that.
   10. Walt Davis Posted: February 07, 2023 at 02:52 PM (#6116047)
If you input AVG, OBP and SLG into team runs scored over a large # of team-seasons, the regression analysis will show that OBP and SLG are crucial, and AVG comes out as largely irrelevant.

Because OBP and SLG are mostly batting average. The model you propose obscures the causal structure of what's actually going on. The OBP, SLG, BA model produces an interpretation of BAs "relevance" that is "after we control for BA's role in OBP and we control for BA's role in SLG, then BA is irrelevant."

Runs = a0 + a1*OBP + a2*SLG + a3*BA + ...

with the estimate of a3 coming out essentially zero. But we can rewrite the above as:

Runs = a0 + a1*(BA + "BB%") + a2*(BA + ISO) + a3*BA + ....

where "BB%" is (OBP - BA) -- not our fault they got give different denominators -- then rearrange terms:

Runs = a0 + (a1 + a2 + a3)*BA + a1*BB% + a2*ISO + ...

And the relevance of BA is obvious and it doesn't matter that a3 is zero.

To the extent that this article adds to this, it's that (whether he realizes it or not) he's removed the unnecessary restriction on a1 and a2 being equal across the components. Lord only knows what scale he's on but he's showing that the BA coefficient is much lower than the sum of the BB% and ISO componenets. That means there's an additional flaw in the OBP + SLG + BA model because that model forces the restriction.

I do find it hard to believe that team ISO correlates over .8 with R/PA. If that's true, neither BA nor BB% matter very much.
   11. Walt Davis Posted: February 07, 2023 at 03:14 PM (#6116049)
Otherwise the primary significance of batting average is that it is part of both OBP and SLG and so, yes, its relevant because of that.

This puts the causal ordering completely backwards. Surely no later than ch 2 in any book on causal modeling is that if you want to understand the causal effect of X on Y3, you do not include conseqeuences of X (Y1, Y2) in the model.

In the most obvious terms I can think of ... OPS = OBP + SLG = (BA + "BB%") + (BA + ISO) = 2*BA + "BB%" + ISO
   12. Walt Davis Posted: February 07, 2023 at 04:02 PM (#6116053)
For the stattier nerds:

VAR(OPS) = 4*VAR(BA) + VAR(BB%) + VAR(ISO) + 4*COV(BA,BB%) + 4*COV(BA,ISO) + 2*COV(BB%,ISO)

Now, in theory, the VAR(BA) could be zero -- every team has the same BA at which point variation in OPS is purely a function of variation of BB% and ISO and their covariation. More realistically, variation in BA may be low while variation in the other componenets (ISO particularly maybe) could be high. But since VAR(BA) gets multiplied by 4, that variation in ISO or BB% needs to be much greater to be the most important component.

The covariation terms capture to what extent different componenents tend to be both be high or one is high and the other low. Oafball suggests ISO and BA are negatively correlated. The role of contact in BA and BB% suggests they are also negatively correlated. I'm not sure there's really any reason to think BB% and ISO are correlated but certainly many oafs feature high walk rates (but not the Javy Baez clique). But covariance terms are generally small so they probably aren't contributing that much to the variation in OPS.

But of course we are interested in runs, not OPS. We could start with a simple model of ("xRuns" being "expected" or "predicted" runs to get rid of the error term):

xRuns = a0 + a1*OPS

If that model is correct we can expand to:

xRuns = a0 + a1*OBP + a1*SLG = a0 + 2a1*BA + a1*BB% + a1*ISO

That's a testable assumption although, as far as I know, only the first of those substitute models was tested. And that test failed leading to the OBP + SLG model (ceofficient now b's to make clear they are not the same as above):

xRuns = b0 + b1*OBP + b2*SLG

which we know can be rewritten as:

xRuns = b0 + (b1 + b2)*BA + b1*BB% + b2*ISO

That also contains a testable assumption which is to say it can be compared with the following model:

xRuns = c0 + c1*BA + c2*BB% + c3*ISO

and test the assumption that c1 = c2 + c3.

The results reported in the paper are not detailed enough to allow a full and proper test of that assumption but it's pretty clear from what's reported the test of that assumption easily fails meaning that a model that separates BA, BB and ISO performs better than one that uses just OBP and SLG (and adding BA won't matter).

Based on these results, the reason it performs better is not so much because of the importance of BA -- as we know, the effects of BA on run-scoring are already pretty fully absorbed in the OBP + SLG model. It's mainly because the effects of BB% and ISO are sufficiently different on a per-unit (or standardized) basis from the effects of BA -- i.e c1 does not equal c2 + c3. In the results reported here, c2 + c3 is much bigger than c1.

From a practical GM's perspective ... that's where the covariances become important to understand. The BA/BB/ISO model still isn't a useful causal structure for investigating "what happens if ...?" It's useful for "what happens if I can increase BA (while holding the rest constant)?" But you can't directly manipulate team BA, certainly not in isolation. There's a causal model of some sort for BA, a causal model that is not fully independent of BB% and ISO.

The easiest way to increase BA is to increase contact. To the extent you turn Ks into contact, that's strictly a win -- you've turned an out into a potential non-out. But of course you can't just turn Ks into contact, a contact approach is also going to reduce walks and ISO. Walks because some PA that would have ended in a walk will now end in contact on an earlier pitch; ISO because presumably increasing contact means swinging at more hittable but not well-hittable pitches. So it's obviously a balancing act, are we gaining more from increased contact than we are losing in walks and HRs? But the point is that behind this is a set of dependent models along the lines of (BA, BB%, SLG) = f(contact + other stuff).

I suppose that's more of an individual batter's perspective. From the team standpoint, it's easy enough to take the xR = BA + BB% + ISO model and swap out Kyle Schwarber and swap in Luis Arraez and see what happens. So we can argue that the GM doesn't need to know the full causal structure, they just need the best predictive model available (with some tradeoff for ease of use but that can be left to the computer nerds). These results suggest xR = BA + BB% + ISO outperforms xR = OBP + SLG (which is not surprising since it can't actually performe worse). But it's possible there's a model out there with contact%, GB%, HH%, etc. (i.e. components of BA, BB%, ISO) that would do better while also pinpointing things for the coaches to work on improving.
   13. SoSH U at work Posted: February 07, 2023 at 04:16 PM (#6116056)
For the stattier nerds:

It gets stattier?
   14. Eric J can SABER all he wants to Posted: February 07, 2023 at 05:04 PM (#6116058)
I do find it hard to believe that team ISO correlates over .8 with R/PA. If that's true, neither BA nor BB% matter very much.

I would expect that ISO itself has a bit of causal relationship with BA, especially if you're looking at things on a team level which will limit how extreme the samples are. You can't get extra bases on hits if you aren't getting hits.
   15. Howie Menckel Posted: February 07, 2023 at 07:21 PM (#6116066)
ah, for some reason this takes me back a half-century.

even as kids, when your Little League manager immediately starts yelling - after you take the first pitch of the season for a ball - "C'MON, NOW, A WALK'S AS GOOD AS A HIT!" that you may well have landed in the dumpster your manager uses to identify you as a lost cause if you foolishly swing. At anything.

   16. . Posted: February 08, 2023 at 08:36 AM (#6116086)
It's certainly normal that a younger generation would distinguish itself from previous generations in all manner of ways, but it's at least a tiny bit weird that one of the vehicles this generation uses to do that, and quite insistently, is that previous generations "analyzed baseball wrong."

I suppose you can squint and see the whole thing about baseball "analysis" being a subset of "criticism," and certainly different schools of art, film, literary criticism rise and fall and ebb and flow with the passage of time -- but that's a mighty big stretch.
   17. BDC Posted: February 08, 2023 at 12:39 PM (#6116102)
I would like to think that a lot of the scorn directed at batting average over the years has been of the kinds "Juan Pierre hit .308 but with no home runs and hardly any walks," or "Darin Erstad's average went up 100 points but his BABip went up 90," or "Luis Castillo hit .308 but the league hit .271." Was there anybody saying "Albert Pujols hit .330, that's meaningless?" :)
   18. ERROR---Jolly Old St. Nick Posted: February 08, 2023 at 03:48 PM (#6116113)
There are game situations where batting average is 95%+ of what matters (man on 3rd, 2 outs, bottom of 9th, tied or 1 run lead).

And while a walk with the bases empty is of equal value to a single, obviously a single that advances a runner from 1st to 3rd is more valuable than a walk. Is there a stat that measures that extra value?

   19. Never Give an Inge (Dave) Posted: February 09, 2023 at 12:58 AM (#6116183)
wOBA and the various stats that are derived from that are based on linear weights. There’s a coefficient on unintentional BB of 0.69 and on singles at 0.89, to give you a sense of the relative values.

I’m sure WAR does something similar.
   20. ERROR---Jolly Old St. Nick Posted: February 09, 2023 at 04:41 PM (#6116350)
Thanks, Dave, though what I was really hoping to find would be something simpler, even though it's clumsy to write out: Take the number of bases advanced on a single by runners on 1st and / or 2nd, add the number of bases advanced on a non-ground rule double by a runner on 1st, and divide it by the total number of singles and doubles. You might call this stat "Runners Advanced Average" (RAA), and IMO it would be a nice complement to traditional RBI, even though when a single scored a runner from 2nd there'd be some overlap.

About the only glitch you'd have to account for would be for a double that didn't score a runner from 2nd because the runner had to hold up to see if the ball was going to fall safely. It'd be a judgment call, but I'd be inclined to award the batter 2 Bases Advanced, since the runner not scoring wasn't the batter's fault.

One value of this sort of number would be that it would reward singles that get through the infield, as opposed to infield singles that only advance runner(s) a single base. It would also reward "gap" doubles, as opposed to doubles that are hit down the LF or RF line and cut off by the corner outfielder before reaching the wall. It's much harder for the latter sort of double to score a runner from 1st.

To take an extreme example, imagine if a batter comes up 50 times with a runner on first, and no other runners on base.

Batter A hits 40 singles that move the runner to 3rd, and 10 doubles that score the runner. So (40 x 2) + (10 x 3) = 110 Bases Advanced

Batter B hits 40 singles that only move the runner to 2nd, and 10 non-ground rule doubles that only move the runner to third. So (40 x 1) + (10 x 2) = 60 Bases Advanced.

And by further contrast,

Batter C walks 50 times, so 50 x 1 = 50 Bases Advanced.

You can see the differences in value, though obviously this is an extreme case.
   21. Jaack Posted: February 09, 2023 at 09:26 PM (#6116368)
For something simple, bbref has some stats for advancing baserunners buried on the Advanced stats page under Situational Batting. I've never looked too hard at what all they have there, but it's where the raw counts on that type of stuff exists. I'm sure you could use that data to create something RBI-esque, but I don't think that specific stat is published anywhere.

For a more comprehensive look, there's RE24. With RE24, a single that moves a runner to third > single that moves them to second = walk that moves a runner to second. But it's gonna look a lot more like wRAA than RBI.
   22. ERROR---Jolly Old St. Nick Posted: February 10, 2023 at 12:02 PM (#6116411)

I may be missing something, but I don't see "Situational Batting" under BB-Ref's Advanced Stats or "More Advanced Stats", let alone anything relating to what I'm looking for. And FanGraph's RE24 is hopelessly complex for those of us not into sabermetric jargon.

That's not a knock on sabermetric jargon, but the "Runners Advanced Average" stat that I proposed would seem much easier for the average fan, both to calculate and to understand. There's no reason I can see why it couldn't be a line in a player's BB-Reference "Standard Batting" stats, since it breaks down singles and non-ground rule doubles in a way that better expresses their true value.
   23. Jaack Posted: February 10, 2023 at 01:13 PM (#6116416)
Here's a link to Situational Batting. They have counts for advancing and scoring runners on the far left. It's not exactly an improved RBI or anything like that, but that's where the sort of raw data you'd need to make it is, short of just pulling PBP data from retrosheet.
   24. ERROR---Jolly Old St. Nick Posted: February 10, 2023 at 03:04 PM (#6116435)
Thanks, Jaack, that's helpful. I was just looking at "Advanced Batting" on the player's main page, as opposed to using the "Finders and Advanced Stats" dropdown and clicking on "Advanced Stats". It doesn't show exactly what I was looking for, but it does have other useful rate stats I didn't previously know where to find.

