User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
Page rendered in 0.2752 seconds
48 querie(s) executed
| ||||||||
You are here > Home > Baseball Newsstand > Discussion
| ||||||||
Baseball Primer Newsblog — The Best News Links from the Baseball Newsstand Tuesday, February 07, 2023The Boomers Were Right: Batting Average is REALLY Important
RoyalsRetro (AG#1F)
Posted: February 07, 2023 at 12:24 AM | 24 comment(s)
Login to Bookmark
Tags: batting average |
Login to submit news.
You must be logged in to view your Bookmarks. Hot TopicsNewsblog: Carlyle’s Rubenstein Is in Talks to Acquire Baltimore Orioles
(9 - 7:44am, Dec 08) Last: . . . . . . Newsblog: Yankees get Juan Soto in blockbuster trade with Padres (60 - 2:49am, Dec 08) Last: rr: over-entitled starf@ck3r Newsblog: OT - NBA Redux Thread for the End of 2023 (155 - 2:34am, Dec 08) Last: aberg Newsblog: Jeimer Candelario, Reds reach 3-year, $45M deal, sources say (17 - 12:08am, Dec 08) Last: NaOH Newsblog: Shohei Ohtani's secretive free agency is a missed opportunity for him and MLB (34 - 11:36pm, Dec 07) Last: Booey Newsblog: OT Soccer - World Cup Final/European Leagues Start (327 - 11:06pm, Dec 07) Last: SoSH U at work Newsblog: Who is on the 2024 Baseball Hall of Fame ballot and what’s the induction process? (416 - 10:20pm, Dec 07) Last: Tom Nawrocki Hall of Merit: 2024 Hall of Merit Ballot Discussion (191 - 7:43pm, Dec 07) Last: Howie Menckel Newsblog: Guardians win Draft Lottery, securing next year's top pick (7 - 6:19pm, Dec 07) Last: Zach Newsblog: Eduardo Rodriguez signs with Diamondbacks: NL champs add to solid rotation on four-year, $80M deal, per report (3 - 6:15pm, Dec 07) Last: Walt Davis Newsblog: Reports: Astros, Victor Caratini agree to 2-year, $12M deal (7 - 5:23pm, Dec 07) Last: Tom and Shivs couples counselor Newsblog: Mookie Betts will be 'every-day second baseman' for Dodgers (38 - 4:14pm, Dec 07) Last: jacksone (AKA It's OK...) Newsblog: Red Sox trade Alex Verdugo to Yankees for three pitchers (29 - 4:14pm, Dec 07) Last: Walt Davis Newsblog: Jerry Reinsdorf meets with Nashville Mayor Freddie O'Connell (5 - 3:14pm, Dec 07) Last: Tom Nawrocki Hall of Merit: 2024 Hall of Merit Ballot Ballot (4 - 3:10pm, Dec 07) Last: Jaack |
|||||||
About Baseball Think Factory | Write for Us | Copyright © 1996-2021 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| Page rendered in 0.2752 seconds |
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Walt Davis Posted: February 07, 2023 at 05:35 AM (#6116006)It seems like I've heard this argument before. I can't put my finger on where.
It later qualifies that it means AVG is more impt than walk % or ISO, but yeah, who doesn't agree with that?
If you input AVG, OBP and SLG into team runs scored over a large # of team-seasons, the regression analysis will show that OBP and SLG are crucial, and AVG comes out as largely irrelevant.
Order of importance goes something like
OPS > SLG > OBP > AVG > ISO > walk%
He also simplifies OBP to bb% + avg which isn't actually true.
Baseball savant has a similar issue on their leaderboard where you do really well on a bunch of metrics if you hit the ball hard
Otherwise the primary significance of batting average is that it is part of both OBP and SLG and so, yes, its relevant because of that.
Because OBP and SLG are mostly batting average. The model you propose obscures the causal structure of what's actually going on. The OBP, SLG, BA model produces an interpretation of BAs "relevance" that is "after we control for BA's role in OBP and we control for BA's role in SLG, then BA is irrelevant."
Runs = a0 + a1*OBP + a2*SLG + a3*BA + ...
with the estimate of a3 coming out essentially zero. But we can rewrite the above as:
Runs = a0 + a1*(BA + "BB%") + a2*(BA + ISO) + a3*BA + ....
where "BB%" is (OBP - BA) -- not our fault they got give different denominators -- then rearrange terms:
Runs = a0 + (a1 + a2 + a3)*BA + a1*BB% + a2*ISO + ...
And the relevance of BA is obvious and it doesn't matter that a3 is zero.
To the extent that this article adds to this, it's that (whether he realizes it or not) he's removed the unnecessary restriction on a1 and a2 being equal across the components. Lord only knows what scale he's on but he's showing that the BA coefficient is much lower than the sum of the BB% and ISO componenets. That means there's an additional flaw in the OBP + SLG + BA model because that model forces the restriction.
I do find it hard to believe that team ISO correlates over .8 with R/PA. If that's true, neither BA nor BB% matter very much.
This puts the causal ordering completely backwards. Surely no later than ch 2 in any book on causal modeling is that if you want to understand the causal effect of X on Y3, you do not include conseqeuences of X (Y1, Y2) in the model.
In the most obvious terms I can think of ... OPS = OBP + SLG = (BA + "BB%") + (BA + ISO) = 2*BA + "BB%" + ISO
VAR(OPS) = 4*VAR(BA) + VAR(BB%) + VAR(ISO) + 4*COV(BA,BB%) + 4*COV(BA,ISO) + 2*COV(BB%,ISO)
Now, in theory, the VAR(BA) could be zero -- every team has the same BA at which point variation in OPS is purely a function of variation of BB% and ISO and their covariation. More realistically, variation in BA may be low while variation in the other componenets (ISO particularly maybe) could be high. But since VAR(BA) gets multiplied by 4, that variation in ISO or BB% needs to be much greater to be the most important component.
The covariation terms capture to what extent different componenents tend to be both be high or one is high and the other low. Oafball suggests ISO and BA are negatively correlated. The role of contact in BA and BB% suggests they are also negatively correlated. I'm not sure there's really any reason to think BB% and ISO are correlated but certainly many oafs feature high walk rates (but not the Javy Baez clique). But covariance terms are generally small so they probably aren't contributing that much to the variation in OPS.
But of course we are interested in runs, not OPS. We could start with a simple model of ("xRuns" being "expected" or "predicted" runs to get rid of the error term):
xRuns = a0 + a1*OPS
If that model is correct we can expand to:
xRuns = a0 + a1*OBP + a1*SLG = a0 + 2a1*BA + a1*BB% + a1*ISO
That's a testable assumption although, as far as I know, only the first of those substitute models was tested. And that test failed leading to the OBP + SLG model (ceofficient now b's to make clear they are not the same as above):
xRuns = b0 + b1*OBP + b2*SLG
which we know can be rewritten as:
xRuns = b0 + (b1 + b2)*BA + b1*BB% + b2*ISO
That also contains a testable assumption which is to say it can be compared with the following model:
xRuns = c0 + c1*BA + c2*BB% + c3*ISO
and test the assumption that c1 = c2 + c3.
The results reported in the paper are not detailed enough to allow a full and proper test of that assumption but it's pretty clear from what's reported the test of that assumption easily fails meaning that a model that separates BA, BB and ISO performs better than one that uses just OBP and SLG (and adding BA won't matter).
Based on these results, the reason it performs better is not so much because of the importance of BA -- as we know, the effects of BA on run-scoring are already pretty fully absorbed in the OBP + SLG model. It's mainly because the effects of BB% and ISO are sufficiently different on a per-unit (or standardized) basis from the effects of BA -- i.e c1 does not equal c2 + c3. In the results reported here, c2 + c3 is much bigger than c1.
From a practical GM's perspective ... that's where the covariances become important to understand. The BA/BB/ISO model still isn't a useful causal structure for investigating "what happens if ...?" It's useful for "what happens if I can increase BA (while holding the rest constant)?" But you can't directly manipulate team BA, certainly not in isolation. There's a causal model of some sort for BA, a causal model that is not fully independent of BB% and ISO.
The easiest way to increase BA is to increase contact. To the extent you turn Ks into contact, that's strictly a win -- you've turned an out into a potential non-out. But of course you can't just turn Ks into contact, a contact approach is also going to reduce walks and ISO. Walks because some PA that would have ended in a walk will now end in contact on an earlier pitch; ISO because presumably increasing contact means swinging at more hittable but not well-hittable pitches. So it's obviously a balancing act, are we gaining more from increased contact than we are losing in walks and HRs? But the point is that behind this is a set of dependent models along the lines of (BA, BB%, SLG) = f(contact + other stuff).
I suppose that's more of an individual batter's perspective. From the team standpoint, it's easy enough to take the xR = BA + BB% + ISO model and swap out Kyle Schwarber and swap in Luis Arraez and see what happens. So we can argue that the GM doesn't need to know the full causal structure, they just need the best predictive model available (with some tradeoff for ease of use but that can be left to the computer nerds). These results suggest xR = BA + BB% + ISO outperforms xR = OBP + SLG (which is not surprising since it can't actually performe worse). But it's possible there's a model out there with contact%, GB%, HH%, etc. (i.e. components of BA, BB%, ISO) that would do better while also pinpointing things for the coaches to work on improving.
It gets stattier?
I would expect that ISO itself has a bit of causal relationship with BA, especially if you're looking at things on a team level which will limit how extreme the samples are. You can't get extra bases on hits if you aren't getting hits.
even as kids, when your Little League manager immediately starts yelling - after you take the first pitch of the season for a ball - "C'MON, NOW, A WALK'S AS GOOD AS A HIT!" that you may well have landed in the dumpster your manager uses to identify you as a lost cause if you foolishly swing. At anything.
:)
I suppose you can squint and see the whole thing about baseball "analysis" being a subset of "criticism," and certainly different schools of art, film, literary criticism rise and fall and ebb and flow with the passage of time -- but that's a mighty big stretch.
And while a walk with the bases empty is of equal value to a single, obviously a single that advances a runner from 1st to 3rd is more valuable than a walk. Is there a stat that measures that extra value?
I’m sure WAR does something similar.
About the only glitch you'd have to account for would be for a double that didn't score a runner from 2nd because the runner had to hold up to see if the ball was going to fall safely. It'd be a judgment call, but I'd be inclined to award the batter 2 Bases Advanced, since the runner not scoring wasn't the batter's fault.
One value of this sort of number would be that it would reward singles that get through the infield, as opposed to infield singles that only advance runner(s) a single base. It would also reward "gap" doubles, as opposed to doubles that are hit down the LF or RF line and cut off by the corner outfielder before reaching the wall. It's much harder for the latter sort of double to score a runner from 1st.
To take an extreme example, imagine if a batter comes up 50 times with a runner on first, and no other runners on base.
Batter A hits 40 singles that move the runner to 3rd, and 10 doubles that score the runner. So (40 x 2) + (10 x 3) = 110 Bases Advanced
Batter B hits 40 singles that only move the runner to 2nd, and 10 non-ground rule doubles that only move the runner to third. So (40 x 1) + (10 x 2) = 60 Bases Advanced.
And by further contrast,
Batter C walks 50 times, so 50 x 1 = 50 Bases Advanced.
You can see the differences in value, though obviously this is an extreme case.
For a more comprehensive look, there's RE24. With RE24, a single that moves a runner to third > single that moves them to second = walk that moves a runner to second. But it's gonna look a lot more like wRAA than RBI.
I may be missing something, but I don't see "Situational Batting" under BB-Ref's Advanced Stats or "More Advanced Stats", let alone anything relating to what I'm looking for. And FanGraph's RE24 is hopelessly complex for those of us not into sabermetric jargon.
That's not a knock on sabermetric jargon, but the "Runners Advanced Average" stat that I proposed would seem much easier for the average fan, both to calculate and to understand. There's no reason I can see why it couldn't be a line in a player's BB-Reference "Standard Batting" stats, since it breaks down singles and non-ground rule doubles in a way that better expresses their true value.
You must be Registered and Logged In to post comments.
<< Back to main