Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Wednesday, March 09, 2005

The Snow Index Project, Part 1

The development of a new statistic.

 

Kyle Lobner

 

 

 

Project Snow Index – Step 1: What do we have?

 

Readers of Snowbaseball.com and my friends and colleagues, both within and outside of baseball, are most certainly aware of the existence of, if not the inner workings of, the Snow Index, the statistical formula I developed to measure hitter efficiency. I created it with the intent of replacing OPS (on base plus slugging), Runs Created and others as the single most accurate statistic showing the full offensive value of a player, expressed as an average result per at bat.

 

Several people, after hearing my ideas, have asked why I would try to replace OPS, and what it is within the system, which is largely accepted, that makes it a flawed system in terms of effectively rating hitter efficiency. Here’s the argument, in its shortest possible form:

 

The small problem: OPS includes some things which I don’t feel a player should be rewarded for, most notably the ability to get hit by pitches. Without a doubt, some players have made a career of taking one for the team, and Don Baylor, Fernando Vina and others inevitably come to mind when having this conversation. However, if you had a young hitter, and you were trying to teach him to get on base more, you would not, under any circumstances, tell him to stay in the box and stare down a fastball. Simply put, getting hit by pitches is usually, and I say usually because of Vina and Baylor, not a tremendously notable skill or something you would want to reward a hitter for. In fact, in some places, including here at Baseball Think Factory, I’ve seen it listed as a cause of concern when young players like Rickie Weeks show too much of a tendency to “take one for the team.”

 

The large problem: OPS uses values for events which seem to be quite skewed from what they should be. If you expand the formula out, here’s what you have:

 

((H+BB+HBP)/(AB+BB+HBP))+((H+2B+2(3B)+3(HR))/AB)

 

To simplify the issue, if you had a hitter who did the following things every time he came up, here’s what his OPS would be:

 

HBP: 1.000 (.500)

BB: 1.000 (.500)

Single: 2.000 (1.000)

Double: 3.000 (1.500)

Triple: 4.000 (2.000)

Home Run: 5.000 (2.500)

 

        I find it makes things easier to bring everything down to the point where a single is worth one base, so the numbers in parentheses are the values cut in half, to make things simpler. Also, note that stolen bases are not noted at all, so a hitter who walks, steals second, then steals third every time up has exactly the same OPS as a plodder who walks and gets doubled off every trip to the plate.

 

In Moneyball, Paul Depodesta told Billy Beane that OBP was worth three times as much as slugging. I’d argue that it’s not that much. But I think the above numbers have some serious problems, mainly the improper valuing of walks and home runs.

 

Let’s take a look at the value of home runs first. A home run is the only single event in baseball that absolutely guarantees a run. It’s not reliant upon runners being on base when it happens, and it’s not reliant on the next hitter getting on base, it’s lineup independent. As such, it seems logical, in an intuitive sort of way, to assign home runs a value at least equivalent to 4 singles, and perhaps more.

 

Next, what is a walk truly worth? If you believe that OBP is the logical alternative to batting average, perhaps you would argue that a walk is worth the same thing as a single. A more purist argument would be that a walk doesn’t advance a runner as far as a single would, if at all, and so the actual value is closer to the .5 singles used in OPS. Here’s my argument:

 

With the bases empty, a walk and a single have the same value. It makes perfect sense when you think about it. If no one is on base, it doesn’t matter how you get to first, as long as you don’t get hurt doing it. In 2004, 57.4% of all major league plate appearances fell into this category.

 

With runners on base, a hitter has three tasks to fulfill (get on base, don’t produce an out, advance the runner). Therefore, by singling, they fulfill all three requirements and get 1 single. However, if they walk, they only fill two of the three tasks, as the runner probably will not advance as far as they could or would have on a single. Therefore, in these situations, a walk is worth two thirds of a single. The remaining 42.6% of 2004 plate appearances fell into this category.

 

1(.574)+2/3(.426) = .864 Given these conditions, then, and accepting this value for a walk in the given situations, .864 is the value of a walk in the average of all situations. So here is, then, the first adjustment of my Snow Index formula:

 

((H+2B+2(3B)+3(HR))+.864(BB))/(AB+BB)

 

Or, simpler:

(TB+.864(BB))/(AB+BB)

 

It is also worth noting that while .864 was calculated based on 2004 statistics, it doesn’t greatly change when other seasons are substituted in. For example, the 1984 co-efficient turns out to be .8676. Unless you’re Barry Bonds, the difference is minor.

 

But this formula still lacks one important factor, the stolen base. Barry Bonds stole 6 bases in 2004, and one certainly would not argue that those 6 stolen bases were his most highly valued attribute. However, there are players, both recent and historic; who’s valued was raised greatly by the ability to gain an extra base or two by stealing it. Just taking a raw stolen base figure will not do, though, because that fails to take into account how many times a runner cost his team the opportunity to score by getting caught attempting to steal.

 

Thankfully, many baseball experts, most recently John Sickels of Minorleagueball.com, provide an easy ideal stolen base percentage. If a runner can steal successfully two thirds of the time, the gains from stealing and the losses from getting caught balance out exactly. Anything above that is a gain, anything below is a loss. Mathematically, SB – 2(CS) gives you a number, either positive or negative, showing the actual impact of a base stealer, in terms of bases gained or lost.

 

So finally, here is the Snow Index formula, as I’ve been using it recently. It gives an average achievement, in terms of bases gained, per trip to the plate, with an adjusted figure for stolen bases and walks:

 

((TB+.864(BB)+SB-2(CS))/(AB+BB)

 

Over the course of their careers, this raises the values of Rickey Henderson, Tim Raines and Lou Brock by 56, 49, and 29 points, respectively. Interestingly enough, it lowers Pete Rose by 7 points.

 

Going back to my point from above, here are the individual event values with the current Snow Index:

 

BB: .864

Single: 1.000

Double: 2.000

Triple: 3.000

Home Run: 4.000

Stolen Base (for gain): 1.000 (added to total)

 

Step 2: The Problem

 

I think these values make considerably more sense than those used within OPS., both intuitively and after some thought. However, when put into practice, the system proved to be a similar method of predicting offensive success to the existing systems. To put it flatly, the system I had worked so hard to create showed no more effectiveness than the system I was trying to replace. As data, I used the team Snow Indexes of all 30 major league teams by season, from 1999-present, and compared them to their actual runs scored, looking for a correlation. Team OPS showed a .912 correlation, out of 1. The Snow Index showed a .881 correlation, just slightly worse. So, simply, more work is needed. More factors need to be considered, perhaps some factors need to be devalued, perhaps others need to have their values raised, but regardless, something needs to be changed. To find potential options, I looked to other baseball statistics and statistical studies. Here are my findings.

 

“DIPS”

 

I feel that I gained the most through Voros McCracken’s “DIPS” concept. McCracken’s basic concepts focus on pitchers, and operate on a basic assumption. “There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play,” McCracken said in this article. And the farther you look into the statistics, the more you realize he’s right. But then the question becomes, if it’s that simple for pitchers, why isn’t it that simple for hitters?

 

McCracken’s system uses the three things a pitcher can do that are completely outside the control of his defense: strikeouts, walks, and home runs. All other balls put in play had their fortune decided by luck, or the strength of the defense, or a combination of the two. So while luck plays some part, and in many cases a rather large part, in the end result a pitcher faces, the only statistics providing an actual measure of a pitcher’s ability are the numbers a pitcher generates independent of any other forces.

 

So why is it any different for hitters? The short answer just may be that it’s not. Walks and strikeouts are exactly the same, just reversed. And furthermore, to a point you can identify a hitter’s ability to get on base, both by walks and by hits, by these numbers. Especially recently, heavy emphasis has been placed on plate discipline, taking pitches, wearing out opposing pitchers, waiting for the right opportunity to swing. Vladimir Guerrero may be the only hitter among baseball’s present elite who doesn’t have notable plate discipline. Almost to a man, more plate discipline equals more effective hitting.

 

And what of power hitting, the ability to connect for extra bases? It’s possible that’s covered here, too. Consider, for a second, this proposal. All doubles and triples fall into one of two categories:

 

1)      Ground balls down the line/fly balls in the gap. Both of these are functions of luck. An extra base may be gained by speed, but if a ground ball is hit down the first base line and would be a triple, it is at best only a few feet away from being a routine out on the left side, or a foul ball on the right. The positioning of that ball is largely good fortune.

 

2)      Balls hit with power, the same kind of power which produces home runs. A long fly ball off the wall is generally only a few feet, tiny fractions of an inch on the bat away from being a home run, and is generally hit by someone who has the ability to hit that ball over the fence, too.

 

Here’s the statistical portion of that argument. 44 of the top 100 single-season

doubles totals have been achieved since 1986. Right now, a total of 48 doubles in a season would earn you a tie for 100th place on the All-time list. Of the 44 hitters who have made that list in my lifetime, here are two lists:

 

First, the high end:

 

Player

Year

2B

HR

Ratio

Wade Boggs

1989

51

3

17

Mark Grudzielanek

1997

54

4

13.5

Brian Roberts

2004

50

4

12.5

Jeff Cirillo

2000

53

11

4.81

Ron Belliard

2004

48

12

4

Craig Biggio

1999

56

16

3.5

Dmitri Young

1998

48

14

3.43

Lyle Overbay

2004

53

16

3.31

Mark Grace

1995

51

16

3.19

 

Now, the low end:

 

Player

Year

2B

HR

Ratio

Albert Pujols

2003

51

43

1.19

Juan Gonzalez

1998

50

45

1.11

Albert Pujols

2004

51

46

1.11

Todd Helton

2001

54

49

1.10

Albert Belle

1995

52

50

1.04

Albert Belle

1998

48

49

0.98

 

        Note that no one appears in the first table more than once, in fact, among players on that list, only Craig Biggio ever hit 48 doubles again. The second list is populated entirely by known power hitters. The remaining 30 of the 44 hitters all ended up with ratios relatively close to the league’s 2B/HR ratio, 1.64. So, therefore, even baseball’s greatest doubles hitters, or at least most of them, do not vary from the norm so much that their ability to hit for extra bases could not be feasibly derived from their home run totals.

 

It can be taken one step farther, however. 103 hitters in big league history have reached base 308 times or more in a season. If you measure the percentage of times on base gained via extra base hits, you get this high end/low end differential:

 

 

Player

Year

Times on base

2B

3B

HR

Percentage

Lou Gehrig

1927

330

52

18

47

35.5%

Chuck Klein

1930

308

59

8

40

34.7%

Babe Ruth

1921

353

44

16

59

33.7%

Sammy Sosa

2001

311

34

5

64

33.1%

Stan Musial

1948

312

46

18

39

33.0%

Rogers Hornsby

1922

316

46

14

42

32.2%

Luis Gonzalez

2001

312

36

7

57

32.1%

Todd Helton

2000

323

59

2

42

31.9%

Barry Bonds

2001

342

32

2

73

31.3%

Hack Wilson

1930

314

35

6

56

30.9%

Lou Gehrig

1930

324

42

17

41

30.9%

Babe Ruth

1920

325

36

9

54

30.5%

Jimmie Foxx

1932

329

33

9

58

30.4%

Babe Herman

1930

311

48

11

35

30.2%

 

Player

Year

Times on base

2B

3B

HR

Percentage

Ty Cobb

1915

336

31

13

3

14.0%

Jesse Burkett

1895

307

22

13

5

13.0%

Lu Blue

1931

309

23

15

1

12.6%

Billy Hamilton

1894

355

25

15

4

12.4%

Richie Ashburn

1958

316

24

13

2

12.3%

Eddie Yost

1950

318

26

2

11

12.3%

Eddie Stanky

1950

314

25

5

8

12.1%

Ichiro Suzuki

2004

315

24

5

8

11.7%

Tony Phillips

1993

313

27

0

7

10.9%

John McGraw

1898

307

8

10

0

5.86%

Roy Thomas

1899

310

12

4

0

5.16%

 

Around 80 of the others fall somewhere between 16% and 30%, which seems like a wide margin of extra base hits, but of those, around 70 fall above 20% but still below 30%. Also, note that changes in era dominate both the low and high ends. At the top, only Stan Musial breaks the trend, all other hitters are from either the Ruth era or the Bonds era. It’s also worth noting that none of the players on the low end ever reached base 300 times in a season again.

 

Conclusion: A hitter’s ability to hit doubles is largely predictable by his ability to hit home runs. Furthermore, hitters who get on base frequently tend to hit a predictable number of extra base hits, largely based on their power.

 

Run Expectations

 

Certainly, the data is available, whether it comes from Pete Palmer, or the few who have recreated his study since, to determine what a walk is worth in any of the myriad of situations that occur in a game. The problem, however, lies largely in converting what I would find in said data into something I could actually use. Take, for example, a situation with a runner on first and no one out. If a hitter walks, the run expectation jumps a fair amount as the situation changes. However, if the hitter singles and advances the runner to third, the run expectation jumps farther, as the runner moves closer to home. In that situation the result is quantifiable. I am not in a position to go situation by situation and adjust that data to determine an average situation. In fact, in the current absence of ability to do that, I am happier to keep my current, simpler method, where a walk with a runner on base is worth two thirds of a single and a walk with no one on is worth the same as a single.

 

Conclusion: Run expectations could provide some insight towards the value of a single versus the value of a walk, but one would have to do so on a situation-by-situation basis which would be difficult to translate back to an average situation, which would need to be done for my purposes.

 

Translations

 

In Baseball’s All Time Best Hitters, Michael J. Schell provides an interesting problem, but shows some incredibly flawed logic in solving it. However, before I go any farther, I should point out that Schell’s tools are designed to be used to compare hitters across eras, a task which I feel I am not yet ready to undertake, as I have yet to prove the Snow Index is good enough to judge hitters from one era against each other, much less different eras.

 

With that being said, here are my critiques of the way Schell goes about his project. First of all, he uses batting average to rate players, effectively changing the question involved in his book from “who is the best hitter of all time?” to “who is the best contact hitter of all time?” I’m considerably less interested in the second question. Judging hitters purely by batting average with no regard for power is like comparing pitchers by their fastball speed with no regard for their control. A hitter’s power and ability to make contact go hand-in-hand when determining a hitter’s value. One simply cannot judge a hitter based purely on one of the two factors.

 

Beyond that, however, I would argue one of the claims Schell makes about why eras in baseball are different. He makes a case for a standard deviation score for the average talent level of the league, claiming that changes over time based on outside factors. However, consider this brief timeline:

 

 

Year

Event Causing Rise in Talent

Event Causing Watering Down of Talent

1901

 

American League forms, number of active baseball teams   doubles.

1920

Babe Ruth causes excitement in baseball to rise, new power   hitting fad draws in new fans, causing salaries to rise and more players to   play.

 

1947

Jackie Robinson debuts, black players begin to filter   their way into baseball.

 

1960’s, 1970’s, twice in 1990’s

 

Expansion franchises in several cities, including Toronto and Houston.

1970’s to Late 1990’s

Increase in foreign players coming to America to play baseball from Far East and Latin America.

 

 

        My point is this: The talent pool is always being lowered by one factor and raised by another, and while it may be true that the talent level varies slightly with a change in the climate of baseball, it is my opinion that it always remains near the same level, and therefore an adjustment for talent level within eras is unnecessary. Schell also makes an argument for park factors, which I agree with to a point, on a season-to-season basis, but I deem them unnecessary for determining the lifetime value of most current players, who will very rarely, if ever, play their entire career in one park.

        The biggest points I accept for comparing hitters across eras are late career decline adjustments and mean adjusted Snow Indexes. Schell makes the point that when you compare a great player’s career percentages to those of a somewhat less talented player, the great player’s numbers will be weighted down because their career will in all likelihood be longer, and their additional at bats will drag down their percentages. Schell suggests using only the first 8,000 AB to determine their peak value, and I accept that, with a few corrections:

 

1)      The number has to be adjusted to plate appearances. On average, major league hitters walk about once in every six plate appearances, meaning 8000 AB would adjust to roughly 9333 plate appearances. This means it will not take a hitter who walks frequently as long to reach the milestone, all hitters will reach it at an equal pace.

2)      The set starting point needs to be eliminated. If a hitter does better in the 500 plate appearances after he reaches 9333 than he did in his first 500 PA, then the better numbers should be used, to get a better feel for how good the hitter actually was at his peak. For example, Barry Bonds already has over 11,000 PA in his career, but his most recent 4 seasons, in which he picked up over 2300 PA, were considerably more successful than his first 4 big league seasons. If one stopped rating Bonds at 9333 PA, they would not get an accurate assessment on Bonds’ prowess as a hitter.

 

Furthermore, while changes in era do not necessarily get reflected in the talent

pool, they do frequently show in rule changes and strategy changes, which will affect the average Snow Indexes of players in that era. Therefore, I accept Schell’s proposal of mean adjustment, which is essentially a system of comparing hitters to others of their era, before comparing them across eras. For example, Frank Chance hit .293 in 1907, which seems low until one realizes only 6 players in the NL were better. In comparing players across eras, Chance would be given a handicap of sorts to make up for the fact that hits simply weren’t falling as frequently when he played.

 

Conclusion: Translating data across eras is possible, with some adjustments being necessary and others being overvalued, but the entire subject is moot until I produce a Snow Index which can accurately rate players against others within their own era.

 

Step 3: The Next Move

 

Obviously, a lot of work needs to be done to continue on in my search for the ultimate baseball statistic. Here are my short term goals.

 

Determine what factors, if any, need to be added to the Snow Index. Take, for example, the case of home runs. As the only play in baseball which produces a run independent of all other factors, home runs are critically important to the run scoring chances of most teams. It is possible that, even at 4 bases instead of OPS’ 2.5, home runs are still undervalued. And while intuitively that argument makes sense, it is also possible that a recent trend of increased power hitting has resulted in an era where the correct number is somewhere in between. Similarly, strikeouts need to be considered, as the absence of all possibility of a base gained. It is possible an extra penalty, if you will, may be added to the Snow Index to show how often a hitter completely removes the opportunity to succeed by failing to make contact.

 

Determine how much variation in run scoring is caused purely by luck. By taking the number of runs a team scores by all methods other than home runs, and dividing that by the number of runners a team gets on base (minus their home runs), you can get an average number of runners on base per run scored. From that, one could determine how often a team is getting fortunate in their scoring of runs, and how much that may be affecting their total production.

 

Adjust any potential inaccuracies in the values of existing variables. For example, a walk may produce more or less than .864 singles value in terms of actual runs produced. Run expectations, if I am able to find a way to work them into my system, may give me a better insight on the matter.

 

These are the steps that lay ahead of me. I look forward to confronting them as I carry on.

 

 

 

Kyle Lobner Posted: March 09, 2005 at 12:00 AM | 47 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Dr. Vaux Posted: March 09, 2005 at 01:35 AM (#1188953)

Bill James already said it, but I agree…

   2. Guy LeDouche Posted: March 09, 2005 at 02:15 AM (#1189015)

Bill James has become a fraud who poops his pants.

Now that James is old and doesn’t have his youthful
inquisitiveness anymore, he denigrates those who do.
Let this kid try and make his bones.


Next week, Guy will introduce his “Guy LeDouche Factor”, which for the first time ever will rate the ######### factor for each player.

Coincidentally, Barry Bonds will be at the top of that list too.

   3. Dan Szymborski Posted: March 09, 2005 at 02:19 AM (#1189022)

Yeah, Bill James shouldn’t talk smack considering how he went about construction win shares.

   4. Robert in Manhattan Beach Posted: March 09, 2005 at 02:23 AM (#1189028)

Therefore, in these situations, a walk is worth two thirds of a single.

For the record, I made it this far before I decided this whole thing was ridiculous and stopped reading.  This statement is made as fact when, of course, it is anything but.

   5. Athletic Supporter can feel the slow rot Posted: March 09, 2005 at 02:27 AM (#1189038)

To simplify the issue, if you had a hitter who did the following things every time he came up, here’s what his OPS would be:

HBP: 1.000 (.500)

BB: 1.000 (.500)

Argh, why does everyone say this? If a hitter walked every time he came up, his OBP would be 1.000, but his slugging would be _undefined_, not zero.

It’s probably most correct to assume his slugging would be league-average, which would give him an OPS of around 1.420 or so.

Of course, this highlights a problem with OPS, which is the fact that the denominators of OBP and SLG are different. A SLG difference of .150 is worth more for low-walk players than it is for high-walk players.

   6. Athletic Supporter can feel the slow rot Posted: March 09, 2005 at 02:35 AM (#1189057)

It’s probably most correct to assume his slugging would be league-average, which would give him an OPS of around 1.420 or so.

Maybe a good way to determine the OPS valuation of events is to determine the OPS differential of each marginal event from an average hitter. Your average hitter might go something like:

27-for-100, with 9 walks and 16 extra bases, for a line of .270/.330/.430 (approximately the AL averages.) This is an OPS of .760.

If you turn an out into a walk, this becomes 27-for-99 with 10 walks and 16 extra bases, a line of:

.273/.339/.434. This is an OPS of .773.

If you turn an out into a single, it becomes:
.280/.339/.440, for .779. So “according to OPS,” a walk is worth about two-thirds of a single.

If you turn an out into a double, you get .280/.339/.450, for .789; a triple yields .799 and a HR .809. So the relative value of events according to OPS is about:

Walk: 13
Single: 19
Double: 29
Triple: 39
Home run: 49

   7. fracas' hope springs eternal Posted: March 09, 2005 at 03:02 AM (#1189121)

Therefore, in these situations, a walk is worth two thirds of a single.

For the record, I made it this far before I decided this whole thing was ridiculous and stopped reading. This statement is made as fact when, of course, it is anything but.

I’m with you, Robert.  It was at that same exact point that I jumped to the comments section to see whether anyone was buying this pantload and I’d need to begin debunking.  For instance, the quoted “fact” first assumes that avoiding an out, reaching base, and advancing a runner are of equal value.  Then, having made this unsupported (and I would argue patently false) assumption, describes the worst case scenario for a runners-on-base walk as if it were the only such situation.

Then I stopped reading.

   8. Kyle Lobner Posted: March 09, 2005 at 03:17 AM (#1189138)

This statement is made as fact when, of course, it is anything but.

For instance, the quoted “fact” first assumes that avoiding an out, reaching base, and advancing a runner are of equal value.

Ok, first and foremost, I’m not claiming .864 to be the ultimate answer, I’m claiming it to be better than the options provided by OPS (.5) and OBP (1). Nothing I’ve said here is a definitive solution, it’s a work in progress and something I’ll adjust as time goes by. And for all the times I’ve been quoted here as listing it as a fact, I didn’t use the word “fact” one time to describe it.

Then, having made this unsupported (and I would argue patently false) assumption, describes the worst case scenario for a runners-on-base walk as if it were the only such situation.

As for the allegation of my use of a “worst case scenario for a runners-on-base walk” as the only possible situation, I’m going to stand by that. Most singles with a runner on first advance a runner from first to third. Most singles with a runner on second get the run home. And unless the bases are loaded, a walk doesn’t drive a runner home from third. If the bases were loaded, a single would still probably drive a run home from second as well. The situations with runners on where a walk and a single produce the same result are far in the minority.

   9. Dan Szymborski Posted: March 09, 2005 at 03:39 AM (#1189167)

Guys,

I think you’re missign one of the points of the series.  Snow isn’t displaying this as a completed statistic, this is the *development* of a statistic, from wild-eyed theory to refined result.

   10. Dag Nabbit is part of the zombie horde Posted: March 09, 2005 at 04:36 AM (#1189231)

Simply put, getting hit by pitches is usually, and I say usually because of Vina and Baylor, not a tremendously notable skill or something you would want to reward a hitter for.

Under the rules of baseball, they do get rewarded for it.

In fact, in some places, including here at Baseball Think Factory, I’ve seen it listed as a cause of concern when young players like Rickie Weeks show too much of a tendency to “take one for the team.”

If you’re trying to judge a player’s current value, I’m not sure how that’s relavent.  Getting on base a few extra times does help his value, even if it could conceivably hurt his future value.  Also, there’s no difference here that I can tell between getting hit and showing “too much of a tendency” to get hit.  If a guy’s hit 3-5 times a yaer, is that “too much?”  It adds to his value.  And with the body armor it doesn’t even hurt some guy’s long term impact. 

I read in Nine Innings by Jules Tygiel that Henry Chadwick didn’t include walks as part of batting average because he thought it was, in a way, immoral for a player to get on by those means; a hitter was to hit.  Leave the moralizing of HBPs out of it, and just judge it on what it does.

McCracken’s system uses the three things a pitcher can do that are completely outside the control of his defense: strikeouts, walks, and home runs. All other balls put in play had their fortune decided by luck, or the strength of the defense, or a combination of the two. So while luck plays some part, and in many cases a rather large part, in the end result a pitcher faces, the only statistics providing an actual measure of a pitcher’s ability are the numbers a pitcher generates independent of any other forces.

So why is it any different for hitters? The short answer just may be that it’s not.

Is this just a statement or was there any evidence to back it up?

   11. Kyle Lobner Posted: March 09, 2005 at 04:47 AM (#1189247)

When I said getting hit by pitches isn’t necessarily something I would want to reward a hitter for, I didn’t mean the hitter shouldn’t get first base. I agree with everything you said about OBP rising, body armor, and current value. But all told, if you were rating hitters in terms of who has the most to offer a team, I’d argue against including HBP in that consideration, because all told, it’s risky. Furthermore, I don’t think getting hit by pitches is a skill, or at least a skill we should be measuring for or rating hitters with.

As for my statement about McCracken, no, I don’t have statistical proof for or against, I’m postulating on that. Part 2 of the project will have my findings on strikeouts, home runs, walks, and anything else I find that may catch my eye, but those three specifically are influenced by my understanding of McCracken.

   12. Dag Nabbit is part of the zombie horde Posted: March 09, 2005 at 04:54 AM (#1189251)

Furthermore, I don’t think getting hit by pitches is a skill

Fair enough, but I don’t think it necessarily matters.  Bloop singles ain’t always a skill either, but they help just the same.

   13. Nick S Posted: March 09, 2005 at 05:10 AM (#1189269)

Snow,

You should take a look at the BaseRuns model developed by David Smyth (back in the days of Fanhome) and described in some detail by Tango (tangotiger.net) 

The question of what is a skill is a tough one, but a great starting point is to see if player’s performances in a particular category tend to correlate form year-to-year.  For instance, if HBP is not a skill for batters, you will find that a batter’s HBP one year is not useful in predicting his HBP the next year.  As for DIPS, batters appear to have much more “skill” in $H than do pitchers, so DIPS-type-stuff is not as applicable to them. 

Also, don’t name the stat after yourself . . .

   14. Aloysius B. Winshares the 4th Posted: March 09, 2005 at 05:48 AM (#1189325)

I disagree and so does my cousin, O.B.P. Vorpington.

   15. mr. man Posted: March 09, 2005 at 06:52 AM (#1189384)

if you’re looking with statistics with predictive capabilites, start with walks/PA and isolated power.  so much of OPS comes out of batting average, such that:

OPS= (AVG x 2) + ISO + ISW

where we’ll call ISW ‘Isolated Walking, ie OBP-AVG

Anyway, rather than using a player’s batting average from one season, or one month, etc, try to replace it with a career average or 3-year average, or better yet some age-adjusted value.

You’ll find that by doing that you can cancel out some of the variability of AVG to get a better correlation on the future OPS.

So:

Predicted OPS= (CareerAVG) + ISO + ISW

You’ll find that ISO and ISW vary very little over short periods of time, surprisingly.

   16. fracas' hope springs eternal Posted: March 09, 2005 at 07:37 AM (#1189416)

Therefore, in these situations, a walk is worth two thirds of a single.

That just sounded like a statement of fact to me, but I probably overreacted.  As for you using a “worst-case scenario,” with respect to valuing men-on-base walks, you gave zero credit for advancing baserunners, even though the most common baserunner situations include a man on first.  That baserunner will always advance on a walk (and frequently advance just as far as he would on a single).  I agree that if you set the value of a single to 1.000, a walk is worth less.  But it’s equally certain that it’s worth more than .864; I’d wouldn’t be surprised if it was above .932, that is, closer to 1.000 than it is to .864.

And forgive me for not reading the whole article, but I just took another glance and saw this:

On average, major league hitters walk about once in every six plate appearances….

I’m sorry, but good major league hitters walk about once in ten PAs.  Last year, National Leaguers averaged just under one walk per eleven PAs.  Where are you getting this stuff?  Are you claiming hitters with 8000 AB careers walk about 1/6th of the time?  I doubt it, but that would plausible, more or less.

   17. fracas' hope springs eternal Posted: March 09, 2005 at 07:42 AM (#1189421)

I’m sorry, the last thing I want to do is discourage research of any kind.  I’m just in an argumentative mood at the moment. (“Yeah, what else is new?”) 

Carry on.

   18. Kyle Lobner Posted: March 09, 2005 at 07:47 AM (#1189427)

Off the top of my head, at 1 am, I don’t remember where the 1 in 6 data came from, but it’s possible it came from the top hitters I was using as a base for the system. If it’s wrong I assure you I’ll correct it before I get to that phase of the process…certainly, translating data across eras is off my radar screen right now, because I don’t even have my tools ready to compare hitters within eras.

And as for your argumentative nature, I encourage it, as long as you can be reasonable and allow me to defend my claims. Thus far, everyone here has done a pretty good job of that.

   19. fracas' hope springs eternal Posted: March 09, 2005 at 08:40 AM (#1189465)

Was Schell’s conclusion that Tony Gwynn was the best hitter of all time?  If so, I think I read some of his book in a bookstore once.  Now, I’m a candidate for the planet’s biggest Tony Gwynn fan, but Schell is, ahh…what’s the term?  A nutbag.

   20. Athletic Supporter can feel the slow rot Posted: March 09, 2005 at 09:05 AM (#1189487)

Thanks for taking the time to answer our questions.

Maybe I’m missing something, but how is this different from / better than Linear Weights? Is the speculation at the end related or unrelated to this Snow Index business?

   21. Chris Dial Posted: March 09, 2005 at 03:28 PM (#1189708)

Isn’t a walk worth about two-thirds to three-quarters of a single?  Why was I not upset about that statement?

mr. man,
I have long used your equation for predicting future performance.  I tried to get Szym to use that model for his projection system.

And thanks for ISW.  I always refer to it as OBP-BA, but I really like your abbreviation.

Also, for Vina and Baylor and Kendall and Hunt and Biggio, getting HBP is a skill.  Most players get HBP something like 1-5 times a season, so you have a league baseline.  There are definite outliers who have accepted this as a way to reach base.  It should be included.

The trick to it being a skill is that *every* player has this skill - some use it more than others.  You can watch a player all season and see where he could have “taken one for the team” about 15-20 times over the season.

   22. Kyle Lobner Posted: March 09, 2005 at 05:48 PM (#1189958)

Was Schell’s conclusion that Tony Gwynn was the best hitter of all time?

Yes, it was. It’s not in this work, but shortly after reading that, I wrote something like “If I had Williams, Mays and Ruth, I doubt I’d find 100 AB for Gwynn.” I think it’s worth noting that Schell’s got a new book out that seems to address the problem, but it just came out last week and I’m not sure I’m in any hurry to pick it up.

Maybe I’m missing something, but how is this different from / better than Linear Weights?

Essentially, with some minor differences, the system I’m starting with is a system of linear weights. I’m looking at that from several angles, and spent several hours last night doing some research/simulation for my probable next article, which, based on my result, will either be a critique or an endorsement of the use of linear weights as a whole. I’m still unsure on the merits of the system, so I either need to see it proven to work or I need to prove that it doesn’t. With that said, I spent about 5 hours on it last night and it’s nowhere near ready to project.

Finally,
Isn’t a walk worth about two-thirds to three-quarters of a single? Why was I not upset about that statement?

I can’t tell you why it didn’t make you upset, I haven’t known you that long. It wouldn’t surprise me if my adjusted value for walks comes out in the 2/3-3/4 range, as some of the early research I’ve done, tinkering with numbers and checking the correlation, would say that my factor should be lower than .864. .864 is drawing a fair amount of fire, as I expected it to, but I’d like to emphasize that it was a best guess based mostly on an intuitive understanding of the situation, it’s certainly not something I’m above changing, and it’s, of course, not a “fact.”

   23. Nick S Posted: March 09, 2005 at 07:21 PM (#1190163)

I’m surprised comments are not much harsher than they are (a kinder, gentler BTF?)  The author is covering some well-mapped out territory here, and getting lost badly while doing so.

   24. mr. man Posted: March 09, 2005 at 08:29 PM (#1190278)

thanks, chris…note of course that ISW would include HBP, and isn’t as great a predictor as walks/PA because it will vary with batting average as well as walk rate.

yeah, i do find replacing the BA component of OPS for a quick predictor of future OPS improves your correlation dramatically…for everyone reading this and trying to figure how to use this, don’t be afraid to make an educated guess about future BA if you think a guy’s likely to hit something other than his career average.

EG adrian beltre: the big jump in performance last year was mostly thanks to a batting average explosion that was 72 points above his career average coming into the season.  His walk rate wasn’t out of line with his career numbers, and 9 of his 53 walks last year were intentional.  His slugging did get a big boost (well, actually, his ISO nearly doubled), which is probably a real change.  I expect him to hit in the .280 range next year at safeco…so that works out to:

Expected OPS = (2xBA) + ISO + ISW

= (2x .280) + .260 + .58

= .878

note- i chose a .260 ISO, down a bit from last year but i think we can all agree that we shouldn’t expect 48 homers again. 

Let’s compare him to, say, his neighbour in Oakland, Eric Chavez:
ISO last year: .225 Career: .225
ISW last year: .121 Career: .077
Career BA: .277

The change in walks appears to be for real-most one-season increases are. you don’t often see a guy lose his walks.  I see no reason why chavvy will hit anything but .277 this year, he hits that every year. so:

Expected OPS: (2*.277) + .225 + .121
= .900

There you have it.  Chavez should be a little better than beltre this year.  Both are in better lineups than last year—i’m not sure what to do with that.  i’ve seen either one rated ahead of the other on the web, but know this: beltre will not hit .334 this year.  don’t expect it, and don’expect 48 homers.  chavez is probably a safer bet.

   25. Mikαεl Posted: March 09, 2005 at 08:32 PM (#1190287)

KL,

I get the very definite sense that you have not read much of the research done in the field on offensive statistics.  Tangotiger wrote a set of excellent articles a little while back on different stats.  I highly recommend you take a look and see where the field sits now.

How are Runs are Really Created?  Parts I-II, Parts III-IV, Parts V-VII

You seem to be trying to create a linear weights system through intuition, when David Smyth and others have developed excellent systems through the systematic use of real data.  Again, I’d recommend you get a sense of their work and then see what you think you can add.

   26. batpig Posted: March 09, 2005 at 09:07 PM (#1190367)

“You seem to be trying to create a linear weights system through intuition…”

That was EXACTLY my thought when I was reading this.  KL, you are trying to use a reasoning process to establish more accurate coefficients for the different offensive inputs, but there is math that will do this faster and BETTER. 

The problem is, the beauty of OPS is its simplicity.  OBP and SLG are easily and publicly available, and easy to add together… and they provide an accuracy that is close to the best run-prediction systems. 

If you want to make a complex system to improve accuracy, you should use the math available to achieve greater precision.  If you’re going to use intuition, and then come up with a complex system with different coefficients for all the inputs, you’ve lost the simplicity/elegance of OPS, without achieving the enhanced accuracy of EQA or RC or Linear Weights.

It’s interesting stuff to think about, but if it’s not as simple as OPS, but not as accurate as the best metrics, what’s the point?

   27. Tango Tiger Posted: March 09, 2005 at 09:48 PM (#1190444)

In addition to the already cited articles, K.L., you should also look at run expectancy values of events by the 24 base-out states, along with the average values at the bottom of that chart.

   28. RobertMachemer Posted: March 09, 2005 at 10:59 PM (#1190602)

I think you’re missign one of the points of the series. Snow isn’t displaying this as a completed statistic, this is the *development* of a statistic, from wild-eyed theory to refined result.


I hear ya, Dan.  That wasn’t clear from the beginning of the article, but it eventually became clear the more I read it.  It’s not an article about truth, but about how one might approach finding it (including the missteps one might make along the way).

   29. KJOK Posted: March 10, 2005 at 01:05 AM (#1190814)

I think Schell is being mischaracterized on what he was saying in his initial book (it should have been clear to anyone he was talking about batting average only in the first book), and furthermore, I would encourage ANYONE to read his new book “Baseball’s All-Time Sluggers” where he DOES address the quesiton of how to determine who the best BATTERS are across eras.  It’s an extremely fascinating book with lots of good statistical techniques that are both presented in technical detail AND explained in plain english.

   30. Walt Davis Posted: March 10, 2005 at 02:42 AM (#1190992)

So, therefore, even baseball’s greatest doubles hitters, or at least most of them, do not vary from the norm so much that their ability to hit for extra bases could not be feasibly derived from their home run totals.

Huh?  What?

Using consecutive seasons of at least 400 AB, I regressed doubles in year 2 on doubles and HR in year 1.

Not only do doubles in year 1 have a significant effect on doubles in year 2 even after controlling for HRs in year 1, but doubles in year 1 explains about 20 times as much variance as HRs.

There clearly is an ability to hit doubles even after controlling for HR ability.

Another way to try to assess this is to look at the extra base rate (# of extra bases per ab or pa, not # xbh) in year 2 as a function of HR rate and double rate in year 1.  Both variables are significant, though of course the # of HRs has a much bigger impact (given they contribute 3 xbs).

If we use xbh rate, still both significant, and HR rate still a stronger predictor but the importance of double rate is much closer.

The upshot is, no, you can’t just use HR rate to measure XB power.

   31. Kyle Lobner Posted: March 10, 2005 at 03:14 AM (#1191043)

KJOK, if he didn’t want me to think the book was about rating the best hitters in baseball history, he shouldn’t have called it “Baseball’s All Time Greatest Hitters.” Maybe he should’ve called it “Baseball’s All Time Greatest Contact Hitters” or “My Own Personal Search To Find A Way to Rate Tony Gwynn Over Everyone Else.”

   32. thok Posted: March 10, 2005 at 03:27 AM (#1191068)

Have you tried thinking about how to handle ground into double plays (GIDP)?  That’s one additional reason that extra base hits are worth more than walks and singles.  (After a home run, the next batter can’t get a GIDP.  After a double or triple it takes a lot of effort to get a GIDP.  After a single or walk, a GIDP is likely).

It’s a small factor, but it might increase the value of a home run from 4 singles to 4.1 singles or so and similarly increase the value of doubles and triples.

   33. nygiants5811 Posted: March 10, 2005 at 03:52 AM (#1191108)

Interesting work; it’s good to see more ways of looking at this stuff.

Jim Furtado did something where he evaluated the coefficients of a lot of the run-estimators, too; I’d take a look at this page as well for some additional information.

   34. fracas' hope springs eternal Posted: March 10, 2005 at 10:49 AM (#1191523)

Maybe he should’ve called it “Baseball’s All Time Greatest Contact Hitters” or “My Own Personal Search To Find A Way to Rate Tony Gwynn Over Everyone Else.”

What he should have called it is “Finding Baseball’s All Time Greatest Hitters For Average.”  I really don’t think he began with his conclusion and found equations to achieve that result—it’s just that he chose such an odd Holy Grail to seek.  (You’d expect a guy so willing to crunch numbers to employ that skill toward a more sabermetric goal than the quest for the best batting average.)  I’m sure he’d have been happy to have his work point him toward Ted Williams, and Ted wasn’t what you’d call a “contact hitter.”

   35. Cabbage Posted: March 10, 2005 at 02:12 PM (#1191555)

Rather than the “Snow Index”, I feel it should be called the “Informer Index”

   36. Mike Emeigh Posted: March 10, 2005 at 03:17 PM (#1191605)

I really don’t think he began with his conclusion and found equations to achieve that result—it’s just that he chose such an odd Holy Grail to seek.

Well, you have to remember that Michael isn’t writing primarily for a sabermetric-savvy audience, and sometimes you have to present your concepts in a well-understood framework in order to gain a wider audience for them.

—MWE

   37. Dufmeister Posted: March 10, 2005 at 04:15 PM (#1191701)

It seems to me that there are two kinds of “overall performance” stats - those that reflect an individual’s ability (OPS, etc) and those that attempt to relate individual ability dirfectly to creating runs (RC, etc).

KL seems to be trying to redefine OPS in a way that is more intuitive and correlates to runs scored at least as well as OPS.

The trouble I have is taking team context into account for an individual’s contribution.  IE comparing walk and single’s value with men on base.  OPS does not look at that, yet in KL’s study, OPS had a better correlation to runs.  Is KL trying to measure individual ability or contribution to runs?  Is the Snow index a relative rate stat or a measure of “runs” contributed per PA?

If you want to analyze team concept, then you need to look at averages for PAs with men on base, outs, how many on base, etc.  As in a previous comment, GIDPs affect the value if there is a force situation, but that is also irrelavent if there are two outs.  Also the comments about full rationalization with base outs are extremely relevant and can lead to a melding of the individual and team context by creating “average” situations.  Perhaps and overall average run expectancy weighted by the total number of PAs each situation came up during a season?  Would this need to be adjusted for lead-off hitters always getting at least one PA per game with a known situation?

An aside, I am sure there is research out there, but is ability for GIDP any different than an ability for lining into DPs?  Both are a function of having men on base, perhaps it is similar discussion to BBs versus HBPs?

   38. Kyle Lobner Posted: March 10, 2005 at 05:19 PM (#1191804)

My argument against the consideration of GIDP is that it’s an opportunity based stat, similar to RBI. Any single has the potential to be an RBI single, it just needs a runner on base to drive home. A lot of ground balls to second, short of third could be double play balls, if there were a runner on first. But the guy who hits behind Barry Bonds is much more likely to ground into a double play than the guy who hits behind Craig Counsell, simply because the former case will come up more often with a runner on.

I’d be interested to see the worst double play hitters in baseball listed in terms of GIDP/opportunities to GIDP, but not nearly interested enough to sit down and go box score by box score to look.

   39. KJOK Posted: March 11, 2005 at 02:05 AM (#1192827)

My argument against the consideration of GIDP is that it’s an opportunity based stat, similar to RBI.

But the problem with this argument is that two players with the SAME opportunities will have very different DP results, year after year, indicating that avoiding DP’s is batter specific.

   40. Kyle Lobner Posted: March 11, 2005 at 03:40 AM (#1193093)

two players with the SAME opportunities will have very different DP results, year after year, indicating that avoiding DP’s is batter specific.

Agreed. But with that said, the only feasible way to measure that ability would be to compare a player’s GIDP stats to his opportunities to hit with a runner on first. If someone has access to that data and would like to crunch the numbers, more power to them, it’s certainly relevant as a small part of a grand scheme. It’s just not something I have immediate access to.

   41. studes Posted: March 12, 2005 at 02:09 AM (#1194896)

Not to be dismissive or anything, but hasn’t this already been covered much more extensively (and clearly) by Tango in the linked articles?  Why not just re-post OPS Begone?

Man, now I sound like Chris reminiscing about the old days.

   42. dcsmyth1 Posted: March 13, 2005 at 04:36 PM (#1196676)

Many of us have gone thru the same process as this fella, in the privacy of our own notepads. But why does this article deserve to be published on this site, when he is going over stuff which was better analyzed 30 years ago.

   43. dcsmyth1 Posted: March 13, 2005 at 04:37 PM (#1196678)

I meant 20 years ago, not 30.

   44. Artie Ziff Posted: March 13, 2005 at 10:21 PM (#1197110)

This is new? I though plagiarism was a punishable offense? What next, the R.B.I.?

   45. Mike Piazza Posted: March 14, 2005 at 04:31 PM (#1198028)

Rather than the “Snow Index”, I feel it should be called the “Informer Index”

</p></p>I licky boomboom down!

   46. Harmon "Thread Killer" Microbrew Posted: March 14, 2005 at 10:48 PM (#1198670)

Snow. Informer.

Good one, Piazza.

   47. rdfc Posted: April 02, 2005 at 09:26 PM (#1228265)

Michael Schell has a new book out called Baseball’s All-Time Best Sluggers.  He clearly never meant to imply anything more than that Tony Gwynn was better at hitting for average than any player in history.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
BDC
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 1.1010 seconds
66 querie(s) executed