You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Tuesday, April 12, 2005The Snow Index Project, Part 2The development of a new statistic, stepbystep.
A 162game season begins this week. Each team will play out better than 10000 atbats between offense and defense, enough that one would think that any “sample size” problems would sift themselves out, yet every year we seem to find a team or two commonly believed to be “lucky.”
I know this territory isn’t exactly uncharted, and I’m certain others smarter than me have taken a swing at it. I’m also aware that if anyone properly quantified luck in this situation, they could probably make themselves quite wealthy in Vegas. I don’t think I’m in a position to write the “ultimate quantification of luck in baseball,” but I would like to offer my own take on it, the RPSR.
The actual formula I’m starting with is rather simple:
I’m calling this formula the Run Production Success Rate, or RPSR. Simply, it measures how often a runner who gets on base, but doesn’t drive himself in, gets driven in. Several factors could make this number go up or down, such as:
Extra base hitting: Home runs are eliminated from the equation in the beginning, but doubles and triples are still in. A runner who starts his trek towards the plate from second or third has a better chance of getting there.
Speed: Cecil Fielder would be a little harder to score from first than Carl Crawford. Hell, in a race from first, Cecil Fielder might cross home plate behind Rocco Baldelli on crutches. Base stealing is an obvious and quantifiable factor, the ability to take the extra base or stay out of the double play is a little harder to lock down.
Power: On a team like the 2004 White Sox, that hit a bunch of home runs, getting on base at the top of the order was sometimes all that was necessary, as it gave Lee, Ordonez and Co. the chance to mash nonsolo home runs.
Lineup balance: Tony Womack might hit .300 in the 9 hole for the Yankees this season. The Royals may have three regulars with OBP under .300. (3 players had better than 200 PA and OBP that low last season.) So when the middle of the Yankees lineup gets on base, there’s a chance the bottom of the order won’t kill the rally. When Mike Sweeney got on base for the Royals, the odds of him being driven in were minimal at best.
“Little ball”: Getting a runner on first and sacrificing him into scoring position increases the odds of getting one run, and simultaneously eliminates the chance of getting anyone else on base, thereby affecting both sides of the equation.
Luck: Sometimes the dice just falls on the right side. Bloop singles, errors and poor decisions by your opponent happen, and while capitalizing on them takes some skill, having them happen at the right time can cause more success than a good decision could have.
Extra base hitting is easy to measure, and can be eliminated from the equation as it is in the table below. The extra base hit ratio (XBH) is also pretty simple:
This gives you a percentage of times on base that aren’t home runs, but are extra base hits. If a team has a high XBH ratio, it makes logical sense that they should also have a high RPSR. In the cases where there are large differences between the two, one has to take a look at the other factors listed above. If the difference can’t be explained by one of those factors, it’s possible you have a luck situation on your hands.
The first number in the table below is RPSR, the second is XBH Ratio. The next two numbers are the respective team rankings in each statistic. As mentioned above, a team with a high number in one column should have a similar number in the other column. A positive number under “Difference” means a team did better than their XBH Ratio would imply. A negative implies underachievement. style='width:464.55pt'>
Admittedly, not every large differential is based on luck, many if not most of them have logical reasons. However, it is worth wondering, what would happen if the other factors went away?
Let’s assume for a second an ideal world, where RPSR is based directly on a team’s XBH Ratio. This would eliminate speed and lineup balance from the equation, but would also take away luck. Using the Pythagorean formula and adjusting teams’ runs scored for this new system, here’s what the 2004 standings would’ve looked like:

BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2015
(12  2:28pm, Nov 17) Last: jingoist Loser Scores 2014 (8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: ERRORJolly Old St. Nick What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH SABR 40: Impressions of a FirstTime Attendee (5  11:12pm, Aug 19) Last: Joe Bivens, Minor Genius 

Page rendered in 0.7568 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. VeteranPresence.com Posted: April 12, 2005 at 03:44 AM (#1249707)So what your telling me is...put the lime in the coconut?
Could you explain that adjustment?
Singles and home runs will help drive runners in, yes, but that wasn't why I separated out doubles and triples. I separated them out because a runner who gets on base by a double or a triple is more likely to score, and therefore a team's RPSR would rise in that situation. So if you double/triple a lot, your RPSR would rise. Home runs were taken out there because they've been removed from the entire equation in the beginning.
In response to post 6:
The concept I used for this one is pretty simple. The 2004 league RPSR was .324, and the league XBH Ratio was .252. .324/.252 = 1.2871. So while a team's actual runs scored are shown by this formula:
RPSR*(times on baseHR)+HR
A team's runs scored, if extra base hitting was the only factor involved, would in fact be shown by this one:
(XBH*1.2871)*(times on baseHR)+HR
This obviously isn't the best method to predict a team's success, and just because a team doesn't match this result doesn't mean they're lucky/unlucky, but it does remove the factor of extra base hitting, so you can look at the result and try to determine why the actual result happened. For example, by this system, the Yankees scored 58 runs more than they should have in 2004, which I'm attributing to a lineup with fewer dead spots than most teams have.
To take the final step, I used the pythagorean formula to determine what each team's win percentage would be if their runs allowed had remained the same, but their runs scored had been adjusted. It's worth noting that while my formula says the Yankees only should have won 84 games, their Pythagorean record from last season (before this change) was 8973. So the change, from my perspective, was only 5 games.
If a team has a high XBH ratio, it makes logical sense that they should also have a high RPSR. In the cases where there are large differences between the two, one has to take a look at the other factors listed above. If the difference can’t be explained by one of those factors, it’s possible you have a luck situation on your hands.
I didn't say doubles were the only factor involved, in fact I listed most, if not all the factors involved. I eliminated extra base hitting from the equation because it's the easiest to quantify. Then, from there, you can look at the numbers again, with one factor removed.
Primates in your last thread offered quite a bit a constructive commentary on the work you had done. They were fairly negative (i.e. we generally said "You aren't doing this well, here are things to look at in order to learn how to do this better") and a bit patronizing. You completely ignored the substance of those comments. I don't really understand why you are posting this work if you are not looking for feedback on it (even if that feedback is negative.)
I'm being a bit of an ####### here, but this sort of stuff (bad, yet pretentious) hits my buttons.
If you find my work pretentious, I'm sorry, but if you're going to hit me for work I haven't even written yet, I'd encourage you to wait.
My take on the difference is pretty simply that pitchers kill rallies (lowering RPSR), but they don't have much effect on XBH ratio. If that's the case, it backs Nick S's observation that XBH and RPSR aren't well correlated, and indicates that the difference is largely due to lineup composition and doesn't measure luck.
Now, it's possible, then, that the difference in rankings that you're observing is interesting and does tell us something useful, but I don't think what it's telling us has been welldefined, and I certainly don't think you've eliminated much of anything nor made strides in isolating luck.
Your stats do show chances of driving in runs with singles and walks, but because HRs are so common and such an important part of run producing up and down the lineup, I'm not sure what the use of this stat is. Still, even though these stats aren't perfect, I found your article interesting, so thanks.
There are a lot more factors to be considered, though, before you are left with "luck" or random fluctuations. Eg  sacrifice fly ability, bunting ability, baserunning ability esp stealing, etc. True Primates could probably help you out here. I hope they do.
Willie Harris and Aaron Rowand stole 19 and 17 bases, respectively, and were caught just 12 times combined, for a gain of 14 bases. This high of a success rate would imply a) good speed, and b) only going when you're sure you'll make it.
My bet is, though, that if they hadn't had 5 20+ HR hitters behind them (and Frank Thomas), they'd have been running more. A fast player on a home run hitting team will run less to prevent outs, because his chances of being driven in are better than average. Their odds of being able to get from first to third on a single, though, are also better than average. So I'm reluctant to measure speed as the opposite of power. I'm also reluctant to measure speed purely by stolen bases and success rate for much the same reason.
That's a 75% success rate, which is actually about break even. Someone has posted break even stealing percentages, broken down by no. of outs and whether stealing 2nd or 3rd. I think the range is from about 69% to 89% if stealing 3rd with 2 outs. Therefore, Harris and Rowand probably would have scored about as often if they hadn't tried to steal. They scored more times the 36 times they advanced successfully, but they lost some scoring opportunities by being caught 12 times.
So I'm reluctant to measure speed as the opposite of power. I'm also reluctant to measure speed purely by stolen bases and success rate for much the same reason.
You can measure speed in any way you want, for example, number of times advancing first to third on a single, as you said, and call it baserunning ability (I would include #of SB and success rate also), but you need to add these types of factors into the equation, because you need to control for these factors. You've found a difference between teams, and based on the comments you made you seem to understand some of the reasons for the differences. But because your comments are observation based and not statistics based, the stat in its present form doesn't allow us to draw any meaningful conclusions or allow us to utilize it in any meaningful way.
A fast player on a home run hitting team will run less to prevent outs, because his chances of being driven in are better than average
Not sure about that. Be careful about such statements until you research them. A home run hitting team may not have a higher chance of driving someone in, depending on many other factors such as batting average, OBP, number of strikeouts, etc. And a manager's preference for hit and run, sacrifices (rare now, I know) versus stealing also play a role.
Got to go. Good luck.
It's well documented how we react to traditionalists who try to create bogus stats, like Productive Out Percentage. Someone out there is watching us now to see what we'll do with an equally bogus and pointless stat being created, except from the other direction.
And if this is for real, which I'm seriously skeptical about, then what is exactly is the point? The Snow Index Part I attempted to reinvent Linear Weights, and screwed up unbelievably badly while doing so.
Now we have the Snow Index Part 2, theorizing about run scoring efficiency, and not really providing any insight.
Mr. Snow, please don't take this personally. The issues that I have with this research are fairly simple:
1. You are covering pretty well charted ground. It's not that this sort of thing isn't worth trying in a vacuum, but what does this teach us that we don't already know?
Bill James has complained about the number of new stats being created to measure things which we've already figured out, for little purpose. I never gave his comments much weight in this regard, but I'm rapidly reconsidering things.
2. In covering this well charted territory, you're making some pretty serious errors based on assumptions that you take as fact. Furthermore, you seem to be spending your time running correlations on the wrong things.
3. Don't name a stat after yourself.
Mr. Snow, have you considered controlling for the number of outs at the time the runner get on base? That would seem like a huge factor that affects whether the runner scores or not. Not controlling for it would mean that it's somehow included in the RPSR, introducing biases into the "luck" component you are trying to capture.
The correlation between RPSR and XBH Ratio is .21. It's not clear to me why the latter term becomes the gold standard for a new statistic to measure up to. That said, there's very little meaningful variance shared by the two.
First and foremost, I could say "Mr. Snow is my father," but that's not even true. Please, call me KL.
Sometimes I waste time running correlations on something where there's no correlation at all. But you usually don't see that in my work. And, I guess, in this process, a "serious error," by my defintion, would be defined as something that causes the ceiling in my apartment to cave in, and while it is doing that, I'm pretty sure it's not because of this project. Everything else I do I define as exploring a possibility. I don't expect most of these possibilities to be accepted, and in fact neither of the published ones have been. But largely, when I put a prospect out there, I'm not putting it out there as "This is the way it is." i'm putting it out there as "Here's what I'm thinking, would anyone care to offer an alternative suggestion?" My first article was buried under alternative suggestions, and I appreciated that. I spent most of a week digging through them and deciding what I could use. And you'll see the difference when I get back to linear weights.
In response to post 18: I guess, in the article, I spent a bit too much time playing up the "what would happen if RPSR was replaced by XBH Ratio" argument, such that people now think I'm putting too much weight on it. I'm not arguing that this is the only factor, I'm simply arguing that it's an easy factor to eliminate. And the concept I'm working on now to eliminate it will probably look considerably better than the one you just read. Unless, of course, someone provides me a better suggestion along the way.
More work on any topic is good (marketplace of ideas) but the value of this work to the greater community seems pretty low. K.L. Snow may have value as he may be learning something from this (both from the process of thinking about it and doing it and from the comments he's receiving). I agree that more references to existing work would be helpful.
I licky boom boom down!
I've been playing with baseball stats for 20 years, and have invented many dozens of stats, all of which seemed logical at the time. Really, only one of those stats has stood the test of time and is still considered the state of the art by the best analysts.
IOW, the end result is all that counts....
You must be Registered and Logged In to post comments.
<< Back to main