## Wednesday, October 10, 2012

#### Grosnick: Introducing the WAR Index

Introducing WARi! Learn it, know it, live it, Murray Chass!

The other thing we can do, a thing which some folks think is of questionable value, is that we can can average the disparate WAR(P) values into a single number. This number, which I call WAR Index (WARi), I believe is valuable in that it gives a snapshot of the entire existing saber community’s look at a single player’s season or career. While many people neck-deep in objective analysis prefer one form of WAR(P) or another, many casual fans or people new to sabermetrics may just use whatever they are presented with.

...One last thing that I need to bring up is that calculating a player’s adjusted WARs and WAR Index over a career is kind of a painful process. Why? Because, unfortunately, the adjustments per plate appearances vary from season to season. The adjustment from fWAR to rWAR may be exactly the same for 2012 and 2011 right now, but it is different for 2010, and almost every single year prior that I’ve looked at. And while WARP may be a smaller total amount for hitters than rWAR for 2012 and 2011, that wasn’t always the case. The total WARP for hitters in 2010 was actually more than total rWAR, so a different adjustment needs to be made. So it’s a labor-intensive, but somewhat rewarding process to do this over several years, as each year has (two) different adjustments that need to be used to calculate WARi. It can take some time.

Nevertheless, with these adjustments, we finally have a (sort of) equal baseline that we can use to (1) average these three replacement-level measures together and (2) determine which systems have the biggest deltas, or differences between the systems. While it’s not a perfect system, it works for what I’m trying to do, which to identify major differences in valuation, and to start to build an overview of how these three systems jointly value a player.

Excited? At least moderately intrigued? I hope so. Later today, I’ll share the qualified hitters for 2012, and show you how they stack up in terms of WAR Index, and where the biggest differences in valuation come from between the WAR systems. Stick around!

sabermetrics

1. The District Attorney Posted: October 10, 2012 at 10:30 AM (#4261574)
Know what'll fix the inherent problems with hanging one number on everyone's nose? If we really hang one number on everyone's nose!
2. JJ1986 Posted: October 10, 2012 at 10:40 AM (#4261586)
I don't think this works quite right. The first breakdown is to look at WAR generated only by position players and then only by pitchers. This will eliminate the pitcher/hitter ration between systems. For example, bWAR assigns 59% of WAR to pitchers, but if WARP assigns only 50, it will be normalized to 59 even though that's probably not an arbitrary decision.
3. salvomania Posted: October 10, 2012 at 10:43 AM (#4261594)
How is one's assessment of the value of WAR (any version) affected by acknowledging that the defensive components are based on inputs that sometimes vary wildly for a given player?

For example, last year I noticed, and made a comment in one of these threads about an infielder---I want to say it was Cano, but now I can't be sure (Pedroia?)---that was rated by one system as very good defender, and another had him as below average...

Now there's a huge difference between "very good" and "below average," and there are many players for whom this is the case. Also, we're told to take a single year's worth of defensive data with a grain of salt, especially if it looks anomalous for a given player.

So given that these systems are often inconsistent in regards to describing one player's performance, and often inconsistent in comparing Player A with Player B, how comfortable are we with having confidence in a number (WAR) that is derived (in part) from them?

I'm on board with the concept of WAR, but the defensive component seems like such a moving target, at least for now, that using WAR for anything other than to gain a general impression of a player's performance seems kind of ridiculous. Any maybe that's how most BTF-types use WAR, but it sure seems like I see a lot of posts/comments that are quibbling over a few points of WAR as if it means something.
4. JJ1986 Posted: October 10, 2012 at 11:02 AM (#4261615)
Actually, I'm wrong. They all have about the same split (60/40, 59/41 and 61/39). They just have wildly different replacement levels. For bWAR, replacement is a .320 team. For fWAR it is a .263 team, and for WARP it is a .353 team..
5. TDF didn't lie, he just didn't remember Posted: October 10, 2012 at 11:24 AM (#4261640)
Now there's a huge difference between "very good" and "below average," and there are many players for whom this is the case. Also, we're told to take a single year's worth of defensive data with a grain of salt, especially if it looks anomalous for a given player.
Here's a question for those around here who actually calculate WAR: What if, instead of using 1 year's defensive "value" for a player to figure yearly WAR, a 3-year rolling average was used?

If 1 year's data is too noisy to tell you anything, would a 3 year average better express a player's defensive "talent" in any given season, making WAR (in whatever guise) more accurate?
6. SG Posted: October 10, 2012 at 11:47 AM (#4261681)
If 1 year's data is too noisy to tell you anything, would a 3 year average better express a player's defensive "talent" in any given season, making WAR (in whatever guise) more accurate?

I don't like this at all. I think when you start trying to assign value in 2012 with components from 2010 and 2011 you're making things worse, not better. Players can have good and bad defensive seasons, and I don't think potentially dragging that into the mix improves anything.

My preference has been that if you want to regress the defensive component of WAR, whatever WAR you're looking at, use a few of the available defensive metrics in any one season and average them, or divide the defensive component by some number to regress it towards league average (ie use 2 to regress halfway towards league average).
7. Rally Posted: October 10, 2012 at 11:50 AM (#4261686)
Actually, I'm wrong. They all have about the same split (60/40, 59/41 and 61/39). They just have wildly different replacement levels. For bWAR, replacement is a .320 team. For fWAR it is a .263 team, and for WARP it is a .353 team..

That makes sense - the batter/pitcher split at least. Pitchers have to be less than 50% as they control the majority of run prevention (but not all), and represent either zero on the offensive side (AL) or virtually zero (NL).

At least that logic applies in a situation where the spread of defensive and offensive talent is equal, as it generally is in MLB. Theoretically the split could be anything, if all batters were robots programmed to swing exactly the same, then fielders and pitchers would deserve 100% credit for differing outcomes. If you used a pitching machine or a batting tee and robot fielders, then batters would get 100% of the credit.
8. PreservedFish Posted: October 10, 2012 at 11:52 AM (#4261689)
For WAR, baseball is 60% pitching?
9. JJ1986 Posted: October 10, 2012 at 11:55 AM (#4261699)
60% positions players, which includes hitting and defense.
10. PreservedFish Posted: October 10, 2012 at 12:06 PM (#4261713)
Ah. So the split is 50/40/10: offense/pitching/defense? How do they come up with that? Just curious.
11. DL from MN Posted: October 10, 2012 at 12:14 PM (#4261718)
Averaging the systems together will come up with a system that is more accurate because everyone is closer to average but less precise because you won't be able to separate the good from the bad as effectively.

Averaging the components separately makes more sense to me because they tend to have wildly different error. Batting and baserunning are "low" error and can probably be averaged without losing much precision. Pitching is "medium" error depending on whether you use DIPS or what actually happened. Fielding is "high" error and we should be really careful which fielding systems make the cut for averaging.
12. snapper (history's 42nd greatest monster) Posted: October 10, 2012 at 12:26 PM (#4261728)
Ah. So the split is 50/40/10: offense/pitching/defense? How do they come up with that? Just curious.

That makes sense. Run scoring and run prevention are equally valuable. And defence being 20% of prevention is plausible.
13. PreservedFish Posted: October 10, 2012 at 12:28 PM (#4261733)
Yes, that seems reasonable to me.

IIRC Bill James "fudged" run prevention in Win Shares up to something like 58% because the pitcher's numbers were just too low to be credible.
14. DL from MN Posted: October 10, 2012 at 01:10 PM (#4261777)
And defence being 20% of prevention is plausible.

18.5% K, 8% BB and 3.5% HR is average so 70% of balls are put in play. If you split the first 30% 50/50 pitcher and hitter then the latter 70% has to be split 50% hitter, 35% pitcher and 15% defense to get those totals. It doesn't seem very DIPS to give the pitcher 35% of the credit on a ball in play.
15. Ron J2 Posted: October 10, 2012 at 01:43 PM (#4261826)
#14 There's no meaningful signal on popups ("always" out) or line drives (out only if somebody happens to be standing in the right place). I think assigning popups and foulouts as pitcher credit and line drives as mostly pitcher debit gets you to roughly the right place.

Also, pitchers are an important part of the running game (the only study I've seen on the matter suggests that the pitcher is about twice as important as the catcher in the success rate. Unfortunately the study didn't address SB frequency so its of limited value).

Also, DP support is very much a pitching ability. Dunno, the 20% looks pretty reasonable to me.
16. DL from MN Posted: October 10, 2012 at 02:10 PM (#4261888)
pitchers are an important part of the running game

I always put that in the "defense" bucket with this being part of the pitcher's defensive contribution.
17.  Posted: October 10, 2012 at 03:06 PM (#4261998)
Bill James "fudged" run prevention in Win Shares up to something like 58% because the pitcher's numbers were just too low to be credible.

52%, actually, which isn't unreasonable. James justified doing this by noting that bad teams tend to be slightly worse in run prevention than run scoring.

-- MWE

18. RMc Has Bizarre Ideas to Fix Baseball Posted: October 10, 2012 at 03:08 PM (#4262003)
Run scoring and run prevention are equally valuable.

Well, kinda. If you're in a league that averages two runs per game, it's possible for a hitter to produce three runs a game, but a pitcher can't prevent three runs a game...
19.  Posted: October 10, 2012 at 03:44 PM (#4262077)
. It doesn't seem very DIPS to give the pitcher 35% of the credit on a ball in play.

I think the argument is that a large portion of those are routine plays, not that the picture is getting credit.
20.  Posted: October 10, 2012 at 03:45 PM (#4262078)
Also, pitchers are an important part of the running game (the only study I've seen on the matter suggests that the pitcher is about twice as important as the catcher in the success rate. Unfortunately the study didn't address SB frequency so its of limited value).

Really? That goes counter to everything I've ever read on the subject. With of course the extremes being an exception.
21. DL from MN Posted: October 10, 2012 at 04:04 PM (#4262109)
If you ask the coaches the #1 thing they use to determine whether a runner should steal is the pitcher time to home plate. #2 is catcher release and arm. Pitchers who are slow to home give up more stolen bases than pitchers who have less of a delivery. Pitchers are also responsible to keep the runner close to the base.

The defensive aspect of run prevention is completely ignored by a DIPS analysis and is one reason guys like Mark Buehrle outperform FIP.

22.  Posted: October 10, 2012 at 04:18 PM (#4262139)
If you ask the coaches the #1 thing they use to determine whether a runner should steal is the pitcher time to home plate. #2 is catcher release and arm. Pitchers who are slow to home give up more stolen bases than pitchers who have less of a delivery. Pitchers are also responsible to keep the runner close to the base.

That would determine whether they attempt to steal or not, it doesn't really explain who is responsible for getting the guy out though, since teams don't steal at 100% clip, the speed to the plate is a determining factor of when to go(there is a good article someone posted in another thread that the Cardinals talk about this in depth)

I don't doubt for a second that pitchers are the number one reason for steal attempts(again with exceptions such as Bench/Molina/Piazza being the outliers) but not sure that it's the number one reason for success rate.
23. Walt Davis Posted: October 10, 2012 at 05:44 PM (#4262292)
If 1 year's data is too noisy to tell you anything, would a 3 year average better express a player's defensive "talent" in any given season, making WAR (in whatever guise) more accurate?

Are we measuring value/performance or projection/true talent? A 3-year (or 4, 5) weighted average (with age adjustment) is how projection is done and therefore also how "true talent at time X" is measured. But WAR is generally considered a measure of actual performance/value unless you're Dave Cameron. :-)

In terms of combining WAR, the first thing you HAVE to do is put them on the same replacement level. If fWAR is gonna give a guy 2.5 WAR and bWAR 2 WAR purely because fWAR's replacement level is lower, then averaging the two to 2.25 WAR does nothing but give you a new replacement level (an assumption which is hidden behind the averaging). (I don't know if the diff in replacement levels is .5 wins, just picking easy numbers) Similarly if they are using different run to win conversion factors, that's a "method artifact" not a difference in true talent. (So, among other things, I would probably work at the level of runs not wins and possibly even something "pre-runs" if I could figure out how.)

Combining multiple measures of pitcher WAR when one is strongly FIP-based and others aren't doesn't strike me as productive either. We know the big reason why those differ and there's no reason to think that averaging a FIP-based measure and a non (or less) FIP-based measure gets you closer to the "the truth" -- again, you're just making another (hidden) assumption, this time about the "proper" mix of FIP vs. "actual". That one's not so easy to sort out I don't think because it's not a simple linear adjustment like equating replacement level is.
24.  Posted: October 10, 2012 at 05:57 PM (#4262310)
In terms of combining WAR, the first thing you HAVE to do is put them on the same replacement level.

I was wondering about that. I knew they had different replacement levels(although that isn't the source of their differences)
25. shoewizard Posted: October 10, 2012 at 10:59 PM (#4262815)
Walt, according to TFA didn't the author rescale each WAR to the same replacement level prior to averaging?

26. bjhanke Posted: October 10, 2012 at 10:59 PM (#4262817)
"Here's a question for those around here who actually calculate WAR: What if, instead of using 1 year's defensive "value" for a player to figure yearly WAR, a 3-year rolling average was used?"

Walt's answer to this (#23) is dead on, as Walt's answers often are. You use one-year numbers if you're trying to estimate the player's VALUE for that year alone. You use multi-year data when you're trying to figure out how GOOD the player is in general, instead of how valuable he was in that one year. This is a principle that underlies virtually everything in sabermetrics. It is, pretty much, the very first question you have to ask yourself before you start doing any analysis. Am I trying to estimate one-year historical VALUE, or multi-year player QUALITY? Everything else flows from your answer to that question. - Brock Hanke
27. PreservedFish Posted: October 10, 2012 at 11:30 PM (#4262933)
Am I trying to estimate one-year historical VALUE, or multi-year player QUALITY? Everything else flows from your answer to that question.

I personally don't have an issue with a Frankenstein's statistic in this case, where you meld hitting value with fielding quality.
28. Der-K: downgraded to lurker Posted: October 10, 2012 at 11:36 PM (#4262950)
on 17 / 52%: I agree that pitching + defense is sliiiightly more valuable than offense, but have never come up with a formulation that satsified as to what exactly is should be.
Put more generally, we know that a prevented run is more valuable than an additional run scored for a good team (the converse is true for a bad one). There are also tactical advatages, I think, to being able to keep

Brock/Walt - 1 year d: Yes, but... that one year estimate of value has a lot of noise, which you can handle in a few ways. Like regressing to that player's mean (as opposed to zero).
So, I agree w you on principle, but this isn't the same thing as counting walks tallied or bases stolen.
29. Harold can be a fun sponge Posted: October 10, 2012 at 11:48 PM (#4262987)
Run scoring and run prevention are equally valuable.

Well, maybe, but there's no reason they have to be. I mean, AROM *just explained it*:

At least that logic applies in a situation where the spread of defensive and offensive talent is equal, as it generally is in MLB. Theoretically the split could be anything, if all batters were robots programmed to swing exactly the same, then fielders and pitchers would deserve 100% credit for differing outcomes. If you used a pitching machine or a batting tee and robot fielders, then batters would get 100% of the credit.
30. TomH Posted: October 11, 2012 at 07:29 AM (#4263214)
For example, in slowpitch softball, most of the better teams are good because they HIT. Hard to defense a big fly.
31. villageidiom Posted: October 11, 2012 at 08:40 AM (#4263248)
"Here's a question for those around here who actually calculate WAR: What if, instead of using 1 year's defensive "value" for a player to figure yearly WAR, a 3-year rolling average was used?"
What if, instead of saying Miguel Cabrera hit more HR this year than anyone else in the AL, we took a 3-year average to determine who won the 2012 HR crown? Clearly, Jose Bautista, with 27 HR in 2012, should be the champion because his 3-year average demonstrates he was likely the better HR hitter in 2012 than Cabrera.

(I've got a club, and there's a dead horse here. What else am I supposed to do?)

