The other thing we can do, a thing which some folks think is of questionable value, is that we can can average the disparate WAR(P) values into a single number. This number, which I call WAR Index (WARi), I believe is valuable in that it gives a snapshot of the entire existing saber community’s look at a single player’s season or career. While many people neck-deep in objective analysis prefer one form of WAR(P) or another, many casual fans or people new to sabermetrics may just use whatever they are presented with.
...One last thing that I need to bring up is that calculating a player’s adjusted WARs and WAR Index over a career is kind of a painful process. Why? Because, unfortunately, the adjustments per plate appearances vary from season to season. The adjustment from fWAR to rWAR may be exactly the same for 2012 and 2011 right now, but it is different for 2010, and almost every single year prior that I’ve looked at. And while WARP may be a smaller total amount for hitters than rWAR for 2012 and 2011, that wasn’t always the case. The total WARP for hitters in 2010 was actually more than total rWAR, so a different adjustment needs to be made. So it’s a labor-intensive, but somewhat rewarding process to do this over several years, as each year has (two) different adjustments that need to be used to calculate WARi. It can take some time.
Nevertheless, with these adjustments, we finally have a (sort of) equal baseline that we can use to (1) average these three replacement-level measures together and (2) determine which systems have the biggest deltas, or differences between the systems. While it’s not a perfect system, it works for what I’m trying to do, which to identify major differences in valuation, and to start to build an overview of how these three systems jointly value a player.
Excited? At least moderately intrigued? I hope so. Later today, I’ll share the qualified hitters for 2012, and show you how they stack up in terms of WAR Index, and where the biggest differences in valuation come from between the WAR systems. Stick around!
Repoz
Posted: October 10, 2012 at 09:42 AM |
31 comment(s)
Login to Bookmark
Tags:
sabermetrics
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. The District Attorney Posted: October 10, 2012 at 10:30 AM (#4261574)For example, last year I noticed, and made a comment in one of these threads about an infielder---I want to say it was Cano, but now I can't be sure (Pedroia?)---that was rated by one system as very good defender, and another had him as below average...
Now there's a huge difference between "very good" and "below average," and there are many players for whom this is the case. Also, we're told to take a single year's worth of defensive data with a grain of salt, especially if it looks anomalous for a given player.
So given that these systems are often inconsistent in regards to describing one player's performance, and often inconsistent in comparing Player A with Player B, how comfortable are we with having confidence in a number (WAR) that is derived (in part) from them?
I'm on board with the concept of WAR, but the defensive component seems like such a moving target, at least for now, that using WAR for anything other than to gain a general impression of a player's performance seems kind of ridiculous. Any maybe that's how most BTF-types use WAR, but it sure seems like I see a lot of posts/comments that are quibbling over a few points of WAR as if it means something.
If 1 year's data is too noisy to tell you anything, would a 3 year average better express a player's defensive "talent" in any given season, making WAR (in whatever guise) more accurate?
I don't like this at all. I think when you start trying to assign value in 2012 with components from 2010 and 2011 you're making things worse, not better. Players can have good and bad defensive seasons, and I don't think potentially dragging that into the mix improves anything.
My preference has been that if you want to regress the defensive component of WAR, whatever WAR you're looking at, use a few of the available defensive metrics in any one season and average them, or divide the defensive component by some number to regress it towards league average (ie use 2 to regress halfway towards league average).
That makes sense - the batter/pitcher split at least. Pitchers have to be less than 50% as they control the majority of run prevention (but not all), and represent either zero on the offensive side (AL) or virtually zero (NL).
At least that logic applies in a situation where the spread of defensive and offensive talent is equal, as it generally is in MLB. Theoretically the split could be anything, if all batters were robots programmed to swing exactly the same, then fielders and pitchers would deserve 100% credit for differing outcomes. If you used a pitching machine or a batting tee and robot fielders, then batters would get 100% of the credit.
Averaging the components separately makes more sense to me because they tend to have wildly different error. Batting and baserunning are "low" error and can probably be averaged without losing much precision. Pitching is "medium" error depending on whether you use DIPS or what actually happened. Fielding is "high" error and we should be really careful which fielding systems make the cut for averaging.
That makes sense. Run scoring and run prevention are equally valuable. And defence being 20% of prevention is plausible.
IIRC Bill James "fudged" run prevention in Win Shares up to something like 58% because the pitcher's numbers were just too low to be credible.
18.5% K, 8% BB and 3.5% HR is average so 70% of balls are put in play. If you split the first 30% 50/50 pitcher and hitter then the latter 70% has to be split 50% hitter, 35% pitcher and 15% defense to get those totals. It doesn't seem very DIPS to give the pitcher 35% of the credit on a ball in play.
Also, pitchers are an important part of the running game (the only study I've seen on the matter suggests that the pitcher is about twice as important as the catcher in the success rate. Unfortunately the study didn't address SB frequency so its of limited value).
Also, DP support is very much a pitching ability. Dunno, the 20% looks pretty reasonable to me.
I always put that in the "defense" bucket with this being part of the pitcher's defensive contribution.
52%, actually, which isn't unreasonable. James justified doing this by noting that bad teams tend to be slightly worse in run prevention than run scoring.
-- MWE
Well, kinda. If you're in a league that averages two runs per game, it's possible for a hitter to produce three runs a game, but a pitcher can't prevent three runs a game...
I think the argument is that a large portion of those are routine plays, not that the picture is getting credit.
Really? That goes counter to everything I've ever read on the subject. With of course the extremes being an exception.
The defensive aspect of run prevention is completely ignored by a DIPS analysis and is one reason guys like Mark Buehrle outperform FIP.
That would determine whether they attempt to steal or not, it doesn't really explain who is responsible for getting the guy out though, since teams don't steal at 100% clip, the speed to the plate is a determining factor of when to go(there is a good article someone posted in another thread that the Cardinals talk about this in depth)
I don't doubt for a second that pitchers are the number one reason for steal attempts(again with exceptions such as Bench/Molina/Piazza being the outliers) but not sure that it's the number one reason for success rate.
Are we measuring value/performance or projection/true talent? A 3-year (or 4, 5) weighted average (with age adjustment) is how projection is done and therefore also how "true talent at time X" is measured. But WAR is generally considered a measure of actual performance/value unless you're Dave Cameron. :-)
In terms of combining WAR, the first thing you HAVE to do is put them on the same replacement level. If fWAR is gonna give a guy 2.5 WAR and bWAR 2 WAR purely because fWAR's replacement level is lower, then averaging the two to 2.25 WAR does nothing but give you a new replacement level (an assumption which is hidden behind the averaging). (I don't know if the diff in replacement levels is .5 wins, just picking easy numbers) Similarly if they are using different run to win conversion factors, that's a "method artifact" not a difference in true talent. (So, among other things, I would probably work at the level of runs not wins and possibly even something "pre-runs" if I could figure out how.)
Combining multiple measures of pitcher WAR when one is strongly FIP-based and others aren't doesn't strike me as productive either. We know the big reason why those differ and there's no reason to think that averaging a FIP-based measure and a non (or less) FIP-based measure gets you closer to the "the truth" -- again, you're just making another (hidden) assumption, this time about the "proper" mix of FIP vs. "actual". That one's not so easy to sort out I don't think because it's not a simple linear adjustment like equating replacement level is.
I was wondering about that. I knew they had different replacement levels(although that isn't the source of their differences)
Walt's answer to this (#23) is dead on, as Walt's answers often are. You use one-year numbers if you're trying to estimate the player's VALUE for that year alone. You use multi-year data when you're trying to figure out how GOOD the player is in general, instead of how valuable he was in that one year. This is a principle that underlies virtually everything in sabermetrics. It is, pretty much, the very first question you have to ask yourself before you start doing any analysis. Am I trying to estimate one-year historical VALUE, or multi-year player QUALITY? Everything else flows from your answer to that question. - Brock Hanke
I personally don't have an issue with a Frankenstein's statistic in this case, where you meld hitting value with fielding quality.
Put more generally, we know that a prevented run is more valuable than an additional run scored for a good team (the converse is true for a bad one). There are also tactical advatages, I think, to being able to keep
Brock/Walt - 1 year d: Yes, but... that one year estimate of value has a lot of noise, which you can handle in a few ways. Like regressing to that player's mean (as opposed to zero).
So, I agree w you on principle, but this isn't the same thing as counting walks tallied or bases stolen.
Well, maybe, but there's no reason they have to be. I mean, AROM *just explained it*:
(I've got a club, and there's a dead horse here. What else am I supposed to do?)
You must be Registered and Logged In to post comments.
<< Back to main