Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Sunday, September 30, 2012

How Many Baseball Writers Have Called or E-mailed to Talk to Me About What Goes Into WAR? Zero.

Hey Bill Madden and Jerry Green, pick up your rotting Inspector Henderson phones and give Sean Forman a call about WAR!

You may have heard that the AL MVP is between a player who may win the Triple Crown and a player who most (if not all) of the stathead-friendly sites say is the best player in the league this year. There have been a number of articles being written by veteran writers about how stupid WAR is—complaining it’s incomprehensible, stupid, meaningless, dumb, formulas are different, etc. etc.

Here are a couple of recent examples:

Here is Bill Madden in the New York Daily News

Here is Jerry Green in the Detroit News

Now I’m painting the baseball media with a broad brush, but each of these types of articles gets my hackles up. I’m a fellow card-carrying-member of the BBWAA and one would think that I would be afforded some professional courtesy before having a stat we produce being berated in print.

Not a single member of the print media, the broadcast media or radio has reached out to me to learn more about WAR since this MVP controversy has erupted. Not one. First, I apologize to the curious and hard working media members who put in the time to study the game and its analysis in detail. You know who you are, and I appreciate your hard work. I’m sure many have taken the time to read our exhaustive introduction to WAR. But in the last two months not a single person has called or e-mailed asking for more information and that includes Bill Madden and Jerry Green.

So if you are a member of the media who is skeptical about WAR and want to get some questions answered. Or if you are a radio or tv host want to talk to me on the air or on the record to excoriate me for WAR’s failings. Let me know. I’ll appear on any radio show to discuss WAR and make time for any writer who wants to learn something about it or debate its merits.

Repoz Posted: September 30, 2012 at 10:30 PM | 386 comment(s) Login to Bookmark
  Tags: sabermetrics, site news

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 3 of 4 pages  < 1 2 3 4 > 
   201. Kiko Sakata Posted: October 01, 2012 at 04:07 PM (#4250222)
The idea of 'replacement level' doesn't really work for defence, except to indicate the player who will be taken out in the late innings of a close game.

A replacement-level hitter might well field his position better than about a third of a league regulars.


Replacement level should always be thought of in terms of the overall player, not separately by offense and defense, because of this. But one contrast between Cabrera and Trout would be, if you suddenly had to replace Miguel Cabrera, you'd lose a ton of offense, but you might actually be able to replace him with a BETTER defender. If you had to replace Mike Trout, you'd almost certainly have to replace him with a player who was worse at everything: worse hitter, worse baserunner, and worse fielder.
   202. Eric J can SABER all he wants to Posted: October 01, 2012 at 04:09 PM (#4250223)
That's at least -- at least -- double the number of extra good outcomes for comparably capable fielders. Moreover, Thome's good outcomes included 49 homeruns and a bunch of extra base hits. Other than the occasional robbed home runs, fielders can't "hit" homeruns (by aaving them). Middle infielders really can't even "hit" extra base hits, only singles.

OK. The difference between Thome and Perez in 2002 was 120 batting runs. WAR usually estimates the difference between the best and worst fielders as what, 40 or 50?

Edit: Also, a big chunk of Thome's advantage over Perez was in walks, which are actually less valuable than most fielding events.
   203. JJ1986 Posted: October 01, 2012 at 04:09 PM (#4250224)
If you had to replace Mike Trout, you'd almost certainly have to replace him with a player who was worse at everything: worse hitter, worse baserunner, and worse fielder.


Unless you were the 2012 Angels, who can replace him with a better defender.

One thing that I've been trying to fit is the idea that teams (especially in the AL) simply don't have the room to carry a backup 3B on their bench. If Cabrera missed a game every 2 weeks, he'd be replaced by a utility infielder. I wonder if durability is more valuable at certain positions.
   204. Mefisto Posted: October 01, 2012 at 04:10 PM (#4250226)
We've tread over this ground for 10 years, but, obviously it's because the offensive stats count REAL things (singles, homeruns, etc) and we're not sure if the defensive ones do or not. You cannot deny that Neifi Perez just hit a homrun - but was that catch that Pete Incaviglia made a tough one? Who knows?


fra paolo already answered this, but defensive stats count the POs (or assists) made. Those are actual events. In the case of both offensive and defensive metrics, those actual events are then translated into runs. You can argue about the translation, but both begin with actual events.

This whole debate seems to me to be a diversion from the real issue, namely, that baserunning and defense should count for MVP balloting. WAR isn't the reason why Trout should win, it's an estimate of the extra value he provides by his baserunning and defense.
   205. PreservedFish Posted: October 01, 2012 at 04:11 PM (#4250230)
fra paolo already answered this, but defensive stats count the POs (or assists) made. Those are actual events. In the case of both offensive and defensive metrics, those actual events are then translated into runs. You can argue about the translation, but both begin with actual events.


Yes, that's right.
   206. cardsfanboy Posted: October 01, 2012 at 04:15 PM (#4250235)
That's not really how it works, though. Even the best hitters make outs in at least half of their PA, Bonds excepted, and replacement level hitters don't make outs all the time. The difference comes in (1) the number of plate appearances that have positive outcomes, and (2) the magnitude of those positive outcomes.


The numbers isn't about the totals at the end of the year, but about the potential variance per play. With every plate appearance there is a wide range of potential variance, every single time. With fielded balls, that isn't the case. A routine fly ball to the fielder is going to be fielded 99+% of the time, regardless of how good of a player he is, and in a potential worse case scenario the range of potentials for most plays is a difference between an out and batter on first(maybe second). With fielding it's about trying to gauge which plays are the plays that have the widest potential of variance and measuring those plays only.
   207. Ron J2 Posted: October 01, 2012 at 04:23 PM (#4250242)
fra paolo already answered this, but defensive stats count the POs (or assists) made. Those are actual events. In the case of both offensive and defensive metrics, those actual events are then translated into runs. You can argue about the translation, but both begin with actual events.


The underlying assumption of every single defensive method (except fielding percentage) is that all other things being equal the better fielder will turn more balls put in play into outs.

The problem is sussing out the various things that skew this. Real position level park effects, dealing with discretionary chances as well as unequal distribution of difficult chances.

I'm doubtful we'll ever get a "true" answer (as in precision to the tenth of a run) though I'm confident we can eventually cut the level of uncertainty.
   208. fra paolo Posted: October 01, 2012 at 04:24 PM (#4250244)
Replacement level should always be thought of in terms of the overall player, not separately by offense and defense, because of this.

Which to my mind is a reason why, in certain debates, I still cling to old-fashioned league average as opposed to replacement level.

if you suddenly had to replace Miguel Cabrera, you'd lose a ton of offense, but you might actually be able to replace him with a BETTER defender.

This, though, suggests a problem that relates to issues with RBI and defensive statistics. It's not Miguel Cabrera's fault that he's been asked to move to 3b. At 1b, his defensive shortcomings had significantly less impact on the team, going by the 'advanced' metrics.

I'd give Cabrera a little bit of leeway on his defence for exactly that reason.
   209. BDC Posted: October 01, 2012 at 04:25 PM (#4250247)
One thing that I've been trying to fit is the idea that teams (especially in the AL) simply don't have the room to carry a backup 3B on their bench. If Cabrera missed a game every 2 weeks, he'd be replaced by a utility infielder. I wonder if durability is more valuable at certain positions

Though I wonder (again, especially in the AL) whether many teams really have a backup SS or 2B either, anymore. The Rangers (the only team whose roster I can remotely keep up with), before rosters expanded in September, were carrying 12 pitchers, 2 catchers, 4 outfielders, 4 starting infielders, and a DH (usually), which left two roster slots. For much of the season these were held by Alberto Gonzalez, a generic 456 kind of guy, and Brandon Snyder, who can play first, third, the outfield, and catch in an emergency, and I bet if you asked him he'd play SS or 2B too. As a result, their depth depended a lot on the fact that one of their C can play 1B, one of their 1B can play RF, anyone can DH, and their DH can play everywhere badly (including 3B). This kind of musical chairs (even involving the catchers a lot of the time) seems to prevail nowadays, though my sample size is admittedly one team.
   210. Eric J can SABER all he wants to Posted: October 01, 2012 at 04:28 PM (#4250248)
The numbers isn't about the totals at the end of the year, but about the potential variance per play.

No, it's about the actual variance per play. The fact that it's theoretically possible for Matt Wieters to hit a home run in each of his plate appearances will only matter if he actually does it someday.
   211. JJ1986 Posted: October 01, 2012 at 04:34 PM (#4250254)
Though I wonder (again, especially in the AL) whether many teams really have a backup SS or 2B either, anymore.


I think most of them do. Looking at stats, only the White Sox really didn't.
   212. AROM Posted: October 01, 2012 at 04:35 PM (#4250255)
It's not Miguel Cabrera's fault that he's been asked to move to 3b. At 1b, his defensive shortcomings had significantly less impact on the team, going by the 'advanced' metrics.


In this case, at least according to the BBref metrics, Cabrera's WAR is helped by the move to 3rd base. Over the previous 4 years he averaged -5 runs as a 1B. Playing third, some predicted he'd be a disaster, but BBref has him at a very non-disaster rating of -5.

Looking at the most simple defensive stat, the same guy who made 13 errors at first base in both 2010 and 2011 has made only 13 errors playing third base this year.
   213. fra paolo Posted: October 01, 2012 at 04:41 PM (#4250262)
at least according to the BBref metrics

Fangraphs
2011 1b -3 DRS, -3.8 UZR
2012 3b -5 DRS, -9.2 UZR

Relative to the league Cabrera is either the worst or one of the worst regular 3bs in 2012, depending on how one wants to define 'regular'. I would be surprised if he is plumbing the depths of the historically bad. At 1b in 2011, he was a below average 1b, but with a bit of distance between him and the bottom.

EDIT: Looking at some other seasons, Mark Reynolds for Baltimore in 2011 or Mark Teahen for KC in 2005 might be my definition of 'historically bad'.
   214. BDC Posted: October 01, 2012 at 04:46 PM (#4250266)
whether many teams really have a backup SS or 2B either, anymore

I think most of them do


But is it the same guy, or truly one dedicated bench SS and one bench 2B? And how many also play 3B? I am truly clueless but curious.
   215. Ron J2 Posted: October 01, 2012 at 04:54 PM (#4250274)
#214 An AL team carrying 12 pitchers only has 4 spots on the bench. Unless your 4th OF or your backup catcher can play third there's just no way to carry two single position middle infielders. (even then it's probably tactically less than optimum. Can't do much platooning and it's tough to carry a good pinch-hitting option.)

   216. JJ1986 Posted: October 01, 2012 at 04:55 PM (#4250277)
But is it the same guy, or truly one dedicated bench SS and one bench 2B? And how many also play 3B? I am truly clueless but curious.


Almost all one guy. Usually a 2Bmen who's stretched at SS.

NYY - Nix: SS and 2B
BAL - Flaherty at 2B and if they needed to (pre-September) I assume Andino would have moved to SS so he covers that spot too.
TB - Sean Rodriguez at SS and 2B (though they have dozens of guys who can play either)
TOR - Vizquel at both
BOS - They had Punto at both for most of the season
DET - Ramon Santiago at SS and 2B
CHW - None
KC - Yuni for most of the year at 2B. I assume he would have covered SS too.
CLE - Jason Donald, I guess. Really boring team.
MIN - Jamey Carroll at both.
TEX - Alberto Gonzalez at both even though they don't really need him with Young on the team.
OAK - They have three guys (Rosales, Hicks, Sogard). I assume only one was on the team at a time.
LAA - Maicer at both.
SEA - Munenori Kawasaki at both.
   217. AROM Posted: October 01, 2012 at 05:06 PM (#4250289)
Relative to the league Cabrera is either the worst or one of the worst regular 3bs in 2012, depending on how one wants to define 'regular'. I would be surprised if he is plumbing the depths of the historically bad. At 1b in 2011, he was a below average 1b, but with a bit of distance between him and the bottom.


Using UZR, he's -4 as a 1B in 2011, but also gets a -10 positional adjustment. In 2012, he's -9 at 3B, so worse compared to his competition, but now gets a +2 position adjustment. So by either WAR calculation he rates higher after the move. That's a credit to Cabrera, for playing the position better than most people expected he would.
   218. andrewberg Posted: October 01, 2012 at 05:29 PM (#4250309)
I've been lurking, but I'd like to say that this is one of the most interesting baseball threads on the site in ages. What are the odds?

Replacement level should always be thought of in terms of the overall player, not separately by offense and defense, because of this.


I agree with this statement mainly because the theoretical replacement player would have to play both sides of the ball. It also seems that, at premium defensive positions, there are more guys who are passable defensively and not remotely passable offensively than the other way around. I suppose that is a byproduct of guys being pushed down the defensive spectrum if the bat can play, but having no where to put them if only the glove can play.
   219. GGC don't think it can get longer than a novella Posted: October 01, 2012 at 08:00 PM (#4250441)
The difference between passer rating and WAR is that passer rating is an official stat. Link. Once Elias starts calculating WAR, Madden will use it. I don't think Madden and company would quote something like DVOA if they were writing about football.
   220. GregD Posted: October 01, 2012 at 08:10 PM (#4250451)
The difference between passer rating and WAR is that passer rating is an official stat. Link. Once Elias starts calculating WAR, Madden will use it. I don't think Madden and company would quote something like DVOA if they were writing about football.
And also plenty of writers mock passer rating. Any time a quality QB with a low rating runs up some wins you see the mockery come out. So it's not like Passer Rating is treated as the be-all and end-all by these same writers.
   221. Howie Menckel Posted: October 01, 2012 at 08:33 PM (#4250466)
But coaches seem to seek to maximize QB ratings the same way that managers maximize SVs.

6-1, 2 outs in the 9th, 2 on, wow, the card tells the MGR he can actually hand out a SV here - so yes, here is my closer entering the game. But at 4-1 in the 9th, no out after a setup man allows a leadoff HR - no closer against this meat of the order, but no closer appearance at that point because the card correctly notes "not a SV situation" (tying run not on deck, didn't start the inning, and the dumb rule doesn't know or care how many outs there are or who is coming up).

NFL coaches are even dumber, so they don't as consciously bow to the QB rating. But look at how much the completion PCTs have soared. Down 14 pts with 8 minutes left, you'll almost never see that QB throw a 40-yard pass in the air, even though two of the three Truest Outcomes are CATCH and Pass Interference on the defense (granting that INC is the most likely of the 3).

Instead, the coach will work the clock methodically, to the delight of the defense, picking up 3-5 yards at a time as the clock runs. Great for the QB passer rating, bad for Win Probability. The typical NFL coach seems to try to score a TD with as little time left as possible, then fail in an onside kick scenario at a point where he'll likely lose even if his team recovers the onside kick.

I'm not going to say that this 2012 QB with the better QB rating is better than the one from 1965, given how the game was played.
   222. Tricky Dick Posted: October 01, 2012 at 08:52 PM (#4250485)
1. Darwin Barney isn't a better player than Josh Hamilton.


That is not self evident to me. I like how people can throw out a metric they don't like without proving why. And this is a one season stat you are comparing. Barney may not be a better true talent player than Hamilton, but Barney is having a hell of a year. I was skeptical of his defensive stats until I actually saw him play defense for about quite a few games this season. He is an incredible defensive infielder. I haven't seen Hamilton play as much, but what I have seen this year wouldn't lead me to question negative defensive metrics. As others have said, WAR is not the be all, end all stat. But I don't throw it out just because the result isn't what I'm predisposed to believe.
   223. tshipman Posted: October 01, 2012 at 09:05 PM (#4250491)
NFL coaches are even dumber, so they don't as consciously bow to the QB rating. But look at how much the completion PCTs have soared. Down 14 pts with 8 minutes left, you'll almost never see that QB throw a 40-yard pass in the air, even though two of the three Truest Outcomes are CATCH and Pass Interference on the defense (granting that INC is the most likely of the 3).

Instead, the coach will work the clock methodically, to the delight of the defense, picking up 3-5 yards at a time as the clock runs. Great for the QB passer rating, bad for Win Probability. The typical NFL coach seems to try to score a TD with as little time left as possible, then fail in an onside kick scenario at a point where he'll likely lose even if his team recovers the onside kick.


Uh ... this is the optimal strategy.
   224. vivaelpujols Posted: October 01, 2012 at 09:06 PM (#4250492)
SugarBearBlanks can suck my balls.
   225. The District Attorney Posted: October 01, 2012 at 09:14 PM (#4250501)
But at 4-1 in the 9th, no out after a setup man allows a leadoff HR - no closer against this meat of the order, but no closer appearance at that point because the card correctly notes "not a SV situation"
Just to be picky, this is a save situation, as the rule has an exception carved out for pitching the final inning with a three-run lead. (And thank God for that.)

In terms of the actual point here... it doesn't strike me as odd that Trout's defensive WAR is so high. And as many have said, I think it's a red herring to even be discussing fielding WAR in terms of the 2012 AL MVP, when the larger point is that Trout is a comparable offensive player to Cabrera and no one is seriously disputing that Trout has far more defensive value. I think what does strike people as odd -- including me -- is when a guy who isn't a CF or SS is being rated as one of the most valuable defensive players in baseball. And to be honest, I suspect that those evaluations are simply incorrect. But even just in terms of trying to understand the old guard's perceptions, I'd definitely distinguish between situations where it's being claimed that Andruw Jones or Ozzie Smith is adding more value solely with his glove than most guys do with their entire game, as opposed to situations where it's being claimed that Carl Crawford, Darwin Barney or Brett Lawrie are doing that.

BTW, I do hate the name "WAR", and I hated Bill James' idea to rename assists "baserunner kills". Baseball is pastoral. (Not to mention we wouldn't have to hear that ####### Edwin Starr joke.) Anyway.
   226. Moeball Posted: October 01, 2012 at 09:18 PM (#4250506)
The problem with WAR is the name. No one wants to buy into a concept that celebrates the killing of small innocent children holding puppies. My suggestion is to rename it:

PEACE
Perfect
Evaluator (of)
Actual
Comprehensive
Excellence.


You left out the "R" - that should be "PEARCE" - Perfect Evaluator (of) Actual Remeasurable Comprehensive Excellence. It would be easy to remember, too, since it so clearly reminds us of the perfect replacement level player - Steve Pearce!
   227. Ray (RDP) Posted: October 02, 2012 at 03:16 AM (#4250685)
I confess I had no clue who Darwin Barney is, and I've finally looked him up. A .661 OPS (80 OPS+) at 2B. 4.5 WAR and 1.4 oWAR. So if I'm doing this right, 31% of his value has been in his offense, and 69% in his defense.

What about doing something like dividing the weight of the defense component by, say, 3? So make the weight of the defense only 23% of his total value, to account for the fact that we are far less confident in evaluations of defense than we are of offense. Then he'd be something like 2.5 WAR (1.4 oWAR + 1.0 dWAR) instead of 4.5 WAR (1.4 oWAR + 3.1 dWAR).

Comments? Am I off the reservation here?

EDIT: Or at least do something like use an average of the other defensive metrics rather than just relying on one.
   228. Greg K Posted: October 02, 2012 at 03:57 AM (#4250689)
EDIT: Or at least do something like use an average of the other defensive metrics rather than just relying on one.

That's sort of what I do in my (patented) end of season assessments. Take defensive numbers from a couple different places and just arbitrarily pick a number that seems to make sense. Not especially scientific but I'm ok with a degree of uncertainty about these things.
   229. Greg K Posted: October 02, 2012 at 04:31 AM (#4250691)
I confess I had no clue who Darwin Barney is, and I've finally looked him up. A .661 OPS (80 OPS+) at 2B. 4.5 WAR and 1.4 oWAR. So if I'm doing this right, 31% of his value has been in his offense, and 69% in his defense.

It might be even more than that depending on how you break things down since oWAR includes a positional adjustment.

EDIT: As a comparison of oWAR v. WAR, Ozzie Smith looks like he maxed out at about 50% by that calculation. Adam Everett hovered around the zero mark in oWAR most of his career so he gets wonky things like over 100% of his value coming from defence at times. Similarly Mark Belanger had 37.6 career WAR with 11.3 oWAR - 70% coming from defence. For Smith and Belanger their WAR totals seem in accordance with how they were perceived as players at the time (Smith anyway, Belanger was before my time).
   230. Howie Menckel Posted: October 02, 2012 at 07:58 AM (#4250732)
"Uh ... this is the optimal strategy."

for the team that is ahead, agreed.

For the team trailing, better to try to score a TD more quickly, then stop the other team so that you can get the ball back via punt, and with more time on the clock.

If that wasn't obvious already, take a cue from what the defense seeks to allow. They don't do charity. They allow the 3-5 yard passes precisely because it eats clock and optimizes their chances at winning. At least that's something NFL coaches get right; I suppose both sides can't be wrong, so someone had to be sensible.

Once a team is within 7 points, then yes, it is better to score with minimal time left than so quickly that it gives ample time for the other team to try to drive for a field goal. But that's not the scenario I gave.

..........

"Just to be picky, this is a save situation,"

not picky, fair point. Thought you had to start the inning, but looks like that's not the case, just not have any outs with the 3-run lead. A dumb rule, still.

   231. The Id of SugarBear Blanks Posted: October 02, 2012 at 08:19 AM (#4250739)
Comments? Am I off the reservation here?


No, you're firmly on the reservation. As noted above, the scale and magnitude of baserunning and defense are off, when compared to offense. The Trout 2012/Bonds 1992 comparison makes the point indisputably. I liked your description -- the components seem "reasonable" viewed separately, but mixing them all together into "WAR" doesn't work well.

The other problem is the lack of dependability of defensive data and statistics, which practically every serious analyst attests to. Even .3 as an adjustment factor might be too high, but it's certainly a good starting point.
   232. Ray (RDP) Posted: October 02, 2012 at 09:08 AM (#4250768)
Miguel Cabrera kind of makes the point. As AROM noted, his defense at 3B has been adequate, and better than most people expected. So his dWAR is plausible but would it shock anyone to learn that his dWAR is way off and he has actually been horrible there, even embarrassingly bad? That's what I'm talking about. But WAR swallows his defensive evaluation whole.
   233. jack the seal clubber (on the sidelines of life) Posted: October 02, 2012 at 09:12 AM (#4250775)
You left out the "R" - that should be "PEARCE" - Perfect Evaluator (of) Actual Remeasurable Comprehensive Excellence. It would be easy to remember, too, since it so clearly reminds us of the perfect replacement level player - Steve Pearce!


Genius. Can't believe I didn't think of this.
   234. Ray (RDP) Posted: October 02, 2012 at 09:25 AM (#4250789)
Actually, for players around 0 dWAR like Cabrera, my 1/3 correction wouldn't work, because 1/3 of 0 is still 0 and so if Cabrera is much worse (or if a player rating at 0 dWAR was much better), the overall WAR would still be off.

My correction would work for an Andruw Jones (or Darwin Barney) if he was rating much better than he was, or for a Gary Sheffield if he was rating much worse. So... back to the drawing board. Maybe taking a composite number for defense from the available metrics would be better. Or just throwing out the garbage and not pretending that we're currently capable of evaluating defense ot this degree would be better. In that case it's back to oWAR.

But some recognition that we can't do this is in order.
   235. BDC Posted: October 02, 2012 at 09:33 AM (#4250801)
Thanks very much for #216! It's very interesting stuff.

I haven't seen Hamilton play as much, but what I have seen this year wouldn't lead me to question negative defensive metrics

Just from observation, Hamilton should not be playing CF anymore, and is just adequate in LF. Theoretically, Hamilton could certainly be bad enough in CF to compensate for being an exceedingly better hitter than Barney, but as many here have said, it's all in the calculations. There's nothing inherently absurd about the proposition.
   236. Ray (RDP) Posted: October 02, 2012 at 11:30 AM (#4250960)
Theoretically, Hamilton could certainly be bad enough in CF to compensate for being an exceedingly better hitter than Barney, but as many here have said, it's all in the calculations. There's nothing inherently absurd about the proposition.


Yes, there is.

Barney has 584 PA of a 79 OPS+ and negligible SB value, playing ~154 games at 2B. OBP/SLG = .301/.357.

Hamilton has 627 PA of a 140 OPS+ 93 with no SB value, playing 146 games (93 in CF, 85 in LF/RF, 10 at DH). OBP/SLG = .356/.580.

It is inherently absurd to think that Barney has been more valuable this year, beating Hamilton 4.6 to 3.5 in WAR.
   237. JJ1986 Posted: October 02, 2012 at 11:37 AM (#4250971)
Isn't Barney getting huge shift credit? His UZR is only 12 compared to 29 DRS.
   238. Fancy Pants Handle doesn't need no water Posted: October 02, 2012 at 11:43 AM (#4250983)
I confess I had no clue who Darwin Barney is, and I've finally looked him up. A .661 OPS (80 OPS+) at 2B. 4.5 WAR and 1.4 oWAR. So if I'm doing this right, 31% of his value has been in his offense, and 69% in his defense.

What about doing something like dividing the weight of the defense component by, say, 3? So make the weight of the defense only 23% of his total value, to account for the fact that we are far less confident in evaluations of defense than we are of offense. Then he'd be something like 2.5 WAR (1.4 oWAR + 1.0 dWAR) instead of 4.5 WAR (1.4 oWAR + 3.1 dWAR).

Comments? Am I off the reservation here?


At that point, why bother with WAR? You aren't actually trying in any real sense of the word to accurately measure defensive value. You've just basically marginalized it out of existence. At that point why not just go look at offense only metrics. We have tons of those. Good ones too.

That aside, yes you are off the reservation (as you seem to have begun to realize in #234). The weight of offense and defense is the same for each player*. Just because one number is higher than the other does not mean it has more weight. Consider for example WAA. The weight of offense and defense is the same as in WAR, yet the offense numbers have all dropped by some 2 wins, while defense numbers remained the same.
*excepting for different opportunities on offense and defense
   239. Famous Original Joe C Posted: October 02, 2012 at 11:49 AM (#4250989)
As noted above, the scale and magnitude of baserunning and defense are off, when compared to offense. The Trout 2012/Bonds 1992 comparison makes the point indisputably.

They're "off" because they "seem" off to you, not for any tangible reason you can actually point to.

I do agree with the idea that defense should be regressed in some way.
   240. JJ1986 Posted: October 02, 2012 at 11:49 AM (#4250990)
The weight should be the same in a perfect world, but we're trying to measure value with somewhat inaccurate tools. If the offensive value is estimated at 50, but is really somewhere between 42 and 58 and the defensive value is estimated at 25, but is really somewhere between 1 and 49, then it makes sense to weigh offense more heavily.
   241. Ron J2 Posted: October 02, 2012 at 11:57 AM (#4250994)
#237 I thought BIS had adjusted their methods. I know Brett Lawrie went from defensive god to merely extremely good.

There's Something going on though since I think UZR andDRS have the same data source.
   242. Ray (RDP) Posted: October 02, 2012 at 11:59 AM (#4250997)
At that point, why bother with WAR?


That's what I'm trying to determine - what the uses of WAR are, given that its defense component breaks the stat.

I can see these uses:

* A tool for doing specific and narrow player comparisons, e.g., for MVP races. If you have a narrow set of players you can break down the components and see whether the components are reasonable... but again, the problem here is that we need to do better than "reasonable" if we are ranking players in this way, because "reasonable" could still be "wrong" and "wrong" throws everything out of whack.

* A blunt tool for broadly classifying players. But VORP does this too, or oWAR. Those assume average defense at the position, so in some respects WAR is better because it takes defense into account... but that introduces a whole new set of problems. If WAR at least gets the defense vector in the right direction I agree it's an improvement, but still not by much, given the complexities.

The best I can say for WAR is this: if the defense component is accurate for Player X, you will get a good approximation of his value. But if it's not, you won't. And right now the "if" is a huge problem. And when you are dealing with a list of several players, the errors are compounded.
   243. The Id of SugarBear Blanks Posted: October 02, 2012 at 11:59 AM (#4250998)
At that point, why bother with WAR? You aren't actually trying in any real sense of the word to accurately measure defensive value. You've just basically marginalized it out of existence. At that point why not just go look at offense only metrics. We have tons of those. Good ones too.

And they're orders of magnitude more dependable and systemetized than the defensive ones.

You can't just add offense over replacement, as determined in every PA, to defense over average, as determined by a whole bunch of subjective decisions not involving every defensive chance, and expect to come up with a single definitive number that makes sense. You're adding an actual set of numbers -- offense -- to a subjectively-determined "virtual" set of numbers -- defense. You're also using different baselines. Not to mention that defense isn't even calclated the same way pre and post-2003.

I'll quote the Fangraphs UZR primer for more illumination:

"Just because UZR, or any other defensive metric "says" that someone is X, even if that X is based on many years of data, does not make it so. When you are dealing with sample data, as we almost always are with every metric in baseball that we encounter, there is a certain chance that the metric is going to be 'wrong.' Sometimes, you can use other information (such as scounting and observation, or physical attributes like size and speed) in order to adjust your "conclusions" and decrease your chance of being "wrong" and sometimes you can't (because the requisite information is not available.)"

This "other information" is what needs to be turned to in situations like Bonds 1992/Trout 2012. Nothing about those players would lead you to believe that Trout made up on defense the entirety, and 60% on top of that, of the difference between an ultra-elite 205 OPS+ and an excellent 168 OPS+. Bonds was very fast, a very good defender, and an outstanding CF when he played CF. The defensive differential is almost certainly wrong and very likely dramatically so.
   244. McCoy Posted: October 02, 2012 at 12:03 PM (#4251002)
re 241. BIS did change their methods but the Cubs don't do the same kind of shifts that get filtered out by Dewan. Their adjustment was to set aside all plays in which 3 infielders are on one side of second base. The Cubs don't really use that kind of shift so when the adjustment was made it didn't really effect the Cub fielders all that much.
   245. BDC Posted: October 02, 2012 at 12:10 PM (#4251007)
It is inherently absurd to think that Barney has been more valuable this year

No, not "inherently." Hamilton could have been so incompetent in CF, butchering so many outs or singles into doubles and triples – while Barney could have so sealed off his sector of the Cub infield – that the difference could have been made up. I'm not saying it actually has been. But it's no more inherently absurd than thinking that Carlos Peña in 2010 (.196 BA, 1.7 WAR) might have been a better player than Dmitri Young in 2007 (.320 BA, 0.1 WAR). Nothing in the logic of the calculations, or the nature of baseball, prevents it.
   246. Mike Webber Posted: October 02, 2012 at 12:11 PM (#4251010)
RE #65
but unless I'm missing it the play index doesn't let you pull up a list of players ranked by oWAR rather than WAR.


2012 Player Value Batting

Not sure if it is in the play index, but is in the seasons - batting - player value

Edit - sorry - I see this has been answered at least twice.
   247. The Id of SugarBear Blanks Posted: October 02, 2012 at 12:27 PM (#4251025)
No, not "inherently." Hamilton could have been so incompetent in CF, butchering so many outs or singles into doubles and triples – while Barney could have so sealed off his sector of the Cub infield – that the difference could have been made up. I'm not saying it actually has been. But it's no more inherently absurd than thinking that Carlos Peña in 2010 (.196 BA, 1.7 WAR) might have been a better player than Dmitri Young in 2007 (.320 BA, 0.1 WAR). Nothing in the logic of the calculations, or the nature of baseball, prevents it.

How many singles would Barney have to "prevent" to catch up to Hamilton's offense? If he went, say, 50 for his next 50, all singles (since that's really all he can prevent), would he?

I get his SLG going up to .494 and his OBP going up to .334. (All math caveats apply.) An OPS of .828. Hamilton's OPS is .936. He'd have to butcher a lot of balls in CF to drop 100 OPS points (and of course, Barney almost certainly didn't convert 50 more outs into hits than a typical 2B)
   248. Ray (RDP) Posted: October 02, 2012 at 12:35 PM (#4251034)
Sometimes, you can use other information (such as scounting and observation, or physical attributes like size and speed) in order to adjust your "conclusions" and decrease your chance of being "wrong" and sometimes you can't (because the requisite information is not available.)"


And this is what I have been pointing out is a problem, particularly when pulling up a list of players sorted by WAR.

A career WAR of 50 could easily be wrong by 10 WAR in either direction due to defense. See, e.g., Sheffield, who has a 76 career oWAR that is gutted by defense such that he ends up with just a 56 WAR. That seems highly implausible to me. No, I can't point to a specific error.
   249. Ray (RDP) Posted: October 02, 2012 at 12:37 PM (#4251036)
Edit - sorry - I see this has been answered at least twice.


Answered twice but unsatisfactorily both times (and now three times with your answer), as Play Index doesn't allow a search by oWAR which was my point.

I'm still hoping Sean F comments on this. Actually, I had hoped he'd be more active in this thread, seeing as he made a lot of noise about nobody in the media asking him anything about WAR, and now the discussion is being had here and he's been silent for the past couple of days.
   250. JJ1986 Posted: October 02, 2012 at 12:37 PM (#4251037)
How many singles would Barney have to "prevent" to catch up to Hamilton's offense? If he went, say, 50 for his next 50, all singles (since that's really all he can prevent), would he?


But you'd have to add 50 hits in 0 at bats to really compare hitting to fielding. If he's turning singles into outs in the field, to approximate that you need to turn outs into singles at the plate. That gives him a .387 OBP and a .448 SLG.
   251. Ray (RDP) Posted: October 02, 2012 at 12:41 PM (#4251045)
I'm not saying it actually has been. But it's no more inherently absurd than thinking that Carlos Peña in 2010 (.196 BA, 1.7 WAR) might have been a better player than Dmitri Young in 2007 (.320 BA, 0.1 WAR). Nothing in the logic of the calculations, or the nature of baseball, prevents it.


Unless you cite OPS+ for Pena and Young, your point has absolutely no meaning.

The last time I relied on batting average to tell me anything about offense I was 15.
   252. BDC Posted: October 02, 2012 at 12:50 PM (#4251059)
Ray, the point is simply that a single number, whatever the number, always has a context, and is subject to illusion. It's as true of OPS+ or batting average as it is of any other metric. If you want OPS+ as that number, substitute Rey Sanchez in 2001 (64 OPS+, 3.2 WAR) and Kevin Reimer in 1992 (119 OPS+, -0.8 WAR). Maybe that's wrong too. But such things are clearly possible, which is to say not inherently absurd.

I just don't see the point of pointing to one side of the equation, or not even doing the math, and then just saying "that's absurd!"
   253. AROM Posted: October 02, 2012 at 12:58 PM (#4251076)
See, e.g., Sheffield, who has a 76 career oWAR that is gutted by defense such that he ends up with just a 56 WAR. That seems highly implausible to me. No, I can't point to a specific error.


When I read that I think you have pointed to a specific error. You don't think Sheffield was 200 runs below average as a fielder. Well, maybe not error, as I watched Gary Sheffield and happen to think -200 runs over 20 years is very plausible for him. But it's a specific point of disagreement.
   254. Johnny Sycophant-Laden Fora Posted: October 02, 2012 at 01:02 PM (#4251085)
Theoretically, Hamilton could certainly be bad enough in CF to compensate for being an exceedingly better hitter than Barney, but as many here have said, it's all in the calculations. There's nothing inherently absurd about the proposition.

Yes, there is.


No there isn't.

Let's say Hamilton is worth 75 more runs on offense than Darwin
Lets' say a typical CF generates 15 more runs on offense than your typical SS (or typical replacement level CF versus SS whatever, I think we all agree that CFs tend to outhit SSs- and Hamilton played nearly 1/2 his games in LF)
So Hamilton is up by 60-65
Can Darwin SAVE that many runs on defense relative to Hamilton? Yes I think so, Hamilton would have to be extremely bad and Darwin extremely good, but it is certainly possible

Whether or not we can we accurately measure that defense- is a separate question- I don't think we can.
Personally I don't think you can, and if you regress each mans fielding stats 1/3 to the mean- Hamilton easily flips ahead of Barney...

That doesn't mean that Barney CAN'T be the better player than Hamilton this year, in my mind it's unlikely, but yes, if he rally is an elite defense SS (near peak Ozzie) and if Hamilton was awful...


   255. Ray (RDP) Posted: October 02, 2012 at 01:07 PM (#4251094)
When I read that I think you have pointed to a specific error. You don't think Sheffield was 200 runs below average as a fielder. Well, maybe not error, as I watched Gary Sheffield and happen to think -200 runs over 20 years is very plausible for him. But it's a specific point of disagreement.


Your system has him as a better outfielder at 33 and 34 than he was in his late 20s. Then he reverted to being a bad outfielder again.

Actually, the specific issue I see is that he was magically a better outfielder both years in Atlanta (ages 33 and 34) than he was for the Marlins, Dodgers, or Yankees.
   256. McCoy Posted: October 02, 2012 at 01:09 PM (#4251097)
Barney is a second baseman and not a SS.
   257. JJ1986 Posted: October 02, 2012 at 01:10 PM (#4251098)
Actually, the specific issue I see is that he was magically a better outfielder both years in Atlanta (ages 33 and 34) than he was for the Marlins, Dodgers, or Yankees.


I'd guess that's a side effect of playing next to the best defensive CF in history.
   258. AROM Posted: October 02, 2012 at 01:18 PM (#4251106)
Nothing magical. Just random variation, and not even an extreme case at that. From LA to ATL to NY his ratings from 2000-05 are -3, -15, -1, -2, -10, -14. Those are results that would be consistent with a -10 true talent and random variation around that base.
   259. DL from MN Posted: October 02, 2012 at 01:23 PM (#4251113)
Seems like we're seeing a lot of outliers that come with the caveat "defensive positioning". How much of that credit should go to the manager instead of the player?
   260. BDC Posted: October 02, 2012 at 01:26 PM (#4251120)
How much of that credit should go to the manager instead of the player?

With the additional caveat that some goes to coaches and scouts, some credit also must go to the pitcher. A pitcher who knows his infield is playing a batter the other way, and throws outside, is aiding his defense a lot more than the bozo who comes inside with a pullable pitch. It just points to the difficulty of separating credit on defense.
   261. Ray (RDP) Posted: October 02, 2012 at 01:28 PM (#4251122)
AROM: Fair enough. Random variation could well explain it. (But, of course, that's my point: there are reasonable explanations for all of this. But that doesn't really get us very far.)

DL: It was hard enough to evaluate defense before, and people were just starting to get a handle on it. But the shifting throws everything out of whack. There is no baseline to compare a shifted 2B to. There's no data. Especially when the SS is shifted as well. "2B" in a shift becomes a meaningless construct having nothing to do with a traditional 2B.
   262. The Id of SugarBear Blanks Posted: October 02, 2012 at 01:28 PM (#4251123)
Colin Wyers, BP, July 16, 2010:

" ... I’ve been slowly coming to some realizations about defensive metrics in general, and they aren’t encouraging.

The short version: I’m not really sure that we’ve gotten any further than where we were when Zone Rating and Defensive Average were proposed in the '80s. And if we have gotten further, I’m not sure how we would really tell ..."
   263. Ray (RDP) Posted: October 02, 2012 at 01:34 PM (#4251133)
My argument in one respect boils down to WAR treating its defense component with just as much certainty as its offense component.
   264. The Id of SugarBear Blanks Posted: October 02, 2012 at 01:46 PM (#4251151)
DL: It was hard enough to evaluate defense before, and people were just starting to get a handle on it. But the shifting throws everything out of whack. There
is no baseline to compare a shifted 2B to. There's no data. Especially when the SS is shifted as well. "2B" in a shift becomes a meaningless construct having nothing to do with a traditional 2B.


As I understand it, DRS doesn't really even account for positioning other than eliminating plays from the data where 3 IFs are on one side of the infield. You could field a ball in a "tough" position in a vector just because you're positioned right in front of it, and DRS will still credit you with making a tough play.
   265. Ray (RDP) Posted: October 02, 2012 at 01:50 PM (#4251160)
This is starting to feel like the debate over NeL statistics, where people were claiming that we had accurate MLEs for Josh Gibson and the like and that the MLEs should be accepted simply because a lot of hard work had been done.

That's about where we are for defense and WAR.
   266. DL from MN Posted: October 02, 2012 at 02:09 PM (#4251213)
It was hard enough to evaluate defense before, and people were just starting to get a handle on it.


The same ball-in-play data that has allowed us to measure defense has altered defense. Is the cat dead or alive?

eliminating plays from the data where 3 IFs are on one side of the infield


That's hardly satisfying. We're only measuring the "easy" stuff?

NeL statistics, where people were claiming that we had accurate MLEs


That's overstating the claim. The claim is we have MLE's at 50% confidence.
   267. Fancy Pants Handle doesn't need no water Posted: October 02, 2012 at 02:10 PM (#4251215)
My argument in one respect boils down to WAR treating its defense component with just as much certainty as its offense component.

Once you combine offense and defense there is no other way to do it. Having a number closer to 0 does not magically mean it has less weight. It takes as much confidence to say Miguel Cabrera is 2 runs below average as it does to say Adrian Beltre is 14 above. Cabrera could be -20, and Beltre could be +25. Defense is an equally important aspect of both players value, and needs to be weighted equally.

This is starting to feel like the debate over NeL statistics, where people were claiming that we had accurate MLEs for Josh Gibson and the like and that the MLEs should be accepted simply because a lot of hard work had been done.

That's about where we are for defense and WAR.

Well as I said, if you want to ignore defense, then ignore defense. That's entirely up to you. But the point of WAR is to combine the values of all aspects of baseball play into one measure. Once you have all but nominally eliminated defense, what you have left is not WAR.
   268. Mike Webber Posted: October 02, 2012 at 02:10 PM (#4251216)
This is starting to feel like the debate over NeL statistics, where people were claiming that we had accurate MLEs for Josh Gibson and the like and that the MLEs should be accepted simply because a lot of hard work had been done.

Of course the main difference being if you question one, you are a racist, and if you question the other you are a Luddite.
   269. Mike Webber Posted: October 02, 2012 at 02:21 PM (#4251242)
Answered twice but unsatisfactorily both times (and now three times with your answer), as Play Index doesn't allow a search by oWAR which was my point.


I thought (and I would assume the other two people that answered you) you wanted to see the oWAR for use in discussing who the MVP is this year, which that list would do fine. You apparently want it for something not really related to this.

I wish BBRef would give me a banana split. Can anyone help me find a banana split using baseball ref?
   270. The Id of SugarBear Blanks Posted: October 02, 2012 at 02:22 PM (#4251247)
Defense is an equally important aspect of both players value, and needs to be weighted equally.

Not necessarily, particularly if there isn't as much opportunity to distinguish yourself on defense as on offense and, as for middle IFs, where when you do distinguish yourself it's only to "hit" a single.

But the point of WAR is to combine the values of all aspects of baseball play into one measure. Once you have all but nominally eliminated defense, what you have left is not WAR.

A better way to do that would be to model replacement-caliber players at all positions in all facets of the game (perhaps equally "good" on offense as defense) for a .320-caliber team and measure players against them. That's really what WAR should be.

   271. Ray (RDP) Posted: October 02, 2012 at 02:23 PM (#4251248)
I thought (and I would assume the other two people that answered you) you wanted to see the oWAR for use in discussing who the MVP is this year, which that list would do fine. You apparently want it for something not really related to this.


? This is a discussion about WAR. I want it for something related to WAR.

I wish BBRef would give me a banana split. Can anyone help me find a banana split using baseball ref?


Lame.
   272. AROM Posted: October 02, 2012 at 02:23 PM (#4251250)
My argument in one respect boils down to WAR treating its defense component with just as much certainty as its offense component.


That's a valid argument. Two potential solutions, when evaluating single seasons:

1. Stat-based. Use multi-year fielding ratings. For Josh Hamilton in 2012 this would mean maybe taking his average fielding rating from 2009-2012, and combine that with his actual offensive numbers.
2. Subjective based. Grade every fielder from terrible, poor, average, good, great. Award runs on those categories -10, -5, 0, 5, 10. We can all see Adrian Beltre or Brendan Ryan play great defense, so give them the top rating. Someone is a complete butcher? Give him the bottom rating.

Either solution is preferable to burying your heads in the sand and pretending defense doesn't matter.
   273. BDC Posted: October 02, 2012 at 02:35 PM (#4251272)
boils down to WAR treating its defense component with just as much certainty as its offense component

And that's fair enough, as others have said. It's the middle term in this progression:

1) There can't possibly be a New World, that doesn't leave room for the enormous turtle.

2) We should think seriously about this New World, but I don't know if I trust these travelers' reports as much as I do my own knowledge of the Old World.

3) Hey, the latest broadside says there are men whose heads do grow beneath their shoulders over in the New World, I'm going to start a business knitting them some sweaters.
   274. zenbitz Posted: October 02, 2012 at 02:35 PM (#4251273)
@262 is that some kind of appeal to authority, -- a >2 year old quote - in a stats thread?

Moving on to other points...

You can't have WAR without Defense, that's the whole point of it. I do wonder if the difficulty of measuring fielding replacement value means we should switch to WAA with a correction for position. You would lose the concept of an average player having value... but I guess you could just add back a constant at the end.


Your system has him as a better outfielder at 33 and 34 than he was in his late 20s. Then he reverted to being a bad outfielder again.


So what, this happens with offense and pitchers All. The. Time. Just not in the aggregate.

I had an idea to fix both FIP and fielding stats by statistically partitioning credit between fielder and pitcher in an iterative way. Because one way or another those outs are getting recorded. But I never quite worked it out or implemented it.

The idea to use N-fold cross validation for each fielder with the N's being the pitchers he plays behind. For each set of fielder plays (in this case subset of a season) you get a ratio of outs/chances, Rf. You also get the league-wide (park corrected?) aggregate as well. For each pitcher you also get (for each fielding position) a ratio of outs/chances, Rp (I am simplifying by ignoring GB/FB - assume we are only talking about GBs for infielders and FBs for outfielders...and of course "chances" depends on the BIP data you are using)

for pitcher i: Rpi = weighted sum of Rf for all fielders
for fielder j, Rfj = weighted sum of Rp for all pitchers

This makes rectangular matrices (for each team, basically) with elements Rij (both "p" and "f")

What we wish to impute are the difficulties of the BIP; or really the average difficulties of the BIP for each cell of the matrices.
The difficultly, Dij is the chance that an average fielder converts a given BIP to an out. From 0-0.999 or something. Dij is the pitchers' component of defense (and also probably positioning/coaching). Fij is the fielding skill. An average fielder has an Fij of 1.

Rij = Dij*Fij

If we assume (for a given iteration) that Fij is constant for a fielder - we have the Rijs from data and Dij[Fs,1] = Rij/Fij -> Fij(1) is just Rfj.
Then we assume that Dij (again we are splitting GB/FB) is constant and we get Fij[Ps,1] = Rij/Dij == Rpi/Dij

Now we have Rij(true) and Rij(est) = Dij[Fs,1]*Fij[Ps,1] - we now iterate changing Fij[Ps,2] and Dij[Fs,2] to minimize the difference.

I ran this past Dial a couple years ago, but we never did anything with it ... possibly because it's a dumb idea that won't work, definitly because I am too busy and don't have BIP data.
   275. Ray (RDP) Posted: October 02, 2012 at 02:45 PM (#4251291)
1. Stat-based. Use multi-year fielding ratings. For Josh Hamilton in 2012 this would mean maybe taking his average fielding rating from 2009-2012, and combine that with his actual offensive numbers.


Doesn't really work. Fielders can lose it pretty quickly, just as on offense. We wouldn't credit Adam Lind with a portion of the 141 OPS+ he put up in 2009, or Joe Mauer with a portion of his 170 OPS+ that he put up in 2009, or ARod with a portion of his 138 OPS+ that he put up in 2009. To do so would be flatly absurd/

2. Subjective based. Grade every fielder from terrible, poor, average, good, great. Award runs on those categories -10, -5, 0, 5, 10. We can all see Adrian Beltre or Brendan Ryan play great defense, so give them the top rating. Someone is a complete butcher? Give him the bottom rating.


But this is no better than evaluating defense based on a combination of scouting and fielding percentage, which was done for decades before ZR or DA.

Either solution is preferable to burying your heads in the sand and pretending defense doesn't matter.


I of course don't pretend it doesn't matter. I just don't pretend that we can get close enough to it that we can have nearly as much confidence in it as we do in the offense evaluations, which is what WAR does.
   276. fra paolo Posted: October 02, 2012 at 02:46 PM (#4251294)
But the point of WAR is to combine the values of all aspects of baseball play into one measure.

Which is great, but this extended discussion demonstrates why using it as the sole or main arbiter when voting for the MVP and the HoF (and even the HoM and the MMP) is fraught with issues that need consideration.

For another thing, if you don't like dWAR, you can always plug in a different-run based value, like UZR or DRS.

I've just made up four lists of AL 3bs with at least 600 innings for 2012 using BB-ref Rtot, DRS, UZR and RZR. Miguel Cabrera is always at or near the bottom of them. Brett Lawrie is always at or near the top.

Now it seems that no matter how well Cabrera hits, his defence is probably the worst of any AL 3B. Worst 3bs historically cost something approaching a single win, unless we are talking about a historically bad season, which all indications suggest Cabrera has avoided. Does a precise calculation of how bad this season was really matter?
   277. DL from MN Posted: October 02, 2012 at 02:49 PM (#4251301)
have nearly as much confidence in it as we do in the offense evaluations


I don't recall WAR making a confidence statement in the number.
   278. The Id of SugarBear Blanks Posted: October 02, 2012 at 02:51 PM (#4251304)
Either solution is preferable to burying your heads in the sand and pretending defense doesn't matter.

Of course it matters.

Does anyone have any idea how many hits Darwin Barney turned into outs that a replacement-level 2B wouldn't have (*)? This is really the fundamental question regarding defense that has to be answered for WAR to work. So how many was it, even approximately? Fifty? More? Less?

(*) Note: Not a replacement-level fielding 2B; a replacement-level 2B. A 0.0 WAR player can have a wide range of both oWAR and dWAR.
   279. Ray (RDP) Posted: October 02, 2012 at 03:00 PM (#4251315)
I don't recall WAR making a confidence statement in the number.


But it does. Because it just incorporates whatever the calculation is. Just like for offense.
   280. Ray (RDP) Posted: October 02, 2012 at 03:04 PM (#4251320)
To make my above point more clearly: The 141 OPS+ Adam Lind put up in 2009 has absolutely no value to the Blue Jays in 2012.

And so to credit him in 2012 with a portion of a defense evaluation from 2009 would be similarly ludicrous. Doing so does not improve WAR so much as it shows its limitations.
   281. GuyM Posted: October 02, 2012 at 03:17 PM (#4251335)
To make my above point more clearly: The 141 OPS+ Adam Lind put up in 2009 has absolutely no value to the Blue Jays in 2012. And so to credit him in 2012 with a portion of a defense evaluation from 2009 would be similarly ludicrous. Doing so does not improve WAR so much as it shows its limitations.

Ray, you're not thinking about this very clearly. As you have noted, there is uncertainty in the defensive metrics. We can reduce that uncertainty by looking at prior performance. If Ozzie Smith posts a +25 TZ, we believe it's probably close to right. If Dan Uggla does it, we suspect there's a lot of noise there. So it would make a great deal of sense to regress the current year's rating toward a player's recent-career average. You don't do that to give him extra credit for what he did in prior years, but to make the best possible estimate you can of what he did this year.

And that should be our goal -- to make the best estimate we can. And that estimate should be weighted equal to offense. BUT, getting our best estimate on defense requires us to regress the current year's stats.

Then you also have to decide what defensive metric(s) you have most confidence in....
   282. JJ1986 Posted: October 02, 2012 at 03:22 PM (#4251345)
But what do you do for someone like Trout who has one season in the big leagues? Regressing him to average isn't really fair to him, but you don't want to wait two years before you have an estimate of his true talent.
   283. DL from MN Posted: October 02, 2012 at 03:25 PM (#4251353)
The uncertainty in WAR is something like the geometric sum of all the uncertainties in the measurements. There is uncertainty in positional replacement, batting value, baserunning and overall replacement too. Defensive uncertainty will dominate in a geometric sum (rms).

To loop back on the Trout v Cabrera argument there is probably some part of the error bar where they overlap. It would be great if I could state the confidence level that Trout is more valuable than Cabrera this season. Is it 90%?
   284. Kiko Sakata Posted: October 02, 2012 at 03:27 PM (#4251356)
Does anyone have any idea how many hits Darwin Barney turned into outs that a replacement-level 2B wouldn't have (*)? This is really the fundamental question regarding defense that has to be answered for WAR to work. So how many was it, even approximately? Fifty? More? Less?


Comparing Barney's range factor per 9 innings (RF9) with the league average of that (lgRF9), I calculated that Barney has made about 55 more plays than league average (you should be able to find a "replacement-level 2B" who can field the position averagely, I'd think). Obviously, you'd want to account for K-rates and GB tendency of the Cubs' staff and various other what-nots to refine that.
   285. GuyM Posted: October 02, 2012 at 03:29 PM (#4251359)
But what do you do for someone like Trout who has one season in the big leagues? Regressing him to average isn't really fair to him, but you don't want to wait two years before you have an estimate of his true talent.

Agreed. So cases like Trout and Barney are hard. Personally, I'd start by looking at plays made above/below average, adjusted for number of BIP his pitchers gave up (airballs for OFs, GBs for infielders). If a player doesn't excel there, then a good DRS is based entirely on the idea he had an unusually challenging set of BIP -- could be true, but I wouldn't give that very much weight for a single season. I'd look at minor league defensive stats. I would see what scouts and fans say about his fielding, and for an OF consider whether he's fast. It's a judgment call for young players. But why should we expect any stat to remove judgment from an MVP choice?
   286. The District Attorney Posted: October 02, 2012 at 03:29 PM (#4251360)
If the options are:

1. Assume that experienced players performed similarly on defense this year as they did in recent years
2. Regress non-experienced players back to league average since there's no other option
3. Make an educated guess about how good a guy is that can incorporate the principles of #1 and 2, but doesn't have to

... I'm going #3.
   287. Ray (RDP) Posted: October 02, 2012 at 03:31 PM (#4251363)
Ray, you're not thinking about this very clearly. As you have noted, there is uncertainty in the defensive metrics. We can reduce that uncertainty by looking at prior performance. If Ozzie Smith posts a +25 TZ, we believe it's probably close to right. If Dan Uggla does it, we suspect there's a lot of noise there. So it would make a great deal of sense to regress the current year's rating toward a player's recent-career average.


It might improve the evaluation, but that highlights the problem. Perhaps the player really did have a great defense year. We see that a player's OPS+ can bounce around - so too can his defense. Uggla's OPS+ went from 110 to 130 to 110 to 130 to 110 to 100.

How can we tell when a player's defense bounces whether it is a flaw in the system or if he really did turn in that kind of a year? A player's OPS+ might bounce due to dumb luck - e.g., BABIP - but we still know that it bounced, and we still know it had value to the team. We _don't_ know that with defense.

You don't do that to give him extra credit for what he did in prior years, but to make the best possible estimate you can of what he did this year.


At which point you may be double counting, such as you would be extra-counting Lind's 141 OPS+ for no reason.

My complaint is that people recognize the flaws in the defensive systems, but rather than admitting that such means that we can only take defensive evaluations so far, they want to make full use of them anyway.
   288. GuyM Posted: October 02, 2012 at 03:35 PM (#4251369)
Comparing Barney's range factor per 9 innings (RF9) with the league average of that (lgRF9), I calculated that Barney has made about 55 more plays than league average (you should be able to find a "replacement-level 2B" who can field the position averagely, I'd think). Obviously, you'd want to account for K-rates and GB tendency of the Cubs' staff and various other what-nots to refine that.

I saw that too. But here's the odd thing: Barney seems to be about league average in assists/9, but way above average on putouts. I'm skeptical about a good defensive rating for an infielder that depends on putouts. Catching linedrives is almost entirely random. Catching shallow flies to the OF can be a real (and valuable) skill, but is that what we're seeing here? Or is he like Orlando Hudson, someone who takes a large number of discretionary flyballs that other players could also have caught? Or some of both?
   289. DL from MN Posted: October 02, 2012 at 03:35 PM (#4251370)
If you only use measurement systems that are without flaws you won't be doing a whole lot of measurements.
   290. GuyM Posted: October 02, 2012 at 03:40 PM (#4251375)
My complaint is that people recognize the flaws in the defensive systems, but rather than admitting that such means that we can only take defensive evaluations so far, they want to make full use of them anyway

All you are saying -- again and again, in various permutations -- is that there is uncertainty. And this obviously frustrates you greatly. But that's life -- our estimates will be uncertain. All we can do is make the best estimates possible with the data we have. By radically underweighting fielding you will simply trade one set of errors for another. Instead of sometimes giving too much credit to players with good fielding stats, you will instead sytematically undervalue good fielders (and systematically overrate bad fielders). If that's the type of error you personally prefer to make, fine. But don't pretend you've made the problem go away....
   291. The Id of SugarBear Blanks Posted: October 02, 2012 at 03:43 PM (#4251384)
Comparing Barney's range factor per 9 innings (RF9) with the league average of that (lgRF9), I calculated that Barney has made about 55 more plays than league average (you should be able to find a "replacement-level 2B" who can field the position averagely, I'd think). Obviously, you'd want to account for K-rates and GB tendency of the Cubs' staff and various other what-nots to refine that.

That seems about right, or certainly not absurd. If Barney were to go 55 for his next 55, all singles, his OPS would still be ca. 100 points short of Josh Hamilton's.

I saw that too. But here's the odd thing: Barney seems to be about league average in assists/9, but way above average on putouts. I'm skeptical about a good defensive rating for an infielder that depends on putouts.

As you should be. Assists would be the better measurement of what we're trying to get at, without putouts. That would reduce Barney's distance from league average (In terms of extra plays made).
   292. GuyM Posted: October 02, 2012 at 03:44 PM (#4251386)
How can we tell when a player's defense bounces whether it is a flaw in the system or if he really did turn in that kind of a year? A player's OPS+ might bounce due to dumb luck - e.g., BABIP - but we still know that it bounced, and we still know it had value to the team. We _don't_ know that with defense.

Making plays in the field is just as real as hitting a single. What we don't know is how difficult the play was to make. But that's true for hitters too. Maybe pitchers hung a lot of curves to Trout this year. Maybe outfielders took bad routes to a lot of Miggy's line drives, turning them into hits. We assume this averages out, but for a single season that's certainly not true.
   293. cardsfanboy Posted: October 02, 2012 at 03:49 PM (#4251391)
That's a valid argument. Two potential solutions, when evaluating single seasons:

1. Stat-based. Use multi-year fielding ratings. For Josh Hamilton in 2012 this would mean maybe taking his average fielding rating from 2009-2012, and combine that with his actual offensive numbers.
2. Subjective based. Grade every fielder from terrible, poor, average, good, great. Award runs on those categories -10, -5, 0, 5, 10. We can all see Adrian Beltre or Brendan Ryan play great defense, so give them the top rating. Someone is a complete butcher? Give him the bottom rating.

Either solution is preferable to burying your heads in the sand and pretending defense doesn't matter.


Agreed. I like both solutions, I just sometimes question the accuracy of the positional adjustments, the fact that war has no way to properly value a player who gives you positional flexibility, the obvious discrepencies in catcher defense and first base defense and of course the run value of defense. (In response to my own complaints, I know that war isn't designed to grade positional flexibility, as that is a concern with potential tactics that is beyond the scope of war. I also understand that the math shows that positional adjustments is correct so it's just my personal worry)
   294. The Id of SugarBear Blanks Posted: October 02, 2012 at 04:04 PM (#4251406)
Barney's 70 assists behind Aaron Hill and only third in the NL among 2B. Starlin Castro, the Cub SS, is first in NL assists by 41 over Reyes in second place(practically identical innings).

Castro's getting to more grounders than other NL shortstops and Barney isn't getting to more grounders than other NL 2B (or really even close). Hmmm. That can only make us highly skeptical about Barney's defensive WAR -- the overwhelming amount of which should be found in his range. (Yes, Barney's very sure-handed.)
   295. BDC Posted: October 02, 2012 at 04:05 PM (#4251411)
We assume this averages out, but for a single season that's certainly not true

While remaining an agnostic on the specific question of Darwin Barney, I am very ready to accept that a fielder can have large year-to-year swings in value. Maybe one year he's in superb shape, and the next he's got some problem with his knees that doesn't reduce his hitting or even his straight-ahead speed, but cuts his range measureably – and on top of that, other factors (positioning, opportunities) kick in, and pretty soon you're talking real runs. I think one problem with confidence in defensive stats is that people tend to think that defense doesn't slump, that it's a skill that doesn't have its ups and downs.
   296. The Id of SugarBear Blanks Posted: October 02, 2012 at 04:17 PM (#4251428)
Aaron Hill only has 6 errors at 2B (Barney has 1), he's blowing Barney away in assists, and is ahead of Barney in Total Zone Runs. Hill's RF/9 is 0.24 higher than Barney's. Barney gets to fewer balls vs. league than his fellow middle infielder, Starlin Castro, and Hill gets to more than his (though the D-Back SS is a statue).

Yet, Hill has 0.2 dWAR and Barney has 3.6. That simply can't be right. Doesn't pass the laugh test.
   297. vivaelpujols Posted: October 02, 2012 at 04:38 PM (#4251459)
It might improve the evaluation, but that highlights the problem. Perhaps the player really did have a great defense year. We see that a player's OPS+ can bounce around - so too can his defense. Uggla's OPS+ went from 110 to 130 to 110 to 130 to 110 to 100.

How can we tell when a player's defense bounces whether it is a flaw in the system or if he really did turn in that kind of a year? A player's OPS+ might bounce due to dumb luck - e.g., BABIP - but we still know that it bounced, and we still know it had value to the team. We _don't_ know that with defense.


The main difference is that defense is estimated while OPS is known. We know exactly how many homers Uggla, but we are guessing how many runs he saved. It's perfectly fair to regress defense but not offense because you expect some measurement error.

That seems about right, or certainly not absurd. If Barney were to go 55 for his next 55, all singles, his OPS would still be ca. 100 points short of Josh Hamilton's.


This is the wrong way to do it. If UZR/RF/whatever says Barney made 55 more plays than league average, that means he turned 55 would be hits into outs. In order to balance that on offense, you'd have to take 55 or Barney's *already accumulated outs* and turn them into singles. You don't add 55 singles and 55 PA to his line. Your method is drastically underrating the impact of Barney's defense. Please recalculate.

Also fWAR has Barney's defense at +11 runs and his WAR at 2.5. No one says you have to put complete faith in defensive metrics. You're allowed to use your head. WAR just gives you the starting point.
   298. Ron J2 Posted: October 02, 2012 at 04:50 PM (#4251479)
#296 Raw range factor has some huge illusions. Just to start with, Arizona pitchers gave up 6% more ground balls (you'd want to break this down into more detail. Clearly it doesn't matter much how many balls were hit to the left side or down the line, but this is an indicator)

Cubs pitchers gave up 9% more line drives, but the Cubs were better than the DBacks at turning line drives into outs (and if some of that is Barney shifting, well that counts. A lot). And yes, some of that may be the way stringers score certain balls in play. It'd be nice to break things down home and road. It'd give us a first order park effect. (They've had 6 more Line outs into DP as well. Only one by Barney which is one more than Hill)

The infields are about equal in turning the DP. This is tricky though. Takes two to turn many DPs. We can tell roughly how good Barney/Castro are compared to Hill/Crowd but it's really tough to separate out the specific contribution of any one player. A WOWY type analysis isn't going to work with the Cubs. When Castro isn't at short, Barney is.

   299. Misirlou is on hiding to nowhere Posted: October 02, 2012 at 04:55 PM (#4251485)
This is the wrong way to do it. If UZR/RF/whatever says Barney made 55 more plays than league average, that means he turned 55 would be hits into outs. In order to balance that on offense, you'd have to take 55 or Barney's *already accumulated outs* and turn them into singles. You don't add 55 singles and 55 PA to his line. Your method is drastically underrating the impact of Barney's defense. Please recalculate.


132 OPS+. .357/.394/.458
   300. The Id of SugarBear Blanks Posted: October 02, 2012 at 05:05 PM (#4251501)
Ron (298) -- Those are factors, and matter, but they don't (and can't) come close to making Hill a replacement-level defender and Barney a superstar. Whatever impacts they have are measured to a degree by seeing whether they help/hurt other IFs, too. Castro's getting to more balls vs. league than Barney is. The extra ground balls AZ pitchers are giving up aren't being fielded by the D-Back SS.

OK, Ariaona pitchers gave up 6% more grounders than Cub pitchers. Aaron Hill has 16.8% more assists as a 2B than Darwin Barney. Barney does, it appears, have a marginally higher RF/9 as a 2B, but RF includes popups and liners caught.

The primary measure of prowess for a 2B, what separates them from each other, is their ability to get to ground balls. There's no indication Darwin Barney does this any better than Aaron Hill, much less in a way that could explain the huge dWAR discrepency.

Barney could be a better fielder than Hill. There's no way Barney's a superstar if Hill's replacement level.
Page 3 of 4 pages  < 1 2 3 4 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
1k5v3L
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogKeri: Slump City: Why Does the 2014 MLB Season Suddenly Feel Like 1968?
(45 - 3:53pm, Apr 24)
Last: The Id of SugarBear Blanks

NewsblogOTP April 2014: BurstNET Sued for Not Making Equipment Lease Payments
(2565 - 3:52pm, Apr 24)
Last: The Good Face

NewsblogNY Times: The Upshot: Up Close on Baseball’s Borders
(17 - 3:52pm, Apr 24)
Last: DL from MN

NewsblogPrimer Dugout (and link of the day) 4-24-2014
(5 - 3:51pm, Apr 24)
Last: Davo Dozier (Mastroianni)

NewsblogOT: NBA Monthly Thread - April 2014
(509 - 3:50pm, Apr 24)
Last: Kurt

NewsblogCalcaterra: Blogger Murray Chass attacks me for bad reporting, ignores quotes, evidence in doing so
(31 - 3:48pm, Apr 24)
Last: dr. scott

NewsblogJust how great can Atlanta's Andrelton Simmons be? | SportsonEarth.com : Howard Megdal Article
(1 - 3:45pm, Apr 24)
Last: Rickey! In a van on 95 south...

NewsblogMatt Williams: No problem with Harper's two-strike bunting
(24 - 3:41pm, Apr 24)
Last: Jolly Old St. Nick Still Gags in October

NewsblogMichael Pineda ejected from Red Sox game after pine tar discovered on neck
(99 - 3:32pm, Apr 24)
Last: Ray (RDP)

NewsblogOMNICHATTER for 4-24-2014
(31 - 3:18pm, Apr 24)
Last: Davo Dozier (Mastroianni)

NewsblogThe Five “Acts” of Ike Davis’s Career, and Why Trading Ike Was a Mistake
(65 - 3:14pm, Apr 24)
Last: Johnny Sycophant-Laden Fora

NewsblogJosh Lueke Is A Rapist, You Say? Keep Saying It.
(245 - 3:04pm, Apr 24)
Last: Bitter Mouse

NewsblogDoyel: How was Gerrit Cole not suspended? He basically started the brawl
(41 - 2:46pm, Apr 24)
Last: Pat Rapper's Delight

NewsblogColiseum Authority accuses Athletics of not paying rent
(20 - 2:42pm, Apr 24)
Last: Jeff R., P***y Mainlander

NewsblogToronto Star: Blue Jays pave way for grass at the Rogers Centre
(11 - 2:42pm, Apr 24)
Last: if nature called, ladodger34 would listen

Demarini, Easton and TPX Baseball Bats

 

 

 

 

Page rendered in 0.7477 seconds
52 querie(s) executed