Who were the REAL MVPs?
The Major League Baseball writers voted for Alex Rodriguez and Albert Pujols as the AL and NL MVPs recently, and statheads around the country lauded them for “getting it right”. But did they?
Every discussion I have read on the American League Award focused largely on whether or not David Ortiz game state performances should outweigh A-Rod’s overall performance compared to players at his position. The lack of defense played by Big Papi played a large role in the number of votes he would get, and very large in the discussions here and around the baseball world.
Even in the National League, there was a good deal of clamor over whether or not a good defensive centerfielder hitting 51 home runs, leading the league in RBIs and lifting his team to their zillionth straight division crown was more deserving over a good fielding, great hitting first baseman – after all, he was still “just” a first baseman.
What I haven’t seen to date is a nice list of what every player contributed on both sides of the ball. Defensive runs saved and offensive runs generated.
There is a big problem – designated hitters, that scourge of baseball everywhere, don’t play defense. So how do you quantify what they contribute to the defense? Some want to say they are the worst fielder on their team, because that is the player a team chooses to put in the field instead of the DH. One of the problems with that is that the DH may not have the ability to play that position – which I’m not sure mitigates the original question.
Perhaps they should only be “penalized” as the worst player in the league at the position they *would* play, were they to play.
But that doesn’t really cover it either, because all that is really required is that they be a worse fielder than the player that is playing the position for the team. I mean, being a DH when Jon Olerud is your defensive first baseman isn’t really damning.
However, if you aren’t very capable of playing defense, you are sucking up a roster spot and hurting your team overall defensively. In addition, your team is stuck in interleague play on the road. Okay, that’s just 8 games, but it is 5% of the time.
I’m not sure what the answer is, but I am certain a designated hitter that cannot play in the field adequately damages his team more than a player that plays defense poorly. This we can be sure of because teams do choose to play a “Manny Ramirez” over a “David Ortiz”.
That is effectively saying that Ortiz at first base with Manny DHing and Jay Payton in LF is a *worse* lineup than Ortiz at DH, Manny in LF and Kevin Millar/Olerud at first base.
We know the Red Sox will make decisions on defensive value, and they stick with this lineup. People argue that the Red Sox don’t do that for defensive reasons, but for health reasons. Ortiz probably couldn’t take the grind. That’s another reason to de-value Ortiz’ abilities, but does it de-value his performance?
For me, I’ll provide you with the two categories, and let you add your own weighting for the DH. I personally discount the DH performance to a bad fielding first baseman – around -15 runs – think Mike Piazza or Frank Thomas. In all honesty, that overstates the DH value, because if Frank Thomas can manage only a -15 and I can get another bat like Ortiz in the lineup, that’s very important. Okay, that was a bit rant-like.
My ratings for defense can be found here.
My offensive ratings are Jim Furtado’s Extrapolated Runs above average at position, park-adjusted. Why? Because I already have all the spreadsheets set up, and it does a very good job, even compared to BaseRuns.
For these purposes, there is nothing wrong with using “average” as the baseline – it doesn’t undervalue an average performance for this usage. Plus, you don’t put your eye out trying to guess at a defensive “replacement level”.
American League
Player Team pos Offense Defense Total
rodriguez,alex NYY 3B 81.0 -13.5 67.5
roberts,brian BAL 2B 47.8 3.8 51.6
ortiz,david BOS DH 51.6 -1.6 50.0
hafner,travis CLE DH 49.0 Dnp 49.0
guerrero,vladim LAA RF 41.8 1.3 43.1
peralta,jhonny CLE SS 28.3 8.9 37.2
martinez,victor CLE C 39.4 -6.0 33.4
ellis,mark OAK 2B 21.2 11.3 32.5
mora,melvin BAL 3B 25.8 6.1 31.9
mauer,joe MIN C 23.3 8.0 31.3
chavez,eric OAK 3B 17.8 13.2 31.0
crisp,coco CLE LF 19.9 10.2 30.1
teixeira,mark TEX 1B 23.2 6.3 29.5
giambi,jason NYY 1B 38.6 -10.3 28.3
crawford,carl TB LF 17.2 10.5 27.7
jeter,derek NYY SS 26.1 1.5 27.6
varitek,jason BOS C 29.1 -4.3 24.8
sizemore,grady CLE CF 22.6 1.4 24.0
gomes,jonny TB DH 18.7 3.5 22.2
polanco,placido DET 2B 18.6 2.3 20.9
lugo,julio TB SS 15.3 5.6 20.9
young,michael TEX SS 28.0 -7.1 20.9
posada,jorge NYY C 21.1 -0.3 20.8
matsui,hideki NYY LF 26.8 -6.7 20.1
Offense is XR runs above average, park-adjusted for a player’s playing time.
Defense is runs prevented above average for a player’s playing time.
Thanks to Doug’s Stats for the offensive stats.
The decimal places are not meant to indicate a level of accuracy, but there so you can see where the math comes out.
Well, Ortiz’ clutch-hitting notwithstanding, ARod was definitely the correct MVP. He had the best bat by a wide margin. If you note, despite my rant, I did not dock the DHs for defense. That’s wrong in the overall analysis, but I’ll let you make your own adjustment.
Look at that – Brian Roberts was the second most valuable player in the American League. What a great season for him. He has to be the bargain of the year. Not a great bet to repeat, but a great season for him.
Ortiz played first base for 78 innings. In that time, he cost the Sox two runs. You don’t want that out there for 780 innings, much less 1400. Playing Manny is probably the right move (Manny isn’t on the list, but ended up at +14 runs).
Travis Hafner, one of the top five AL players last year, didn’t play in the field. He is a great secret. Sure the Indians are becoming popular, but Grady Sizemore and Jhonny Peralta are getting the press. Hafner is going to be a top candidate for the MVP for a few more years.
Above are the players that were twenty runs above average at their position. It’s a nice list, with a good variety of teams and positions.
There are five Yankees on the list, and Sheffield was just off of it. That’s a good team.
There are also five Indians on the list, and they are all twenty nine or younger. That’s a good team.
National League
Player Team pos Offense Defense Total
lee,derrek CHN 1B 59.7 0.0 59.7
utley,chase PHI 2B 34.3 19.4 53.7
giles,brian SDP RF 48.5 4.3 52.8
pujols,albert STL 1B 52.6 -1.3 51.3
ensberg,morgan HOU 3B 38.4 4.3 42.7
bay,jason PIT LF 40.9 -2.2 38.7
kent,jeff LAD 2B 35.4 1.4 36.8
jones,chipper ATL 3B 31.8 4.9 36.7
edmonds,jim STL CF 32.5 3.7 36.2
wright,david NYM 3B 36.4 -5.0 31.4
lopez,felipe CIN SS 31.7 -2.2 29.5
winn,randy SFG CF 23.7 5.7 29.4
cabrera,miguel FLA LF 36.2 -7.2 29.0
furcal,rafael ATL SS 23.2 5.5 28.7
drew,j.d. LAD RF 21.8 6.5 28.3
helton,todd COL 1B 19.1 8.6 27.7
hall,bill MIL SS 23.9 2.0 25.9
floyd,cliff NYM LF 16.1 9.7 25.8
abreu,bobby PHI RF 31.5 -5.8 25.7
jones,andruw ATL CF 25.3 -0.2 25.1
jenkins,geoff MIL RF 17.2 7.1 24.3
dunn,adam CIN LF 26.4 -2.5 23.9
delgado,carlos FLA 1B 31.2 -8.2 23.0
burrell,pat PHI LF 17.9 4.7 22.6
rollins,jimmy PHI SS 18.4 1.6 20.1
valentin,javier CIN C 17.8 0.3 18.1
Chart key as above.
I added Javier Valentin because he was the highest rated NL catcher. He’ll be the sleeper in next year’s fantasy leagues.
So it looks like the voters got this one wrong – sort of. Sure it’s close enough to not really be a travesty, but it looks like Lee was the better performer. In addition, we can see Chase Utley and Brian Giles being top performers as well. Utley, like Roberts in the AL, was a great bargain for maximum production. The problem will be, in Philly, that Utley doesn’t “look like” a second baseman. He’ll be an all-star there if he’s allowed to play it.
Brian Giles wasn’t much of a secret before and now he has re-signed with San Diego. That’s a great deal for the Padres. Teams would have really benefited from Giles signing with them. I would bet he has four more top-notch seasons in him.
It is interesting to note that JD Drew is in the top twenty considering he missed most of the season. The combination of his injuries, his holdout and being platooned has probably sidetracked what could have been a stellar career.
All in all, the MVP awards were given to very deserving candidates. What we did not see was deserving candidates being considered, like Giles and Utley and Roberts. It isn’t likely that Utley and Roberts will be in this lofty position very often, so finishing high in the MVP voting is a good reward when you do deserve it.
No, I didn’t list any pitchers here. We can discuss them, but that’s a different ranking system.
I may have missed someone else that performed at 20 runs above average, but I don’t think so.
Complete player rankings will be available (all players) when I combine the defense and offense. That’s a bit of work.
Chris Dial
Posted: January 04, 2006 at 05:45 AM |
197 comment(s)
Login to Bookmark
Related News:
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
I didn't first hear or read about a single being worth 0.47 runs out of context from the rest of the methods. When I first read Bill James's Runs Created method and whatever I first read regarding linear weights, it immediately made a whole lot of sense. Keep in mind I was already an electrical (and biomedical) engineering graduate student at the time. RC and linear weights were easy to comprehend compared to what I was studying. But the defensive metrics ... not as easy, probably because I couldn't actually see (or do myself) the actual data and equations, or if I could, it wasn't as familiar to me as hits, BB, HR, AB, PA, probability of scoring based on base and out conditions, etc.
That makes you a lot smarter than everyone else.
Not necessarily. Best not to compare oneself (at least publicly!) with others.
Have you read any of the Blyleven threads? OK, "no idea" would be an overstatement on pitching metrics, but it's probably an overstatement for defense metrics as well.
Dial, since you've taken to wearing your Professor Hat in this thread, do you know of any reason to use defensive stats based on traditional data (PO, A, E) when ZR is available?
that is possible.
I didn't read James until I was 27, and had little issue with it for similar reasons (in that - they gave reasons immediately, and with my age and math experience, the concept wasn't foreign like it would be to a 12-yr old).
However, I think in all these metrics, the explanation *has* been fully offered - BUT, you may have heard, as so many do, that ZR is flawed or "defense cannot be quantitated" enough times, so you don't approach the methodlogies openly.
Steve has already decided that the sample size is too small, despite not examining it at that level (and I don't want to pick on Steve - that's just an example).
Andere,
read the last three defense articles, and I have leveled those criticisms at UZR many times on this site - usually without specific response, but MGL responds in the discussion in the D ### article.
no.
However, because we don't have ZR before about 1988, using ZR to develop a good metric *solely* from traditional data is a very good idea.
I don't know that this is true. It may be true to state today, but it may soon be that you *can* just look at the stats and get the defense.
There are those that say you *can* just look at DER and tell how many runs the defense prevented. They're wrong.
I get a rough correlation of team ZR and DER around 0.32 (r^2)
I realize you're just using me as an example of prejudging, and fair enough. But for the sake of clarity: I haven't "decided" that the sample size is too small; I am suspicious about sample size issues here, because that's always one of the likely culprits when you get readings that appear to be quite unstable.
Anyone who has scored ZR or Project Scoresheet may have noticed. I admit, Mike Emeigh and I have probably been more down with BIP distribution than anyone else here, but...
Ah hell, I suppose I can just demonstrate it better with some charts - that's going to be difficult.
If everyone prints out my What is ZR article and a copy of the grid, and then colors in teh areas described, it helps to visualize what BIP distribution can mean. Raw counts would certainly help. I may be able to do something there too.
AP was second for exactly the reason you mentioned.
I don't understand. If are agreeing with the hypothetical argument that Dr. Memory is putting out there - that STL didn't need much or any of Pujols' production, how can you even vote for him? Seems like his contribution would be considered worthless in your eyes.
It's a valid concern, but I am pretty sure that sample size isn't the issue with these fluctuations - well, sort of. Obviously as long as two players can get differnt BIP distributions, there is some sample size effect.
The theory is that *eventually* teh distribution will even out (over a career).
For some reason, that *hasn't* happened for Jeter (read: the Yankees at SS) and teh Braves at 3B. Even through pitching staff changes and tendency changes. Why? It's probably related to the pitching styles. If you have the same pitching coach, the pitchers all have a general philosophy (and I'm not going to debate that here), so even though the pitchers change, the BIP distribution doesn't change (much) relative to the league.
I know this happens for differnet fielders on teh same team and with different pitchers. Heck, maybe it is a park effect. Not sure.
Yes, that makes sense. The world still would be a happier place if a moratorium was placed on the development of trad-data fielding metrics.
Colorado is the biggest outlier, with -6 ZR and -106 DER. I think STATS treats their zones a little differently because of the park. For other teams, a hitters park will affect both DER and ZR. Removing Colorado only improves the correlation to .61 anyway.
I'll try and put my team numbers somewhere where people can look at them.
we can attach tehm to your article and get a fronter link, I think.
Here is team ZR+ compared to DER+ (Defense efficiency rating):
The correlation between the 2 columns is 0.58. Zone rating doesn't explain all of DER, because of balls hit out of zones. Balls hit outside of zones are harder to explain, predict, and grok. There are many factors involved, including pitching, ballparks, and probably some luck thrown in.
We can measure pitching through the pitching independent stats (BB, HR, SO, BB) and defense through ZR, but these alone can't tell us as much about how many runs the team will allow as runs created or linear weights does about offense.
Balls out of zone is the missing ingredient. It's part defense (sometimes fielders can make plays on these balls), part pitching, part ballpark (think Green Monster or anywhere with larger than normal foul territory), part who knows what.
Balls out of zone is the missing ingredient.
sehr gut!
That's where chances comes in. I'll do a nice little trick with this afterwhile.
Chris, it's true that the # of BIP isn't that much smaller than the # of PAs. But you noted yourself that many balls are totally unplayable. An even larger proportion (at least in OF) are playable by any fielder. So the # of marginal BIP -- plays for which there is any doubt at all about the outcome -- must be far smaller than PA. And it is really only the outcomes of these plays that distinguish good from bad fielders. (To find out if some players were better than others at handing IF popups, you'd need an N of 10,000, since 98% of them are caught.) So it seems possible to me that the measurement error for fielding metrics will be larger than for hitting and pitching metrics.
Yes. This is a second way in which # of BIPs and # of PAs might not be truly comparable standards.
########! ZR does NOT have smaller zones than UZR; it has ONE big zone for each position. You can take issue with Mitchel not using smaller zones in UZR (I happen to agree that he should), but his zones are much smaller than those used for ZR, since ZR does not distinguish between a ball hit into zone G from a ball hit into zone F (names for illustration only -- I'm not actually looking at a chart of all the zones).
Also, ZR does not look at batted ball speed. ZR does a crappy job of treating balls out of zone. If you have UZR, you want to use UZR, if you have ZR, you want to use some combination of ZR and Range or DRA or whatever. To say that ZR is the best defensive metric out there, and that it's limitations are imagined is pure, well, I already said it once.
I used to agree with Mr. H.S.. I remember sayign there was no way that A-Rod should get the MVP back when he was with Texas because how valuable could he be to a last place team?
I have come around on that thinking. I now believe that the best all-around performance deserves the MVP, no matter what the outcome. This is because it is an individual award, not a team one.
Derrek Lee had better all-around performance this year than Albert Pujols did.
Period.
Er, we've missed each other. The #of BIP for a SS is approx 530 - that approximates PAs.
The lowest fieldable by any position is something like 350 - which is certainly a good batting sample if smaller, and that's at first base where the skill range is expected to be smaller.
I've got to say that Runs Created is a totally different situation. I bought into RC the minute I first read about it. It's simple and accessible. These fielding systems aren't. I'm more accepting of the output than Steve and Greg are, but I can't say I deeply understand them.
To me, the huge lesson that Chris, Rally, DSG and others have shown us is that fielding counts (see Chris's MVP list for examples) and that we all should really invest ourselves in understanding fielding systems and spreading the word.
And I think using Game States or Win Probability as your primary criterion for MVP is just silly and an inappropriate use of the stat. I've focused a lot on WPA at THT, but I've never suggested it should be used that way.
David,
you'll do better at this argument if you operate from "Chris knows what he's talking about" rather than "Chris is wrong."
ZR, the score, is made up of balls hit to *a data summary of small zones*.
The group of zones used to make up a fielder's area of responsibility are zones where players at that position regularly (>50%) convert the plays into outs.
The crux of the problem with #BIP is that, as GuyM and Rally state above, many balls are unfieldable.
ZR doesn't count those balls against the fielders. UZR does count ground balls like that.
You should re-read my articles.
Even MGL says his data is in smaller zones and he converts them to larger zones. Maybe *you* should take it up with MGL about that.
Also, ZR does not look at batted ball speed.
Does that make a significant impact? Demonstrate that. I don't believe it does *from teh view that it unfairly impacts some fielders over others* That is, a fielder gets a wildly disproportionate number of difficult to field balls such that it distorts his skill as a fielder.
Please - demonstrate that.
ZR does a crappy job of treating balls out of zone.
Their treatment is suboptimal. However, this is only a factor if the number of OOZ plays is significant. Can you demonstrate this is a "fatal flaw"?
If you have UZR, you want to use UZR,
No, I don't. It's using the wrong data. Or rather the data wrong. You don't have to agree with that, but I am correct. Sooner or later, MGL will recalculate UZR using STATS zones, and you'll see I'm right.
if you have ZR, you want to use some combination of ZR and Range or DRA or whatever.
No, I don't. Those add nothing to ZR. Unless you have some brilliant DP tweak. And when I say nothing I mean that Range adds nothing. Well, it adds confusion and murkiness, but it's not remotely as sophisticated as ZR. Range is like using BA instead of OPS+. Sure OPS+ doesn't weight OBP properly and has other issues, but that doesn't make it a bad stat. No, OPS+ isn't as good as using LWts or whathaveyou, but it does a good job for comparative purposes.
DRA has some interesting characteristics - MAH sent me some work he had doen with it right before the holidays (maybe before the Thanksgiving ones), but I haven't gotten around to diving in - but he had really good comparative scores).
To say that ZR is the best defensive metric out there,
Did I say that? I did say my interpretation of ZR is the best out there. And I agree that it is. That doesn't mean it doesn't have limitations.
and that it's limitations are imagined is pure, well, I already said it once.
I didn't say that either. But I agree, the things you complain about are, well, you already said it once.
I'm sorry you like UZR better, but your understanding of the systems is clearly limited, so you are better off not shouting about it.
Evidently I was wrong about htat one. I was thinking of LWts apparently.
Or you are all smarter than me - which is no great leap.
Fair enough, but PAs will be outs - what - 67% of the time? What's the variance on BA? 40 points?
It's not identical, but it is a reasonable sample size. It's analogous.
Not nearly as silly as saying a typical single, in a typical game, at a typical time, playing for a typical team, in a typical lineup slot is worth .47 so we will treat everysingle for in every inning, for every team, for lineupslot, for every baseout situation, ect... When you treat them the same your not distinguising between value and ability.
A single with 2 outs and a runner on second leads to more runs than a single with 2 outs. A run in a 3-3 game in the 9th leads to more wins than a single in 7-2 game in 8th inning. The idea is to measure how many wins a player actually contributed. Not how many wins a player might have hypothetically contributed if he was in a vaccum.
So Dial agrees with Dial that his interpretation of ZR is best. I think that settles it.
Kidding of course...this is actually a great thread.
Spivey--that is indeed an inescapable conclusion of his argument as stated, so perhaps he isn't explaining himself as well as he might. What he also isn't seeing is that it applies even better to Clemens, who probably has <u>less</u> overall impact on his team's success than does Pujols or Lee (or Berkman).
In Clemens' particular case, even more damning is that the team did better in games he didn't pitch.
True enough, but ZR does require interpretation to runs and then compared to average to be clearly applicable. What a 0.865 ZR means isn't very apparent.
The problem I have with this is that it clearly favors #3 hitters on teams that play close games (and win some of them).
Because a GrandSlam that takes a game from 4-2 to 8-2 in the 4th changes teh other team's playing. They remove the starter and it just goes downhill from there.
If a batter strikes out here, it doesn't do anything. So when the pitcher gives up a 2-run HR in the 5th to make it 4-4, now you suddenly think a single in the 9th to score a run is more valuable than that GS.
That's just wrong. It is NOT value.
It's only analogous if you believe that in some large percentage of ABs the hitter effectively has no chance at all of reaching base, and/or that in many ABs ANY major league hitter would reach base. I don't see any reason to think that's true.
Do your really not understand this point, or are you just being difficult?
This reminds me of something a college prof once said (that I haven't forgotten in the 30 years since): "Every idea carried to its logical extreme becomes a caricature of itself." IMO, that's what you're doing using Game States to solely determine your MVP.
I don't mean to sound like Chris Dial (God forbid!), but have you scored many games using WPA? I have, and I've found that the results sometimes just don't make sense when interpreted as "value". Larry's example of a home run in the first vs. a sacrifice fly in the ninth is a great example.
WPA is awesome for following a game, it's great for analyzing relievers and their usage, and it's useful for MVP discussions. But it's not sine qua non.
If there weren't some portion of ABs where ANY ML hitter would reach base, why is the BA spread so small? The baseline to be a MLB player is very high.
I understand the point - thanks.
I just think that teh sample sizes are reasonable analagous.
Thats the position they have been put in. Their is no rule that life is fair or equitable. Some people get paid much larger bonuses for doing less quality work, because firms are more profitable.
Chris, thankfully you understand defense a little more than the concept of value, of course your wrong about the accuracy of the specific numbers involved but your still in the know. Maybe one of these days I'll get through but I won't hold my breath.
It doesn't solely determine who should be the MVP, in my opinion. This statement is doing exactly what you charachterize me of doing previously "Every idea carried to its logical extreme becomes a caricature of itself." (btw, thats a very nice quote)
MHS has said he uses Game State and tehn sprinkles with defense, clubhouse and position.
That's helpful. Thanks. I'm obviously coming in late to a discussion, and I interpreted vehemence as intractability. Personally, I would put Dial-type ratings first, then sprinkle with WPA. Win Shares does that in a weak-form kind of way.
ZR doesn't count those balls against the fielders. UZR does count ground balls like that.
If a ball is "unfieldable", it will NEVER be fielded, and it won't count against a player in UZR. If a ball is fielded 5% of the time, it will count and -.05 plays against someone in UZR. And it's mostly those plays that distinguish the great players. By discluding those plays, ZR converted to runs ends up with tiny variance, and mistates the value of a player.
Does that make a significant impact? Demonstrate that. I don't believe it does *from teh view that it unfairly impacts some fielders over others* That is, a fielder gets a wildly disproportionate number of difficult to field balls such that it distorts his skill as a fielder.
Yes, it does. Re-read MGL's original UZR articles. Or, look at how much the results change when batted ball speed is included. Or, re-read the Ichiro! thread on an article I wrote about his decline in performance in '05 where Mitchel demonstrated that a large part of Ichiro!'s lower BA on GB was slower hit speed.
No, I don't. Those add nothing to ZR. Unless you have some brilliant DP tweak. And when I say nothing I mean that Range adds nothing. Well, it adds confusion and murkiness, but it's not remotely as sophisticated as ZR. Range is like using BA instead of OPS+.
No. Incorrect. Range is like using OBP instead of SLG (or vice-versa, I don't really care). Neither ZR nor range capture the whole truth, but together (OPS) they can get 90% of the way there (or whatever % you choose to use).
Chris what exactly isn't a combination of things?
In other threads i've also said I use other things. Importance to team. Leverage of teams needs for wins for example contributing about 10 wins to a 95 win team as contributing 10 wins to an 80 win team. Traditional stats. Adjusted stats. In the A-rod/Ortiz debate many of these factors aren't important which is why they weren't mentioned.
"But even there, it isn't all that hard to add up the RC's for every member of a team, and see how close they come to the team's actual runs scored."
that doesn't prove RC is accurate for each individual. Bonds's RC are overestimated because all his BB and HR cannot interact w/ each other.
That said - I think it is appropriate, when given a close decision, to factor in the extent to which player performance was leveraged into actual wins and losses, although the amount of weight to give it is a different and IMO difficult question. Chris Jaffe noted in one of the various Blyleven threads that when he factored in Blyleven's run support Blyleven still comes up about 9 wins short of expectations (IIRC), and to the extent that it's true that Blyleven's teams won less often than expected when he was pitching - which was the perception when he was active - even in the face of his actual run support (and bullpen support, which I don't know if Chris considered), that would be a legitimate argument IMO against Blyleven's HOF candidacy.
-- MWE
this is what I think.
50% is too high a threshold, IMO, and as David Gassko suggests in #140, I think it winds up discriminating against the truly great fielders. However, if ZR would give players credit for a play made OOZ without charging the player with a zone opportunity, that would go a long way toward fixing the problem.
-- MWE
Derf.
Every groundball in UZR counts against someone, even if it is unfieldable. ZR doesn't "disclude" them. It just counts them as plusses.
Yes, it does. Re-read MGL's original UZR articles. Or, look at how much the results change when batted ball speed is included.
BUT HE"S USING THE WRONG ZONES! His baseline is off - if the hard hit balls are in the 56 zones, it won't impact it. Or up the middle.
Or, re-read the Ichiro! thread on an article I wrote about his decline in performance in '05 where Mitchel demonstrated that a large part of Ichiro!'s lower BA on GB was slower hit speed
Did you read the tail end of those comments? That data was analyzed all wrong. He was hitting the ball *further to the left* - that is - not up the middle, but to the shortstop.
And that doesn't address what I said anyway.
Range is like using OBP instead of SLG (or vice-versa, I don't really care). Neither ZR nor range capture the whole truth, but together (OPS) they can get 90% of the way there (or whatever % you choose to use).
I'll refrain from explaining the value of your method to you.
Sure. But there's value in turning the lineup over, and part of the linear weight reflects that value. The game-state approach undervalues that performance when compared to a linear weights approach; the linear weights approach understates performance in high-leverage situations when compared to a game-state approach. In the end state, the difference between the two isn't usually very significant; it happened to be significant this year because a good percentage of ARod's value was concentrated in low-leverage situations and a good percentage of Ortiz's value was concentrated in high-leverage situations.
-- MWE
Sure, but Bonds is an extreme outlier in event distribution. Given that RC does closely approximate runs scored at the team level, it is very appropriate to conclude that the RC scores for individual players with event distributions in the normal range do accurately reflect their total of runs "created."
The reasonable test of any metric isn't how accurately it captures an extreme outlier, or the hypothetical "suppose a guy went 1-for-1,000" cases. It's how accurately it captures the great majority of normally witnessed cases.
---
50% is too high a threshold, IMO, and as David Gassko suggests in #140, I think it winds up discriminating against the truly great fielders. However, if ZR would give players credit for a play made OOZ without charging the player with a zone opportunity, that would go a long way toward fixing the problem.
Mike - let me make sure I'm at the right point in the discussion. Right now,
ZR = ( BIZ_fielded + BOZ_fielded) / (BIZ_opps + BOZ_fielded)
Is that correct? Leaving a theoretical maximum of 1.00 ZR. I think your formula would be (if I understand correctly):
ZR_emeigh = (BIZ_fielded + BOZ_fielded) / BIZ_opps
As I see it, this makes sense. Take two players: one converts 3 out of 4 BIZ plays and 1 out of 1 BOZ play; the second converts 4 out of 4 BIZ plays and 0 out of 1 BOZ play. The first will have a 4/5 = .80 ZR, the second will have a 4/4 = 1.00 ZR, even though (assuming the balls hit to them have equivalent run value) they both provide equal fielding value.
Right?
But the value-added method doesn't deal in real wins and losses, only probable wins and losses. A player gets credit for nearly a run when he hits a triple, because he "should" score, even if he ends up stranded. A player gets credit for a go-ahead HR in the 8th, even if his team loses in the 9th. Consequently, the value-added approach is neither fish nor foul -- it's less useful than traditional sabremetric methods for assessing true talent (because context is so powerful), but probably no more accurate than Runs or RBI in assessing "true game value." Does value-added tell us more about Ortiz' clutch performance than RBI or BARISP? It's not at all clear that it does.
I think it's great for understanding leverage and reliever impact (as Studes suggests), but skeptical about other applications.
Correct.
this is what I say in my "What is Zone Rating?" article (on your side bar):
When a player fields a ball outside his zone and turns it into an out, it is counted as both an out and a ‘ball in zone’ for the purposes of calculating his zone rating. This is a flaw in ZR, as balls outside the zone should be counted only as an out. Otherwise it takes away a “range” aspect of the rating.
Its not really all that relevant. You know the players were full time players, so they had the same basic opportunity to contribute. Its what the bottom line of their contribution was. The only time I'm concerned with it is if their playing time was equal or close. If one player had more high leverage opportunities I don't really care. Like I said its not about being fair its about who contributed more to his teams wins and loses. Some players will get more opportunity than others. Just like some people in life have more opportunities than others, it only matters what you make of your opportunities not how many you have.
Right.
The OOZ plays may not have equivalent run value to the in-zone plays, though. If teams are positioning their players appropriately, the OOZ plays should be worth less in run value than the in-zone plays (on a net basis). In the infield, it's almost certainly true that the OOZ plays are worth less than the in-zone play; most of the OOZ plays are singles, while a good percentage of in-zone plays on the corners would go for extra bases if they weren't made.
-- MWE
Distinguishing value from ability is fine. But a run in the 9th of a 3-3 game does NOT lead to more wins than a run in the 2nd inning of that same game. Yet value-added metrics say it does.
Moreover, the value-added metric doesn't even care if your 2-out single actually plated the runner on second -- it just assumes it did (most of the time). If you really want to measure how many wins a player actually contributed, shouldn't you care about actual outcomes, rather than probabilities?
Right. So what does value-added give us?
So in a tie game a manager should just bring in his best reliever no matter what the inning?
The problem is you don't know in the 2nd inning how important that run will be. You know how important it is in the 9th. Because it ends the game.
The idea is to capture the changes in game state a player is contributing. It measures something completely different from other metrics. I am not advocating its the only metric that should be considered but it is far more important than just a tie breaker type element. Its just as important as runs created or OPS, somewhat more important in my opinion not hugely so but more important. The reason it seems like I value it as much is because in the A-rod/Ortiz debat its one of the few major differences between the players.
That's actually a good example. Relievers are brought in and out of games all the time, so their leverage changes as a result of managerial decisions. That's not true with batters. There's a difference. WPA does, IMO, distort the value of the reliever vs. the starter. So I would only use it to contrast and compare the contribution of relievers and not relievers vs. starters. To me, that's a great example of the limits of its usefulness.
It would be a great tool for assessing pinch hitters, BTW.
No, but you DO know by the time you're casting MVP awards. So why give more credit now to the guy who contributed in the 9th? Value-added mixes up two very different things: 1) hitting well w/ men on base, and 2) hitting well close and late. I can see why you may want to do #1 (though RBI may do it as well and far more simply), but why should #2 matter once games are over?
Also, doesn't this approach give credit only to "clutch" RBI guys, but not to "clutch" OB guys? Under this philosophy, you should give greater credit to hitters who have the "ability" to get on base before a hit than those who don't. But a lead-off single is worth the same, whether or not the hitter eventually scores. VA gives bonus credit to the hitter who drives him in for hitting at the right time, but not to the first hitter who got on base at the right time. Why credit one but not the other?
Yes - but managerial control is out of the hands of that pitcher, just like what the baseout situation is for the hitter. They just have to perform within the bounds of the situation they are put in, and they get credit for it based on how important those situations are.
You don't know why you would want to know who hit well when the game was close and late? Because those hits high leverage? And as I explained to Studes leverage matters for hitters and pitchers.
This approach also gives credit to "clutch OB" guys. I strongly suspsect your missing something in how the system works if you think otherwise.
Right. But that wasn't really my point (though I'll admit I didn't express it well). WPA is useful to assess the value of relievers AS WELL AS the usage patterns of managers. Because both are critical to the optimal use of the bullpen, it's a great tool to look at both. "Value" in that sense is explicitly a function of both.
But the more germane point is that I believe it's not useful to compare relievers and starters. Starters often provide a lot of great innings that keep ballclubs in games but, because they don't pitch say, the final inning of a close game, a reliever will often have more WPA points for pitching one shutout inning vs. a starter pitching, say, 8 innings of 3-run ball. In that case, I don't believe the reliever was truly more "valuable" than the starter. But, due to the leverage of late innings, WPA says he was.
And I think this same argument is why WPA shouldn't be a primary consideration for MVP. Batters are, for all intents and purposes, "starters" whose contributions, if they happen to fall in late innings of close games, will be overvalued vs. those who contributed earlier in games.
---
Here's the "Clutch" OBP example:
Tie game, 0 outs, top of the 9th: P(Home win) = .500
Batter 1 doubles: P(Home win) = .328
Batter 2 singles (runner scores): P(Home win) = .133
(using tango's WE matrix from a while back I have on my computer)
If you assign credit based on delta P(home win), the double was worth .172 wins, while the single was worth .195 wins. Doesn't that strike you as silly?
Moreover, if the same thing happens in the 2nd inning of a game that ended 1-0, the numbers are much smaller.
I'm not sure it is appropriate to immediately conclude that. As it turns out, it is generally correct. But i can see where one would find it a "leap of faith" to believe that just because RC works on the aggregate team level, it must therefore be as accurate for each individual player. It estimates runs based on getting on base and moving runners around. Well, no player gets on base and then bats himself around. I've never studied it, but this may not affect only Bondsian outliers. Maybe a Roy Thomas/Richie Ashburn type is consistently misvalued by basic RC. Maybe a Tony Batista/Joe Carter type is as well. these players may not be "normal," but they are not so rare that we shouldn't expect I'm sure their RC estimates are good enough for government work, but perhaps some of the more advanced runs-created estimates handle those types of players better. (or perhaps they even handle all individual players better, even if they are no more accurate at the team level).
Certainty has value. However, I agree that would be a useful stat to have as well, not in RBI form but wins added form... as soon as someone provides it, I will use it. I don't have the accumen or data to do it myself. Then I could even lower the dependecy on standard metrics.
As for your clutch OBP example, that seems reasonable to me. They were both very important to winning the game. I guess you object to the double being more valuable than a single, but I see no problem with that, it is what it is and one way or the other it is reasonable. Much more reasonable than saying that double was worth .74 wins and the single was work .47 wins.
While overvalued/undervalued is relative, it does a closer job or approximating than using static weights does, at least in terms of change in win contribution. Though I certainly agree you need to use both or sometimes you'll get wrong answers.
It gives credit in one sense: if you get on base close and late. But it's indifferent to the performance of later hitters -- i.e. whether you actually score! But the system does reward hitters for doing well with men on base.
We don't think about "clutch" OB performance for the obvious reason that a hitter can't know what the guys following him in the lineup are going to do. But in value terms, there's no logical difference. (And there's good reason to doubt clutch RBI performance is a skill for most players, but that's another debate.)
How so? Seems like you use some combination of season stats (like LWTS), game situation adjustments, and how well the team finishes without explaining why you use these 3 things and why you weight them like you do. You're not required to explain it, but given how you've acted in the AL MVP threads, I assumed you would explain it.
1) Can we only use defensive metrics to ID really exceptional and awful fielders, or can they also make finer distinctions?
2) How many years of data do we need to judge a player's true talent?
3) What's the y-t-r correlation of Dial ratings and other metrics? And is this one way to judge the competing value of different metrics?
Heh. Overvalued and undervalued are relative. That's the whole point. And that, of course, is the seduction of WPA. It isn't subjective; it's the result of real math. That doesn't mean it isn't distorted, however, particularly in terms of how it handles events in the ninth inning vs. earlier innings.
I've stated what my objections are. As much as I use and enjoy WPA, I don't endorse it as a primary tool for MVP determination, at least in its present state.
If the same cup of coffee costs $2.50 across the street, that's different, these quarters are clearly buying less -- but it's a different game. And when coffee is better or worse across the street, that makes it interesting. But within a game, all runs are of equal value.
I've proposed, but never done since I don't have any PBP data or the time/inclination to exactify the procedure, that the best way to compute value* -- at least the offensive part -- is to replace all a batter's PA with a random event generator that is park and pitcher adjusted, restart all innings at the same place, don't introduce substitutions, sim 100 times, and definite value as actual team wins - mean simmed team wins. Similar fielding and pitching metrics can be defined, but might be a bit harder.
*this is value defined as "worth to specific team". I haven't decided if this is better than "worth to any team" -- wherein the player could be "added" to every other team in the league, moving around other players on that team and getting appropriate # of PA (min 0, max actual PA) and then computing delta team runs using some RC/XR type formula and some Defensive Runs formula. And all these up, including the "worth to specific team" value that you got above, or just a RCAA + FRAA value, and take the average. That's the other metric I'm mulling over -- it is context independent, but it has to be since you can't fairly get the context of a hypothetical team.
I like your obs. I suspect the accuracy is a little better than that wrt performance (not true talent).
I'll see what I can't generate for y-t-y data, as well comparing to a team value (although I don't think that'll work).
And I think about 3 years to judge a player's ability - but not his value.
I would consider it far more of a little hop than a leap. Maybe a nudge. Your basic commonsense deduction, actually.
Team offense is nothing more nor less than the aggregation of its players, occurring in sequence. Unusual players (and they get no more unusual than Bonds), and teams with unusually good or bad RISP performance, will elude RC to some extent, of course. But for the practical purposes generally required, if it works well at the team level, it's a logical deduction that it's working well at the individual level.
well i was just quoting Boots Day's terminology. for the record i never really questioned using RC for individuals and i have no problem w/ doing so. just playing devil's advocate a bit, as i can see how one trying to think about it analytically might pause at assuming the calculation works for the individual as well as it does for the team.
In Kyle S's example, the double was worth .172 wins and the single was worth .195 wins. Batter #1's double will still be worth .172 "wins" even if the next 3 hitters strike out. The double that preceded a single is far more valuable, but both are assigned the same VA value.
Now let's say batter 1 strikes out, but batter #2 again singles. Now his single is worth just .063 wins, 1/3 as much. The "value" of batter #2's single is hugely impacted by the success/failure of his teammate, but the value of #1's double is not affected by his teammate's performance. Does that make sense?
A team's goal, going into the game, is not "keep it a close and then get a big hit late". The goal is "put runs on the board early, knock out their starting pitcher, and then not let them get back into it". The team wants to create low leverage situations where they are ahead. Obviously, they won't accomplish that goal, and when they do find themselves in high leverage situations they want to perform well. But from what I can tell, the WPA stats do not give much credit for creating the low leverage (with your team winning) context.
A hitter doing well in the high leverage situations does have an impact to the team's won/loss record, and should be considered. But the stats seem to almost penalize hitters who do their damage early, thus avoiding the high leverage situations altogther.
The same sort of argument can be made for starting pitchers, though they're mostly limited to avoiding low leverage situations where your team is losing. They're still playing a big part in creating the context.
Relief pitchers on the other hand, are thrown into a context they did help create. The WPA stats are better tools for them because they are never (or at least very rarely) in a situation to create a low leverage situation with their team winning, therefore won't be penalized by the stats.
As mentioned earlier, pinch hitters could evaluated similarly.
My mind is open, very few others appear to be however.
You would also have to do some level of situational adjustments, to take into account known variations in batter performance (as a group). Batters (as a group) have a different level of performance with a runner on first than they do with the bases empty, for example - this was something that James noticed when he was doing his rookie study back into the '80s.
-- MWE
That's an awesome point. Well said.
Most likely, as the smaller the sample size, the more inaccurate the model is going to be.
My main issue with the use WPA stats for hitters (and this same argument would apply for starting pitchers, but not relief pitchers) is that the hitters help to create the context, and the ideal situation is to avoid high leverage situations altogther.
I second Studes on this. Excellent comment.
My mind is open, very few others appear to be however.
Most predictable response award goes to MHS.
On a situation where someone hits a triple in a tie game in the bottom of the 9th with 0 outs, and then there are 3 straight strikeouts afterwards...would you value that triple the same as a triple in the same situation with a single that followed? I honestly am curious.
Steve Treder:
I would consider it far more of a little hop than a leap. Maybe a nudge. Your basic commonsense deduction, actually.
Team offense is nothing more nor less than the aggregation of its players, occurring in sequence.
I could be way off base here, but...
I don't know if this is necessarily true. Actual runs correlate perfectly on the team level to team offense productivity (nobody is surprised, I know). RBIs correlate very well. The reason they do not correlate well on the individual level is because what happens before and after the hitter is important. I think you could say that's possible true with RC. In what order events occur affects the offensive value of certain contributions. Plus, a walk isn't a walk. A walk is more valuable in certain positions in the lineup than it is in others. I don't think these make huge differences, but there I think there is at least some reason to think that just because RC correlates well on a team level doesn't mean it's going to correlate just as well on an individual level.
I don't know about Rauseo, but Win Probability methods would give equal rank to those triples.
Small sample size obviously means BsR will have SOME issues, but because of the strength of the build (that is, because BsR works at extremes much better), BsR will be MUCH closer than any other run estimator. For example, look at the following table from Tangotiger's Part 3 of his BsR series:
Runs Scored, breakdown by HR hit HRclass n R BsR LWTS RC 0 33,068 3.08 3.06 3.79 3.03 1 23,117 4.62 4.62 4.44 4.66 2 9,218 6.12 6.12 5.00 6.41 3 2,838 7.65 7.65 5.62 8.37 4 687 9.03 9.00 6.07 10.29 5 146 10.55 10.49 6.73 12.45 6 40 12.33 12.32 7.52 15.35 7 9 16.22 14.32 8.34 18.27 8 2 14.00 15.87 8.58 22.52 10 1 18.00 18.30 9.51 27.03You must be Registered and Logged In to post comments.
<< Back to main