Poor Man’s Super Linear Weights (PM-SLWTS)
Introduction
A few years back Mitchel Lichtman one of the co-authors of “The Book” first published a metric of his design called Super Linear Weights (SLTWs). The metric used play by play data purchased from Stats Inc. to calculate the value of a player’s batting, defensive, and base running contributions to a team. Since then, Mitchel went to work for a ball club and was no longer able to publish SLWTS, which left us without a really good measure of a player’s contribution. The stats that are available are flawed and not all encompassing. Win Shares designed by Bill James and available at the Hardball Times website uses a useless defensive system, and no base running system. VORP and WARP published by Baseball Prospectus are each flawed in different ways – VORP uses a ridiculous fictional baseline, has no base running system and only uses a positional adjustment for defense. WARP, also uses a ridiculous fictional baseline, has no base running system and uses a useless defensive system.
Luckily for us, with the advent of the Bill James Baseball Handbook, and various other sources we have the components available to us to calculate a Poor Man’s Super Linear Weights (PM-SLWTs). I have done very little heavy lifting here. Most of the real work was done, by Bill James, Lee Sinins, Chone Smith, and the good people at Baseball Info Solutions – I neither deserve nor want any credit for their work – they are the real sabrmetricians at work on this project, I’m just a book keeper.
Sources
Batting: RCAA – The Complete Baseball Encyclopedia published by Lee Sinins and available here: http://www.baseball-encyclopedia.com/
Fielding: Zone Projections – Published by Chone Smith and can found here: http://lanaheimangelfan.blogspot.com/
Base Running: Baseball Info Solutions – The Bill James Handbook 2007 and can be purchased at BN or Amazon
Position Adjustments: The data used can from the Complete Baseball Encyclopedia using 2000-2006 data and was calculated by me – the factors are based on 400 outs are:
1b: -14
2b: +6
SS: +9
3b: +2
CF: +1
LF: -9
RF: -8
DH: -20 (no calculation, intuitive based on other adjustments)
Notes
Before we get into the data, a few notes on some of the data. There are a number of defensive systems available – I chose Chone Smiths for a couple of reasons. 1) It is zone based. At the end of the day any system that is based on primary defensive data will be virtually useless as it is nearly impossible to estimate player’s opportunities from that data. Imagine trying to calculate batting average without, having a reasonable idea if a player had 600 At-Bats or 475 At-Bats. While Zone data isn’t nearly perfect, and has a considerable amount of noise it is still the best that we have. 2) It incorporates outfield park factors, which play a large part in determining specific player values. A major flaw in zone data is that it measures the distance of flyballs from home plate rather from outfield wall, so you can see players get charged with opportunities when balls are not actually playable (think Green Monster) this can create large distortions from reality. 3) It is regressed – while the idea of this project is to measure how many runs a player contributed relative to an average player during last season (2006), I believe that there is so much noise surrounding defensive statistics it is appropriate to be as conservative as possible so that you get the best estimate of how many runs a player contributed last year. While I have no doubt that I using a regressed projection rather than last years specific stat will mean that we get less players exactly right, I believe your sample as a whole get more players approximately right. Also keep in mind that catchers are not getting any defensive credit, beyond a position adjustment, that isn’t the correct methodology and should be enhanced at some point in the future. Zone rating isn’t applicable to them, however, at minimum a caught stealing factor should be included, I have some ideas but haven’t put enough thought into it yet. Also outfielders don’t have an arm rating either. Again that should be enhanced at some point in the future. On to Base Running, BIS provides data on bases gained. I converted bases to runs assuming each base was worth 0.30 runs, which is based off traditional linear weight values.
Purpose and Uses
The purpose of this project isn’t to reinvent the wheel; it is just to get a catch all tools to evaluate how players performed last year. It isn’t meant to be used as a forecasting device as it only uses 1 year worth data rather than multiple year’s worth of data which is needed to make reasonably accurate projections. I don’t believe it should be used as a device to determine MVP, as this is somewhat context neutral while MVP discussions should include the context of ones production. I believe that these rankings give us a reasonably clearer idea of who was a better player least season. For example, let’s compare two players – Josh Willingham and Mark Ellis.
At Bats OPS+ VORP
Josh 502 121 28
Mark 441 85 7
It would be very tough from that line to determine that Ellis was a better player, yet when you look at the players PM-SWLTs it’s tough to ignore that conclusion as Ellis is better by more than 25 runs. While I realize that these numbers are estimates, 25 runs is a very hard number to ignore.
Legend
POS: The players primary position as defined by the Complete Base Ball Encylopedia
RCAA: Runs Created Above Average (Batting Runs)
POS – ADJ: Position Adjustment
DEF: Defensive Runs
BASE RUN: Base Running Runs
PM-SLWTs: Poor Man’s Super Linear Weights
PM-SLWTS/400: Poor Mans Super Linear Weights Per 400 Outs
DATA can be found HERE
I tried to post the data here but I couldn’t get the formating right on the columns. Sorry for having to link it.
Mister High Standards
Posted: November 27, 2006 at 01:33 PM |
44 comment(s)
Related News:
Sabermetrics
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
I'm kind of fond of Tango's linear weights with men on base info myself, but it's six of one, half a dozen of the other.
2) I don't think it's correct to use regressed fielding runs here. I think I understand why you're doing it (greater uncertainty in fielding runs compared to batting runs), but if you want to do that you should regress batting runs also. They will simply be regressed less.
3) I don't think a baserunning base is worth .30 runs. I think .20 or .25 is better. And did you debit for baserunning outs by .5 runs or whatever?
99.99% sure you meant to say "I have done very little heavy lifting here."</em> :)
AROM - Thanks I'll send you a note.
GGC - I believe he uses the orginal.
Duffy - 1) I believe outs is the correct context either way. 2) I thought about this a ton, and if I was trying to projectI think your right. While part of my rational is what you stated, I also believe (and I don't have a good reason for believing it) is that in any given year a random fielder is more morew likly worth his projected fielding number than his actual fielding number. For batting I think the single season number is more illustrative of what the players performance was. 3) I'll think about it.
Kyle - Fixed. Thanks.
Rauseo- It was hard to muddle through it, what with all the typos.
- Chase Utley is really, really, really good.
- Ken Griffey, Travis Lee, Ronny Cedeno, and Angel Berroa are really, really bad.
- Markakis was far better than Hermida. It'll be interesting to see how they progress.
- further proof that AL MVP was a toss up between Jeter and Mauer (ignoring the C defense issue for the moment).
BTW - How do you know exactly how BP calculates WARP3?
And what do you think about BFW ( Batter Fielder Wins) from the ESPN encyclopedia?
Two defenders, generally considered above average ( Langerhans, Endy ) have 0 runs for defence..found that interesting.
And as for Chipper Jones, I remember him always being a above average baserunner ( observation and numbers ), but he gets -2 for this season. But wow, he still had a monster season.
and nice work, was an interesting read
Mind going through how you calculated the positional adjustment (for the position, not for the player)?
<'zop - RTFA.>
Rauseo- It was hard to muddle through it, what with all the typos.
Thanks for coming out. Really.
Also interesting is that Utley ranks 1st in terms of gap between the top player at his position, and the #2 guy. Utley clocks in at 62 runs better than average overall value - Robinson Cano was #2 at 23. The difference between Utley and the #2 guy was 39 runs. The difference between the #1 guy and the #2 guy, combined at every other position(including DH) was only 65 runs.
Nice work MHS, especially adding in the baserunning data.
If you want defense from 2002-2006 send me an email through BTF and I can send the excel sheets to you.
I mean, I'm pretty familiar with this stuff, and my first instinct is that my objections are correct. As someone who is in the "active" role here--you published an article, not me--I think you should respond in more detail.
I fixed in the zero issue - some players def is now listed as N/A when I get the data I'll update it.
Duffy, thats a fair point I should have given you more specificly on point number 1. Point 2, really is what it is. I'm not sure what more I can tell you except why I did it the way I did it. I see and understand your point, I just don't agree with it. Point 3) I needed to check my copy of hidden game which is why I held off on replying.
Dan - here is an example of how I calculated the position adjustments.
From 2000-2006, firstbaseman created 23,240 runs and used 98,145 outs - or .24 runs per out. Over 400 PA's thats 94.7 runs. During that period all players I used in my calculation created 160,832 runs using 793,220 outs or .20 runs per out. Over 400 outs thats 81.1 runs. 81-95 = -14.
Duffy - I'll get back to your other points in a bit.
No, RC per out is correct. I was under the impression that MHS is calculating LWTS runs, which is not RC but is Runs above avg. In that case PA is the correct opportunity factor.
Plus, I'm not sure how the fielding opps are converted to the 400 out scale (or even the more correct PA scale). Lots more basic explanation, which should have been included in the writeup, is needed here...
________________________________
Ok, on point # 2, I accept that you thought about it deeply, but I still think you're wrong. Either regress all components appropriatly (still in the end giving an advantage to batting, which is what you are after), or don't regress anything.
On point 3, please don't pull out "The Hidden Game". There are much more current sources of baserunning value. If you want to use an overall value for a base, then .25 is most appropos. But you also have to factor in the outs, which AFAIK you didn't mention, and which is right there in the B J HBook.
And on point#1, it is quite accepted that the correct denoninator for RC is outs, while for Lwts runs (=RAA), it is PA. Take a look at Tango's wOBA in The Book for further information.
The QB is Matt's little space here. Maybe he should rename it (Matt's Batts?), but this is where his stuff goes.
I have never seen anyone run a critique of this method; either or con. I do like how it takes the men on base situation into consideration (not unlike Win Shares RC). Plus, folks who study this type of stuff more than me prefer linear weights over runs created. (I am aware of base-runs, but, from what I understand, it's not a good measure for individual players.)
Not that I've done anything more than read Tango's article, but I like this concept too - especially for retroactive value measures - which is what Matt's trying to do here.
I've thought about trying to use a LW approach combined with leverage index as an alternative to WPA. Obviously you'd have to look at every at-bat to figure this out, but I think there might be a few advantages. First off, the numbers seem a little more intuitively reasonable to me for the few games I tried to chart. Second, the result would be runs above average, which allows us to use it in this type of overall value calculation. I know it's possible to convert WPA to runs, but I'm not convinced that WPA wins are the same beast as theoretical wins and therefore that the same converter holds in all cases.
I thought one of the benefits of base-runs was that it held up well on the extremes. Does that just mean a team of 9 Barry Bonds rather than any individual player?
I've also added defensive data for the catchers and the missing players (Endy, Langerhans).
Dan - To use base runs for players, you just need to calculate the linear weights from base runs and apply them. I would have used that method, but the idea was to use readily available stats.
I'm fine with that decision - I was just responding to GGC's remark about base runs.
OK, that would also leave out Linear Weights with men on base. I'll also have to reread the base-runs thread.
Dan Fox's series on baserunning always showed the best baserunners being worth 7 or 8 runs a season, and the worst being about 5 to 6 on the negative side of the ledger. His metric had Carlos Beltran as the best runner in baseball from 2000 to 2005, and he was +25 for all six years combined.
I'm glad you did this, Matt, but the baserunning numbers invalidate the results. There's just no way that the baserunning runs column are accurate.
Is Dan Fox's stuff in the 2005 THT Book state of the art, or is there somerhing better?
Pujols: 76-13+1+(13*.3) = 68
They made sense until post 33 - they must have been screwed up when Matt reposted his data.
Right now he has bases gained instead of runs in that column, but the total column seems to be all right.
I was coming here to do exactly that, but you beat me to it!
I'd probably express it as runs, rather than bases, given post 38.
Yes. Base Runs will tell you better than any other system how a team of 9 Barry Bonds will fare. But we're (usually) more interested in how adding one Barry Bonds to an otherwise-normal team will do. So you don't want to just calculate the Base Runs of Bonds' stats; you want to look at how adding Bonds' stats to a hypothetical team would change said team's Base Runs total.
And the generalization of this is to calculate linear weights values for each event. In other words, rather than having to calculate Base Runs for team+player for every player in baseball, figure out on average how many base runs each single is worth, each double is worth, and so on, and then apply those run values to a batter's stats.
Are these numbers any better than the ones that Palmer came up with (or the ones that Tango came up with for the various base-out situations?)
You must be Registered and Logged In to post comments.
<< Back to main