Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Quote Blog > Discussion
Baseball Quote Blog
— Unearthing Quotes from Baseball's Past

Monday, November 27, 2006

Poor Man’s Super Linear Weights (PM-SLWTS)

Introduction

A few years back Mitchel Lichtman one of the co-authors of “The Book” first published a metric of his design called Super Linear Weights (SLTWs).  The metric used play by play data purchased from Stats Inc. to calculate the value of a player’s batting, defensive, and base running contributions to a team.  Since then, Mitchel went to work for a ball club and was no longer able to publish SLWTS, which left us without a really good measure of a player’s contribution.  The stats that are available are flawed and not all encompassing.  Win Shares designed by Bill James and available at the Hardball Times website uses a useless defensive system, and no base running system.  VORP and WARP published by Baseball Prospectus are each flawed in different ways – VORP uses a ridiculous fictional baseline, has no base running system and only uses a positional adjustment for defense.  WARP, also uses a ridiculous fictional baseline, has no base running system and uses a useless defensive system. 

Luckily for us, with the advent of the Bill James Baseball Handbook, and various other sources we have the components available to us to calculate a Poor Man’s Super Linear Weights (PM-SLWTs).  I have done very little heavy lifting here.  Most of the real work was done, by Bill James, Lee Sinins, Chone Smith, and the good people at Baseball Info Solutions – I neither deserve nor want any credit for their work – they are the real sabrmetricians at work on this project, I’m just a book keeper. 

Sources

Batting:  RCAA – The Complete Baseball Encyclopedia published by Lee Sinins and available here: http://www.baseball-encyclopedia.com/

Fielding: Zone Projections – Published by Chone Smith and can found here: http://lanaheimangelfan.blogspot.com/

Base Running: Baseball Info Solutions – The Bill James Handbook 2007 and can be purchased at BN or Amazon

Position Adjustments: The data used can from the Complete Baseball Encyclopedia using 2000-2006 data and was calculated by me – the factors are based on 400 outs are:

1b:  -14
2b:  +6
SS:  +9
3b:  +2
CF:  +1
LF:  -9
RF:  -8
DH:  -20 (no calculation, intuitive based on other adjustments)

Notes
Before we get into the data, a few notes on some of the data.  There are a number of defensive systems available – I chose Chone Smiths for a couple of reasons.  1) It is zone based.  At the end of the day any system that is based on primary defensive data will be virtually useless as it is nearly impossible to estimate player’s opportunities from that data. Imagine trying to calculate batting average without, having a reasonable idea if a player had 600 At-Bats or 475 At-Bats.  While Zone data isn’t nearly perfect, and has a considerable amount of noise it is still the best that we have. 2) It incorporates outfield park factors, which play a large part in determining specific player values.  A major flaw in zone data is that it measures the distance of flyballs from home plate rather from outfield wall, so you can see players get charged with opportunities when balls are not actually playable (think Green Monster) this can create large distortions from reality.  3) It is regressed – while the idea of this project is to measure how many runs a player contributed relative to an average player during last season (2006), I believe that there is so much noise surrounding defensive statistics it is appropriate to be as conservative as possible so that you get the best estimate of how many runs a player contributed last year.  While I have no doubt that I using a regressed projection rather than last years specific stat will mean that we get less players exactly right, I believe your sample as a whole get more players approximately right.  Also keep in mind that catchers are not getting any defensive credit, beyond a position adjustment, that isn’t the correct methodology and should be enhanced at some point in the future.  Zone rating isn’t applicable to them, however, at minimum a caught stealing factor should be included, I have some ideas but haven’t put enough thought into it yet.  Also outfielders don’t have an arm rating either.  Again that should be enhanced at some point in the future. On to Base Running, BIS provides data on bases gained.  I converted bases to runs assuming each base was worth 0.30 runs, which is based off traditional linear weight values.

Purpose and Uses
The purpose of this project isn’t to reinvent the wheel; it is just to get a catch all tools to evaluate how players performed last year.  It isn’t meant to be used as a forecasting device as it only uses 1 year worth data rather than multiple year’s worth of data which is needed to make reasonably accurate projections.  I don’t believe it should be used as a device to determine MVP, as this is somewhat context neutral while MVP discussions should include the context of ones production.  I believe that these rankings give us a reasonably clearer idea of who was a better player least season.  For example, let’s compare two players – Josh Willingham and Mark Ellis. 

At Bats OPS+ VORP

Josh 502 121 28
Mark 441 85 7

It would be very tough from that line to determine that Ellis was a better player, yet when you look at the players PM-SWLTs it’s tough to ignore that conclusion as Ellis is better by more than 25 runs.  While I realize that these numbers are estimates, 25 runs is a very hard number to ignore. 

Legend
POS:  The players primary position as defined by the Complete Base Ball Encylopedia
RCAA: Runs Created Above Average (Batting Runs)
POS – ADJ:  Position Adjustment
DEF: Defensive Runs
BASE RUN: Base Running Runs
PM-SLWTs: Poor Man’s Super Linear Weights
PM-SLWTS/400: Poor Mans Super Linear Weights Per 400 Outs

DATA can be found HERE

I tried to post the data here but I couldn’t get the formating right on the columns.  Sorry for having to link it. 

Mister High Standards Posted: November 27, 2006 at 01:33 PM | 44 comment(s)
  Related News: Sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 1 pages
   1. zoperino,if youre not into the whole brevity thing Posted: November 27, 2006 at 02:45 PM (#2246606)
Catchers all have identical defensive values? How coincidental.
   2. AROM Posted: November 27, 2006 at 03:01 PM (#2246623)
Nice work. Send me a reminder email and tonight I can send you catcher data. Let me know if you want just 2006 or the multiyear data.
   3. pweber Posted: November 27, 2006 at 03:16 PM (#2246636)
Is Chase Utley that good a baserunner? The second best in all of baseball behind Chone Figgins? Just intrigued by that.
   4. Gary Geiger Counter Posted: November 27, 2006 at 03:21 PM (#2246642)
Does Lee use the newer version of Runs Created (the one used in the Win Shares book) or does he use the older version that doesn't adjust for context?

I'm kind of fond of Tango's linear weights with men on base info myself, but it's six of one, half a dozen of the other.
   5. Duffy Duff Posted: November 27, 2006 at 03:26 PM (#2246645)
1) Since Lwts already include the impact of the out, shouldn't this be in the form of PM-SLWTS per 650 PA rather than 400 outs?

2) I don't think it's correct to use regressed fielding runs here. I think I understand why you're doing it (greater uncertainty in fielding runs compared to batting runs), but if you want to do that you should regress batting runs also. They will simply be regressed less.

3) I don't think a baserunning base is worth .30 runs. I think .20 or .25 is better. And did you debit for baserunning outs by .5 runs or whatever?
   6. Kyle S Posted: November 27, 2006 at 03:27 PM (#2246646)
Very nice work, Matt. Thanks! to nitpick:
I have done very heavy lifting here.

99.99% sure you meant to say "I have done very little heavy lifting here."</em> :)
   7. Mister High Standards Posted: November 27, 2006 at 03:37 PM (#2246655)
'zop - RTFA.
Also keep in mind that catchers are not getting any defensive credit, beyond a position adjustment, that isn’t the correct methodology and should be enhanced at some point in the future. Zone rating isn’t applicable to them, however, at minimum a caught stealing factor should be included, I have some ideas but haven’t put enough thought into it yet.


AROM - Thanks I'll send you a note.

GGC - I believe he uses the orginal.

Duffy - 1) I believe outs is the correct context either way. 2) I thought about this a ton, and if I was trying to projectI think your right. While part of my rational is what you stated, I also believe (and I don't have a good reason for believing it) is that in any given year a random fielder is more morew likly worth his projected fielding number than his actual fielding number. For batting I think the single season number is more illustrative of what the players performance was. 3) I'll think about it.

Kyle - Fixed. Thanks.
   8. Kyle S Posted: November 27, 2006 at 03:43 PM (#2246661)
No prob. FWIW, I think you're probably right on #2 - when Griffey Jr for instance is measured at -52 or some ludicrous number like that, I don't think his actual performance cost the Reds 50 runs more than an average CF would. It's more likely that he just faced a BIP distribution that was much harder to field than the average BIP distribution is. Regressing allows you to account for that.
   9. zoperino,if youre not into the whole brevity thing Posted: November 27, 2006 at 03:51 PM (#2246665)
<'zop - RTFA.>

Rauseo- It was hard to muddle through it, what with all the typos.
   10. Yeaarrgghhhh Posted: November 27, 2006 at 04:06 PM (#2246673)
Interesting...a few random thoughts:
- Chase Utley is really, really, really good.
- Ken Griffey, Travis Lee, Ronny Cedeno, and Angel Berroa are really, really bad.
- Markakis was far better than Hermida. It'll be interesting to see how they progress.
- further proof that AL MVP was a toss up between Jeter and Mauer (ignoring the C defense issue for the moment).
   11. Minus Ice Posted: November 27, 2006 at 04:55 PM (#2246723)
Very interesting stuff. A bit hard on WS and WARP3 though, eh ?

BTW - How do you know exactly how BP calculates WARP3?

And what do you think about BFW ( Batter Fielder Wins) from the ESPN encyclopedia?
   12. Ludwig the Indestructible Posted: November 27, 2006 at 04:57 PM (#2246727)
Couple of anomalous #s :
Two defenders, generally considered above average ( Langerhans, Endy ) have 0 runs for defence..found that interesting.
And as for Chipper Jones, I remember him always being a above average baserunner ( observation and numbers ), but he gets -2 for this season. But wow, he still had a monster season.

and nice work, was an interesting read
   13. Dan Turkenkopf Posted: November 27, 2006 at 05:12 PM (#2246741)
Nice work.

Mind going through how you calculated the positional adjustment (for the position, not for the player)?
   14. Chuck Oliveros Posted: November 27, 2006 at 05:40 PM (#2246769)
Would it be possible to provide the data in a format that will fit on a printout on a standard piece of paper in landscape orientation? As it is, some of it gets cut off.
   15. Nate Posted: November 27, 2006 at 05:40 PM (#2246771)
Phil Hughes a Condom ('zop) Posted: November 27, 2006 at 02:51 PM (#2246665)

<'zop - RTFA.>

Rauseo- It was hard to muddle through it, what with all the typos.


Thanks for coming out. Really.
   16. Chuck Oliveros Posted: November 27, 2006 at 05:41 PM (#2246772)
Excuse me, I meant portrait orientation.
   17. Шĥy Posted: November 27, 2006 at 06:07 PM (#2246793)
Why do the defensive numbers differ from Dial's so much? Endy a 0? Reyes a -1?
   18. bibigon Posted: November 27, 2006 at 06:35 PM (#2246820)
Players more valuable than Chase Utley this year: Albert Pujols, Carlos Beltran. If I recall, last year by the way, by SLWTS, MGL had him 1st.

Also interesting is that Utley ranks 1st in terms of gap between the top player at his position, and the #2 guy. Utley clocks in at 62 runs better than average overall value - Robinson Cano was #2 at 23. The difference between Utley and the #2 guy was 39 runs. The difference between the #1 guy and the #2 guy, combined at every other position(including DH) was only 65 runs.
   19. Mister High Standards Posted: November 27, 2006 at 06:48 PM (#2246830)
Minor problem with the data - some of the defensive 0's (not the catchers) aren't zero's they are N/A as Chone doesn't have some players on his site... Langerhans and Endy fit into that camp.
   20. Master of Karate and Friendship (Kyle C) Posted: November 27, 2006 at 07:22 PM (#2246853)
I did this too, only I used LWTS run values on offesne instead of the RCAA. I posted it in the saber forum on here.

Nice work MHS, especially adding in the baserunning data.
   21. Master of Karate and Friendship (Kyle C) Posted: November 27, 2006 at 07:24 PM (#2246858)
Minor problem with the data - some of the defensive 0's (not the catchers) aren't zero's they are N/A as Chone doesn't have some players on his site... Langerhans and Endy fit into that camp.


If you want defense from 2002-2006 send me an email through BTF and I can send the excel sheets to you.
   22. Gary Geiger Counter Posted: November 27, 2006 at 07:32 PM (#2246869)
Is AROM's name really Chone or is that a joke foited on us from the Halosphere?
   23. Duffy Duff Posted: November 27, 2006 at 07:39 PM (#2246874)
MHS, in your response to my post, instead of simply saying that you believe you are right and I am wrong, see you later,-- I wish you would have offered more.

I mean, I'm pretty familiar with this stuff, and my first instinct is that my objections are correct. As someone who is in the "active" role here--you published an article, not me--I think you should respond in more detail.
   24. Gaelan Posted: November 27, 2006 at 07:52 PM (#2246885)
I'll reiterate that I also think counting per/out is wrong. Since outs is already in the calculation of runs created to state the metric per out counts outs twice.
   25. Mister High Standards Posted: November 27, 2006 at 08:05 PM (#2246893)
Chuck - Once I have all the data finalized I'll publish a pdf copy (i'm waiting on some missing player data). If you want to send me an email through the site I can forward you something in word.

I fixed in the zero issue - some players def is now listed as N/A when I get the data I'll update it.

Duffy, thats a fair point I should have given you more specificly on point number 1. Point 2, really is what it is. I'm not sure what more I can tell you except why I did it the way I did it. I see and understand your point, I just don't agree with it. Point 3) I needed to check my copy of hidden game which is why I held off on replying.

Dan - here is an example of how I calculated the position adjustments.

From 2000-2006, firstbaseman created 23,240 runs and used 98,145 outs - or .24 runs per out. Over 400 PA's thats 94.7 runs. During that period all players I used in my calculation created 160,832 runs using 793,220 outs or .20 runs per out. Over 400 outs thats 81.1 runs. 81-95 = -14.

Duffy - I'll get back to your other points in a bit.
   26. Duffy Duff Posted: November 27, 2006 at 08:10 PM (#2246897)
---"I'll reiterate that I also think counting per/out is wrong. Since outs is already in the calculation of runs created to state the metric per out counts outs twice."

No, RC per out is correct. I was under the impression that MHS is calculating LWTS runs, which is not RC but is Runs above avg. In that case PA is the correct opportunity factor.

Plus, I'm not sure how the fielding opps are converted to the 400 out scale (or even the more correct PA scale). Lots more basic explanation, which should have been included in the writeup, is needed here...
   27. Mister High Standards Posted: November 27, 2006 at 08:21 PM (#2246901)
Duffy - In regard to your thoughts on the outvalue being .2 or .25 rather than .3, it really doesn't have a impact less than 1 run either way and the stat isn't really ment to be accurate to the run. I'd hope it is accurate to the point 5 runs, but I'm not sure if thats the case or how to even check that. The reason I chose .3 is in the Hidden Game pete uses .3 as the value of a stolen base, also if you look at the difference between the value of a single, double, triple, or homer the weighted average difference is very close to .3027.
   28. Duffy Duff Posted: November 27, 2006 at 08:26 PM (#2246909)
----"Duffy, thats a fair point I should have given you more specificly on point number 1. Point 2, really is what it is. I'm not sure what more I can tell you except why I did it the way I did it. I see and understand your point, I just don't agree with it. Point 3) I needed to check my copy of hidden game which is why I held off on replying."
________________________________

Ok, on point # 2, I accept that you thought about it deeply, but I still think you're wrong. Either regress all components appropriatly (still in the end giving an advantage to batting, which is what you are after), or don't regress anything.

On point 3, please don't pull out "The Hidden Game". There are much more current sources of baserunning value. If you want to use an overall value for a base, then .25 is most appropos. But you also have to factor in the outs, which AFAIK you didn't mention, and which is right there in the B J HBook.

And on point#1, it is quite accepted that the correct denoninator for RC is outs, while for Lwts runs (=RAA), it is PA. Take a look at Tango's wOBA in The Book for further information.
   29. battlekow Posted: November 27, 2006 at 11:52 PM (#2247082)
Why the hell is this in the Baseball Quote Blog?
   30. Gary Geiger Counter Posted: November 28, 2006 at 09:02 AM (#2247223)
Why the hell is this in the Baseball Quote Blog?


The QB is Matt's little space here. Maybe he should rename it (Matt's Batts?), but this is where his stuff goes.
   31. Gary Geiger Counter Posted: November 28, 2006 at 11:14 AM (#2247340)
I'm kind of fond of Tango's linear weights with men on base info myself, but it's six of one, half a dozen of the other.


I have never seen anyone run a critique of this method; either or con. I do like how it takes the men on base situation into consideration (not unlike Win Shares RC). Plus, folks who study this type of stuff more than me prefer linear weights over runs created. (I am aware of base-runs, but, from what I understand, it's not a good measure for individual players.)
   32. Dan Turkenkopf Posted: November 28, 2006 at 11:42 AM (#2247381)
I have never seen anyone run a critique of this method; either or con. I do like how it takes the men on base situation into consideration (not unlike Win Shares RC).


Not that I've done anything more than read Tango's article, but I like this concept too - especially for retroactive value measures - which is what Matt's trying to do here.

I've thought about trying to use a LW approach combined with leverage index as an alternative to WPA. Obviously you'd have to look at every at-bat to figure this out, but I think there might be a few advantages. First off, the numbers seem a little more intuitively reasonable to me for the few games I tried to chart. Second, the result would be runs above average, which allows us to use it in this type of overall value calculation. I know it's possible to convert WPA to runs, but I'm not convinced that WPA wins are the same beast as theoretical wins and therefore that the same converter holds in all cases.

(I am aware of base-runs, but, from what I understand, it's not a good measure for individual players.)


I thought one of the benefits of base-runs was that it held up well on the extremes. Does that just mean a team of 9 Barry Bonds rather than any individual player?
   33. Mister High Standards Posted: November 28, 2006 at 01:39 PM (#2247509)
I've updated the data file replacing per 400 outs w/ per 650 PA. My preference is to think about everything in terms of outs, but I don't think either number is particularly meaningful with this data...

I've also added defensive data for the catchers and the missing players (Endy, Langerhans).

Dan - To use base runs for players, you just need to calculate the linear weights from base runs and apply them. I would have used that method, but the idea was to use readily available stats.
   34. Dan Turkenkopf Posted: November 28, 2006 at 01:54 PM (#2247526)
Dan - To use base runs for players, you just need to calculate the linear weights from base runs and apply them. I would have used that method, but the idea was to use readily available stats.


I'm fine with that decision - I was just responding to GGC's remark about base runs.
   35. Gary Geiger Counter Posted: November 28, 2006 at 02:31 PM (#2247565)
I would have used that method, but the idea was to use readily available stats.


OK, that would also leave out Linear Weights with men on base. I'll also have to reread the base-runs thread.
   36. David Cameron Posted: November 28, 2006 at 04:23 PM (#2247680)
35 comments and no one has called BS on the baserunning numbers yet? I don't have my Bill James Handbook laying next to me, so maybe there's an explanation of the numbers that I'm not remembering, but there's no way in hell that anyone is +/- 30 runs above or below average on the bases.

Dan Fox's series on baserunning always showed the best baserunners being worth 7 or 8 runs a season, and the worst being about 5 to 6 on the negative side of the ledger. His metric had Carlos Beltran as the best runner in baseball from 2000 to 2005, and he was +25 for all six years combined.

I'm glad you did this, Matt, but the baserunning numbers invalidate the results. There's just no way that the baserunning runs column are accurate.
   37. Gary Geiger Counter Posted: November 28, 2006 at 04:42 PM (#2247696)
On point 3, please don't pull out "The Hidden Game". There are much more current sources of baserunning value. If you want to use an overall value for a base, then .25 is most appropos. But you also have to factor in the outs, which AFAIK you didn't mention, and which is right there in the B J HBook.


Is Dan Fox's stuff in the 2005 THT Book state of the art, or is there somerhing better?
   38. Mister High Standards Posted: November 28, 2006 at 04:47 PM (#2247701)
David - those are bases not runs... so +24 would be 8 runs... its conveted from bases to runs in the PM+SLWTS column. Its mislabeled in the post as runs (which wasn't in issue in the first draft that went up, but was an issue in the updated copy)... So for example:

Pujols: 76-13+1+(13*.3) = 68
   39. AROM Posted: November 28, 2006 at 04:48 PM (#2247702)
35 comments and no one has called BS on the baserunning numbers yet?

They made sense until post 33 - they must have been screwed up when Matt reposted his data.

Right now he has bases gained instead of runs in that column, but the total column seems to be all right.
   40. pkb33 Posted: November 28, 2006 at 08:52 PM (#2247983)
35 comments and no one has called BS on the baserunning numbers yet?

I was coming here to do exactly that, but you beat me to it!

I'd probably express it as runs, rather than bases, given post 38.
   41. pkb33 Posted: November 28, 2006 at 08:56 PM (#2247989)
I don't think using a different position adjustment for DH than 1B makes much sense; I do realize there's different views on that.
   42. Harold Posted: November 28, 2006 at 10:51 PM (#2248090)
I thought one of the benefits of base-runs was that it held up well on the extremes. Does that just mean a team of 9 Barry Bonds rather than any individual player?

Yes. Base Runs will tell you better than any other system how a team of 9 Barry Bonds will fare. But we're (usually) more interested in how adding one Barry Bonds to an otherwise-normal team will do. So you don't want to just calculate the Base Runs of Bonds' stats; you want to look at how adding Bonds' stats to a hypothetical team would change said team's Base Runs total.

And the generalization of this is to calculate linear weights values for each event. In other words, rather than having to calculate Base Runs for team+player for every player in baseball, figure out on average how many base runs each single is worth, each double is worth, and so on, and then apply those run values to a batter's stats.
   43. Gary Geiger Counter Posted: November 29, 2006 at 09:02 AM (#2248294)
(Y)ou want to look at how adding Bonds' stats to a hypothetical team would change said team's Base Runs total.

And the generalization of this is to calculate linear weights values for each event.


Are these numbers any better than the ones that Palmer came up with (or the ones that Tango came up with for the various base-out situations?)
   44. Gary Geiger Counter Posted: December 09, 2006 at 05:13 PM (#2256546)
It's a shame that this thread and the PMR thread died while folks are discussing guns instead. This is the tyupe of stuff that this site is for.
Page 1 of 1 pages

You must be Registered and Logged In to post comments.

 

<< Back to main

Support BBTF

donate

My Bookmarks

You must be logged in to view your Bookmarks.

Vivid Seats is a sports ticket broker, concert ticket broker and theater ticket broker offering the best baseball tickets like Yankees tickets, Cubs tickets, and Red Sox tickets, as well as Police reunion tour tickets and Jersey Boys tickets.

Ticket Nest sells Braves, Cubs, Padres, Indians, Marlins, Nuts, Pirates, Rangers, Patriots, Royals, Stars, Tides, Tigers, Twins, Phillies, Wings, Mets, Yankees, Angels, Dodgers tickets, and Dragons tickets.

Buy Cheap MLB Tickets

Concerts Theatre NFL Angels Dodgers MLB Celtics Theater NBA Tickets Venues NHL Lakers Tickets NFL Yankees NHL Phillies NBA Wicked Marlins MLB Concerts Cubs Mets Red Sox Wicked WWE Red Sox Mets Yankees Dodgers

Page rendered in 0.5970 seconds
61 querie(s) executed