Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
101. andrew siegel
Posted: April 27, 2005 at 06:53 PM (#1293034)
Paul--
Since players aren't replaced solely on the basis of their fielding and since the data set for major league players at one poisition in any given year is so small, how do you analytically determine the relationship between "replacement level" and "average" on a season-by-season basis?
102. TomH
Posted: June 15, 2005 at 03:39 PM (#1405986)
Q for those who have te biggest issues with WARP -
is the major issue
A) The WARP league strength / timeline adjustment from WARP1 to WARP2?, or
B) the whole WARP1 calc to begin with?
I assume the length-of-schedule adjustment from WARP2 to WARP3 is not a biggie for anybody - maybe exceptions for the real short-league 1880s guys.
I for one don't have a big problem with WARP1, except for maybe a bit of extra fielding credit for SS, for example.
As for the timeline adjustments, if you put 40 of us in a room, you'll soon get 45 opinions :)
103. sunnyday2
Posted: June 15, 2005 at 04:11 PM (#1406130)
Tom, good question.
WARP3 is a complete waste of time, IMO.
WARP1 is OK except that it keeps changing and we don't entirely know why.
WARP2 is somewhere in between ;-)
104. Michael Bass
Posted: June 15, 2005 at 04:17 PM (#1406153)
I love WARP as most know, though I tend to adjust WARP3 mentally when I think the timeline is a bit much (or not enough in the case of WW2).
Still, sunnyday, the only difference between WARP2 and 3 is schedule adjustment, so by any reasonable standard, WARP3 > WARP2, except perhaps in extreme 1870s circumstances.
I think what you mean to say is WARP2 is a complete waste of time, while WARP3 is a slightly better complete waste of time. ;)
105. sunnyday2
Posted: June 15, 2005 at 04:24 PM (#1406186)
Michael, I would schedule adjust WARP1 for the short seasons including 1918. I don't think either W2 or W3 represents such a number. Seriously, WARP1 is OK it's just that I can't keep my spreadsheets up to date with the frequent changes. The changes also bring into question the validity of the whole system. If its owners are pretty sure that their previous iterations suck, how do they know the current one doesn't?
Equally seriously, I know that TPR has been largely discredited. But that mostly relates to pegging a hypothetical zero value at the average. Aside from that, which each of us can adjust for as easily as we can schedule adjust WARP1, where are people at on TPR. Considering I get all my OPS and ERA+ numbers out of the Palmer-Gillette Encyclopedia, and considering the TPR is right there on the same page, I admit to glancing at it now and again. I am not sure it is not a useful number.
I used to look at WS, WARP and TPR and throw out the one that is not like the others. Now I don't look at any of them very much.
106. Chris Cobb
Posted: June 15, 2005 at 04:37 PM (#1406238)
Q for those who have te biggest issues with WARP -
is the major issue
A) The WARP league strength / timeline adjustment from WARP1 to WARP2?, or
B) the whole WARP1 calc to begin with?
I answered this obliquely on the Lombardi thread, but I'll present the arguments explicitly here.
The biggest issue with WARP is the WARP2 all-time context adjustment to fielding value. I think that's simply a mistake.
The second big issue with WARP is the league strength adjustments. Too much of a black box to be trustworthy, especially when the results seem counterintuitive on the visible surface (as in the competition adjustments during WWII).
The third big issue with WARP is FRAR in WARP1. It's another big black box that has tremendous effects on the rankings and that gives results not intuitively reliable.
Because of these issues, I don't rely on WARP2 or WARP3 at all, and I don't rely heavily of WARP1 as a comprehensive metric.
That said, I find several components of WARP to be quite valuable. EQA is a handy improvement, imo, on OPS+ as a batting value rate stat. DERA is a very substantial improvement on ERA+ as a pitching rate stat, with results that can be cross-checked by studies of team fielding in relation to pitching. FRAA are at least as reliable as fws at assessing fielding quality.
In general, I think WARP does a much better job than win shares of handling the changing relationship of pitching and fielding in the creation of defensive value as the conditions of the game change (especially before 1900!). However, there are problems in the ways they turn their rate measures into comprehensive metrics that prevent me from using WARP as the foundational system for my rankings.
107. Dr. Chaleeko
Posted: June 15, 2005 at 05:10 PM (#1406358)
My main objection to TPR was not with the zero point, which isn't all that different than using OPS+ or ERA+ in a holistic sense. I found fault in TB's fielding measurements, and those problems could be summed up simply by noting that Don Mattingly had negative fielding runs while Zeke Bonura had positive ones.
Now until recently, I was working with much older editions of TB (1996), but I also haven't yet checked the "new" 2004 edition to see how it handles fielding. Does anyone know how much the FR calculations have changed? Do they better reflect team-contexts? Or the historical relationships of the positions and of the pitcher/defense balance?
108. sunnyday2
Posted: June 15, 2005 at 05:14 PM (#1406378)
Mattingly is now positive but Bonura is still better. In fairness, Bonura should not be a straw man. He led the league in FA 3 times. That is something.
Isn't TPR sorta like what MGL does with Super LWTS but it is simpler and uses a different defensive system to UZR? I also look at TPR from time to time as it is right next to the Win Shares column of TB 2004. However, I am only 24 didn't get into sabermetrics until TPR was already being discredited and jsut before Win Shares came out. I am not even sure I am well versed enough in TPR to use it.
I do use WARP3 because unlike Win Shares it has a time line and schedule adjustment. there are players that I trust WARP over WS (Bobby Veach, Earl Averill) but usually if push comes to shove I will use WS with a few random adjustments for mistakes I believe they make.
Culd tehproblem with WARP be taht Clay Davenport and the rest of the BPro guys (except for maybe Steven Goldman and Mark Armour if they count) aren't really the baseball historians taht James and Palmer are. For instance, in reading his writings James is as well versed in 1930's or even 1910's baseball as he is in the baseball that he witnessed (60's-on). Can the same be said of any of the BPro guys? Not that they have to be Bill James, but...
At the same time I love Eqa and RARP (VORP a little less so) and think that they really have the cutting edge on modern baseball statistics inever category but fielding, with UZR far better than FRAR or FRAA. It seems to me that they have these nice metrics for modern baseball and decided to trace them back through time. Maybe if a guy with an deeper understanding of baseball history went through and re-did WARP it would look much better. Or maybe not.
Oh, and it would also be nice if they did a metric with BRARP and FRAA. I hate that they use FRAR.
112. jimd
Posted: June 15, 2005 at 09:52 PM (#1407226)
Conceptually, there is no difference between FRAR and FWS. Both start with a base rate for what an average fielder at the position is worth, and then add adjustments, positive or negative, based on the team and individual fielding statistics.
The differences lie in how the base rate is determined, and the exact adjustments made.
113. KJOK
Posted: June 16, 2005 at 12:14 AM (#1407531)
EQA doesn't even have the "correct" weights for batting events, so you're starting from the wrong base to begin with. Using BaseRuns or some Linear Weights model makes much more sense.
I do look at WARP1 relative to peers in the same era, and look at FRAA, but that's it for BP measures.
Now until recently, I was working with much older editions of TB (1996), but I also haven't yet checked the "new" 2004 edition to see how it handles fielding. Does anyone know how much the FR calculations have changed? Do they better reflect team-contexts? Or the historical relationships of the positions and of the pitcher/defense balance?
Fielding Runs have been reworked quite extensively, and seem to be much better. I think if you take FWAA, Win Shares/1000 Innings Fielding, and Fielding Runs all together, you can get a good handle on a player's fielding abilities.
114. Jeff M
Posted: August 01, 2005 at 12:44 AM (#1514057)
The differences lie in how the base rate is determined, and the exact adjustments made.
This is true, and it is nearly impossible (I think) to correlate FWS with RRAR.
Take a fielder (like Lou Boudreau) with 54 FRAR in 1943. Using a very simplified model, suppose 9 runs = 1 win (which is basically what WARP does). That means Lou's fielding was 6 wins above replacement. In WS lingo, that's 18 FWS. That's just for fielding and that's supposed to be above replacement. We don't know what FWS replacement for a shortstop is, but if it is anything north of 0 FWS, then Boudreau would be entitled to more than 18 FWS to make it equivalent to the WARP number. The highest number of FWS ever recorded for a shortstop was 12.83 (it wasn't Boudreau).
Let's take the reverse. Boudreau had 8.8 FWS in 1943, which is 2.9 wins, which is 26.4 runs. So to equate that to an FRAR of 54, you'd have to assume that replacement level in FWS was negative 27.6 runs, which is -3.07 wins, which is -9.2 FWS, which of course, is impossible.
Another way to look at this, using a method tailored to the number of games/innings. The average shortstop in 1943 AL had 3.89 FWS in 91.7 games. Using 8.8 innings per game (just for kicks), you get 807 defensive innings, which means the FWS rate was about 4.82 FWS per 1000 innings, for an average SS.
Boudreau played in 152 games, or 1338 innings, which means the average shortstop in the FWS system would have had 6.45 FWS (4.82*1.338), or 2.14 wins, or 19.34 runs.
BP has Boudreau with 54 FRAR and 22 FRAA, which means the average player with the same playing time as Boudreau saved 32 more runs than a replacement player with the same playing time as Boudreau. If an average shortstop with the same playing time is 32 runs better than replacement with the same playing time, and the average player has 19.34 runs derived from the FWS system, then replacement must be -12.6 runs, or negative 1.4 wins, or -4.2 FWS. Again, that's impossible in the WS system.
Also, if the average shortstop with Boudreau's playing time would have had 6.45 FWS and he had 8.8, then he has FWSAA of 2.35, which is .78 wins, which is 7 runs above average (as opposed to a FRAA of 22).
Although they certainly don't equate, you can come a lot closer to deriving FRAA from FWS than you can FRAR. BP says it sets replacement as the lowest runs calculated at the position for that season, but of course, we don't know how to calculate it.
Certainly BP is using some sort of linear weights system for fielding, with tweaks, which is by its nature going to produce different results than FWS Claim Points.
I think if you take FWAA, Win Shares/1000 Innings Fielding, and Fielding Runs all together, you can get a good handle on a player's fielding abilities.
That's about all you can do, and compare them to comparables at the same position (within the same WS or WARP system).
115. Jeff M
Posted: August 01, 2005 at 12:52 AM (#1514077)
Quick follow up to #114, I noticed last night in the Ed Yost essay in Win Shares (where Bill tears down Palmer's fielding linear weights) the following snippets. They relate to third basemen, but they illustrate the differences I mention in #114:
"The Win Shares system is vastly more conservative in measuring the differences among third basemen, or players at any defensive position, than is Linear Weights."
"Linear Weights in 1952 rates Ed Yost at -35 runs, while rating Fred Hatfield of Detroit...at +19 runs. I think it is better not to assert that there is a 54-run difference between two third basemen -- a 40 homer difference -- without very solid evidence that such a gulf actually exists."
"But a 54-run difference would be equivalant to a swing of about 17 Win Shares. I do not now believe that this is a realistic estimate of the defensive impact of a third baseman. Our system would normally evaluate the difference between the league's best defensive third baseman and the worst at something more like four to five Win Shares." ...or 12-15 runs.
116. Jeff M
Posted: August 01, 2005 at 01:03 AM (#1514096)
EQA doesn't even have the "correct" weights for batting events, so you're starting from the wrong base to begin with.
Three more things I noticed about WARP today, relating to hitting:
1. No positive credit is given for sacrifice hits, unlike RC which gives .50 credit. However, WARP does include sacrifice hits in the number of outs (and so does RC). That would seem to lower the BRAR and BRAA of the "little" hitters.
2. Grounding into a double play does not count as an additional out in the WARP system, but it does in runs created. That would seem to increase the BRAR and BRAA of the slow/ground ball hitters.
3. WARP uses the same formula for hitting regardless of the era, so hits, total bases, walks and steals are all worth the same throughout eras. At first I thought maybe they make up for that in moving from WARP1 to WARP2, but I don't think so, because with the translation to WARP2 they are only taking into account league difficulty.
Runs Created has some modifications tailored to era, which is why there are 24 different formulas.
117. KJOK
Posted: August 01, 2005 at 05:20 AM (#1514526)
"Linear Weights in 1952 rates Ed Yost at -35 runs, while rating Fred Hatfield of Detroit...at +19 runs. I think it is better not to assert that there is a 54-run difference between two third basemen -- a 40 homer difference -- without very solid evidence that such a gulf actually exists."
"But a 54-run difference would be equivalant to a swing of about 17 Win Shares. I do not now believe that this is a realistic estimate of the defensive impact of a third baseman. Our system would normally evaluate the difference between the league's best defensive third baseman and the worst at something more like four to five Win Shares." ...or 12-15 runs
In the "new" Fielding Runs Yost is -30 with a 78 (100=ave) Fielding Rating, while Hatfield is a +12 with a 109 Fielding Rating.
118. KJOK
Posted: August 02, 2005 at 07:01 AM (#1516633)
Has anyone looked at this JAWS HOF system from BP? Since it's probably going to be unavailable after the free trial is over, I'll post the full explanation part of the article below:
Back in January, I examined the 2004 Hall of Fame ballot through the lens of Baseball Prospectus' Davenport Translated player cards. The idea was to establish a new set of sabermetric standards which could help us separate the Cooperstown wheat from the chaff, especially since Bill James' Hall of Fame Standards and Hall of Fame Monitor tools have reached their sell-by date. After all, the Hall has added 26 non-Negro League players since James last revised those tools in 1994's The Politics of Glory, and we've learned a lot since then.
These new metrics enable us to identify candidates who are as good or better than the average Hall of Famer at their position. By promoting those players for election, we can avoid further diluting the quality of the Hall's membership. Clay Davenport's Translations make an ideal tool for this endeavor because they normalize all performance records in major-league history to the same scoring environment, adjusting for park effects, quality of competition and length of schedule. All pitchers, hitters and fielders are thus rated above or below one consistent replacement level, making cross-era comparisons a breeze. Though non-statistical considerations--awards, championships, postseason performance--shouldn't be left by the wayside in weighing a player's Hall of Fame case, they're not the focus here.
Since election to the Hall of Fame requires a player to perform both at a very high level and for a long time, it's inappropriate to rely simply on career Wins Above Replacement (WARP, which for this exercise refers exclusively to the adjusted-for-all-time version. WARP3). For this process I also identified each player's peak value as determined by the player's WARP in his best five consecutive seasons (with allowances made for seasons lost to war or injury). That choice is an admittedly arbitrary one; I simply selected a peak vaue that was relatively easy to calculate and that, at five years, represented a minimum of half the career of a Hall of Famer.
This oversimplification of career and peak into One Great Number isn't meant to obscure the components which go into that figure, nor should it be taken as the end-all rating system for these players. We're looking for patterns to help us determine whether a player belongs in the Hall or doesn't and roughly where he fits. Though this piece is founded on the sabermetric credentials of Hall of Fame candidates, I've also taken the trouble to wrangle together traditional stat lines for each one, including All-Star (AS), MVP and Gold Glove (GG) awards as well as the hoary but somewhat useful Jamesian Hall of Fame Standards (HOFS) and Hall of Fame Monitor (HOFM) scores.
The career and peak WARP totals for each Hall of Famer and candidate on the ballot were tabulated and then averaged [(Career WARP + Peak WARP) / 2] to come up with a score which, because it's a better acronym than what came before, I've very self-consciously christened JAWS (JAffe WARP Score). I then calculated positional JAWS averages and compared each candidate's JAWS to those enshrined.
It should be noted that I simply followed the Hall's own system of classifying a player by the position he appeared at the most. Thus, for example, Rod Carew is classified as a second baseman, and all of his numbers count towards establishing the standards at second, even though he spent the latter half of his career at first base. This is something of an inevitability within such a system, but the if the alternative is going nuts resolving the Paul Molitors and the Harmon Killebrews into fragmentary careers at numerous positions, we'll never get anywhere.
By necessity I had to eliminate not only all Negro League-only electees, who have no major league stats, but also Satchel Paige and Monte Irvin, two great players whose presence in the Hall is largely based on their Negro League accomplishments. Other Negro Leaguers, such as Jackie Robinson, Roy Campanella and Larry Doby have been included. While their career totals are somewhat compromised by not having crossed the color line until relatively later in their careers, their peak values--especially Robinson's--contribute positively to our understanding of the Hall's standards.
Here are the positional averages, the standards, to which I'll refer throughout the piece.
A quick breeze through the other abbreviations: BRAR is Batting Runs Above Replacement, BRAA is Batting Runs Above Average; both are included here because they make good secondary measures of career and peak value. FRAA is Fielding Runs Above Average, which is a bit less messy and more meaningful to the average reader than measuring from replacement level.
It's worth noting that these figures have changed somewhat since the last time around, as Davenport has continued to revise his system--particularly the defensive elements--and adjust appropriately for the way the game has changed over 135 years of major-league history. Most notably, the spread between the average JAWS scores at various positions has been cut in half, which I interpret as a sign that the system's biases have been reduced. So without further ado, we'll move on to the 2005 Hall of Fame ballot.....
I really like that system KJOK. Of course you could do it using WS or WARP3 or WARP1 or whatever you want which is pretty cool.
Anyone care to do it for our guys . . . if I get free time I'll take a stab at it, but I have no idea where I'd be.
It'd be interesting to see our average electee according to that, and maybe an average of our bottom 3, since we don't have the mistakes of the Hall of Fame. Also it's cool to see both the peak and career numbers . . .
very strange when the subconscious mind takes over for the conscious one . . .
121. Jeff M
Posted: August 02, 2005 at 01:03 PM (#1516717)
I really like that system KJOK. Of course you could do it using WS or WARP3 or WARP1 or whatever you want which is pretty cool.
Anyone care to do it for our guys . . . if I get free time I'll take a stab at it, but I have no idea where I'd be.
I have this data somewhere for WS and WARP1. It is the baseline for my WARP and WS evaluations. I'll try to find the spreadsheet and --- and what? How do I post it?
122. Jeff M
Posted: August 02, 2005 at 05:16 PM (#1517172)
Okay, I found the spreadsheet and updated it to include Boggs and Sandberg.
*A caveat with the WARP stuff in the spreadsheet. I did this spreadsheet before the first election because establishing the HoF baseline underlies about 7/8 of my HoM rating system. BP has changed the WARP calculations several times since I first did the sheet a couple of years ago, so I suspect the numbers in the sheet are not perfectly in accord with WARP anymore.
Here's what the sheet has for the HoF hitters (and Ripken), listed by position:
1. RC/27 LgRC/27 RangeFactor LgRangeFactor
2. Win Shares: 3-year peak, 5-year consec peak, 7-year peak, total and per 162 games
3. WARP1: 3-year peak, 5-year consec peak, 7-year peak, total and per 162 games
4. HOF Standards and HOF Monitor scores
Here's what is has for the HoF pitchers:
1. Win Shares: 3-year peak, 5-year consec peak, 7-year peak, total and per 100 IP
2. WARP1: 3-year peak, 5-year consec peak, 7-year peak, total and per 100 IP
3. HOF Standards and HOF Monitor scores
4. Linear Weights: 3-year peak, 5-year consec peak, 7-year peak, total and per 100 IP
5. Wins Above Team
Also, I just added significant relievers in all of the above pitching categories except #4 and #5. Only Eck and Fingers are in the HoF, but I included 15 other recognizable names (Sutter, Face, Rivera, etc.).
You will also see a little "grade sheet" at the end of each positional category. No need to pay attention to that, but it creates a little hall of fame report card using the averages in the various categories and standard deviations in those categories. The spreadsheet currently just does the calcs for the career numbers, but you can easily fix the formulas yourselves to apply them to 3-year peak or some other category...if you are so inclined. I rarely use those numbers "as is". I first eliminate the extreme cases (like Ruth's and Young's career WS on the high end, and Hafey's and Haines' career WS on the bottom end). Anyway, who cares about that.
So, does anyone want this thing? Should I e-mail it to those who request it, or can someone instruct me how to send it to the group through the HOM group on Yahoo?
123. Jeff M
Posted: August 02, 2005 at 06:22 PM (#1517312)
I posted the spreadsheet, but didn't add the JAWS calculations. Extremely simple to add, 'though, given the info in the spreadsheet.
I was leafing through Win Shares, and Bill James says he thinks it would be interesting to see how much "star power" a team has by taking each player's WS for that year and multiplying it by his career WS.
Maybe the same would work for a unified peak/career number. Multiply a player's WS/162 (or WARP/162) times his career WS (or WARP). Maybe take the square root of that to get it to a manageable number.
So Bobby Doerr with 281 WS and 22.61 per 162 would get a score of 79.7. Joe Gordon with 243 WS and 25.13 per 162 would get 78.2. So they are about even, despite Doerr's longer career and Gordon's higher peaks.
Using WARP1, Doerr would get 30.6 and Gordon would get 27.0, which accords with how much more favorably WARP views Doerr.
You'd have to season-length adjust and make whatever other adjustments you want before using this formula.
I think I may have proposed this in the past (without the Bill James backup and without the square root) and it did not take off, but I can't remember. :)
124. Cblau
Posted: December 21, 2005 at 04:04 AM (#1787610)
I'm probably repeating some of what was said previously, but this has been bugging me, and may be appropriate with respect to recent postings in the Bob Lemon discussion.
I was trying to rank center fielders the other day, and was using WARP fielding ratings. What to do about Max Carey? His FRAA was 32, and his FRAR was 556. Now, the difference between an average hitter and a replacement hitter during his time was only 303 runs, so how can the fielding difference by a (mostly) CF be so much more?
If you look at all NL CFs in 1917, you'll find that Edd Roush was the worst, with a fielding rate of 87. He was still well above replacement. His main backup had a rate of 91. Most players with very few games at a position get a rate of 100. But I'm wondering, if Roush is well above replacement, and he's the worst CF, who exactly is this replacement level player who would fill in for him? Some people take replacement level to be equivalent to the worst regular in a league. Looking at the 1917 AL, I find that rates for CFs go as low as 81 for Clyde Milan (still slightly above replacement.) Fielding rates over 110 and below 90 are pretty rare, whereas hitting rates commonly vary by much more than that. So how can the differences between average and replacement fielders be greater than the difference between an average and replacement hitter?
Part of the problem is that WARP treats fielding as of more importance in earlier years, when there were more balls in play. This makes sense, but the degree of difference it uses seems excessive. For Willie Davis, who played about the same number of games, the difference between FRAA and FRAR is 377.
Another problem appears in 1927. Carey has a 108 fielding rate in CF, but a 93 in RF. The other CF on the Dodgers that year also has a high rate, while the other LFs and RFs also have low rates. This looks like a matter of distribution of batted balls more than an accurate assessment of fielding performance.
Anyway, you can see why I don't submit ballots, when I can't even rank the center fielders.
125. Dr. Chaleeko
Posted: December 21, 2005 at 03:25 PM (#1788143)
Cblau,
Sadly, that's one of the several reasons I've chosen to eschew WARP (esp WARP2/3) in creating my own rankings.... Much as WS may have its limitations, they are known and adjustable.
126. Mark Shirk (jsch)
Posted: December 21, 2005 at 04:07 PM (#1788221)
I would say that you should email clay davenport (the email is at the bottom of one of his articles or chats). He has always been kind anough to answer my emails in a decent amount of time.
Cliff - I would agree with you wholeheartedly. Replacement level for defense is generally league average. I'm sure there are exceptions, but you won't go too far wrong if you just use that all the time. This is the major flaw of WARP.
The way I see it, I would value a player like this, theoretically:
1) Batting runs over replacement level at position (I would take the average of the bottom 15% of regulars as replacement level for all positions except pitcher, where I'd use the league average).
2) Fielding runs over average at position.
3) Pitching runs over replacement (I would take the average of the bottom 15% of pitchers in the same role with about 100 IP for starters, maybe 50 for relievers, give or take the length of the schedule, etc. - I'm open to idea here though).
I could see splitting 1 and 3 into - 1a) Batting value over replacement level hitter (generally .350 or so, could see the case for anywhere from .300 - .400) 1b) Defensive constant, based on position played.
What I'm still trying to figure out is a simple way to do this for WARP, without having a database of all their values. I'd pay decent money for a workable complete all-time database of their run values for offense/defense/pitching for each season. Would make it much easier to figure out league norms, etc..
Jeff I the thread dropped off the board before I had to chance to read your post - I'm going to check out the spreadsheet, thanks!
129. Cblau
Posted: December 24, 2005 at 05:02 PM (#1792959)
Joe, I agree with you in general on the proper way to value players.
Still pondering the general issue of pitching versus fielding. Going back to the Bob Lemon discussion, if pitching was a bigger part of defense in 1949 than it was in 1925, and fielding a smaller part, it seems to me that the standard deviation of ERAs would be higher in 1949 than in 1925. So, I did the following. Using all pitchers in 1924, 1925, 1948, and 1949 with over 50 innings pitched, I took the standard deviation of ERA for each team. Then I compared the average for 1924/5 with the mean for 1948/9.
In 1924/5, the average standard deviation was 0.84. For the 1948/9 period, it was 0.87. I also divided the STD by the average ERA, and came up with a figure of 0.20 for the earlier years and 0.21 for the later.
This would suggest that there wasn't much of a change in the relative importance of pitching and fielding in this period. For the 1925 NL, BP has the difference between an average and replacement level CF as 36 runs. In the 1949 AL, it shows a 19 run difference. My study seems to show that the change should be on the order of 2 or 3 runs, not 17. Of course, I could be wrong.
130. Brent
Posted: December 24, 2005 at 06:23 PM (#1793012)
131. Cblau
Posted: December 25, 2005 at 03:44 AM (#1793489)
That ERA includes both pitching and fielding is the beauty of it. Because I'm trying to determine if the relative importance of each changed. Using FIP would just show the change in the 3 true outcomes; I'd then have to try to relate that to regular ERA to determine if the change in ERA is due to pitchers or fielders. Also, it assumes that pitchers have no effect on BABIP.
132. KJOK
Posted: January 01, 2006 at 07:32 AM (#1801666)
I know this is posted elsewhere on the site, but I think we'll befenfit from having it permanently here - Brandon's (Patriot's) Win Shares Analysis:
133. KJOK
Posted: January 06, 2006 at 01:07 AM (#1808552)
Not really clear if Part 6 is the last one, but excerpting just one of the many critcisms I think is really important:
<i>....the system is giving out absolute wins on the basis of marginal runs. 50% of the league average in runs scored, with a Pythagorean exponent of 2, corresponds to a W% of .200. It is for this reason that in old FanHome discussions myself and others said that WS had an intrinsic baseline of .200 (James changed the offensive margin line to 52%, which corresponds to about .213).
In an essay in the book, James discusses this, and says that the margin level(i.e. 52%) “is not a replacement level; it’s assumed to be a zero-win level”. This is fine on it’s face; you can assume 105% to be a zero-win level if you want. But the simple fact is that a team that scored runs at 52% of the league average with average defense will win around 20% of their games. Just because we assume this to not be the case does not mean that it is so.
Win Shares would not work for a team with a .200 W%, because the team itself would come out with negative marginal runs. If it doesn’t work at .200, how well does it work at .300, where there are real teams? That’s a rhetorical question; I don’t know. I do know that there will be a little bit of distortion every where.
In discussing the .200 subtraction, James says “Intuitively, we would assume that one player who creates 50 runs while making 400 outs does not have one-half the offensive value of a player who creates 100 runs while making 400 outs.” This is either true or not true, depending on what you mean by “value”. The first player has one-half the run value of the second player; 50/100 = 1/2, a mathematical fact. The first player will not have one-half the value of the second player if they are compared to some other standard. From zero, i.e. zero RC, one is valued at 50 and one is valued at 100.
By using team absolute wins as the unit to be split up, James implies that zero is the value line in win shares. Anyone who creates a run has done something to help the team win. It may be very small, but he has contributed more wins then zero. Wins above zero are useless in a rating system; you need wins and losses to evaluate something. If I told you one pitcher won 20 and the other won 18, what can you do? I guess you assume the guy who won 20 was more valuable. But what if he was 20-9, and the other guy was 18-5?
You can’t rate players on wins alone. You must have losses, or games. The problem with Win Shares is that they are neither wins nor wins above some baseline. They are wins above some very small baseline, re-scaled against team wins. If you want to evaluate WS against some baseline, you will have to jump through all sorts of hoops because you first must determine what a performance at that baseline will imply in win shares. Sabermetricians commonly use a .350 OW%, about 73% of the average runs/out, as the replacement level for a batter. A 73% batter though will not get 73% as many win shares as an average player. He will get less then that, because only 21%(73% - 52%) of his runs went to win shares, while for an average player it was 48%. So maybe he will get .21/.48 = 44%. I’m not sure, because I don’t jump through hoops.
Bill could use his system, and get Loss Shares, and have the whole thing balance out all right in the end. But to do it, you would have to accept negative loss shares for some players, just as you would have to accept negative win shares for some players. Since there are few players who get negative wins, and they rarely have much playing time, you can ignore them and get away with it for the most part. But in the James system, you could not just wipe out all of the negative loss shares. Any hitter who performed at greater then 152% of the league average would wind up with them, and there are (relatively) a lot hitters who create seven runs a game.
James writes in the book that with Win Shares, he has recognized that Pete Palmer was right after all in saying that using linear methods to evaluate players would result in only “limited distortions”. And it’s true that a linear method involves distortions, because when you add a player to a team, he changes the linear weights of the team. This is why Theoretical Team approaches are sometimes used. But the difference between the Palmer system and the James system is that Palmer takes one member of the team, isolates him, and evaluates him. James takes the entire team.
So while individual players vary far more in their performance then teams, they are still just a part of the team. Barry Bonds changes the linear weight values of his team, no doubt; but the difference might only be five or ten runs. Significant? Yes. Crippling to the system? Probably not. But when you take a team, particularly an unusually good or bad team, and use a linear method on the entire team, you have much bigger distortions.
Take the 1962 Mets. They scored 617 and allowed 948, in a league where the average was 726. Win Shares’ W% estimator tells me they should be (617-948+726)/(2*726) = .272. Pythagorus tells us they should be .304. That’s a difference of 5 wins. WS proceeds as if this team will win 5 less games then it probably will. Bonds’ LW estimate may be off by 1 win, but that is for him only. It does not distort the rest of the players (they cause their own smaller distortions themselves, but the error does not compound). Win Shares takes the linear distortion and thrusts it onto the whole team.
Finally, the defensive margin of 152% corresponds to a W% of about .300, compared to .213 for the offense. The only possible cutoffs which would produce equal percentages are .618/1.618 (the Fibonacci number). That is not to say that they are right, because Bill is trying to make margins that work out in a linear system, but we like to think of 2 runs and 5 allowed as being equal to the complement of 5 runs and 2 allowed. In Win Shares, this is not the case. And it could be another reason why pitchers seem to rate too low with respect to batters (and our expectations).
134. Mark Shirk (jsch)
Posted: January 06, 2006 at 03:31 AM (#1808706)
KJOK,
Couldn't you argue that a team that scores 50% of the league average number of runs but with league average defense and pitching would win 20% of its games based on their defense and pitching? Isn't that where their WS would go?
What would the winning percentage be if a team scores 52% of all runs (isn't it 48%?) and allowed 152% of league average. I know it woudl be greater than zero, but then again, are their numbers that can actually get a team to 0 wins in a pythagorean system?
And how does one calculate loss shares? Couldnt' it be argued taht a loss is just the absence of a win? You lose if you aren't doing the things (scoring runs, preventing runs) that it takes to win.
135. Mark Shirk (jsch)
Posted: January 06, 2006 at 03:37 AM (#1808712)
Also, I found this exchange in a clay davenport chat...
pkw (Indy, IN): With a new book out this spring, I presume we'll have to wait until at least Fall '06 for the WARP Encyclopedia to come to bookshelves. Barring that, would it be possible for a series of "basics" articles showing how WARP is calculated, the whys and whynots all explained, Win Shares-style? Why FRAR is used instead of FRAA is one question I like to see discussed. Thanks for all the great work.
Clay Davenport: If you used FRAA, then an average SS and an average 1B would have an equal rating, zero. You would need to introduce a positional adjustment, which most people calculate by using the average batting performance at a position.
I really, really don't like the idea of using batting performance to measure a fielding performance. However, assuming reasonably intelligent management, the difference in offensive level between positions should be roughly equal to the defensive difference between positions. If it wasn't - if everybody overstated the fielding value of a shortstop, for instance - the a team who used a better-hitting, poor-fielding SS would gain an advantage. Assuming the advanatge led to wins, everybody would copy them (because even an assumption of reasonable intelligence leaves us at the monkey-see monkey-do level) and the difference in fielding would decline. Anyway, FRAR essentially mimics using FRAA + fielding adjustment, but only uses fielding stats to do it.
I think it is reasonable to treat each position on the field as being roughly equal in importance, and FRAR is the vehicle I use to make it so.
I am not sure we really looked at this topic from the above angle. Though th epossibility of a WARP book is pretty exciting.
I asked him about how really low replacement levels for defense and this was his response...
jschmeagol (new york, ny): Hey Clay, I have a WARP question. How exactly do you find replacement level for defense. The reason that I ask is that it seems to be really really low. For instance, over the course of Max Carey's career the difference between FRAA and FRAR is larger than the difference between BRAA and BRAR. This doesn't seem possible, but I would think that you have a godo reason for it. Can you elaborate? Thanks
Clay Davenport: Replacement level for defense primarily depends on how many balls get hit to a given position, and what happens to them when they get there. Generally speaking, more balls in play = more FRAR for all positions, which is a lot of what's going on for Max Carey and other deadball era players. There weren't many homers, there weren't many walks or striekouts, there were lots of errors, although not as many as a generation earlier. All of those tilt the share of total runs from the pitchers to the fielders, and it enhances the FRAR.
The reason it so low ties in with this question -
The last line leads into the first question.
I also want to point out that while Davenport hasn't really put his system up for scrutiny in the way that Bill James was with WS, he does seem very open to answering emails and always answers a few WARP questions in his chats.
136. Mark Shirk (jsch)
Posted: January 06, 2006 at 03:41 AM (#1808713)
I am going to repost this so that it isn't so confusing.
Also, I found this exchange in a clay davenport chat...
"pkw (Indy, IN): With a new book out this spring, I presume we'll have to wait until at least Fall '06 for the WARP Encyclopedia to come to bookshelves. Barring that, would it be possible for a series of "basics" articles showing how WARP is calculated, the whys and whynots all explained, Win Shares-style? Why FRAR is used instead of FRAA is one question I like to see discussed. Thanks for all the great work.
Clay Davenport: If you used FRAA, then an average SS and an average 1B would have an equal rating, zero. You would need to introduce a positional adjustment, which most people calculate by using the average batting performance at a position.
I really, really don't like the idea of using batting performance to measure a fielding performance. However, assuming reasonably intelligent management, the difference in offensive level between positions should be roughly equal to the defensive difference between positions. If it wasn't - if everybody overstated the fielding value of a shortstop, for instance - the a team who used a better-hitting, poor-fielding SS would gain an advantage. Assuming the advanatge led to wins, everybody would copy them (because even an assumption of reasonable intelligence leaves us at the monkey-see monkey-do level) and the difference in fielding would decline. Anyway, FRAR essentially mimics using FRAA + fielding adjustment, but only uses fielding stats to do it.
I think it is reasonable to treat each position on the field as being roughly equal in importance, and FRAR is the vehicle I use to make it so. "
I am not sure we really looked at this topic from the above angle. Though th epossibility of a WARP book is pretty exciting.
I asked him about how really low replacement levels for defense and this was his response...
"jschmeagol (new york, ny): Hey Clay, I have a WARP question. How exactly do you find replacement level for defense. The reason that I ask is that it seems to be really really low. For instance, over the course of Max Carey's career the difference between FRAA and FRAR is larger than the difference between BRAA and BRAR. This doesn't seem possible, but I would think that you have a godo reason for it. Can you elaborate? Thanks
Clay Davenport: Replacement level for defense primarily depends on how many balls get hit to a given position, and what happens to them when they get there. Generally speaking, more balls in play = more FRAR for all positions, which is a lot of what's going on for Max Carey and other deadball era players. There weren't many homers, there weren't many walks or striekouts, there were lots of errors, although not as many as a generation earlier. All of those tilt the share of total runs from the pitchers to the fielders, and it enhances the FRAR.
The reason it so low ties in with this question - "
The last line leads into the first question.
I also want to point out that while Davenport hasn't really put his system up for scrutiny in the way that Bill James was with WS, he does seem very open to answering emails and always answers a few WARP questions in his chats.
Hopefully that works. I don't know how to use the bold or italics or anything like that. The parts in quatiations are from the chat, the rest are my comments.
137. Brent
Posted: January 06, 2006 at 03:54 AM (#1808721)
Frankly I've been unimpressed by USPatriot's critique. It seems to me that many or most of his criticisms are nitpicking, and others (such as the absence of loss shares) were already acknowledged by James in his book--and apparently will be addressed in his future work.
James didn't claim his system was perfect. Quoting from p. 2 of his book, he says "If one player in this system is credited with 20 Win Shares and another with 18, we can state with a fair degree of confidence that the one player has contributed more to his team than the other...not that we are always right; there will always be anomalies and there will always be limitations to the data, but I would be confident that we had it right a high percentage of the time."
I think WS generally meets that standard. It has its faults. I certainly wouldn't recommend using it alone or without checking the data or questioning its results if they seem anomalous, but in general I think it does a very good job of bringing together lots of information on batting (including pieces that are missing from OPS, such as double plays), fielding (much more sophisticated than anything available 10 or 15 years ago), and pitching and boiling them down to a meaningful integer.
"Couldn't you argue that a team that scores 50% of the league average number of runs but with league average defense and pitching would win 20% of its games based on their defense and pitching? Isn't that where their WS would go?"
Bingo, I agree with that 100%.
Also, assuming a normal run environment of 3.5 to 5 runs a game, for example . . . A player that creates 1 run does not necessarily contribute to winning, from the 'net' perspective. If that player also created 27 outs, he contributed much more to the losses than the wins, to the point where his net win contribution is less than zero.
"What would the winning percentage be if a team scores 52% of all runs (isn't it 48%?) and allowed 152% of league average. I know it woudl be greater than zero, but then again, are their numbers that can actually get a team to 0 wins in a pythagorean system?"
That team would win 11% of it's games, using PythaganPat and a run environment of 5 R/G. The team would score 2.6 R/G and allow 7.6. Real teams rarely vary by more than 75-125% of the league. The worst of both worlds would be a team that scored 562 runs and allowed 938 in a 750 run environment. That team would play .283 ball, or win 46 games.
140. Dr. Chaleeko
Posted: January 06, 2006 at 02:08 PM (#1809006)
To nitpick Patriot's nitpicks....
I think the key point that James makes is that extreme teams do little to help us understand the other 99.99% of teams. The 1898 Spiders, the 1915 Athletics, and the 1962 Mets are such utter anomalies that they tell us almost nothing.
Do I want a system that gets it all right? Sure. But my preference is for a system that hits 98% of the time and accepts the distortions at the left end of the curve rather above having no system at all.
One thing I would point out, however, is that NA Win Shares don't work for me because the run environment is so weird. When I figured them for a few of the NA Boston teams (replicating Chris Cobb's work), I realized the run environment is so extreme that it leaves several regulars looking like non-contributors.
141. Chris Cobb
Posted: January 06, 2006 at 04:31 PM (#1809168)
Do I want a system that gets it all right? Sure. But my preference is for a system that hits 98% of the time and accepts the distortions at the left end of the curve rather above having no system at all.
I use win shares, but the problems of extreme teams shouldn't be minimized. The distortions may be worse at the left end of the curve due to the impact of the zero point, but there are also distortions on the right end of the curve as well.
142. andrew siegel
Posted: January 06, 2006 at 06:32 PM (#1809358)
I thought the biggest problem pointed out in Partiot's series was the zeroing out of offensive win shares for nonpitchers who produce negative marginal offensive runs. (For pitchers those runs are subtracted from the numbers used to calculate their pitching WS, which is a problem in and of itself but a more minor one.)
As I understand the argument, by giving zero offensive WS (rather than a negative numbers) to players who produce below the marginal run cutoff you are artifically reducing the amount of offensive WS available for all the players who produced positive marginal runs. (If I understand their adjustments correctly, the Hardball Times folks adjust for this in their present-day WS calculations.)
One question that we should probably look at empirically is whether this distortion is the main reason why hitters on poor teams seem to be earning less WS than hitters with similar performance levels on good teams (e.g., Medwick outperforming Sisler in their best seasons, Mediwck outpeforming Johnson over their careers). It stands to reason that there is a correlation between teams that are bad and teams that give AB to sub-marginal hitters.
143. KJOK
Posted: January 06, 2006 at 09:57 PM (#1809676)
Do I want a system that gets it all right? Sure. But my preference is for a system that hits 98% of the time and accepts the distortions at the left end of the curve rather above having no system at all.
I use win shares, but the problems of extreme teams shouldn't be minimized. The distortions may be worse at the left end of the curve due to the impact of the zero point, but there are also distortions on the right end of the curve as well.
I think the "problem" is that Winshares is just not slightly innacurate due to extreme teams - there are a whole series of "small" inaccuracies that, added together, make the system as a whole inferior to many other ways to measure "value."
James didn't claim his system was perfect. Quoting from p. 2 of his book, he says "If one player in this system is credited with 20 Win Shares and another with 18, we can state with a fair degree of confidence that the one player has contributed more to his team than the other...not that we are always right; there will always be anomalies and there will always be limitations to the data, but I would be confident that we had it right a high percentage of the time."
I don't see this as much of a defense. I mean, I could use RBI's to measure hitters, wins to measure pitchers, say 'my system's not perfect but my system would be right a high percentage of the time.' I would not recommend using such a system...
144. DavidFoss
Posted: January 06, 2006 at 10:28 PM (#1809737)
I thought the biggest problem pointed out in Partiot's series was the zeroing out of offensive win shares for nonpitchers who produce negative marginal offensive runs. (For pitchers those runs are subtracted from the numbers used to calculate their pitching WS, which is a problem in and of itself but a more minor one.)
I thought it was bad when pitchers were zero-ed out as well. Is the team's offense/defense split done before these corrections are done? Does that mean that a horribly bad-hitting pitcher will take bWS away from his non-pitching teammates and give them to his better-hitting pitching teammates as pWS?
Plus it complicates the DH-league/non-DH league situation. I'm not exactly sure who is favored there exactly. One one hand, NL batters are lowered due to this zero-ing out issue, on the other hand they're raised up because its effectively eight line-up slots competing for bWS instead of nine.
145. Mark Shirk (jsch)
Posted: January 06, 2006 at 11:21 PM (#1809823)
A few questions...
How many pitchers are below the zero win level offensively?
Can you really be a negaitve amount of wins? I guess you can but at the same time you either win the game or you don't, you can't lose games already won.
Doesn't James sy that AL hitters are screwed because of the DH and that he can't really do anything about it?
If that last one is correct, it most likely is, then how should we adjust WS so as to not penalize AL players? Should we even do this? I mean offense is less valuable in the AL because there are nine hitters as opposed to 8.
146. DavidFoss
Posted: January 07, 2006 at 12:23 AM (#1809886)
How many pitchers are below the zero win level offensively?
Most of them.
Using the Hardball Times's modified calculations...
They list 126 pitchers in the 2005 NL.
11 are above 0.0
11 are equal to 0.0
102 are below 0.0
147. DavidFoss
Posted: January 07, 2006 at 12:28 AM (#1809893)
Can you really be a negaitve amount of wins? I guess you can but at the same time you either win the game or you don't, you can't lose games already won.
No one wins games by themselves. Win Shares is just divvying out a teams wins among the its players. A negative number implies you are cancelling out someone else's positive contribution.
148. Paul Wendt
Posted: January 08, 2006 at 04:52 PM (#1811266)
Cblau #124 If you look at all NL CFs in 1917, you'll find that Edd Roush was the worst, with a fielding rate of 87. He was still well above replacement. His main backup had a rate of 91. Most players with very few games at a position get a rate of 100. But I'm wondering, if Roush is well above replacement, and he's the worst CF, who exactly is this replacement level player who would fill in for him?
"Most players with very few games at a position get a rate of 100."
precisely 100? What about players with few games but not "very few"? Is it possible that the measure involves derivation from per-game or per-inning defensive data plus "regression" (the wrong term here) toward 100 based on playing time?
149. Paul Wendt
Posted: January 08, 2006 at 05:00 PM (#1811276)
One question that we should probably look at empirically is whether this distortion is the main reason why hitters on poor teams seem to be earning less WS than hitters with similar performance levels on good teams (e.g., Medwick outperforming Sisler in their best seasons, Medwick outpeforming Johnson over their careers). It stands to reason that there is a correlation between teams that are bad and teams that give AB to sub-marginal hitters.
I don't believe it can be the main reason for the phenomenon (granted for the sake of argument). The correlation between team quality and pitcher batting quality must be low. If statistically significant, I guess it is baseballistically insignificant. But it's worth checking, if anyone has data in the right format.
150. Patrick W
Posted: January 08, 2006 at 05:23 PM (#1811303)
jschmeagol,
I thought it was great that 2 of the first 3 questions in Clay's chat were from the HOM group.
To me, his answer to my question indicates how much WARP and WS have in common at the big picture level. Hitters are hitters when they're at bat, not shortstops or right fielders. The added value for being a better defensive player should stay on the defensive side of the ledger.
I would be kinda curious how closely BRARP+FRAA corresponds to BRAR+FRAR, but those who advocate for FRAA should have the same problems with WS that they do with WARP. BP's replacement level might be too low, but average is not the answer.
151. Paul Wendt
Posted: January 09, 2006 at 05:59 PM (#1812655)
To me, his answer to my question indicates how much WARP and WS have in common at the big picture level.
For a while after Bill James presented Win Shares (July 2001?), I thought of it and described it as fundamentally different from other measures in that the players on each team are assigned positive scores that sum to the number of team wins. The name "win shares" aptly summarizes those two features and the first, positive scores, fits the chief prior criticism of the Total Baseball Ratings, focusing on its zero sum (for all players each league season).
When I read the book, I was surprised (shouldn't have been) to learn that the second feature, the sum to number of wins (for all players on each team season), is superficial: a late step in calculation and easy to undo, except for rounding to integers.
152. KJOK
Posted: January 09, 2006 at 08:27 PM (#1812925)
153. jimd
Posted: January 09, 2006 at 11:21 PM (#1813299)
I think it is reasonable to treat each position on the field as being roughly equal in importance -- Quote from Clay Davenport
It is obvious from the results that this is a major problem with Win Shares.
I've brought this up before (see the SS thread. There is a discussion (around post 86 and thereabouts) of Bobby Wallace and the relative distribution of "All-Stars" (both WARP and WS) during his career.
If Win Shares was fair to each of the 8 everyday positions, there would be an approximately equal amount of value created at each position (pitchers excepted), if one totaled it over a long enough period of time so that local talent gluts even out. The OF/IF imbalance during the Deadball era is particularly dramatic, but it appears to be present in other eras also. I have seen no evidence of a corresponding era where IFers (2B,3B,SS) receive more of the total value than the OFers.
Therefore it appears that a strong bias towards OFers over IFers is built into the Win Shares system. There are two places where this could occur. Either the fielding intrinsic weights are wrong -- too many fielding WS go to OFers, not enough to IFers. Or the offensive/defensive balance is wrong -- too many Win Shares go to offense, not enough to defense, resulting in too many Win Shares being awarded to the hitting end of the defensive spectrum. Either situation will result in too many OF "All-Stars".
154. Paul Wendt
Posted: January 18, 2006 at 10:19 PM (#1827420)
David Foss on two occasions in 1968 Ballot: Its good practice for the future, but for now I give a heads up to everyone that the entire backlog needs their seasons adjusted to 162.
. . .
The biggest obstacle is to remember to do it at all. I mean, its just so quick and easy to look at the career total for career and line up seasonal WS totals to look at peak. Its those seasonal line-ups I'm most worried about. A single WS difference between each season on those lines often makes a player look quite a bit better.
Not only Win Shares but all counting rather than rate statistics. Not only season stats but career stats.
Career Win Shares per 162 games is in print, organized rather conveniently for the HOM project, in one or two books by Bill James. But that is unreliable. Hard as it is to believe in this day, HOMeboys have discovered numerous arithmetic errors.
I hope that that is not worth saying. Comparison of rates per 162 is technically pertinent regardless of the length of the season, and I guess everyone here knows that. But I fear that the virtual transition to a 162-game era makes those published rates a more attractive nuisance.
155. KJOK
Posted: January 18, 2006 at 11:27 PM (#1827561)
Cyril Morong has correct Win Share rate stats on his website under "All Time Rankings" heading:
156. jimd
Posted: March 15, 2006 at 07:16 PM (#1899749)
Putting this hear where it belongs...
A note on error-percentage. The error rating in Win Shares is ratio-based. Commit twice as many errors as the average and you get no credit; no errors and you get full credit. It's grading on a curve, with little or no relationship to the number of runs prevented or allowed.
To me, this makes as much sense as rating HR's on a ratio. Then Tommy Leach's 6 HR's in 1902 is more impressive than Bond's 73 HR's because the avg position player in 1902 hit about 1 HR, while the avg position player in modern times hits about 20. Ty Cobb might think this is a good way to rate HR's but I think most of us would reject any offensive rating system like that out-of-hand.
IMO, the fielding ratings in Win Shares appear to attempt to translate the stats into a modern setting and evaluate them on a modern scale. They do NOT appear to be evaluating them in the context of the game that was actually being played then.
157. KJOK
Posted: April 21, 2006 at 09:54 PM (#1979980)
Another pretty good "HOM type" article primarily using JAWS:
158. sunnyday2
Posted: April 21, 2006 at 10:17 PM (#1980021)
Re. the JAWS article: With all due respect, is this meant to be a joke?
>I used the third version of WARP, which is WARP3. This version is adjusted for difficulty and for playing time, so it levels the playing field for different eras.
The WARP3 translation is quite explicitly meant NOT to level the playing field, but is meant to make sure that modern players come out ahead as "we" intuitively think they should.
Or I'm missing something.
159. Mark Shirk (jsch)
Posted: April 22, 2006 at 01:30 AM (#1980560)
I think you are being a little harsh on Clay Davenport there sunny. He is adjusting to be fair to all eras, with respect to player pools, etc. You may think he goes a little far, or way too far, but he isn't merely trying to make modern players look better just because. There are statistical reasons for the numbers he comes up with.
160. sunnyday2
Posted: April 22, 2006 at 03:24 AM (#1981160)
But they are based on the presumption (a priori) that players today are better. If that's because of "the pool," so be it. It's still a priori. The results speak for themselves. If we elected the top 225 WARP3 players of all-time it would not be a HoM, it would be a HoS (skills).
161. TomH
Posted: April 23, 2006 at 09:09 PM (#1984254)
Win Shares, W.R.T. great teams/lousy teams:
Systems like WARP and OWP and linear weights assume standard rules: 1 out into a hit = .74 runs = .74 Wins.
For Win Shares, this works for a .500 club. For a team that wins 70% of the time, no matter how good you are, you can't generate many extra wins, and so when they are apportioned out, the great players get shafted, by the principle of diminshing returns.
For a losing club, it's tough to raise the bar (too many more runs needed to make another win), so again these players are underrated.
So, theoretcially, I conclude it is NOT true that players are great teams are overrated by WS. Our perception of such might be such, BECAUSE most single-season teams that won many games did so in part by being 'lucky' (win clsoe games), and in the WS system, these wins are credited to the players.
Zat make sense?
162. Mark Shirk (jsch)
Posted: April 23, 2006 at 09:49 PM (#1984314)
Tom,
The argument is that players on great teams get a benefit in WS because they don't have to play teams as good as themselves. I think this goes a little far. I think that playing LF on a team that has great pitching and defense may give you a lift because you don't have to face your pitching and defense. I dont think it matters to you if you are a LFer on a great team and the opposite team has Jimmie Foxx or George Burns (the 1B) hitting cleanup because you don't play agains the other team's lineup, you play against their pitching and defense. The reverse would be true for pitchers.
163. Chris Cobb
Posted: April 24, 2006 at 01:35 AM (#1984564)
I think this goes a little far. I think that playing LF on a team that has great pitching and defense may give you a lift because you don't have to face your pitching and defense.
There will be more win shares to go around to everyone. And any team that is great enough for their win shares to be inflated by their not having to play themselves is going to be well above average on both offense and defense.
The principle of diminishing returns kicks at a higher level of team performance than the excess win shares for not having to play yourself. That starts to show up in an 8-team league, IIRC, at about .630. The effect isn't large at that level, but it is noticeable.
164. rawagman
Posted: April 24, 2006 at 10:07 AM (#1984960)
Maybe someone can help explain this to me. Does the WS system (and if so, how) account for the fact that a very good player on a lousy team (think Mike Sweeney on the Royals the last few years) get notice for the fact that he did what he did without any noticeable lineup protection? Does it factor that if he had 2,3 other decent bats in his own lineup, he wouldn't be pitched around as much.
Conversly, how about someone like any of the Yankees recent first basemen? How much were Tino';s stats recently helped by hitting with guys like Sheff, Bernie (until recently) Jeter, etc...?
165. TomH
Posted: April 24, 2006 at 01:37 PM (#1985019)
WS does not account for the good-player-on-lousy-team situation mentioned above.
However, I'm not sure it should. Most studies on the value of 'protection' concluded that there is little noticable effect on the hitter's quality. Yes, it may affect hi sstats in that he gets more walks and fewer HR without any good bat behind him, but overall his contributions remain the same (see Babe Ruth, before and after Lou Gehrig arrives).
As to WS and good teas/bad teams, yes, WS does not account for not having to face your own teammates. But this is certainly not a unique issue to WS; many metrics 'suffer' from the same issue.
Guys like Bob Johnson are hurt by using Win Sahres because they have TWO effects against them; diminishing returns (bad team, needs more runs to make a win) and poor teammates (doesn't get to face his own pitching).
166. DavidFoss
Posted: April 24, 2006 at 02:16 PM (#1985086)
Guys like Bob Johnson are hurt by using Win Sahres because they have TWO effects against them; diminishing returns (bad team, needs more runs to make a win)
When does the diminishing returns kick in on the low end? My back-of-the-envelope guess from taking the derivative of the OWP formula shows a *maximum* at around 0.250, which is pretty darn low. That implies that a great player can help a bad team easier than he can help a good team. Only when the team becomes quite brutal, does the effect of one player start returning to zero (sharply so, I'll admit). Anyone look at this effect with the real Win Shares calculation? Is my guess way too low?
and poor teammates (doesn't get to face his own pitching).
This effect is mentioned all the time, but it only exists indirectly in the Win Shares calculation. For the most part with Win Shares, you are competing against your own teammates for value.
The first place where other teams come into play are in the total Win Shares available -- the teams overall record. There has been some talk of needing to 11 wins and 11 losses to each team and rescaling back to 154 to get some sort of adjusted W/L record (using 154 G season example here). I don't know if that's a valid tweak or not, but I've seen it mentioned here several times
The second place where other teams come into play is the Park Factor. Win Shares uses the straight PF, not the BPF or PPF that was designed to account for not facing your own teammates. This affects the way that Win Shares splits between offense & defense and any discrepancy would only come into play when the team in question is very unbalanced (e.g. bad-pitching/decent-hitting) and then it will have the effect of dampening some of that unbalance. A decent offense with a crummy staff will already get a larger segment of the Win Shares, but by not having to face that crummy staff their context is going a bit off and they'll get a slightly smaller cut of WS (but still certainly larger). If a team has offense/defense that is equally bad (or good), then the effect of Park Factors on this split goes away.
After that, I believe its all competition within your own team for value.
167. Mark Shirk (jsch)
Posted: April 24, 2006 at 02:22 PM (#1985103)
While there may be competition 'within your own team' for value, the better you play the more wins you should get. So good teams aren't competing as much as adding extra WS to be divvied (spelling?) up between them.
168. TomH
Posted: April 24, 2006 at 03:16 PM (#1985213)
David, my bad. You are right - I had mistakenly calculated the effect on a bad team by assuming that only the pitching was bad (more runs allowed), in which case then of course it will take more runs to make a win. If you assume half-and-half (or maybe find the derivative of OWP to WS, as you seem to have done), then it is slightly Easier to create an extra win on a bad team, although not hugely so.
As to Win Shares using the 'unscaled' PF, I'd have to think more about that one. While you want to calculate the effects of not facing your own teammates if you wish to translate stats into nomralixed stats, I'm not positive that by using team Wins this doesn't already do some of that - you ave a great team, you get to not face your mates, but then it becomes harder to generate an extra win since you are already at a WPCT where it takes more runs to get a marginal win.... need a study that I shan't take time to do right now.
169. Chris Cobb
Posted: May 11, 2006 at 03:23 AM (#2014270)
Since WARP fielding has come up on the Dobie Moore thread, I'm going to pose a set of questions I've been meaning to ask jimd for a while that I think could use some general discussion. I'll crosspost it to the WARP v Win Shares thread, too. It concerns WARP1's calculation of fielding runs above replacement level in relation to fielding runs above average by position. The question is: how should one attempt to reconcile WARP's representation of period-specific fielding spectra with the evidence of the shifting offensive production at defensive positions and/or with other views of period-specific fielding spectra (win shares) that receive fairly wide acceptance?
Background:
WARP generates FRAR by taking FRAA and adding an amount of FRAR per game that is fixed for each position. This amount changes over time to reflect 1) shifts in defensive responsibility between pitchers and fielders and 2) shifts in defensive responsibility between positions.
For example, in 1895, an average defensive player at each position (FRAA = 0) would receive something very near to the following FRAR for a full season (132 games):
P - 9 (obviously no pitcher would play 132 games, but this shows the fielding importance of the position relative to other positions)
C - 44
1B - 16
2B - 43
3B - 32
SS - 47
LF - 30
CF - 31
RF - 14
In 1965 the FRAR for an average defensive player at each position for a 162 game season would be very near to these amounts
P - 8
C - 29
1B - 13
2B - 33
3B -23
SS - 34
LF - 17
CF - 25
RF - 15
This readiness to shift fielding value around is one of WARP's potential points of superiority to win shares, which sticks to a constant set of "inherent weights" to distribute fielding value among the positions. The one change James acknowledges in the defensive spectrum involves the 2B and 3B, which he sees as switching places on the defensive spectrum. The "inherent weights" in the fielding win share system are
C - 19%
1B - 6%
2B - 16%
3B - 12%
SS - 18%
OF - 29% (James treats OF as one position and then uses the distribution of win shares among individual players to sort out the relative value of each outfield position, but we can estimate that CF will typically land between 2B and 3B and that LF and RF will fall between 3B and 1B).
Before 1920, the weights for 2B and 3B are reversed.
WARP shows us a shifting defensive spectrum over the history of the game, where win shares does not.
However, jimd's study of average OPS+ by position also suggests a shifting defensive spectrum over the history of the game, one in which the shifts are rather different from the ones WARP presents. I'll reproduce his famous table once again:
The premise here is that the defensive importance of a position is suggested by the amount of offense the management is willing to give up at a position in order to play a competent defender there.
Avoiding, for the moment, any question of the overall weight given to fielding value in any system, let me line up the defensive spectrum for the 1890s and the 1960s as represented by WARP, WS, and the OPS+ study. (Pitchers will be left out.)
1890s
W1 -- SS C 2B 3B CF LF 1B RF (Top three spots are is 3+ times more valuable than bottom spot, 3B is twice as valuable as 1B)
WS -- C SS 3B CF 2B LF/RF 1B (Top 3 spots are 2 2/3+ times more valuable than bottom spot, 2B is twice as valuable as 1B)
OPS+ -- C SS/2B 3B 1B CF/RF LF (Top 3 spots have OPS+ below avg., 3B is 0, 1B-LF +6 to +10)
1960s
W1 -- SS 2B C CF 3B LF RF 1B (Top three spots are 2.2+ times more valuable than bottom spot, 3B & CF are almost twice as valuable as 1B)
WS -- C SS 2B CF 3B LF/RF 1B (Top 3 spots are 2 2/3+ times more valuable than bottom spot, 3B is twice as valuable as 1B)
OPS+ -- SS 2B C 3B CF LF RF/1B (Top 3 spots have OPS+ below avg., 3B is +4, RF/!B +11)
Parallel representations of the proposed defensive spectrums for other decades would show different discrepancies. One I am especially concerned about right now is 2B/3B pre 1930. OPS+ has these two positions always close in average offense, and they shift back and forth as to which is slightly higher or lower decade to decade, but WARP _always_ gives second base more defensive value. The treatment of pre-1930 first base is also a fraught issue, as is the relative importance of infield vs. outfield positions.
The big questions:
Where there are disagreements in these lists, which assessment should one accept, and why?
If one wants to use WARP or win shares, but trusts the OPS+ assessment more, how might one adjust the results of these systems?
If WARP's calculation of FRAA is run-based, is their estimate of FRAR also run-based, or is it offense-based (like the OPS+ study), or theoretical (like win shares)? If it is offense-based, what measure does it use and how does it get from offensive value to defensive value? If it is theoretical, what is the theory?
170. sunnyday2
Posted: May 11, 2006 at 12:13 PM (#2014383)
To my way of thinking, any system that (whether because of an assumption or a finding), any system that rates an entire class of players as better than another entire class of players is of very little help in constructing a HoM that is fair to all eras--IOW if all of the SSs of one era are rated more highly than all of the SSs of another era (the worst of one era being better than the best of another). I don't know if WARP3 is quite this extreme but it is pretty much a foregone conclusion that most players today are "better" than most players of previous eras. The day Bert Campaneris (let's say) was born, he was already pretty much predestined to be better than Herman Long, pretty much no matter what.
171. TomH
Posted: May 11, 2006 at 12:35 PM (#2014389)
I agree with sunny's first sentrence. However, we shouldn't lump WARP3 into this 'extreme' category, any more than we should label congressmen extremists even if they happen to be members of the "other" party. The simplest of evidence: the hitter and pitcher with the highest career WARP3 totals both played prior to 1940.
172. sunnyday2
Posted: May 11, 2006 at 12:47 PM (#2014396)
Well, I almost got my whole post into one sentence...
173. Chris Cobb
Posted: May 11, 2006 at 02:16 PM (#2014446)
Because _value_ is contextual, even though _merit_ is not entirely contextual, I don't believe that any comprehensive metric can produce a completely fair way of comparing the merit of players from different eras purely on the basis of value.
For that reason, I don't use WARP3, although I do look at the competition-strength adjustments in WARP2. For that reason, my ranking system always compares a player first to his contemporaries and only second to all other eligible players. What I care about in a comprehensive metric, therefore, is the extent to which it gives an accurate representation of value in context.
On the subject of the defensive spectrum, I have available to me three different views of the value of the defensive positions in context--WARP1, win shares, and OPS+ by position, and I am looking for reasons to accept or modify these views and their results.
I have been using win shares, with modifications but in to adjust the pitching/fielding division of defensive value, and I think that system has worked pretty well. But as we are having to make finer distinctions in the backlog, I am concerned, as in the case of Sisler/Beckley and Elliott/Boyer, that errors in the treatment, not of fielding as a whole but of the value of particular positions at particular times in the history of the game, may to the overrating or underrating of particular players. Win shares argues that the defensive spectrum has changed very little over time, but the OPS+ study argues otherwise. I know that WARP1 is designed to be more flexible on this point than win shares, but some study of WARP1 suggests that its representation of the defensive spectrum, although it is changing, does not agree with the findings of the OPS+ study.
I don't treat the OPS+ study as gospel, as its findings are influenced by the (variable) level of talent available at a position during a given decade, but still, overall, it tells a different story about the defensive spectrum than WARP does. I trust the OPS+ story as having significant validity, however, because I know how it is grounded in actual data. I'd like to know whether the WARP story is as well. If it is really grounded in the data, I could accept WARP1's results and weigh them equally win win shares'. Or, I could modify WARP's fielding assessments to fit more closely with OPS+, but that would be a lot of work, so I don't want to start in on that project unless I have a clear sense that it is warranted. Or I could just stick with win shares exclusively, and try to find ways to fine-tune its handling of the defensive spectrum or just make subjective adjustments where I think they are needed. I see how I could adjust the fielding spectrum pretty neatly in WARP1, but for win shares I don't.
Any thoughts?
174. ronw
Posted: May 11, 2006 at 03:48 PM (#2014555)
Personally, I don't do more than eyeball either fielding metric. The inconsistencies and limitations with each lead me to believe that fielding analysis is still pretty rudimentary.
1. I start with batting win shares, and move on from there to EQA, WARP1, OPS+ to get a feel for the player as a hitter.
2. I factor in intangibles (missed wartime, racism, minor league credit) and lump fielding and MLEs in with this. For fielding, I look at both WARP and WS, but do not rely on them, as they often are very different.
3. I compare the player to his contemporaries.
4. I compare the player to the remaining eligibles at his position (or positions, for guys like Tommy Leach).
5. I rank the highest ranked players at the positions against each other, paying particular attention to the question (If the HOM ended without this guy in, how would I feel about that?)
My system started much more sabermetric-based and has now returned to much more subjective analysis.
175. TomH
Posted: May 11, 2006 at 03:52 PM (#2014563)
One issue with using OPS+ (or OWP) as a guideline for typical performance by position is that it is an average measure, which can be skewed by a few superstars. While the presence of 1930s 1Bmen (Gehrig/Greenberg/Foxx) did indeed change the value of playing 1B, I don't think it changed it to the level that jimd's table posits; because there were still teams trying to find replacement level first basemen. So Foxx's RCAP, for example, underrates his true value IMHO.
As is true of most things in life, judicious blending of multiple systems often lead to a better answer than total reliance on one.
176. Paul Wendt
Posted: May 11, 2006 at 04:55 PM (#2014657)
if he had 2,3 other decent bats in his own lineup, he wouldn't be pitched around as much.
Along same lines as TomH but in other words:
This is a big obstacle for Mike Sweeney, et al, using traditional sabermetrics such as runs scored and batted in or the the batting triple crown stats. But everyone with sabermetric sophistication to consider this point uses measures of batting or offense that count bases on balls (or another on-base measure) heavily.
--
TomH: As to Win Shares using the 'unscaled' PF, I'd have to think more about that one. While you want to calculate the effects of not facing your own teammates if you wish to translate stats into nomralixed stats, I'm not positive that by using team Wins this doesn't already do some of that - you ave a great team, you get to not face your mates, but then it becomes harder to generate an extra win since you are already at a WPCT where it takes more runs to get a marginal win.... need a study that I shan't take time to do right now.
TomH,
jimd(?) has done a relevant study, producing a gong that I and Chris Cobb, at least, have hammered upon. In my case, without assessing the relevant study closely, mainly because it confirms my prior judgments, guestimates, whatever.
Chris Cobb #63: There will be more win shares to go around to everyone. And any team that is great enough for their win shares to be inflated by their not having to play themselves is going to be well above average on both offense and defense.
The principle of diminishing returns kicks [in] at a higher level of team performance than the excess win shares for not having to play yourself. That starts to show up in an 8-team league, IIRC, at about .630. The effect isn't large at that level, but it is noticeable.
My memo to self says that this will be serious at about .700
(probably April? I don't date memos to self.)
jimd,
Is the famous table a result of your own study? Tom Ruane distributed a similar table to SABR-L in 1999 or so. I don't recall the basic measure of batting (OPS+ in your table) or the treatment of pitchers. I'll track it down if yours is distinct.
177. sunnyday2
Posted: May 11, 2006 at 05:19 PM (#2014696)
The problem I have with this analysis--not to discount it entirely, but mitigating it IMO--is the assumption tht offense and defense represent a zero-sum game, that a better hitter is necessarily a poorer fielder. I would argue that the best athletes can do both and usually end up at SS or CF.
Besides, even if the theory holds up in the aggregate, does it really describe the specific set of players (outliers) that are under consideration here?
178. Dr. Chaleeko
Posted: May 11, 2006 at 05:23 PM (#2014703)
The bolded positions are LF, CF, and SS, where it appears the rubber is really meeting the road. The other positions are moving gently toward the mean for the most part, but These three are moving more rapidly over the past thirty years (esp relative to where they started from). This makes sense, of course. The NL of the 1970s had zero HOM level SS (assuming Concepcion isn't going in) and the AL weren't much better, but MLB had some wonderful CF (Lynn, Otis, Murcer, Gorman!!!) and LFs (Brock, Rice, Luzinski, White, Foster et al plus bits of Stargell, Yaz, and Williams). The 1990s was not a good era for high-powered CFs in general (Griffey, Bernie Williams, the lesser Edmonds and Jones years, Lofton, Dykstra, Van Slyke, Lance Johnson, Lankford) but it had a wealth of outstanding SS. Its LF were pretty good but not as good as the 1970s generation (Belle, Alou, Greenwell, G Vaughn, Gant, Gilkey, Mack, the ends of Mitchell, Rickey, and Raines).
Anyway, so my larger question is what's it all mean? Is CF defense more important now? Might be since everyone swings for the fences. Does that mean DPs are less important? Or does it mean that baseball isn't typecasting its SS in the Aparacio/Concepcion mold anymore? And why would LF more, but not RF?
179. Chris Cobb
Posted: May 11, 2006 at 06:08 PM (#2014774)
Dr. Chaleeko,
The long-term trends you bring out here are the sort of thing that make me wonder. Which systems are accounting for these shifts? Should we pay attention to them? To interpret these changes starting in the 1970s, I think it's good to look at the longer view, going back even to the 1940s:
On LF vs. RF -- I don't think there is clear evidence that these positoins ar tracking significantly differently over the longer term. LF is higher in the 1940s and 1950s: Is that a Musial/Williams effect ? RF jumps ahead in the 1960s: is that Aaron, Robinson, Clemente, Kaline, et al? Until we have data for the 2000s, I don't think we'll be in a position to see the 1990s dip as a significant defensive divergence between the corners or not, although the fact that LF dropped despite having the best hitter of the decade playing there is suggestive.
On CF -- it sure looks like it is becoming a steadily more important defensive position. I'd guess that this trend was underway even as early as the 1950s, but it doesn't register because Willie, Mickey, and the Duke provide a counterbalancing spike in batting value for the position.
On SS -- It looks like shortstop was gradually becoming more important defensively, topping in the 1970s, but with the 1980s following the 1970s looks anomalously low. I'd guess a conjunction between defensive need and a temporary dearth of hitters. The 1980s weren't a less astroturfy decade than the 1970s and were not a big home run decade either, and the rebound of shortstop OPS+ then suggests that 1970s just lacked good shortstops. If SS OPS+ continues to rise for the 2000s, then I think we can have some confidence in a long-term, gradual decrease in defensive demands on shortstops starting in the 1980s. If the aughts drops back down, then the 1990s will appear as a temporary spike due to a confluence of great players at the position.
180. sunnyday2
Posted: May 11, 2006 at 06:13 PM (#2014779)
I think the numbers represent certain persons/personalities as much as they represent an aggregate, as has been said.
And, secondly, just as we think Gavvy Cravath should have been in the MLs more than he was (but the powers-that-be-at-the-time didn't) and just as (as Bill James says) there is no reason in the world why Ryne Sandberg didn't bat third and Mark Grace second (except raw prejudice), so too perhaps this part of baseball reality is affected by decisions that are made by people who didn't have the information nor the mindset that we have today. Doesn't make us right and them wrong, but it doesn't make them right either. So IOW maybe these numbers just reflect changing prejudices rather than changing realities.
IOW there were always guys like ARod and Vern Stephens and Ripken and Yount who could hit AND field SS better than the alternatives. It's just that through most of history the decision-makers couldn't believe their own eyes and they just had to have the slap hitter at SS and the big bat somewhere else.
181. sunnyday2
Posted: May 11, 2006 at 06:14 PM (#2014780)
IOW it is posited that fielding ability is the independent variable and hitting is dependent on that.
I'm saying prejudice is the independent variable, which would be another way of saying that the numbers reflect self-fulfilling prophecies.
182. TomH
Posted: May 11, 2006 at 07:23 PM (#2014892)
Marc, that one hurts my brain too much just contemplatin it
I posted this over on the 1977 ballot discussion, where I'm going through my new pitcher system. It's pretty relevant to this discussion, but I think we've been approaching the WS replacement level issue all wrong . . . here goes:
Yeah, good point, I didn't think I was making the replacement level that high - wow. 15 WS is nowhere near replacement level, I mean Joe Charboneau's rookie of the year season was 15 WS. Even if it was only 131 games, there's no way if he plays every day that season is only 3.5 WS above replacement.
Wait a minute - I'm not doing that. Setting pitching at a .385 WPct assumes an average offense. An team with an average offense and replacement level pitching would lose 100 games is what I'm setting the replacement level at.
I've always felt a full-time replacement player would get about 8 WS (7 with the DH) and a 220 IP pitcher at replacement level would get about 7 WS. I was trying to err to the side of not setting it too low - I should have thought it through under those terms. I know that shows pitching replacement being a little higher than position player replacement level but I don't think that's unreasonable.
Let me think that through again though. A true replacement level position player will hit at replacement level but field at average level.
An average team in a 162-game season: 116.6 offensive WS, 41.1 D and 85.3 P Win Shares. That gives you 19.7 WS for an average position player in 162 games in a non-DH league, 18.1 in a DH league and 12.9 for an average pitcher. Drop that to 18.7 for a 154 game season position player and 12.25 for a 154 game season average pitcher (that's over 209 IP). BTW as a side note that should dispell any myth that Jake Beckley wasn an average player - he was in the 20-27 WS prorated to 154 game seasons over his career, not in the teens and he wasn't playing every single game every year. Sorry for the digression, but it's an important point.
If the team remains average on offense and fielding, it would take 28.3 pitching WS to drop them to 100 losses. So I'm setting my replacement equivalent to 4.3 pWS over 220 IP being replacement level. That's probably too low.
If 7 WS per 220 IP is replacement level, then a team with average hitters and fielders would win 68 games with a replacement level staff. That would mean setting .420 as replacement level, assuming an average offense.
Now lets reverse it and see what a replacement level hitter would do to a team with average defense/pitching.
Setting replacement level hitters to where a team with average pitching and fielding loses 100 games, means 59.6 bWS. That sets hitters at 7.5 WS + 5.1 for fielding or 12.6 WS per 162 games - 11.8 in a DH league.
Bumping it up to the 68 win mark like we did for pitchers would make their replacement level 14.8 WS, or 13.8 were it a DH league. That's too high.
What to conclude from this - pitching replacement level - in terms of the record of a team with all replacements as pitchers and average everywhere else - is probably higher than position player replacement level - at least under the WS system.
There's no way pitching replacement level is as low as 4 WS/220 IP. And there is now way that position player replacement level is 15 per 162 games. Just look at some 15 WS position player seasons if you don't believe me. And take a look at how bad you have to be to get 4 WS in a 220 IP season.
There's a logical explanation for this apparent paradox, IMO. It's that WS only gives 1/3 of the credit (35.1% to be exact) to the pitchers. Combine that with the fact that no one at the major league level with any signficant time is a replacement level fielder AND hitter, and that's what you get.
It doesn't surprise me at all that to get their replacement levels equivalent (for a full time player or a 220 IP pitcher) on a per player level, a team with replacement level hitters dragging down 48% the team would do worse than a team with replacement level pitchers only dragging down 35.1% of the team.
So here's where I'm going. If you set a .225 team at replacement level (a little worse than the 1962 Mets), with all things equal (batting and pitching replacement level, fielding average) you get:
Hitters 39.4 WS, Fielders 41.1 WS, Pitching 28.8 WS. That sets position players at 10.1 WS, (9.5 in a DH league) and pitchers at 4.4 WS.
Think about it though - James was wondering why pitchers came out too low. Hell most of us think that. That's why - when you adjust for replacement level the pitchers get the boost they are in need of.
BTW, that's probably too low, but it's where you are if you set the pitchers and hitters as equally bad.
If you want to bump the pitchers to 6 WS being replacement level you get a team at .278 WPct (45-117). I think that's fair - the 1962 Mets certainly had some players that were way below replacement level - certainly many players that couldn't get time in other organizations were better than what the Mets put out on the field that first season.
So there you have it. I'm going to be setting my pitcher replacement level to 6 WS per 220 IP. My position player replacement level becomes 11.9 in a non-DH league, 11.2 in a DH league. Over a 154 game season, I'll go with 11.3/10.6/5.7 (the 5.7 is over 209 IP, not 220, in the shorter season).
What does this mean for my expected team WPct to use in this massive pitcher spreadsheet? Well a team with an average offensive and defense, and pitchers that pull in 6 WS per 220/IP would win 65.8 games, or play at a .406 clip. That means that I'll be using 5.48 as my replacement level aDERA - which is the equivalent of 6 WS in a 220 IP season.
Having looked at it this way, I'm pretty surprised that the replacement level for a position player comes out that high, but it makes sense. I mean an average season is generally about 2-2.5 WARP. A replacement player is about -2 to -2.5 in TPR. If an average position player gets about 19.7 WS in a full 162 game season, it would make sense that a replacement player would be about 8 WS below that.
I can't believe it took me 4-5 years of working with WS to approach it this way, thanks for triggering it jim!
184. Dr. Chaleeko
Posted: May 27, 2006 at 05:05 AM (#2039658)
If you want to bump the pitchers to 6 WS being replacement level you get a team at .278 WPct (45-117).
Joe, I was hoping you'd end up around a .278 team. That feels more like the magic number to me. Sometimes I see where people set the number of games a replacement team would win somewhere in the mid-50s, but when you think a sec on it, that's too high. Why? Because real-life teams actually win in the 50s.
That really bad Tigers team a couple years back was the worst thing any of us has ever seen that wasn't an expansion team, a team from Philadelphia, or a Ron Howard film. They won just 43 games. BUT the year before, Detroit, Tampa, and Milwaukee all won 55-56 games.
A year after the disaster in Detroit, the DBacks won 51 and the Royals 58. So teams win only 50-59 games with some frequency. Not necessarily often, but it's far from unheard of. Teams win less than 50 games almost never. It's always felt to me like an all-replacement team should be a once-in-a-lifetime event (or near to it) because no non-expansion team making even a pretense of competing or rebuilding should end up below fifty wins unless absolutely everything went wrong, just like it did with the 2003 Tigers.
I'm mostly pleased to see that my own stylistic preference on the matter is upheld in some fashion by an independent third party's thinking.
185. Paul Wendt
Posted: June 04, 2006 at 06:16 PM (#2050723)
by Paul Wendt, 2006 June 4
Big seasons measured by Win Shares: a Decennial Census
This report, decennial from 1910 to 2000, begins with 1892 and 1899, the first and last 154-game season of the 19th century. In 1890 and 1900, first three 8-team leagues and then one played 140-game seasons, making the scope of the major leagues too different from the 20th century norm for comfort.
Columns 'Tot' and 'Avg' give the number of 20- or 30-Win Share players in the major leagues and the number per team. Columns 'oth' and 'P' give the distribution between non-pitchers and pitchers. Players are player-team-seasons, the work by one player with one team in one season.
It seems that the frequency of 20-Win Share pitchers decreased sharply between 1899 and 1910 and steadily in the second half of the 20th century. 30-Win Share pitchers were practically extinct by 1950.
The frequency of 20+ seasons by pitchers has decreased by about 80% since the deadball era, from about one per team (30/32 in 1910 and 1920) to one per division (11/56 in 1990 and 2000). We know that has happened mainly by a decrease in workload, but the the frequency of those seasons has decreased for other players, too. The next table examines the non-pitchers more closely. The focal number of Win Shares is 19 rather than 20 through 1960 because a complete team-season was 154 rather than 162 games played to a decision.
<u>Big seasons by Win Shares:
19 Win Shares to 1960; 20 Win Shares from 1970</u>
The correction confirms a decrease in frequency of big seasons for non-pitchers in the second half of the 20th century, perhaps specifically during the expansion era, for it is not clear that the number ever differed significantly from 2.5 per team in the 154-game epoch.
(The correction also highlights a decrease in big seasons for pitchers in the 1920s rather than a steady decrease in the first half of the 20th century.)
186. TomH
Posted: June 05, 2006 at 12:08 PM (#2051631)
OF course, the DH in the AL also makes it harder for non-P to accumulate WS. Kind of like if MLB changed to an 10-man batting order; the players would be as good, but getting fewer chances to bat sure would make it harder to get 200 hits, 40 HR and 25 WS.
187. Paul Wendt
Posted: June 05, 2006 at 02:29 PM (#2051705)
Regarding this approach to evolution (player-season win shares statistics, both the ratings covered here and the rankings not covered here), I have more than ample notes on starting pitchers as well as pitchers; only meagre notes on designated hitters. The player-season approach works for designated hitters but not for the effects of designated hitting. In contrast, it picks up the effects of relief pitching when implemented for starting pitchers.
I don't believe the effect of designated hitting --the effect of AL use of its option and practically mandatory use of the DH by every AL team-- is so great as TomH suggests. Indeed, I suppose it's much smaller than the would-be effect of adopting 10-batter or 8-batter lineups. If I'm right, it's partly because the DH helps some of the best batters play more games. And partly because the DH replaces some PH; pitchers didn't do 10% of major league batting before 1973.
But the qualitative point is valid: "Of course, the DH in the AL also makes it harder for non-P to accumulate WS." I don't know how to incorporate it.
Does the Win Shares system give a share to eight players for fielding that is fixed for all time?
? one third of 52%, which is 1.04 win shares per game played to a decision
Some people here modify WS informally, at least, to give more credit for fielding in early days. Such variation makes it easier for the non-pitchers to accumulate WS, per game.
188. Paul Wendt
Posted: June 05, 2006 at 02:34 PM (#2051710)
If I'm right, it's partly because
- the DH helps some of the best batters play more games.
- the DH replaces some PH; pitchers didn't do 10% of major league batting before 1973.
and partly because
- the new player bats but doesn't field; the fielding credit is still divided eight ways.
maybe
- the new player (not necessarily the DH) is below average in batting quality
1) Set replacement level about .7 WS per season lower for leagues with a DH. See my post #183 on this thread for more detail on how I got that number. This is for those of us subtract a replacement level from WS.
2) Multiply the offensive WS for players in a DH league by a factor (I'd guess 1.05), and then WS don't add up to 3*wins. I don't have a problem with that, because by adding the DH, you are artificially adding offense to the league. Since WS measure offense and defense, it would make sense that things wouldn't add up if you artificially add 5% offense to a league.
190. DanG
Posted: June 07, 2006 at 03:21 PM (#2055152)
A year after the disaster in Detroit, the DBacks won 51 and the Royals 58. So teams win only 50-59 games with some frequency.
Teams playing under .350, 1980-2005
Detroit AL 2003 43 119 0.265 Arizona NL 2005 51 111 0.315 Detroit AL 1996 53 109 0.327 Florida NL 1998 54 108 0.333 Baltimore AL 1988 54 107 0.335 Atlanta NL 1988 54 106 0.338 Tampa Bay AL 2002 55 106 0.342 Detroit AL 2002 55 106 0.342 Milwaukee NL 2002 56 106 0.346 Kansas City AL 2005 56 106 0.346 Toronto AL 1981 37 69 0.349
191. Dr. Chaleeko
Posted: June 07, 2006 at 03:50 PM (#2055182)
Thanks DanG. That seems like a pretty reasonable number of extremely bad teams; about once every two-three years (on average) a team wins under 60 games. If replacement level should be an extremely infrequent cataclysm of badness, then .278 works well.
192. sunnyday2
Posted: June 07, 2006 at 04:41 PM (#2055211)
As bad as the Royals are/have been, they've got a long way to go to be as bad as the Tigers were. Has any other team been under .350 3 times in 8 years?
But note also: 1980-1987--once (1/8)
1988--twice.
1989-1995--none (0/7)
1996-2005--eight (about 1/1). That old saw about competitive balance should conclude: But it's over now. Is it about money? Or is it about non-baseball owners who think they know something about baseball? Or both? Lemme see: Detroit, Arizona, Florida, Tampa, Milwaukee. Sometimes it's about money.
193. Dr. Chaleeko
Posted: June 07, 2006 at 04:44 PM (#2055215)
A long way to go, but, the Rs are on pace for a 40.50 win season. I should, however, note that they are streaking, having won 3 of their last 10.
194. DanG
Posted: June 07, 2006 at 06:20 PM (#2055280)
Second time's the charm?
18 teams played under .370 (a 60 win pace) from 1980-2005
Detroit AL 2003 43 119 0.265 Arizona NL 2005 51 111 0.315 Detroit AL 1996 53 109 0.327 Florida NL 1998 54 108 0.333 Baltimore AL 1988 54 107 0.335 Atlanta NL 1988 54 106 0.338 Tampa Bay AL 2002 55 106 0.342 Detroit AL 2002 55 106 0.342 Milwaukee NL 2002 56 106 0.346 Kansas City AL 2005 56 106 0.346 Toronto AL 1981 37 69 0.349 Cleveland AL 1991 57 105 0.352 Pittsburgh NL 1985 57 104 0.354 Kansas City AL 2004 58 104 0.358 New York NL 1993 59 103 0.364 Detroit AL 1989 59 103 0.364 Seattle AL 1980 59 103 0.364 Chicago NL 1981 38 65 0.369
195. Paul Wendt
Posted: June 08, 2006 at 04:31 PM (#2056221)
TomH #86 (186): OF course, [I've seen that before and I always read it first as "outfield course" the DH in the AL also makes it harder for non-P to accumulate WS. Kind of like if MLB changed to an 10-man batting order; the players would be as good, but getting fewer chances to bat sure would make it harder to get 200 hits, 40 HR and 25 WS.
JoeD #89: Paul there are two ways to do it, IMO.
1) Set replacement level about .7 WS per season lower for leagues with a DH. See my post #183 on this thread for more detail on how I got that number. This is for those of us subtract a replacement level from WS.
2) Multiply the offensive WS for players in a DH league by a factor (I'd guess 1.05), and then WS don't add up to 3*wins.
In #87-88, I gave ways the change from no-DH to DH differs from a simple change in the number of batters. (The historic change in MLB is also complicated by happening in only one of two leagues, so that it may also have a class of effects that I didn't give above, operating through the allocation of personnel to the two leagues.)
Anyway, it would be unreasonable to inflate batting win shares by 9/8 (1.125) as an adjustment for playing in a DH league. JoeD guesses 1.05. Inflation of bWS by 1.06 to 1.08 is close, on average, to inflation of total win shares by 162/154, which is how I handled the difference in length of schedule before, after, and during 1961. For non-pitchers, it may be that Win Shares are about as "easy" to earn in the AL from 1973 and in the 154-game epoch.
196. jimd
Posted: June 16, 2006 at 01:14 AM (#2065584)
many voters disagree with WARP's defensive assessment and are more included toward the somewhat more conservative WS assessment
That doesn't mean that they are correct to do so.
As a part of my ranking system, I've constructed MLB "All-Star" teams for every season through 1975 (top 2xN players, where N is the number of teams). For the 16 team era (1901-1960), there are 2081 player-seasons selected (34.7 per year, more than 32 due to ties). 752 were outfielders, 495 were infielders (SS,3B,2B). One would expect these two totals to split close to 50/50 because there is no reason for outfielders to be more valuable than infielders in any given season.
Can a Win Shares defender explain this highly significant split to me? (Please don't use differences in games played, unless you are prepared to show me that top infielders miss about 20 games per season more, on average, than do top outfielders.)
My explanation is that Win Shares does not give enough fielding credit to the infield positions.
(Reposted from the 1979 discussion thread)
197. Chris Cobb
Posted: June 16, 2006 at 01:25 AM (#2065601)
jimd,
Excellent data!
I would guess, that if one studied the matter, one would find that top infielders miss about 10 gmaes per season more, on average, than top outfielders, though I don't have data to support that guess, at least not yet. However, I would argue that, if that were the case, infielders ought to receive an adjustment for that, like catchers but on a smaller scale, since the missed time is presumably a consequence of the demands of the position.
Have you similarly construted MLB "All-Star" teams using WARP1 (using some iteration or other)? Are its teams more balanced between IF and OF than win shares?
198. Brent
Posted: June 16, 2006 at 05:26 AM (#2065784)
A reason that infielders might be underrepresented on All Star teams is a greater propensity for injuries. In one of the Bill James Abstracts he looked at the future careers of rookies with various characteristics and found that rookies at certain positions (second base, third base, and catcher if my memory is correct) tend to have shorter careers because of injuries.
The underrepresentation could be even larger if teams recognize the risk and move their up-and-coming star second baseman or catcher to the outfield to reduce the chances of injury.
I'm suggesting that there may be natural reasons why certain positions are underrepresented among star players. We know that third basemen are underrepresented in both the HoM and the HoF, and after looking carefully at the candidates I have to think that there just haven't been as many great players at third base as at the other positions.
199. jimd
Posted: June 16, 2006 at 08:47 PM (#2066252)
Have you similarly construted MLB "All-Star" teams using WARP1 (using some iteration or other)? Are its teams more balanced between IF and OF than win shares?
Yes I have. But I haven't completed the positional counts yet. Early next week some time I would guess.
Top 32 players each season (including ties),
broken down by position, aggregated by 20 year intervals.
200. jimd
Posted: June 16, 2006 at 08:55 PM (#2066260)
The underrepresentation could be even larger if teams recognize the risk and move their up-and-coming star second baseman or catcher to the outfield to reduce the chances of injury.
At least 20% of all infielder star candidates would have to be moved to cause this large imbalance. It implies that a significant number of outfield stars would have evidence of great infield play in the middle/high minors. Does this sound reasonable? And why not Collins, Baker, Hornsby, Frisch, Mathews?
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
Since players aren't replaced solely on the basis of their fielding and since the data set for major league players at one poisition in any given year is so small, how do you analytically determine the relationship between "replacement level" and "average" on a season-by-season basis?
is the major issue
A) The WARP league strength / timeline adjustment from WARP1 to WARP2?, or
B) the whole WARP1 calc to begin with?
I assume the length-of-schedule adjustment from WARP2 to WARP3 is not a biggie for anybody - maybe exceptions for the real short-league 1880s guys.
I for one don't have a big problem with WARP1, except for maybe a bit of extra fielding credit for SS, for example.
As for the timeline adjustments, if you put 40 of us in a room, you'll soon get 45 opinions :)
WARP3 is a complete waste of time, IMO.
WARP1 is OK except that it keeps changing and we don't entirely know why.
WARP2 is somewhere in between ;-)
Still, sunnyday, the only difference between WARP2 and 3 is schedule adjustment, so by any reasonable standard, WARP3 > WARP2, except perhaps in extreme 1870s circumstances.
I think what you mean to say is WARP2 is a complete waste of time, while WARP3 is a slightly better complete waste of time. ;)
Equally seriously, I know that TPR has been largely discredited. But that mostly relates to pegging a hypothetical zero value at the average. Aside from that, which each of us can adjust for as easily as we can schedule adjust WARP1, where are people at on TPR. Considering I get all my OPS and ERA+ numbers out of the Palmer-Gillette Encyclopedia, and considering the TPR is right there on the same page, I admit to glancing at it now and again. I am not sure it is not a useful number.
I used to look at WS, WARP and TPR and throw out the one that is not like the others. Now I don't look at any of them very much.
is the major issue
A) The WARP league strength / timeline adjustment from WARP1 to WARP2?, or
B) the whole WARP1 calc to begin with?
I answered this obliquely on the Lombardi thread, but I'll present the arguments explicitly here.
The biggest issue with WARP is the WARP2 all-time context adjustment to fielding value. I think that's simply a mistake.
The second big issue with WARP is the league strength adjustments. Too much of a black box to be trustworthy, especially when the results seem counterintuitive on the visible surface (as in the competition adjustments during WWII).
The third big issue with WARP is FRAR in WARP1. It's another big black box that has tremendous effects on the rankings and that gives results not intuitively reliable.
Because of these issues, I don't rely on WARP2 or WARP3 at all, and I don't rely heavily of WARP1 as a comprehensive metric.
That said, I find several components of WARP to be quite valuable. EQA is a handy improvement, imo, on OPS+ as a batting value rate stat. DERA is a very substantial improvement on ERA+ as a pitching rate stat, with results that can be cross-checked by studies of team fielding in relation to pitching. FRAA are at least as reliable as fws at assessing fielding quality.
In general, I think WARP does a much better job than win shares of handling the changing relationship of pitching and fielding in the creation of defensive value as the conditions of the game change (especially before 1900!). However, there are problems in the ways they turn their rate measures into comprehensive metrics that prevent me from using WARP as the foundational system for my rankings.
Now until recently, I was working with much older editions of TB (1996), but I also haven't yet checked the "new" 2004 edition to see how it handles fielding. Does anyone know how much the FR calculations have changed? Do they better reflect team-contexts? Or the historical relationships of the positions and of the pitcher/defense balance?
(Mattingly 7, of course.)
I do use WARP3 because unlike Win Shares it has a time line and schedule adjustment. there are players that I trust WARP over WS (Bobby Veach, Earl Averill) but usually if push comes to shove I will use WS with a few random adjustments for mistakes I believe they make.
At the same time I love Eqa and RARP (VORP a little less so) and think that they really have the cutting edge on modern baseball statistics inever category but fielding, with UZR far better than FRAR or FRAA. It seems to me that they have these nice metrics for modern baseball and decided to trace them back through time. Maybe if a guy with an deeper understanding of baseball history went through and re-did WARP it would look much better. Or maybe not.
The differences lie in how the base rate is determined, and the exact adjustments made.
I do look at WARP1 relative to peers in the same era, and look at FRAA, but that's it for BP measures.
Now until recently, I was working with much older editions of TB (1996), but I also haven't yet checked the "new" 2004 edition to see how it handles fielding. Does anyone know how much the FR calculations have changed? Do they better reflect team-contexts? Or the historical relationships of the positions and of the pitcher/defense balance?
Fielding Runs have been reworked quite extensively, and seem to be much better. I think if you take FWAA, Win Shares/1000 Innings Fielding, and Fielding Runs all together, you can get a good handle on a player's fielding abilities.
This is true, and it is nearly impossible (I think) to correlate FWS with RRAR.
Take a fielder (like Lou Boudreau) with 54 FRAR in 1943. Using a very simplified model, suppose 9 runs = 1 win (which is basically what WARP does). That means Lou's fielding was 6 wins above replacement. In WS lingo, that's 18 FWS. That's just for fielding and that's supposed to be above replacement. We don't know what FWS replacement for a shortstop is, but if it is anything north of 0 FWS, then Boudreau would be entitled to more than 18 FWS to make it equivalent to the WARP number. The highest number of FWS ever recorded for a shortstop was 12.83 (it wasn't Boudreau).
Let's take the reverse. Boudreau had 8.8 FWS in 1943, which is 2.9 wins, which is 26.4 runs. So to equate that to an FRAR of 54, you'd have to assume that replacement level in FWS was negative 27.6 runs, which is -3.07 wins, which is -9.2 FWS, which of course, is impossible.
Another way to look at this, using a method tailored to the number of games/innings. The average shortstop in 1943 AL had 3.89 FWS in 91.7 games. Using 8.8 innings per game (just for kicks), you get 807 defensive innings, which means the FWS rate was about 4.82 FWS per 1000 innings, for an average SS.
Boudreau played in 152 games, or 1338 innings, which means the average shortstop in the FWS system would have had 6.45 FWS (4.82*1.338), or 2.14 wins, or 19.34 runs.
BP has Boudreau with 54 FRAR and 22 FRAA, which means the average player with the same playing time as Boudreau saved 32 more runs than a replacement player with the same playing time as Boudreau. If an average shortstop with the same playing time is 32 runs better than replacement with the same playing time, and the average player has 19.34 runs derived from the FWS system, then replacement must be -12.6 runs, or negative 1.4 wins, or -4.2 FWS. Again, that's impossible in the WS system.
Also, if the average shortstop with Boudreau's playing time would have had 6.45 FWS and he had 8.8, then he has FWSAA of 2.35, which is .78 wins, which is 7 runs above average (as opposed to a FRAA of 22).
Although they certainly don't equate, you can come a lot closer to deriving FRAA from FWS than you can FRAR. BP says it sets replacement as the lowest runs calculated at the position for that season, but of course, we don't know how to calculate it.
Certainly BP is using some sort of linear weights system for fielding, with tweaks, which is by its nature going to produce different results than FWS Claim Points.
I think if you take FWAA, Win Shares/1000 Innings Fielding, and Fielding Runs all together, you can get a good handle on a player's fielding abilities.
That's about all you can do, and compare them to comparables at the same position (within the same WS or WARP system).
"The Win Shares system is vastly more conservative in measuring the differences among third basemen, or players at any defensive position, than is Linear Weights."
"Linear Weights in 1952 rates Ed Yost at -35 runs, while rating Fred Hatfield of Detroit...at +19 runs. I think it is better not to assert that there is a 54-run difference between two third basemen -- a 40 homer difference -- without very solid evidence that such a gulf actually exists."
"But a 54-run difference would be equivalant to a swing of about 17 Win Shares. I do not now believe that this is a realistic estimate of the defensive impact of a third baseman. Our system would normally evaluate the difference between the league's best defensive third baseman and the worst at something more like four to five Win Shares." ...or 12-15 runs.
Three more things I noticed about WARP today, relating to hitting:
1. No positive credit is given for sacrifice hits, unlike RC which gives .50 credit. However, WARP does include sacrifice hits in the number of outs (and so does RC). That would seem to lower the BRAR and BRAA of the "little" hitters.
2. Grounding into a double play does not count as an additional out in the WARP system, but it does in runs created. That would seem to increase the BRAR and BRAA of the slow/ground ball hitters.
3. WARP uses the same formula for hitting regardless of the era, so hits, total bases, walks and steals are all worth the same throughout eras. At first I thought maybe they make up for that in moving from WARP1 to WARP2, but I don't think so, because with the translation to WARP2 they are only taking into account league difficulty.
Runs Created has some modifications tailored to era, which is why there are 24 different formulas.
"But a 54-run difference would be equivalant to a swing of about 17 Win Shares. I do not now believe that this is a realistic estimate of the defensive impact of a third baseman. Our system would normally evaluate the difference between the league's best defensive third baseman and the worst at something more like four to five Win Shares." ...or 12-15 runs
In the "new" Fielding Runs Yost is -30 with a 78 (100=ave) Fielding Rating, while Hatfield is a +12 with a 109 Fielding Rating.
Back in January, I examined the 2004 Hall of Fame ballot through the lens of Baseball Prospectus' Davenport Translated player cards. The idea was to establish a new set of sabermetric standards which could help us separate the Cooperstown wheat from the chaff, especially since Bill James' Hall of Fame Standards and Hall of Fame Monitor tools have reached their sell-by date. After all, the Hall has added 26 non-Negro League players since James last revised those tools in 1994's The Politics of Glory, and we've learned a lot since then.
These new metrics enable us to identify candidates who are as good or better than the average Hall of Famer at their position. By promoting those players for election, we can avoid further diluting the quality of the Hall's membership. Clay Davenport's Translations make an ideal tool for this endeavor because they normalize all performance records in major-league history to the same scoring environment, adjusting for park effects, quality of competition and length of schedule. All pitchers, hitters and fielders are thus rated above or below one consistent replacement level, making cross-era comparisons a breeze. Though non-statistical considerations--awards, championships, postseason performance--shouldn't be left by the wayside in weighing a player's Hall of Fame case, they're not the focus here.
Since election to the Hall of Fame requires a player to perform both at a very high level and for a long time, it's inappropriate to rely simply on career Wins Above Replacement (WARP, which for this exercise refers exclusively to the adjusted-for-all-time version. WARP3). For this process I also identified each player's peak value as determined by the player's WARP in his best five consecutive seasons (with allowances made for seasons lost to war or injury). That choice is an admittedly arbitrary one; I simply selected a peak vaue that was relatively easy to calculate and that, at five years, represented a minimum of half the career of a Hall of Famer.
This oversimplification of career and peak into One Great Number isn't meant to obscure the components which go into that figure, nor should it be taken as the end-all rating system for these players. We're looking for patterns to help us determine whether a player belongs in the Hall or doesn't and roughly where he fits. Though this piece is founded on the sabermetric credentials of Hall of Fame candidates, I've also taken the trouble to wrangle together traditional stat lines for each one, including All-Star (AS), MVP and Gold Glove (GG) awards as well as the hoary but somewhat useful Jamesian Hall of Fame Standards (HOFS) and Hall of Fame Monitor (HOFM) scores.
The career and peak WARP totals for each Hall of Famer and candidate on the ballot were tabulated and then averaged [(Career WARP + Peak WARP) / 2] to come up with a score which, because it's a better acronym than what came before, I've very self-consciously christened JAWS (JAffe WARP Score). I then calculated positional JAWS averages and compared each candidate's JAWS to those enshrined.
It should be noted that I simply followed the Hall's own system of classifying a player by the position he appeared at the most. Thus, for example, Rod Carew is classified as a second baseman, and all of his numbers count towards establishing the standards at second, even though he spent the latter half of his career at first base. This is something of an inevitability within such a system, but the if the alternative is going nuts resolving the Paul Molitors and the Harmon Killebrews into fragmentary careers at numerous positions, we'll never get anywhere.
By necessity I had to eliminate not only all Negro League-only electees, who have no major league stats, but also Satchel Paige and Monte Irvin, two great players whose presence in the Hall is largely based on their Negro League accomplishments. Other Negro Leaguers, such as Jackie Robinson, Roy Campanella and Larry Doby have been included. While their career totals are somewhat compromised by not having crossed the color line until relatively later in their careers, their peak values--especially Robinson's--contribute positively to our understanding of the Hall's standards.
Here are the positional averages, the standards, to which I'll refer throughout the piece.
POS # BRAR BRAA FRAA WARP PEAK JAWS
C 13 406 197 61 94.8 41.3 68.1
1B 18 717 465 2 98.2 43.1 70.7
2B 16 558 255 70 99.0 41.9 70.4
3B 10 594 322 48 100.2 42.2 71.2
SS 20 411 136 77 100.5 43.2 71.9
LF 18 730 462 -8 103.8 42.8 73.3
CF 17 694 445 14 108.8 46.5 77.6
RF 22 754 482 33 110.2 43.3 76.8
CI 28 673 414 18 98.9 42.8 70.8
MI 36 476 189 74 99.8 42.6 71.2
IF 64 562 287 49 99.4 42.7 71.1
OF 57 729 465 15 107.8 44.1 75.9
Middle 66 519 257 56 101.1 43.3 72.2
Corners 68 714 449 16 103.9 42.9 73.4
Hitters 134 618 354 36 102.5 43.1 72.8
A quick breeze through the other abbreviations: BRAR is Batting Runs Above Replacement, BRAA is Batting Runs Above Average; both are included here because they make good secondary measures of career and peak value. FRAA is Fielding Runs Above Average, which is a bit less messy and more meaningful to the average reader than measuring from replacement level.
It's worth noting that these figures have changed somewhat since the last time around, as Davenport has continued to revise his system--particularly the defensive elements--and adjust appropriately for the way the game has changed over 135 years of major-league history. Most notably, the spread between the average JAWS scores at various positions has been cut in half, which I interpret as a sign that the system's biases have been reduced. So without further ado, we'll move on to the 2005 Hall of Fame ballot.....
Anyone care to do it for our guys . . . if I get free time I'll take a stab at it, but I have no idea where I'd be.
It'd be interesting to see our average electee according to that, and maybe an average of our bottom 3, since we don't have the mistakes of the Hall of Fame. Also it's cool to see both the peak and career numbers . . .
should be:
"but I have no idea when that'd be"
very strange when the subconscious mind takes over for the conscious one . . .
Anyone care to do it for our guys . . . if I get free time I'll take a stab at it, but I have no idea where I'd be.
I have this data somewhere for WS and WARP1. It is the baseline for my WARP and WS evaluations. I'll try to find the spreadsheet and --- and what? How do I post it?
*A caveat with the WARP stuff in the spreadsheet. I did this spreadsheet before the first election because establishing the HoF baseline underlies about 7/8 of my HoM rating system. BP has changed the WARP calculations several times since I first did the sheet a couple of years ago, so I suspect the numbers in the sheet are not perfectly in accord with WARP anymore.
Here's what the sheet has for the HoF hitters (and Ripken), listed by position:
1. RC/27 LgRC/27 RangeFactor LgRangeFactor
2. Win Shares: 3-year peak, 5-year consec peak, 7-year peak, total and per 162 games
3. WARP1: 3-year peak, 5-year consec peak, 7-year peak, total and per 162 games
4. HOF Standards and HOF Monitor scores
Here's what is has for the HoF pitchers:
1. Win Shares: 3-year peak, 5-year consec peak, 7-year peak, total and per 100 IP
2. WARP1: 3-year peak, 5-year consec peak, 7-year peak, total and per 100 IP
3. HOF Standards and HOF Monitor scores
4. Linear Weights: 3-year peak, 5-year consec peak, 7-year peak, total and per 100 IP
5. Wins Above Team
Also, I just added significant relievers in all of the above pitching categories except #4 and #5. Only Eck and Fingers are in the HoF, but I included 15 other recognizable names (Sutter, Face, Rivera, etc.).
You will also see a little "grade sheet" at the end of each positional category. No need to pay attention to that, but it creates a little hall of fame report card using the averages in the various categories and standard deviations in those categories. The spreadsheet currently just does the calcs for the career numbers, but you can easily fix the formulas yourselves to apply them to 3-year peak or some other category...if you are so inclined. I rarely use those numbers "as is". I first eliminate the extreme cases (like Ruth's and Young's career WS on the high end, and Hafey's and Haines' career WS on the bottom end). Anyway, who cares about that.
So, does anyone want this thing? Should I e-mail it to those who request it, or can someone instruct me how to send it to the group through the HOM group on Yahoo?
I was leafing through Win Shares, and Bill James says he thinks it would be interesting to see how much "star power" a team has by taking each player's WS for that year and multiplying it by his career WS.
Maybe the same would work for a unified peak/career number. Multiply a player's WS/162 (or WARP/162) times his career WS (or WARP). Maybe take the square root of that to get it to a manageable number.
So Bobby Doerr with 281 WS and 22.61 per 162 would get a score of 79.7. Joe Gordon with 243 WS and 25.13 per 162 would get 78.2. So they are about even, despite Doerr's longer career and Gordon's higher peaks.
Using WARP1, Doerr would get 30.6 and Gordon would get 27.0, which accords with how much more favorably WARP views Doerr.
You'd have to season-length adjust and make whatever other adjustments you want before using this formula.
I think I may have proposed this in the past (without the Bill James backup and without the square root) and it did not take off, but I can't remember. :)
I was trying to rank center fielders the other day, and was using WARP fielding ratings. What to do about Max Carey? His FRAA was 32, and his FRAR was 556. Now, the difference between an average hitter and a replacement hitter during his time was only 303 runs, so how can the fielding difference by a (mostly) CF be so much more?
If you look at all NL CFs in 1917, you'll find that Edd Roush was the worst, with a fielding rate of 87. He was still well above replacement. His main backup had a rate of 91. Most players with very few games at a position get a rate of 100. But I'm wondering, if Roush is well above replacement, and he's the worst CF, who exactly is this replacement level player who would fill in for him? Some people take replacement level to be equivalent to the worst regular in a league. Looking at the 1917 AL, I find that rates for CFs go as low as 81 for Clyde Milan (still slightly above replacement.) Fielding rates over 110 and below 90 are pretty rare, whereas hitting rates commonly vary by much more than that. So how can the differences between average and replacement fielders be greater than the difference between an average and replacement hitter?
Part of the problem is that WARP treats fielding as of more importance in earlier years, when there were more balls in play. This makes sense, but the degree of difference it uses seems excessive. For Willie Davis, who played about the same number of games, the difference between FRAA and FRAR is 377.
Another problem appears in 1927. Carey has a 108 fielding rate in CF, but a 93 in RF. The other CF on the Dodgers that year also has a high rate, while the other LFs and RFs also have low rates. This looks like a matter of distribution of batted balls more than an accurate assessment of fielding performance.
Anyway, you can see why I don't submit ballots, when I can't even rank the center fielders.
Sadly, that's one of the several reasons I've chosen to eschew WARP (esp WARP2/3) in creating my own rankings.... Much as WS may have its limitations, they are known and adjustable.
The way I see it, I would value a player like this, theoretically:
1) Batting runs over replacement level at position (I would take the average of the bottom 15% of regulars as replacement level for all positions except pitcher, where I'd use the league average).
2) Fielding runs over average at position.
3) Pitching runs over replacement (I would take the average of the bottom 15% of pitchers in the same role with about 100 IP for starters, maybe 50 for relievers, give or take the length of the schedule, etc. - I'm open to idea here though).
I could see splitting 1 and 3 into - 1a) Batting value over replacement level hitter (generally .350 or so, could see the case for anywhere from .300 - .400) 1b) Defensive constant, based on position played.
What I'm still trying to figure out is a simple way to do this for WARP, without having a database of all their values. I'd pay decent money for a workable complete all-time database of their run values for offense/defense/pitching for each season. Would make it much easier to figure out league norms, etc..
Still pondering the general issue of pitching versus fielding. Going back to the Bob Lemon discussion, if pitching was a bigger part of defense in 1949 than it was in 1925, and fielding a smaller part, it seems to me that the standard deviation of ERAs would be higher in 1949 than in 1925. So, I did the following. Using all pitchers in 1924, 1925, 1948, and 1949 with over 50 innings pitched, I took the standard deviation of ERA for each team. Then I compared the average for 1924/5 with the mean for 1948/9.
In 1924/5, the average standard deviation was 0.84. For the 1948/9 period, it was 0.87. I also divided the STD by the average ERA, and came up with a figure of 0.20 for the earlier years and 0.21 for the later.
This would suggest that there wasn't much of a change in the relative importance of pitching and fielding in this period. For the 1925 NL, BP has the difference between an average and replacement level CF as 36 runs. In the 1949 AL, it shows a 19 run difference. My study seems to show that the change should be on the order of 2 or 3 runs, not 17. Of course, I could be wrong.
Win Shares Walk Thru Part I
<i>....the system is giving out absolute wins on the basis of marginal runs. 50% of the league average in runs scored, with a Pythagorean exponent of 2, corresponds to a W% of .200. It is for this reason that in old FanHome discussions myself and others said that WS had an intrinsic baseline of .200 (James changed the offensive margin line to 52%, which corresponds to about .213).
In an essay in the book, James discusses this, and says that the margin level(i.e. 52%) “is not a replacement level; it’s assumed to be a zero-win level”. This is fine on it’s face; you can assume 105% to be a zero-win level if you want. But the simple fact is that a team that scored runs at 52% of the league average with average defense will win around 20% of their games. Just because we assume this to not be the case does not mean that it is so.
Win Shares would not work for a team with a .200 W%, because the team itself would come out with negative marginal runs. If it doesn’t work at .200, how well does it work at .300, where there are real teams? That’s a rhetorical question; I don’t know. I do know that there will be a little bit of distortion every where.
In discussing the .200 subtraction, James says “Intuitively, we would assume that one player who creates 50 runs while making 400 outs does not have one-half the offensive value of a player who creates 100 runs while making 400 outs.” This is either true or not true, depending on what you mean by “value”. The first player has one-half the run value of the second player; 50/100 = 1/2, a mathematical fact. The first player will not have one-half the value of the second player if they are compared to some other standard. From zero, i.e. zero RC, one is valued at 50 and one is valued at 100.
By using team absolute wins as the unit to be split up, James implies that zero is the value line in win shares. Anyone who creates a run has done something to help the team win. It may be very small, but he has contributed more wins then zero. Wins above zero are useless in a rating system; you need wins and losses to evaluate something. If I told you one pitcher won 20 and the other won 18, what can you do? I guess you assume the guy who won 20 was more valuable. But what if he was 20-9, and the other guy was 18-5?
You can’t rate players on wins alone. You must have losses, or games. The problem with Win Shares is that they are neither wins nor wins above some baseline. They are wins above some very small baseline, re-scaled against team wins. If you want to evaluate WS against some baseline, you will have to jump through all sorts of hoops because you first must determine what a performance at that baseline will imply in win shares. Sabermetricians commonly use a .350 OW%, about 73% of the average runs/out, as the replacement level for a batter. A 73% batter though will not get 73% as many win shares as an average player. He will get less then that, because only 21%(73% - 52%) of his runs went to win shares, while for an average player it was 48%. So maybe he will get .21/.48 = 44%. I’m not sure, because I don’t jump through hoops.
Bill could use his system, and get Loss Shares, and have the whole thing balance out all right in the end. But to do it, you would have to accept negative loss shares for some players, just as you would have to accept negative win shares for some players. Since there are few players who get negative wins, and they rarely have much playing time, you can ignore them and get away with it for the most part. But in the James system, you could not just wipe out all of the negative loss shares. Any hitter who performed at greater then 152% of the league average would wind up with them, and there are (relatively) a lot hitters who create seven runs a game.
James writes in the book that with Win Shares, he has recognized that Pete Palmer was right after all in saying that using linear methods to evaluate players would result in only “limited distortions”. And it’s true that a linear method involves distortions, because when you add a player to a team, he changes the linear weights of the team. This is why Theoretical Team approaches are sometimes used. But the difference between the Palmer system and the James system is that Palmer takes one member of the team, isolates him, and evaluates him. James takes the entire team.
So while individual players vary far more in their performance then teams, they are still just a part of the team. Barry Bonds changes the linear weight values of his team, no doubt; but the difference might only be five or ten runs. Significant? Yes. Crippling to the system? Probably not. But when you take a team, particularly an unusually good or bad team, and use a linear method on the entire team, you have much bigger distortions.
Take the 1962 Mets. They scored 617 and allowed 948, in a league where the average was 726. Win Shares’ W% estimator tells me they should be (617-948+726)/(2*726) = .272. Pythagorus tells us they should be .304. That’s a difference of 5 wins. WS proceeds as if this team will win 5 less games then it probably will. Bonds’ LW estimate may be off by 1 win, but that is for him only. It does not distort the rest of the players (they cause their own smaller distortions themselves, but the error does not compound). Win Shares takes the linear distortion and thrusts it onto the whole team.
Finally, the defensive margin of 152% corresponds to a W% of about .300, compared to .213 for the offense. The only possible cutoffs which would produce equal percentages are .618/1.618 (the Fibonacci number). That is not to say that they are right, because Bill is trying to make margins that work out in a linear system, but we like to think of 2 runs and 5 allowed as being equal to the complement of 5 runs and 2 allowed. In Win Shares, this is not the case. And it could be another reason why pitchers seem to rate too low with respect to batters (and our expectations).
Couldn't you argue that a team that scores 50% of the league average number of runs but with league average defense and pitching would win 20% of its games based on their defense and pitching? Isn't that where their WS would go?
What would the winning percentage be if a team scores 52% of all runs (isn't it 48%?) and allowed 152% of league average. I know it woudl be greater than zero, but then again, are their numbers that can actually get a team to 0 wins in a pythagorean system?
And how does one calculate loss shares? Couldnt' it be argued taht a loss is just the absence of a win? You lose if you aren't doing the things (scoring runs, preventing runs) that it takes to win.
pkw (Indy, IN): With a new book out this spring, I presume we'll have to wait until at least Fall '06 for the WARP Encyclopedia to come to bookshelves. Barring that, would it be possible for a series of "basics" articles showing how WARP is calculated, the whys and whynots all explained, Win Shares-style? Why FRAR is used instead of FRAA is one question I like to see discussed. Thanks for all the great work.
Clay Davenport: If you used FRAA, then an average SS and an average 1B would have an equal rating, zero. You would need to introduce a positional adjustment, which most people calculate by using the average batting performance at a position.
I really, really don't like the idea of using batting performance to measure a fielding performance. However, assuming reasonably intelligent management, the difference in offensive level between positions should be roughly equal to the defensive difference between positions. If it wasn't - if everybody overstated the fielding value of a shortstop, for instance - the a team who used a better-hitting, poor-fielding SS would gain an advantage. Assuming the advanatge led to wins, everybody would copy them (because even an assumption of reasonable intelligence leaves us at the monkey-see monkey-do level) and the difference in fielding would decline. Anyway, FRAR essentially mimics using FRAA + fielding adjustment, but only uses fielding stats to do it.
I think it is reasonable to treat each position on the field as being roughly equal in importance, and FRAR is the vehicle I use to make it so.
I am not sure we really looked at this topic from the above angle. Though th epossibility of a WARP book is pretty exciting.
I asked him about how really low replacement levels for defense and this was his response...
jschmeagol (new york, ny): Hey Clay, I have a WARP question. How exactly do you find replacement level for defense. The reason that I ask is that it seems to be really really low. For instance, over the course of Max Carey's career the difference between FRAA and FRAR is larger than the difference between BRAA and BRAR. This doesn't seem possible, but I would think that you have a godo reason for it. Can you elaborate? Thanks
Clay Davenport: Replacement level for defense primarily depends on how many balls get hit to a given position, and what happens to them when they get there. Generally speaking, more balls in play = more FRAR for all positions, which is a lot of what's going on for Max Carey and other deadball era players. There weren't many homers, there weren't many walks or striekouts, there were lots of errors, although not as many as a generation earlier. All of those tilt the share of total runs from the pitchers to the fielders, and it enhances the FRAR.
The reason it so low ties in with this question -
The last line leads into the first question.
I also want to point out that while Davenport hasn't really put his system up for scrutiny in the way that Bill James was with WS, he does seem very open to answering emails and always answers a few WARP questions in his chats.
Also, I found this exchange in a clay davenport chat...
"pkw (Indy, IN): With a new book out this spring, I presume we'll have to wait until at least Fall '06 for the WARP Encyclopedia to come to bookshelves. Barring that, would it be possible for a series of "basics" articles showing how WARP is calculated, the whys and whynots all explained, Win Shares-style? Why FRAR is used instead of FRAA is one question I like to see discussed. Thanks for all the great work.
Clay Davenport: If you used FRAA, then an average SS and an average 1B would have an equal rating, zero. You would need to introduce a positional adjustment, which most people calculate by using the average batting performance at a position.
I really, really don't like the idea of using batting performance to measure a fielding performance. However, assuming reasonably intelligent management, the difference in offensive level between positions should be roughly equal to the defensive difference between positions. If it wasn't - if everybody overstated the fielding value of a shortstop, for instance - the a team who used a better-hitting, poor-fielding SS would gain an advantage. Assuming the advanatge led to wins, everybody would copy them (because even an assumption of reasonable intelligence leaves us at the monkey-see monkey-do level) and the difference in fielding would decline. Anyway, FRAR essentially mimics using FRAA + fielding adjustment, but only uses fielding stats to do it.
I think it is reasonable to treat each position on the field as being roughly equal in importance, and FRAR is the vehicle I use to make it so. "
I am not sure we really looked at this topic from the above angle. Though th epossibility of a WARP book is pretty exciting.
I asked him about how really low replacement levels for defense and this was his response...
"jschmeagol (new york, ny): Hey Clay, I have a WARP question. How exactly do you find replacement level for defense. The reason that I ask is that it seems to be really really low. For instance, over the course of Max Carey's career the difference between FRAA and FRAR is larger than the difference between BRAA and BRAR. This doesn't seem possible, but I would think that you have a godo reason for it. Can you elaborate? Thanks
Clay Davenport: Replacement level for defense primarily depends on how many balls get hit to a given position, and what happens to them when they get there. Generally speaking, more balls in play = more FRAR for all positions, which is a lot of what's going on for Max Carey and other deadball era players. There weren't many homers, there weren't many walks or striekouts, there were lots of errors, although not as many as a generation earlier. All of those tilt the share of total runs from the pitchers to the fielders, and it enhances the FRAR.
The reason it so low ties in with this question - "
The last line leads into the first question.
I also want to point out that while Davenport hasn't really put his system up for scrutiny in the way that Bill James was with WS, he does seem very open to answering emails and always answers a few WARP questions in his chats.
Hopefully that works. I don't know how to use the bold or italics or anything like that. The parts in quatiations are from the chat, the rest are my comments.
James didn't claim his system was perfect. Quoting from p. 2 of his book, he says "If one player in this system is credited with 20 Win Shares and another with 18, we can state with a fair degree of confidence that the one player has contributed more to his team than the other...not that we are always right; there will always be anomalies and there will always be limitations to the data, but I would be confident that we had it right a high percentage of the time."
I think WS generally meets that standard. It has its faults. I certainly wouldn't recommend using it alone or without checking the data or questioning its results if they seem anomalous, but in general I think it does a very good job of bringing together lots of information on batting (including pieces that are missing from OPS, such as double plays), fielding (much more sophisticated than anything available 10 or 15 years ago), and pitching and boiling them down to a meaningful integer.
Bingo, I agree with that 100%.
Also, assuming a normal run environment of 3.5 to 5 runs a game, for example . . . A player that creates 1 run does not necessarily contribute to winning, from the 'net' perspective. If that player also created 27 outs, he contributed much more to the losses than the wins, to the point where his net win contribution is less than zero.
That team would win 11% of it's games, using PythaganPat and a run environment of 5 R/G. The team would score 2.6 R/G and allow 7.6. Real teams rarely vary by more than 75-125% of the league. The worst of both worlds would be a team that scored 562 runs and allowed 938 in a 750 run environment. That team would play .283 ball, or win 46 games.
I think the key point that James makes is that extreme teams do little to help us understand the other 99.99% of teams. The 1898 Spiders, the 1915 Athletics, and the 1962 Mets are such utter anomalies that they tell us almost nothing.
Do I want a system that gets it all right? Sure. But my preference is for a system that hits 98% of the time and accepts the distortions at the left end of the curve rather above having no system at all.
One thing I would point out, however, is that NA Win Shares don't work for me because the run environment is so weird. When I figured them for a few of the NA Boston teams (replicating Chris Cobb's work), I realized the run environment is so extreme that it leaves several regulars looking like non-contributors.
I use win shares, but the problems of extreme teams shouldn't be minimized. The distortions may be worse at the left end of the curve due to the impact of the zero point, but there are also distortions on the right end of the curve as well.
As I understand the argument, by giving zero offensive WS (rather than a negative numbers) to players who produce below the marginal run cutoff you are artifically reducing the amount of offensive WS available for all the players who produced positive marginal runs. (If I understand their adjustments correctly, the Hardball Times folks adjust for this in their present-day WS calculations.)
One question that we should probably look at empirically is whether this distortion is the main reason why hitters on poor teams seem to be earning less WS than hitters with similar performance levels on good teams (e.g., Medwick outperforming Sisler in their best seasons, Mediwck outpeforming Johnson over their careers). It stands to reason that there is a correlation between teams that are bad and teams that give AB to sub-marginal hitters.
I use win shares, but the problems of extreme teams shouldn't be minimized. The distortions may be worse at the left end of the curve due to the impact of the zero point, but there are also distortions on the right end of the curve as well.
I think the "problem" is that Winshares is just not slightly innacurate due to extreme teams - there are a whole series of "small" inaccuracies that, added together, make the system as a whole inferior to many other ways to measure "value."
James didn't claim his system was perfect. Quoting from p. 2 of his book, he says "If one player in this system is credited with 20 Win Shares and another with 18, we can state with a fair degree of confidence that the one player has contributed more to his team than the other...not that we are always right; there will always be anomalies and there will always be limitations to the data, but I would be confident that we had it right a high percentage of the time."
I don't see this as much of a defense. I mean, I could use RBI's to measure hitters, wins to measure pitchers, say 'my system's not perfect but my system would be right a high percentage of the time.' I would not recommend using such a system...
I thought it was bad when pitchers were zero-ed out as well. Is the team's offense/defense split done before these corrections are done? Does that mean that a horribly bad-hitting pitcher will take bWS away from his non-pitching teammates and give them to his better-hitting pitching teammates as pWS?
Plus it complicates the DH-league/non-DH league situation. I'm not exactly sure who is favored there exactly. One one hand, NL batters are lowered due to this zero-ing out issue, on the other hand they're raised up because its effectively eight line-up slots competing for bWS instead of nine.
How many pitchers are below the zero win level offensively?
Can you really be a negaitve amount of wins? I guess you can but at the same time you either win the game or you don't, you can't lose games already won.
Doesn't James sy that AL hitters are screwed because of the DH and that he can't really do anything about it?
If that last one is correct, it most likely is, then how should we adjust WS so as to not penalize AL players? Should we even do this? I mean offense is less valuable in the AL because there are nine hitters as opposed to 8.
Most of them.
Using the Hardball Times's modified calculations...
They list 126 pitchers in the 2005 NL.
11 are above 0.0
11 are equal to 0.0
102 are below 0.0
chart
No one wins games by themselves. Win Shares is just divvying out a teams wins among the its players. A negative number implies you are cancelling out someone else's positive contribution.
If you look at all NL CFs in 1917, you'll find that Edd Roush was the worst, with a fielding rate of 87. He was still well above replacement. His main backup had a rate of 91. Most players with very few games at a position get a rate of 100. But I'm wondering, if Roush is well above replacement, and he's the worst CF, who exactly is this replacement level player who would fill in for him?
"Most players with very few games at a position get a rate of 100."
precisely 100? What about players with few games but not "very few"? Is it possible that the measure involves derivation from per-game or per-inning defensive data plus "regression" (the wrong term here) toward 100 based on playing time?
I don't believe it can be the main reason for the phenomenon (granted for the sake of argument). The correlation between team quality and pitcher batting quality must be low. If statistically significant, I guess it is baseballistically insignificant. But it's worth checking, if anyone has data in the right format.
I thought it was great that 2 of the first 3 questions in Clay's chat were from the HOM group.
To me, his answer to my question indicates how much WARP and WS have in common at the big picture level. Hitters are hitters when they're at bat, not shortstops or right fielders. The added value for being a better defensive player should stay on the defensive side of the ledger.
I would be kinda curious how closely BRARP+FRAA corresponds to BRAR+FRAR, but those who advocate for FRAA should have the same problems with WS that they do with WARP. BP's replacement level might be too low, but average is not the answer.
For a while after Bill James presented Win Shares (July 2001?), I thought of it and described it as fundamentally different from other measures in that the players on each team are assigned positive scores that sum to the number of team wins. The name "win shares" aptly summarizes those two features and the first, positive scores, fits the chief prior criticism of the Total Baseball Ratings, focusing on its zero sum (for all players each league season).
When I read the book, I was surprised (shouldn't have been) to learn that the second feature, the sum to number of wins (for all players on each team season), is superficial: a late step in calculation and easy to undo, except for rounding to integers.
Win Shares Part 7 - Conclusion
It is obvious from the results that this is a major problem with Win Shares.
I've brought this up before (see the SS thread. There is a discussion (around post 86 and thereabouts) of Bobby Wallace and the relative distribution of "All-Stars" (both WARP and WS) during his career.
If Win Shares was fair to each of the 8 everyday positions, there would be an approximately equal amount of value created at each position (pitchers excepted), if one totaled it over a long enough period of time so that local talent gluts even out. The OF/IF imbalance during the Deadball era is particularly dramatic, but it appears to be present in other eras also. I have seen no evidence of a corresponding era where IFers (2B,3B,SS) receive more of the total value than the OFers.
Therefore it appears that a strong bias towards OFers over IFers is built into the Win Shares system. There are two places where this could occur. Either the fielding intrinsic weights are wrong -- too many fielding WS go to OFers, not enough to IFers. Or the offensive/defensive balance is wrong -- too many Win Shares go to offense, not enough to defense, resulting in too many Win Shares being awarded to the hitting end of the defensive spectrum. Either situation will result in too many OF "All-Stars".
Its good practice for the future, but for now I give a heads up to everyone that the entire backlog needs their seasons adjusted to 162.
. . .
The biggest obstacle is to remember to do it at all. I mean, its just so quick and easy to look at the career total for career and line up seasonal WS totals to look at peak. Its those seasonal line-ups I'm most worried about. A single WS difference between each season on those lines often makes a player look quite a bit better.
Not only Win Shares but all counting rather than rate statistics. Not only season stats but career stats.
Career Win Shares per 162 games is in print, organized rather conveniently for the HOM project, in one or two books by Bill James. But that is unreliable. Hard as it is to believe in this day, HOMeboys have discovered numerous arithmetic errors.
I hope that that is not worth saying. Comparison of rates per 162 is technically pertinent regardless of the length of the season, and I guess everyone here knows that. But I fear that the virtual transition to a 162-game era makes those published rates a more attractive nuisance.
Cyril Morong's Sabermetric Research
A note on error-percentage. The error rating in Win Shares is ratio-based. Commit twice as many errors as the average and you get no credit; no errors and you get full credit. It's grading on a curve, with little or no relationship to the number of runs prevented or allowed.
To me, this makes as much sense as rating HR's on a ratio. Then Tommy Leach's 6 HR's in 1902 is more impressive than Bond's 73 HR's because the avg position player in 1902 hit about 1 HR, while the avg position player in modern times hits about 20. Ty Cobb might think this is a good way to rate HR's but I think most of us would reject any offensive rating system like that out-of-hand.
IMO, the fielding ratings in Win Shares appear to attempt to translate the stats into a modern setting and evaluate them on a modern scale. They do NOT appear to be evaluating them in the context of the game that was actually being played then.
Jaws_WARP3_Career_Peak_etc_Method
>I used the third version of WARP, which is WARP3. This version is adjusted for difficulty and for playing time, so it levels the playing field for different eras.
The WARP3 translation is quite explicitly meant NOT to level the playing field, but is meant to make sure that modern players come out ahead as "we" intuitively think they should.
Or I'm missing something.
Systems like WARP and OWP and linear weights assume standard rules: 1 out into a hit = .74 runs = .74 Wins.
For Win Shares, this works for a .500 club. For a team that wins 70% of the time, no matter how good you are, you can't generate many extra wins, and so when they are apportioned out, the great players get shafted, by the principle of diminshing returns.
For a losing club, it's tough to raise the bar (too many more runs needed to make another win), so again these players are underrated.
So, theoretcially, I conclude it is NOT true that players are great teams are overrated by WS. Our perception of such might be such, BECAUSE most single-season teams that won many games did so in part by being 'lucky' (win clsoe games), and in the WS system, these wins are credited to the players.
Zat make sense?
The argument is that players on great teams get a benefit in WS because they don't have to play teams as good as themselves. I think this goes a little far. I think that playing LF on a team that has great pitching and defense may give you a lift because you don't have to face your pitching and defense. I dont think it matters to you if you are a LFer on a great team and the opposite team has Jimmie Foxx or George Burns (the 1B) hitting cleanup because you don't play agains the other team's lineup, you play against their pitching and defense. The reverse would be true for pitchers.
There will be more win shares to go around to everyone. And any team that is great enough for their win shares to be inflated by their not having to play themselves is going to be well above average on both offense and defense.
The principle of diminishing returns kicks at a higher level of team performance than the excess win shares for not having to play yourself. That starts to show up in an 8-team league, IIRC, at about .630. The effect isn't large at that level, but it is noticeable.
Conversly, how about someone like any of the Yankees recent first basemen? How much were Tino';s stats recently helped by hitting with guys like Sheff, Bernie (until recently) Jeter, etc...?
However, I'm not sure it should. Most studies on the value of 'protection' concluded that there is little noticable effect on the hitter's quality. Yes, it may affect hi sstats in that he gets more walks and fewer HR without any good bat behind him, but overall his contributions remain the same (see Babe Ruth, before and after Lou Gehrig arrives).
As to WS and good teas/bad teams, yes, WS does not account for not having to face your own teammates. But this is certainly not a unique issue to WS; many metrics 'suffer' from the same issue.
Guys like Bob Johnson are hurt by using Win Sahres because they have TWO effects against them; diminishing returns (bad team, needs more runs to make a win) and poor teammates (doesn't get to face his own pitching).
When does the diminishing returns kick in on the low end? My back-of-the-envelope guess from taking the derivative of the OWP formula shows a *maximum* at around 0.250, which is pretty darn low. That implies that a great player can help a bad team easier than he can help a good team. Only when the team becomes quite brutal, does the effect of one player start returning to zero (sharply so, I'll admit). Anyone look at this effect with the real Win Shares calculation? Is my guess way too low?
and poor teammates (doesn't get to face his own pitching).
This effect is mentioned all the time, but it only exists indirectly in the Win Shares calculation. For the most part with Win Shares, you are competing against your own teammates for value.
The first place where other teams come into play are in the total Win Shares available -- the teams overall record. There has been some talk of needing to 11 wins and 11 losses to each team and rescaling back to 154 to get some sort of adjusted W/L record (using 154 G season example here). I don't know if that's a valid tweak or not, but I've seen it mentioned here several times
The second place where other teams come into play is the Park Factor. Win Shares uses the straight PF, not the BPF or PPF that was designed to account for not facing your own teammates. This affects the way that Win Shares splits between offense & defense and any discrepancy would only come into play when the team in question is very unbalanced (e.g. bad-pitching/decent-hitting) and then it will have the effect of dampening some of that unbalance. A decent offense with a crummy staff will already get a larger segment of the Win Shares, but by not having to face that crummy staff their context is going a bit off and they'll get a slightly smaller cut of WS (but still certainly larger). If a team has offense/defense that is equally bad (or good), then the effect of Park Factors on this split goes away.
After that, I believe its all competition within your own team for value.
As to Win Shares using the 'unscaled' PF, I'd have to think more about that one. While you want to calculate the effects of not facing your own teammates if you wish to translate stats into nomralixed stats, I'm not positive that by using team Wins this doesn't already do some of that - you ave a great team, you get to not face your mates, but then it becomes harder to generate an extra win since you are already at a WPCT where it takes more runs to get a marginal win.... need a study that I shan't take time to do right now.
Background:
WARP generates FRAR by taking FRAA and adding an amount of FRAR per game that is fixed for each position. This amount changes over time to reflect 1) shifts in defensive responsibility between pitchers and fielders and 2) shifts in defensive responsibility between positions.
For example, in 1895, an average defensive player at each position (FRAA = 0) would receive something very near to the following FRAR for a full season (132 games):
P - 9 (obviously no pitcher would play 132 games, but this shows the fielding importance of the position relative to other positions)
C - 44
1B - 16
2B - 43
3B - 32
SS - 47
LF - 30
CF - 31
RF - 14
In 1965 the FRAR for an average defensive player at each position for a 162 game season would be very near to these amounts
P - 8
C - 29
1B - 13
2B - 33
3B -23
SS - 34
LF - 17
CF - 25
RF - 15
This readiness to shift fielding value around is one of WARP's potential points of superiority to win shares, which sticks to a constant set of "inherent weights" to distribute fielding value among the positions. The one change James acknowledges in the defensive spectrum involves the 2B and 3B, which he sees as switching places on the defensive spectrum. The "inherent weights" in the fielding win share system are
C - 19%
1B - 6%
2B - 16%
3B - 12%
SS - 18%
OF - 29% (James treats OF as one position and then uses the distribution of win shares among individual players to sort out the relative value of each outfield position, but we can estimate that CF will typically land between 2B and 3B and that LF and RF will fall between 3B and 1B).
Before 1920, the weights for 2B and 3B are reversed.
WARP shows us a shifting defensive spectrum over the history of the game, where win shares does not.
However, jimd's study of average OPS+ by position also suggests a shifting defensive spectrum over the history of the game, one in which the shifts are rather different from the ones WARP presents. I'll reproduce his famous table once again:
Decade 1B LF RF CF 3B 2B Ca SS Pit
1870's +1 +4 -1 +4 +2 +2 +0 +1 -13
1880's 13 +6 +1 +5 +1 -1 -7 -2 -17
1890's +6 +9 +7 +7 +0 -2 -6 -2 -22
1900's +6 10 +9 +8 +0 +2 -9 -1 -29
1910's +6 +7 +9 10 +1 +1 -7 -4 -31
1920's +9 10 10 +8 -3 +1 -4 -7 -32
1930's 13 +8 10 +5 -1 -3 -3 -4 -36
1940's +8 11 +9 +7 +2 -3 -4 -4 -37
1950's +9 10 +7 +7 +4 -3 -1 -5 -40
1960's 11 +9 11 +7 +4 -5 -3 -6 -46
1970's 10 +8 +8 +5 +3 -5 -2 -11-45
1980's +8 +6 +6 +2 +3 -4 -4 -8 -48
1990's +9 +4 +6 +1 +1 -3 -4 -7 -50
Mean.. +9 +8 +7 +6 +1 -2 -4 -5 -36
The premise here is that the defensive importance of a position is suggested by the amount of offense the management is willing to give up at a position in order to play a competent defender there.
Avoiding, for the moment, any question of the overall weight given to fielding value in any system, let me line up the defensive spectrum for the 1890s and the 1960s as represented by WARP, WS, and the OPS+ study. (Pitchers will be left out.)
1890s
W1 -- SS C 2B 3B CF LF 1B RF (Top three spots are is 3+ times more valuable than bottom spot, 3B is twice as valuable as 1B)
WS -- C SS 3B CF 2B LF/RF 1B (Top 3 spots are 2 2/3+ times more valuable than bottom spot, 2B is twice as valuable as 1B)
OPS+ -- C SS/2B 3B 1B CF/RF LF (Top 3 spots have OPS+ below avg., 3B is 0, 1B-LF +6 to +10)
1960s
W1 -- SS 2B C CF 3B LF RF 1B (Top three spots are 2.2+ times more valuable than bottom spot, 3B & CF are almost twice as valuable as 1B)
WS -- C SS 2B CF 3B LF/RF 1B (Top 3 spots are 2 2/3+ times more valuable than bottom spot, 3B is twice as valuable as 1B)
OPS+ -- SS 2B C 3B CF LF RF/1B (Top 3 spots have OPS+ below avg., 3B is +4, RF/!B +11)
Parallel representations of the proposed defensive spectrums for other decades would show different discrepancies. One I am especially concerned about right now is 2B/3B pre 1930. OPS+ has these two positions always close in average offense, and they shift back and forth as to which is slightly higher or lower decade to decade, but WARP _always_ gives second base more defensive value. The treatment of pre-1930 first base is also a fraught issue, as is the relative importance of infield vs. outfield positions.
The big questions:
Where there are disagreements in these lists, which assessment should one accept, and why?
If one wants to use WARP or win shares, but trusts the OPS+ assessment more, how might one adjust the results of these systems?
If WARP's calculation of FRAA is run-based, is their estimate of FRAR also run-based, or is it offense-based (like the OPS+ study), or theoretical (like win shares)? If it is offense-based, what measure does it use and how does it get from offensive value to defensive value? If it is theoretical, what is the theory?
For that reason, I don't use WARP3, although I do look at the competition-strength adjustments in WARP2. For that reason, my ranking system always compares a player first to his contemporaries and only second to all other eligible players. What I care about in a comprehensive metric, therefore, is the extent to which it gives an accurate representation of value in context.
On the subject of the defensive spectrum, I have available to me three different views of the value of the defensive positions in context--WARP1, win shares, and OPS+ by position, and I am looking for reasons to accept or modify these views and their results.
I have been using win shares, with modifications but in to adjust the pitching/fielding division of defensive value, and I think that system has worked pretty well. But as we are having to make finer distinctions in the backlog, I am concerned, as in the case of Sisler/Beckley and Elliott/Boyer, that errors in the treatment, not of fielding as a whole but of the value of particular positions at particular times in the history of the game, may to the overrating or underrating of particular players. Win shares argues that the defensive spectrum has changed very little over time, but the OPS+ study argues otherwise. I know that WARP1 is designed to be more flexible on this point than win shares, but some study of WARP1 suggests that its representation of the defensive spectrum, although it is changing, does not agree with the findings of the OPS+ study.
I don't treat the OPS+ study as gospel, as its findings are influenced by the (variable) level of talent available at a position during a given decade, but still, overall, it tells a different story about the defensive spectrum than WARP does. I trust the OPS+ story as having significant validity, however, because I know how it is grounded in actual data. I'd like to know whether the WARP story is as well. If it is really grounded in the data, I could accept WARP1's results and weigh them equally win win shares'. Or, I could modify WARP's fielding assessments to fit more closely with OPS+, but that would be a lot of work, so I don't want to start in on that project unless I have a clear sense that it is warranted. Or I could just stick with win shares exclusively, and try to find ways to fine-tune its handling of the defensive spectrum or just make subjective adjustments where I think they are needed. I see how I could adjust the fielding spectrum pretty neatly in WARP1, but for win shares I don't.
Any thoughts?
1. I start with batting win shares, and move on from there to EQA, WARP1, OPS+ to get a feel for the player as a hitter.
2. I factor in intangibles (missed wartime, racism, minor league credit) and lump fielding and MLEs in with this. For fielding, I look at both WARP and WS, but do not rely on them, as they often are very different.
3. I compare the player to his contemporaries.
4. I compare the player to the remaining eligibles at his position (or positions, for guys like Tommy Leach).
5. I rank the highest ranked players at the positions against each other, paying particular attention to the question (If the HOM ended without this guy in, how would I feel about that?)
My system started much more sabermetric-based and has now returned to much more subjective analysis.
As is true of most things in life, judicious blending of multiple systems often lead to a better answer than total reliance on one.
Along same lines as TomH but in other words:
This is a big obstacle for Mike Sweeney, et al, using traditional sabermetrics such as runs scored and batted in or the the batting triple crown stats. But everyone with sabermetric sophistication to consider this point uses measures of batting or offense that count bases on balls (or another on-base measure) heavily.
--
TomH:
As to Win Shares using the 'unscaled' PF, I'd have to think more about that one. While you want to calculate the effects of not facing your own teammates if you wish to translate stats into nomralixed stats, I'm not positive that by using team Wins this doesn't already do some of that - you ave a great team, you get to not face your mates, but then it becomes harder to generate an extra win since you are already at a WPCT where it takes more runs to get a marginal win.... need a study that I shan't take time to do right now.
TomH,
jimd(?) has done a relevant study, producing a gong that I and Chris Cobb, at least, have hammered upon. In my case, without assessing the relevant study closely, mainly because it confirms my prior judgments, guestimates, whatever.
Chris Cobb #63:
There will be more win shares to go around to everyone. And any team that is great enough for their win shares to be inflated by their not having to play themselves is going to be well above average on both offense and defense.
The principle of diminishing returns kicks [in] at a higher level of team performance than the excess win shares for not having to play yourself. That starts to show up in an 8-team league, IIRC, at about .630. The effect isn't large at that level, but it is noticeable.
My memo to self says that this will be serious at about .700
(probably April? I don't date memos to self.)
--
Decade 1B LF RF CF 3B 2B Ca SS Pit
1870's +1 +4 -1 +4 +2 +2 +0 +1 -13
. . .
1990's +9 +4 +6 +1 +1 -3 -4 -7 -50
jimd,
Is the famous table a result of your own study? Tom Ruane distributed a similar table to SABR-L in 1999 or so. I don't recall the basic measure of batting (OPS+ in your table) or the treatment of pitchers. I'll track it down if yours is distinct.
Besides, even if the theory holds up in the aggregate, does it really describe the specific set of players (outliers) that are under consideration here?
1970's 10 +8 +8 +5 +3 -5 -2 -11-45
1980's +8 +6 +6 +2 +3 -4 -4 -8 -48
1990's +9 +4 +6 +1 +1 -3 -4 -7 -50
The bolded positions are LF, CF, and SS, where it appears the rubber is really meeting the road. The other positions are moving gently toward the mean for the most part, but These three are moving more rapidly over the past thirty years (esp relative to where they started from). This makes sense, of course. The NL of the 1970s had zero HOM level SS (assuming Concepcion isn't going in) and the AL weren't much better, but MLB had some wonderful CF (Lynn, Otis, Murcer, Gorman!!!) and LFs (Brock, Rice, Luzinski, White, Foster et al plus bits of Stargell, Yaz, and Williams). The 1990s was not a good era for high-powered CFs in general (Griffey, Bernie Williams, the lesser Edmonds and Jones years, Lofton, Dykstra, Van Slyke, Lance Johnson, Lankford) but it had a wealth of outstanding SS. Its LF were pretty good but not as good as the 1970s generation (Belle, Alou, Greenwell, G Vaughn, Gant, Gilkey, Mack, the ends of Mitchell, Rickey, and Raines).
Anyway, so my larger question is what's it all mean? Is CF defense more important now? Might be since everyone swings for the fences. Does that mean DPs are less important? Or does it mean that baseball isn't typecasting its SS in the Aparacio/Concepcion mold anymore? And why would LF more, but not RF?
The long-term trends you bring out here are the sort of thing that make me wonder. Which systems are accounting for these shifts? Should we pay attention to them? To interpret these changes starting in the 1970s, I think it's good to look at the longer view, going back even to the 1940s:
Decade 1B LF RF CF 3B 2B Ca SS Pit
1940's +8 11 +9 +7 +2 -3 -4 -4 -37
1950's +9 10 +7 +7 +4 -3 -1 -5 -40
1960's 11 +9 11 +7 +4 -5 -3 -6 -46
1970's 10 +8 +8 +5 +3 -5 -2 -11-45
1980's +8 +6 +6 +2 +3 -4 -4 -8 -48
1990's +9 +4 +6 +1 +1 -3 -4 -7 -50
Mean.. +9 +8 +7 +6 +1 -2 -4 -5 -36
Here's my musings on trends:
On LF vs. RF -- I don't think there is clear evidence that these positoins ar tracking significantly differently over the longer term. LF is higher in the 1940s and 1950s: Is that a Musial/Williams effect ? RF jumps ahead in the 1960s: is that Aaron, Robinson, Clemente, Kaline, et al? Until we have data for the 2000s, I don't think we'll be in a position to see the 1990s dip as a significant defensive divergence between the corners or not, although the fact that LF dropped despite having the best hitter of the decade playing there is suggestive.
On CF -- it sure looks like it is becoming a steadily more important defensive position. I'd guess that this trend was underway even as early as the 1950s, but it doesn't register because Willie, Mickey, and the Duke provide a counterbalancing spike in batting value for the position.
On SS -- It looks like shortstop was gradually becoming more important defensively, topping in the 1970s, but with the 1980s following the 1970s looks anomalously low. I'd guess a conjunction between defensive need and a temporary dearth of hitters. The 1980s weren't a less astroturfy decade than the 1970s and were not a big home run decade either, and the rebound of shortstop OPS+ then suggests that 1970s just lacked good shortstops. If SS OPS+ continues to rise for the 2000s, then I think we can have some confidence in a long-term, gradual decrease in defensive demands on shortstops starting in the 1980s. If the aughts drops back down, then the 1990s will appear as a temporary spike due to a confluence of great players at the position.
And, secondly, just as we think Gavvy Cravath should have been in the MLs more than he was (but the powers-that-be-at-the-time didn't) and just as (as Bill James says) there is no reason in the world why Ryne Sandberg didn't bat third and Mark Grace second (except raw prejudice), so too perhaps this part of baseball reality is affected by decisions that are made by people who didn't have the information nor the mindset that we have today. Doesn't make us right and them wrong, but it doesn't make them right either. So IOW maybe these numbers just reflect changing prejudices rather than changing realities.
IOW there were always guys like ARod and Vern Stephens and Ripken and Yount who could hit AND field SS better than the alternatives. It's just that through most of history the decision-makers couldn't believe their own eyes and they just had to have the slap hitter at SS and the big bat somewhere else.
I'm saying prejudice is the independent variable, which would be another way of saying that the numbers reflect self-fulfilling prophecies.
Yeah, good point, I didn't think I was making the replacement level that high - wow. 15 WS is nowhere near replacement level, I mean Joe Charboneau's rookie of the year season was 15 WS. Even if it was only 131 games, there's no way if he plays every day that season is only 3.5 WS above replacement.
Wait a minute - I'm not doing that. Setting pitching at a .385 WPct assumes an average offense. An team with an average offense and replacement level pitching would lose 100 games is what I'm setting the replacement level at.
I've always felt a full-time replacement player would get about 8 WS (7 with the DH) and a 220 IP pitcher at replacement level would get about 7 WS. I was trying to err to the side of not setting it too low - I should have thought it through under those terms. I know that shows pitching replacement being a little higher than position player replacement level but I don't think that's unreasonable.
Let me think that through again though. A true replacement level position player will hit at replacement level but field at average level.
An average team in a 162-game season: 116.6 offensive WS, 41.1 D and 85.3 P Win Shares. That gives you 19.7 WS for an average position player in 162 games in a non-DH league, 18.1 in a DH league and 12.9 for an average pitcher. Drop that to 18.7 for a 154 game season position player and 12.25 for a 154 game season average pitcher (that's over 209 IP). BTW as a side note that should dispell any myth that Jake Beckley wasn an average player - he was in the 20-27 WS prorated to 154 game seasons over his career, not in the teens and he wasn't playing every single game every year. Sorry for the digression, but it's an important point.
If the team remains average on offense and fielding, it would take 28.3 pitching WS to drop them to 100 losses. So I'm setting my replacement equivalent to 4.3 pWS over 220 IP being replacement level. That's probably too low.
If 7 WS per 220 IP is replacement level, then a team with average hitters and fielders would win 68 games with a replacement level staff. That would mean setting .420 as replacement level, assuming an average offense.
Now lets reverse it and see what a replacement level hitter would do to a team with average defense/pitching.
Setting replacement level hitters to where a team with average pitching and fielding loses 100 games, means 59.6 bWS. That sets hitters at 7.5 WS + 5.1 for fielding or 12.6 WS per 162 games - 11.8 in a DH league.
Bumping it up to the 68 win mark like we did for pitchers would make their replacement level 14.8 WS, or 13.8 were it a DH league. That's too high.
What to conclude from this - pitching replacement level - in terms of the record of a team with all replacements as pitchers and average everywhere else - is probably higher than position player replacement level - at least under the WS system.
There's no way pitching replacement level is as low as 4 WS/220 IP. And there is now way that position player replacement level is 15 per 162 games. Just look at some 15 WS position player seasons if you don't believe me. And take a look at how bad you have to be to get 4 WS in a 220 IP season.
There's a logical explanation for this apparent paradox, IMO. It's that WS only gives 1/3 of the credit (35.1% to be exact) to the pitchers. Combine that with the fact that no one at the major league level with any signficant time is a replacement level fielder AND hitter, and that's what you get.
It doesn't surprise me at all that to get their replacement levels equivalent (for a full time player or a 220 IP pitcher) on a per player level, a team with replacement level hitters dragging down 48% the team would do worse than a team with replacement level pitchers only dragging down 35.1% of the team.
So here's where I'm going. If you set a .225 team at replacement level (a little worse than the 1962 Mets), with all things equal (batting and pitching replacement level, fielding average) you get:
Hitters 39.4 WS, Fielders 41.1 WS, Pitching 28.8 WS. That sets position players at 10.1 WS, (9.5 in a DH league) and pitchers at 4.4 WS.
Think about it though - James was wondering why pitchers came out too low. Hell most of us think that. That's why - when you adjust for replacement level the pitchers get the boost they are in need of.
BTW, that's probably too low, but it's where you are if you set the pitchers and hitters as equally bad.
If you want to bump the pitchers to 6 WS being replacement level you get a team at .278 WPct (45-117). I think that's fair - the 1962 Mets certainly had some players that were way below replacement level - certainly many players that couldn't get time in other organizations were better than what the Mets put out on the field that first season.
So there you have it. I'm going to be setting my pitcher replacement level to 6 WS per 220 IP. My position player replacement level becomes 11.9 in a non-DH league, 11.2 in a DH league. Over a 154 game season, I'll go with 11.3/10.6/5.7 (the 5.7 is over 209 IP, not 220, in the shorter season).
What does this mean for my expected team WPct to use in this massive pitcher spreadsheet? Well a team with an average offensive and defense, and pitchers that pull in 6 WS per 220/IP would win 65.8 games, or play at a .406 clip. That means that I'll be using 5.48 as my replacement level aDERA - which is the equivalent of 6 WS in a 220 IP season.
Having looked at it this way, I'm pretty surprised that the replacement level for a position player comes out that high, but it makes sense. I mean an average season is generally about 2-2.5 WARP. A replacement player is about -2 to -2.5 in TPR. If an average position player gets about 19.7 WS in a full 162 game season, it would make sense that a replacement player would be about 8 WS below that.
I can't believe it took me 4-5 years of working with WS to approach it this way, thanks for triggering it jim!
Joe, I was hoping you'd end up around a .278 team. That feels more like the magic number to me. Sometimes I see where people set the number of games a replacement team would win somewhere in the mid-50s, but when you think a sec on it, that's too high. Why? Because real-life teams actually win in the 50s.
That really bad Tigers team a couple years back was the worst thing any of us has ever seen that wasn't an expansion team, a team from Philadelphia, or a Ron Howard film. They won just 43 games. BUT the year before, Detroit, Tampa, and Milwaukee all won 55-56 games.
A year after the disaster in Detroit, the DBacks won 51 and the Royals 58. So teams win only 50-59 games with some frequency. Not necessarily often, but it's far from unheard of. Teams win less than 50 games almost never. It's always felt to me like an all-replacement team should be a once-in-a-lifetime event (or near to it) because no non-expansion team making even a pretense of competing or rebuilding should end up below fifty wins unless absolutely everything went wrong, just like it did with the 2003 Tigers.
I'm mostly pleased to see that my own stylistic preference on the matter is upheld in some fashion by an independent third party's thinking.
Big seasons measured by Win Shares: a Decennial Census
This report, decennial from 1910 to 2000, begins with 1892 and 1899, the first and last 154-game season of the 19th century. In 1890 and 1900, first three 8-team leagues and then one played 140-game seasons, making the scope of the major leagues too different from the 20th century norm for comfort.
Columns 'Tot' and 'Avg' give the number of 20- or 30-Win Share players in the major leagues and the number per team. Columns 'oth' and 'P' give the distribution between non-pitchers and pitchers. Players are player-team-seasons, the work by one player with one team in one season.
<u>Big seasons by Win Shares: 20 and 30</u>
- - - 20 Win Shares - - - 30 Win Shares
Year oth P Tot Avg .Tm. oth P Tot Avg
1892 30 23 53 4.4 . 12 . 4 8 12 1.00 :
1899 25 20 45 3.7 . 12 . 7 6 13 1.08 : MVP Ed Delahanty, the first non-pitcher
1910 34 14 48 3.0 . 16 . 7 5 12 0.75 :
1920 38 16 54 3.4 . 16 . 9 5 14 0.87 :
1930 38 11 49 3.1 . 16 . 9 2 11 0.69 :
1940 40 10 50 3.1 . 16 . 7 2 _7 0.44 : 0.62 pitchers per team with 20 Win Shares
1950 38 15 53 3.3 . 16 . 6 0 _6 0.37 : 0.94
1960 34 _7 41 2.6 . 16 . 6 0 _6 0.37 : 0.44
1970 52 14 66 2.7 . 24 .13 1 14 0.57 : 0.57
1980 51 10 61 2.3 . 26 . 8 0 _8 0.31 : 0.38
1990 52 _6 58 2.2 . 26 . 5 0 _5 0.19 : 0.23
2000 58 _5 63 2.1 . 30 .11 0 11 0.33 : 0.17
It seems that the frequency of 20-Win Share pitchers decreased sharply between 1899 and 1910 and steadily in the second half of the 20th century. 30-Win Share pitchers were practically extinct by 1950.
The frequency of 20+ seasons by pitchers has decreased by about 80% since the deadball era, from about one per team (30/32 in 1910 and 1920) to one per division (11/56 in 1990 and 2000). We know that has happened mainly by a decrease in workload, but the the frequency of those seasons has decreased for other players, too. The next table examines the non-pitchers more closely. The focal number of Win Shares is 19 rather than 20 through 1960 because a complete team-season was 154 rather than 162 games played to a decision.
<u>Big seasons by Win Shares:
19 Win Shares to 1960; 20 Win Shares from 1970</u>
Year - - pitchers - - non-pitchers
1892 . 23 1.92 . . 12 . . 35 2.9 per team
1899 . 24 2.00 . . 12 . . 28 2.3
1910 . 19 1.19 . . 16 . . 39 2.4
1920 . 22 1.37 . . 16 . . 40 2.5
1930 . 13 0.81 . . 16 . . 40 2.5
1940 . 12 0.75 . . 16 . . 43 2.6
1950 . 17 1.06 . . 16 . . 45 2.7
1960 . _8 0.50 . . 16 . . 39 2.4
1970 . 14 0.57 . . 24 . . 52 2.2
1980 . 10 0.38 . . 26 . . 51 2.0
1990 . _6 0.23 . . 26 . . 52 2.0
2000 . _5 0.17 . . 30 . . 58 1.9
The correction confirms a decrease in frequency of big seasons for non-pitchers in the second half of the 20th century, perhaps specifically during the expansion era, for it is not clear that the number ever differed significantly from 2.5 per team in the 154-game epoch.
(The correction also highlights a decrease in big seasons for pitchers in the 1920s rather than a steady decrease in the first half of the 20th century.)
I don't believe the effect of designated hitting --the effect of AL use of its option and practically mandatory use of the DH by every AL team-- is so great as TomH suggests. Indeed, I suppose it's much smaller than the would-be effect of adopting 10-batter or 8-batter lineups. If I'm right, it's partly because the DH helps some of the best batters play more games. And partly because the DH replaces some PH; pitchers didn't do 10% of major league batting before 1973.
But the qualitative point is valid: "Of course, the DH in the AL also makes it harder for non-P to accumulate WS." I don't know how to incorporate it.
Does the Win Shares system give a share to eight players for fielding that is fixed for all time?
? one third of 52%, which is 1.04 win shares per game played to a decision
Some people here modify WS informally, at least, to give more credit for fielding in early days. Such variation makes it easier for the non-pitchers to accumulate WS, per game.
- the DH helps some of the best batters play more games.
- the DH replaces some PH; pitchers didn't do 10% of major league batting before 1973.
and partly because
- the new player bats but doesn't field; the fielding credit is still divided eight ways.
maybe
- the new player (not necessarily the DH) is below average in batting quality
1) Set replacement level about .7 WS per season lower for leagues with a DH. See my post #183 on this thread for more detail on how I got that number. This is for those of us subtract a replacement level from WS.
2) Multiply the offensive WS for players in a DH league by a factor (I'd guess 1.05), and then WS don't add up to 3*wins. I don't have a problem with that, because by adding the DH, you are artificially adding offense to the league. Since WS measure offense and defense, it would make sense that things wouldn't add up if you artificially add 5% offense to a league.
Teams playing under .350, 1980-2005
Detroit AL 2003 43 119 0.265
Arizona NL 2005 51 111 0.315
Detroit AL 1996 53 109 0.327
Florida NL 1998 54 108 0.333
Baltimore AL 1988 54 107 0.335
Atlanta NL 1988 54 106 0.338
Tampa Bay AL 2002 55 106 0.342
Detroit AL 2002 55 106 0.342
Milwaukee NL 2002 56 106 0.346
Kansas City AL 2005 56 106 0.346
Toronto AL 1981 37 69 0.349
But note also: 1980-1987--once (1/8)
1988--twice.
1989-1995--none (0/7)
1996-2005--eight (about 1/1). That old saw about competitive balance should conclude: But it's over now. Is it about money? Or is it about non-baseball owners who think they know something about baseball? Or both? Lemme see: Detroit, Arizona, Florida, Tampa, Milwaukee. Sometimes it's about money.
18 teams played under .370 (a 60 win pace) from 1980-2005
Detroit AL 2003 43 119 0.265
Arizona NL 2005 51 111 0.315
Detroit AL 1996 53 109 0.327
Florida NL 1998 54 108 0.333
Baltimore AL 1988 54 107 0.335
Atlanta NL 1988 54 106 0.338
Tampa Bay AL 2002 55 106 0.342
Detroit AL 2002 55 106 0.342
Milwaukee NL 2002 56 106 0.346
Kansas City AL 2005 56 106 0.346
Toronto AL 1981 37 69 0.349
Cleveland AL 1991 57 105 0.352
Pittsburgh NL 1985 57 104 0.354
Kansas City AL 2004 58 104 0.358
New York NL 1993 59 103 0.364
Detroit AL 1989 59 103 0.364
Seattle AL 1980 59 103 0.364
Chicago NL 1981 38 65 0.369
OF course, [I've seen that before and I always read it first as "outfield course"
the DH in the AL also makes it harder for non-P to accumulate WS. Kind of like if MLB changed to an 10-man batting order; the players would be as good, but getting fewer chances to bat sure would make it harder to get 200 hits, 40 HR and 25 WS.
JoeD #89:
Paul there are two ways to do it, IMO.
1) Set replacement level about .7 WS per season lower for leagues with a DH. See my post #183 on this thread for more detail on how I got that number. This is for those of us subtract a replacement level from WS.
2) Multiply the offensive WS for players in a DH league by a factor (I'd guess 1.05), and then WS don't add up to 3*wins.
In #87-88, I gave ways the change from no-DH to DH differs from a simple change in the number of batters. (The historic change in MLB is also complicated by happening in only one of two leagues, so that it may also have a class of effects that I didn't give above, operating through the allocation of personnel to the two leagues.)
Anyway, it would be unreasonable to inflate batting win shares by 9/8 (1.125) as an adjustment for playing in a DH league. JoeD guesses 1.05. Inflation of bWS by 1.06 to 1.08 is close, on average, to inflation of total win shares by 162/154, which is how I handled the difference in length of schedule before, after, and during 1961. For non-pitchers, it may be that Win Shares are about as "easy" to earn in the AL from 1973 and in the 154-game epoch.
That doesn't mean that they are correct to do so.
As a part of my ranking system, I've constructed MLB "All-Star" teams for every season through 1975 (top 2xN players, where N is the number of teams). For the 16 team era (1901-1960), there are 2081 player-seasons selected (34.7 per year, more than 32 due to ties). 752 were outfielders, 495 were infielders (SS,3B,2B). One would expect these two totals to split close to 50/50 because there is no reason for outfielders to be more valuable than infielders in any given season.
Can a Win Shares defender explain this highly significant split to me? (Please don't use differences in games played, unless you are prepared to show me that top infielders miss about 20 games per season more, on average, than do top outfielders.)
My explanation is that Win Shares does not give enough fielding credit to the infield positions.
(Reposted from the 1979 discussion thread)
Excellent data!
I would guess, that if one studied the matter, one would find that top infielders miss about 10 gmaes per season more, on average, than top outfielders, though I don't have data to support that guess, at least not yet. However, I would argue that, if that were the case, infielders ought to receive an adjustment for that, like catchers but on a smaller scale, since the missed time is presumably a consequence of the demands of the position.
Have you similarly construted MLB "All-Star" teams using WARP1 (using some iteration or other)? Are its teams more balanced between IF and OF than win shares?
The underrepresentation could be even larger if teams recognize the risk and move their up-and-coming star second baseman or catcher to the outfield to reduce the chances of injury.
I'm suggesting that there may be natural reasons why certain positions are underrepresented among star players. We know that third basemen are underrepresented in both the HoM and the HoF, and after looking carefully at the candidates I have to think that there just haven't been as many great players at third base as at the other positions.
Yes I have. But I haven't completed the positional counts yet. Early next week some time I would guess.
---------RF--CF--LF--SS--3B--2B--1B--CA--PI TOT OF/IF
1901-20 -66-101--80--52--54--43--42--06-250 694 247/149
1921-40 -79--85--77--53--39--54--90--28-183 688 241/146
1941-60 -76-101--87--71--69--60--60--28-147 699 270/200
--------221-287-244-176-162-157-192--62-580 2081 758/495
Top 32 players each season (including ties),
broken down by position, aggregated by 20 year intervals.
At least 20% of all infielder star candidates would have to be moved to cause this large imbalance. It implies that a significant number of outfield stars would have evidence of great infield play in the middle/high minors. Does this sound reasonable? And why not Collins, Baker, Hornsby, Frisch, Mathews?
You must be Registered and Logged In to post comments.
<< Back to main