About Baseball Think Factory  Write for Us  Copyright © 19962021 Baseball Think Factory
User Comments, Suggestions, or Complaints  Privacy Policy  Terms
of Service  Advertising
You are here > Home > Hall of Merit > Discussion
 
Hall of Merit — A Look at Baseball's AllTime Best Monday, October 11, 2004Battle of the UberStat Systems (Win Shares vs. WARP)!Don’t ever say that I never gave you anything! :) John (You Can Call Me Grandma) Murphy
Posted: October 11, 2004 at 02:46 PM  381 comment(s)
Login to Bookmark
Related News: 
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsReranking First Basemen: Results
(8  4:22pm, Sep 21) Last: Chris Cobb Reranking Pitchers 18931923: Ballot (2  9:05pm, Sep 20) Last: kcgard2 Reranking Pitchers 18931923: Discussion (38  7:19pm, Sep 20) Last: DL from MN Reranking First Basemen: Ballot (18  10:13am, Sep 11) Last: DL from MN Reranking First Basemen: Discussion Thread (111  5:08pm, Sep 01) Last: Chris Cobb 2024 Hall of Merit Ballot Discussion (151  6:33pm, Aug 31) Last: kcgard2 Hall of Merit Book Club (15  6:04pm, Aug 10) Last: progrockfan Battle of the UberStat Systems (Win Shares vs. WARP)! (381  1:13pm, Jul 14) Last: Chris Cobb Reranking Shortstops: Results (6  5:15pm, Jun 17) Last: Chris Cobb Reranking Shortstops Ballot (21  5:02pm, Jun 07) Last: DL from MN Reranking Shortstops: Discussion Thread (69  11:52pm, Jun 06) Last: Guapo Cal Ripken, Jr. (15  12:42am, May 18) Last: The Honorable Ardo New Eligibles Year by Year (996  12:23pm, May 12) Last: cookiedabookie Reranking Centerfielders: Results (20  10:31am, Apr 28) Last: cookiedabookie Reranking Center Fielders Ballot (20  9:30am, Apr 06) Last: DL from MN 

Page rendered in 0.8365 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
Clay Davenport: "......... Keep in mind that the replacement level in the WARPs is very low indeed, what a AA player might do. It is geared towards what the worst teams in history actually accomplished."
They named "Pythagenpat" for him.
Joe Dimino is credited with an assist concerning these pythagen definitions with visual basic code(?).
http://www.super70s.com/Baseball/Background/Glossary/S/Sabermetrics/Pythagorean.asp
That makes sense because, first, many rating systems are denominated in wins and, second, Bill James assigns three Win Shares to every team for every win.
Has anyone compared WinShares/3 and WARP1 for large numbers of playerseasons or careers?
The idea is so simple, the answer must be yes!
Such comparison would provide one, relativistic point of entry to understanding, criticizing, etc. Which players does WinShares rate highest, relative to WARP? George Van Haltren is rated three wins greater by Win Shares, about 114 to 111. Whose approach rates preexpansion CFs higher and, if possible, why? And so on.
Simlarly, systematic comparison with WinShares/3 or with WARP1 may be part of an effective explanation for a new rating denominated in wins, such as WarpRosenheck or Win&Loss; Shares in Bill James' future. Here are some examples (or here is statistical analysis) of some of the biggest differences in the numbers of wins attributed: [list with commentary].
As far as I know, no such comparison may be equally illuminating for Pennants Added. Because it is denominated in pennants, the equally "natural" comparison is merely ordinal. Who ranks the greatest number of places higher? And so on.
Well, the batting replacement level for WS is lower than for WARP, but the fielding replacement level for win sharees is higher than for WARP, and the two basically balance out. I suspect the pitching replacement level for win shares is higher than for WARP for the postworld war 2 game, although I haven't tracked this data, so I don't know for sure.
I hav found that for most Homcalibre post WW2 position players, WS/3 is similar to WARP1. There is not a consistent difference in magnitude of the sort one would expecct if replacement level in one were consistently lower than replacement level in the other. Prior to WW2, WARP1's fielding replacement level is so much lower that WARP1 totals begin consistently to outstrip win share/3 totals, and this difference continues to increase as one moves back in time.
I don't have parallel data gathered for a representative crosssection of players, but I have gathered it for all position players who would be considered "serious candidates" for the HoM.
Of my top 9 who played mostly after 1940 (nonpitchers), their WS/3 avgs about 4 more than their WARP1.
Of the top 9 pre1940 guys, WARP1 is avg of 2 career pts higher than WS/3.
What is Peak?
over at Patriot's blog.
ftp://ftp.baseballgraphs.com/winshares/
This link contains historical win shares from 18762006. Thanks again David for providing this link.
Does anyone have a link somewhere where I can download WARP data from 18762006.
Keep up the good work Hall of Merit voters and posters. I've learned so much from reading the threads over the past five years.
I thought that was solely because of the way Win Shares allocates fielding WS disproportionately towards outfielders. However, I was surprised to find that Duffy was among the league leaders among outfielders in batting WS, even though he was nowhere near the leaders in OPS+.
For example, in 1892, Duffy led all outfielders with 23.5 batting WS, although he had a 125 OPS+. In 1893, he was second among outfielders at 23.1 bWS to Ed Delahanty (23.3), despite another 125 OPS+. Of course he led the league in 1894 (28.0) (177 OPS+).
I was seeing him as historically dominant from 18911894, primarily because of Win Shares, especially batting Win Shares. Just recently, I wondered how someone with a relatively low OPS+ could consistently be among the league leaders in batting WS?
First I thought that Duffy had more plate appearances than anyone else. It is true that he had over 600PA each of these years, and so had more opportunity to accumulate batting WS.
But others had similarly high PA but didn't have nearly the amount of batting WS.
For example, in 1892, when Duffy led all outfielders in batting WS, he had 673 PA. Sam Thompson had a similar number of PA, with 679. Let's compare their statistics.
Very similar raw stats. How were their adjusted numbers? (BRAA adjusted for season from Prospectus, BatRuns and OPS+ from bbref)
OK, so compared to Thompson, Duffy played in a bandbox. Indeed the Philadelphia Baseball Grounds had an 1892 batting park factor of 100, while Boston's South End Grounds was 109.
But what were their raw batting WS? (Remember, similar raw stats, similar PA, but Duffy played in an easier park.)
How can this be? Then my relatively simple brain remembered that WS can be affected by team wins, especially team wins above their pythagorean projection.
Based on this, Boston had 306 WS to dole out, while Philadelphia had 261.
Here are the players who weren't traded from Boston during 1892. Only Harry Stovey (3 WS total for Boston) had significant hitting time before he was traded.
Here are the players who weren't traded from Philadelphia during 1892. No traded player had significant hitting time.
So according to this, Thompson seems to be penalized because he has three greathitting teammates, and the Phillies did not have a great record. Duffy, although clearly a worse hitter than any of the Phillies top 4, has more batting win shares than all of them because: (a) he had worsehitting teammates; and (b) his team performed well.
I don't think I am going to use Win Shares anymore for this project.
I thought that was solely because of the way Win Shares allocates fielding WS disproportionately towards outfielders. However, I was surprised to find that Duffy was among the league leaders among outfielders in batting WS, even though he was nowhere near the leaders in OPS+.
For example, in 1892, Duffy led all outfielders with 23.5 batting WS, although he had a 125 OPS+. In 1893, he was second among outfielders at 23.1 bWS to Ed Delahanty (23.3), despite another 125 OPS+. Of course he led the league in 1894 (28.0) (177 OPS+).
I was seeing him as historically dominant from 18911894, primarily because of Win Shares, especially batting Win Shares. Just recently, I wondered how someone with a relatively low OPS+ could consistently be among the league leaders in batting WS?
First I thought that Duffy had more plate appearances than anyone else. It is true that he had over 600PA each of these years, and so had more opportunity to accumulate batting WS.
But others had similarly high PA but didn't have nearly the amount of batting WS.
For example, in 1892, when Duffy led all outfielders in batting WS, he had 673 PA. Sam Thompson had a similar number of PA, with 679. Let's compare their statistics.
Player BA OBP SLG AB R H 2B 3B HR RBI BB SO
Duffy .301 .364 .410 612 125 184 28 12 5 81 60 37
Thompson .305 .377 .432 609 109 186 28 11 9 104 59 19
Very similar raw stats. How were their adjusted numbers? (BRAA adjusted for season from Prospectus, BatRuns and OPS+ from bbref)
Player OPS+ BRAA BatRns
Duffy 125 25 17.1
Thompson 144 40 32.9
OK, so compared to Thompson, Duffy played in a bandbox. Indeed the Philadelphia Baseball Grounds had an 1892 batting park factor of 100, while Boston's South End Grounds was 109.
But what were their raw batting WS? (Remember, similar raw stats, similar PA, but Duffy played in an easier park.)
Player bWS (raw)
Duffy 23.5
Thompson 18.1
How can this be? Then my relatively simple brain remembered that WS can be affected by team wins, especially team wins above their pythagorean projection.
Team Record Pythag Record RS RA
1892 Boston 10248 9456 862 649
1892 Phil. 8766 9261 860 690
Based on this, Boston had 306 WS to dole out, while Philadelphia had 261.
Here are the players who weren't traded from Boston during 1892. Only Harry Stovey (3 WS total for Boston) had significant hitting time before he was traded.
Player bWS OPS+ AB
Duffy 23.5 125 612
Long 19.5 107 646
McCarthy 16.4 92 603
Tucker 15.9 106 542
Nash 14.2 101 526
Lowe 9.4 84 475
Stivetts 8.6 126 240
Ganzel 4.5 97 198
Quinn 2.9 54 532
Bennett 2.5 81 114
Kelly 2.5 53 281
Nichols 0.7 57 197
Total 120.8
Here are the players who weren't traded from Philadelphia during 1892. No traded player had significant hitting time.
Player bWS OPS+ AB
Connor 22.0 167 564
Hamilton 20.8 152 554
Thompson 18.1 144 609
Delahanty 15.6 158 477
Hallman 11.5 117 586
Cross 9.5 108 541
Clements 9.0 128 402
Allen 6.5 89 563
Reilly 0.0 47 331
Total 113.0
So according to this, Thompson seems to be penalized because he has three greathitting teammates, and the Phillies did not have a great record. Duffy, although clearly a worse hitter than any of the Phillies top 4, has more batting win shares than all of them because: (a) he had worsehitting teammates; and (b) his team performed well.
I don't think I am going to use Win Shares anymore for this project.
Those much smarter than I have probably already realized this, but I am not sure Win Shares is a good method for the HOM. I was thinking about Hugh Duffy's dominance of Win Shares with a relatively low OPS+. Other than 1894 and 1891, he never had an OPS+ greater than 130.
I thought that was solely because of the way Win Shares allocates fielding WS disproportionately towards outfielders. However, I was surprised to find that Duffy was among the league leaders among outfielders in batting WS, even though he was nowhere near the leaders in OPS+.
For example, in 1892, Duffy led all outfielders with 23.5 batting WS, although he had a 125 OPS+. In 1893, he was second among outfielders at 23.1 bWS to Ed Delahanty (23.3), despite another 125 OPS+. Of course he led the league in 1894 (28.0) (177 OPS+).
I was seeing him as historically dominant from 18911894, primarily because of Win Shares, especially batting Win Shares. Just recently, I wondered how someone with a relatively low OPS+ could consistently be among the league leaders in batting WS?
First I thought that Duffy had more plate appearances than anyone else. It is true that he had over 600PA each of these years, and so had more opportunity to accumulate batting WS.
But others had similarly high PA but didn't have nearly the amount of batting WS.
For example, in 1892, when Duffy led all outfielders in batting WS, he had 673 PA. Sam Thompson had a similar number of PA, with 679. Let's compare their statistics.
Very similar raw stats. How were their adjusted numbers? (BRAA adjusted for season from Prospectus, BatRuns and OPS+ from bbref)
OK, so compared to Thompson, Duffy played in a bandbox. Indeed the Philadelphia Baseball Grounds had an 1892 batting park factor of 100, while Boston's South End Grounds was 109.
But what were their raw batting WS? (Remember, similar raw stats, similar PA, but Duffy played in an easier park.)
How can this be? Then my relatively simple brain remembered that WS can be affected by team wins, especially team wins above their pythagorean projection.
Based on this, Boston had 306 WS to dole out, while Philadelphia had 261.
Here are the players who weren't traded from Boston during 1892. Only Harry Stovey (3 WS total for Boston) had significant hitting time before he was traded.
Here are the players who weren't traded from Philadelphia during 1892. No traded player had significant hitting time.
So according to this, Thompson seems to be penalized because he has three greathitting teammates, and the Phillies did not have a great record. Duffy, although clearly a worse hitter than any of the Phillies top 4, has more batting win shares than all of them because: (a) he had worsehitting teammates; and (b) his team performed well.
I don't think I am going to use Win Shares anymore for this project.
Thompsons team had 87 wins (pythag was 92, but WS uses actual) so 261 WS are divided among the team.
Duffys team had 102 wins (pythag was 94) so 306 WS were divided among the team.
If you use pythag wins rather than real wins Duffy gets 21.5 batting WS and Thompson 19.1
Thompson WAS a lesser % of his team's offense than Duffy was to his. That could be an allocation problem too many of DUFFY's team's WS are going to offense instead of pitching and/or defense.
ALSO James found that run estimators tend top get a bit wonky before 1920 and especially before 1900. Duffy has a great SB and SH advantage over Thompson James found that while SB and Sh do not correlate [positively] with scoring in our era they did pre 1920 and especially pre 1900. So James' pre 1900 run estimator (a reworked version of runs created) may see Duffy as Thompson's offensive equal despite Thompson's 144 to 125 OPS+ advantage.
So only half of Duffy's 1892 bws margin over Thompson is generated by Bill James' full allocation of wins.
by the way,
Wins above Pythagorean projection, Boston NL 189199
2 <u>8 8 5 0 1 2 4 0</u>
bold  excellent team, .625 or better (7 of 9 seasons)
<u>underline</b>  Hugh Duffy a regular outfielder (8 of 9 seasons, four cf then four rf)
italic  154 game schedule (3 seasons)
1892 was the split season with a playoff (not included in season statistics) between the first and secondhalf leaders, Boston and Cleveland.
In that system, I have Duffy 1892 as 34, and I have Thompson 1892 as 34. So there you are  very similar. If I switch to RC above 75% of average, I get Duffy 54, Thompson 53. OK, their playing time was pretty similar. On the same scale (back to the first version, RCAA), I have Duffy's "wow" year of 1894 as a 69. OK, that's a very good year  but it wasn't Frank Chance 1906 (scored as 78) or even George Stone 1906 (scored as 92).
These are all offense only. Of course, Duffy was a better defender than Thompson. Not that I was any fan of electing Thompson at the time. The differences between OPS+ and this version of RC do erase Thompson's advantage, but only to equalize them  not to put Duffy ahead.
Bbref's new OWP numbers for the two are
Duffy .541
Thompson .570
Not so different from what OPS+ shows.
The formula BBref is using for OWP isn't listed, however, so it's not clear where these numbers come from.
FWIW, EQA sees a big difference between the pair.
Duffy .291 EQA, 25 BRAA
Thompson .309 EQA, 40 BRAA
That gap seems a bit large to me in favor of Thompson.
I’ve also railed on BP for replacement level and using Runs Created. Those however are more philosophical disagreements. I like to consider replacement level at around .300, and could live with it being as low as .250. Clay (and by extension BP) uses .150. Clay is in the clear minority on this one and has a tougher job to explain himself. But, he could possibly muster enough evidence to support himself. However, that has never happened. I’d also be willing to debate that with them.
Strangely, rather than using EqR as their basis for VORP (and MLVr), they use Bill James old RC equation (one that even James himself doesn’t use). It’s one of those things that is so buried under the machinations of the process, that no one bothers to look, and deride BP for using. This one, while blatantly a very poor choice, is not “wrong”, because anything short of an allencompassing sim would be “wrong”. However, it’s an extremely poor choice, one that BP should not be making. BaseRuns is the obvious choice here.
***
One thing that BP has straightened out is they have gotten rid of Pythagenport in favor of Pythagenpat. What would be nice however is if they call it Pythagenpat or whatever name David/Patriot want, rather than continuing to use Pythagenport as the name. And another is that Woolner did use the Tango Distribution over his, even though that was also a philosophical choice.
In both these instances, they went with the cleaner method that works a bit better in the normal range, and much better at the extremes. {clap clap clap}
***
While I’m here… will OPS+ go away please? No one will bother to calculate OPS+ on their own. So, why not calculate some form of Runs Created as a “+” metric.
There is a serious multiplication bug in Excel 2007, which has been reported. The example first that came to light is =850*77.1 — which gives a result of 100,000 instead of the correct 65,535. It seems that any formula that should evaluate to 65,535 will act strangely.
Just thought you stat geeks would want to know.
Year N3 7%
Year N2 13%
Year N1 22%
Year N 31%
Year N+1 18%
Year N+2 9%
I posted a copy of the spreadsheet in the HOM Yahoo egroup FILES section.
Getting each out has a certain value; the pitcher retains all of the value of his K's while sharing the value of the BIPout with his fielders.
Dazzy pitched in front of the fieldingchallenged for almost all of his prime. His ERA+ is therefore misleadingly low.
Let's look at this from the batting perspective.
Suppose you had two hitters that played for the same team, each playing half the season in the 3 slot, the other half in the 4 slot. Suppose they each had the same number of RBI's, but one had significantly more Runs Created than the other. Does that fact invalidate Runs Created?
Which is more important? The lowerlevel components as predictors of what "should have" happened at the runlevel? Or the runlevel result stats as documentors of what did happen? Are you arguing for components for hitters and runlevel for pitchers? And if so, why?
(This also has some relation to Thompson/Duffy, WARP vs WinShares, componentlevel vs winlevel.)
This reads as much more "combative" than I intended.
There may well be very good reasons for different persectives here.
I just haven't thought enough about these issues myself, either.
There are two separate issues here. The first is adjusting for defensive support, which everyone agrees should be done. If Vance's fielders were below average, he shouldn't be penalized for that, just as Palmer should be penalized for having aboveaverage fielders. The second issue is intrinsically rewarding K's regardless of defensive support, which BP does. Given two *teammates* with the same ERA+ and the same fielders, the one with the higher K rate will still get more BP WARP, on the grounds that he got a greater share of his outs by himself. The latter I don't agree with.
The rationale may be that first base on error is missing data whose incidence is greater for pitchers with fewer strikeouts. Crediting pitchers for strikeouts is a proxy for debiting bases on errors.
Paul Wendt and Dr. Chaleeko, there is an empirically measurable marginal value of a K relative to a nonGIDP, nonSF, nonH fielded out, and it is .008 runs. Never adds up to more than a handful of runs even in the most extreme cases (although I do factor it in in my run estimation nonetheless).

I assume analysis of pre1911 pitching would put this at a different value.
TomH is right that the true strikeout premium must vary historically. I guess that where Davenport does use fixed alltime parameters he estimates them for only some recent portion of mlb history, so his estimate would be little or none influenced by 100 or 150yearold conditions, but I am guessing.
I'm almost certain strikeouts consistently are around .04 runs more damaging than the average 'other' out.
I'm almost certain consistency at that precision is possible from year to year but not from model to model. There are too many variations in the methods of measuring that cost.

New England Symposium for Statistics in Sports
Months ago I mentioned this oneday conference on Statistics in Sports, tomorrow at Harvard University. See "Program" for the authortitle list and the abstracts. The organizers hope that it will be annual.
Statistics in Sports (oneday conference Saturday)
heh  if i may anticipate jtm
There is a long display cabinet along one wall of the sciences library at Harvard University. The current exhibit is "From BA to BABIP  The History of Baseball Statistics". Regarding the abstruse work of Earnshaw Cook, Percentage Baseball (imagine a copy of the early 1960s book open to a telling page), the exhibit notes his splash of publicity thanks to a Sport Illustrated feature by young Frank Deford and his generally poor reception in academia, scornful reception in baseball. But Cook did make one convert [?? maybe not even a good paraphrase]. Davey Johnson, a math major in college at the time, accepted many of Cook's ideas and later became manager of the Mets.
[copied from "2005 Results"]
141. TomH Posted: October 06, 2007 at 10:28 PM (#2564778)
well, I can't FIND the uber WS vs WARP thread, so let's continue here...
DanR: The only way you can calculate that a player who got 3 WS on the 1899 Cleveland Spiders contributed exactly one win to the team is if you use a replacement level of precisely 52% of league average offense. Let's take a hypothetical guy who was 3 WS = 1 win in 1899. League average was .214 runs per batting out, so James's 52% baseline is .111 runs per out. Say the guy made 300 outs. If he was 1 win = 10.77 runs in 1899 above that baseline, then he created .111*300 + 10.77 = 44.2 runs. OK, so if you use James's 52% figure that's 3 batting WS. But if you used a baseline of 25% of league average offense, then he'd magically get 44.2  (300*.214*.25 = 16.05) = 28.15 runs/10.77 runs/out * 3 WS/win = 7.8 batting WS.
No, that is not right. You can't merely move the baseline and then neglect to reaccount for the ratio of runs to wins, or there would be far more win shares than wins. WS starts with team wins, figures how many runs it takes to make thsoe wins in the team as a whole, and then distributes them (offensively) but RC/out. The above calculations are not correct; changing the baseline would not give the example batter 4.8 more BWS. Same with the Rosen example.
DanR: See, there's nothing inherently "true" or "right" about the Win Shares allocation system.
Oh, I completely agree. But it makes much sense for the application for which it was designed; because using absolute zero has its own problems, and using 80% of average (or precisely average, the only other measure that has inherent good properties to it) causes wins then that need to be reallocated by playing time, since there would be many fewer WS earned than wins the team achieved. Using 'average', you could create a system that then adds so many "wins" for each player by playing time (1 win per 150PA or 40 IP or some such thing). But James created a system that did not require that. 52% on offense was a low enough replacement level to make it work. I am not really trying to defend 52% as "right"; is it arbitrary? Sure. Would other numbers work? Sure; IF you wished to go back and subtract or add in fudge factors to make the individual totals match ther team total. WS is a topdown approach, and maybe I should leave it at that; it is DIFFERENT from a bottomup approach that almost every other system uses. In some applications, it will be a better tool. For others, it is worse. If I had invented it, I might have tried to come up with a player's Win Shares AND Loss Shares.
DanR: And regarding the 197576 Reds, who would they have been forced to play if one of their stars had gone down? A replacement player, of courseone whose production would probably be approximately 80% of positional average.
My point about the extreme teams was that great teams tend to have higher freely available talent on the bench (Dan Driessen!), whereas the Spiders do not (duh). Is this important when considering longterm replacement level for the HoM? Maybe not. Again, the question WS was designed to answer was "how do I distribute the 108 wins in Cincinnati's 1975 team among their players?". Babe Ruth, on the 99 Spiders, could have been just as great a player but would have likely won fewer additional games for that team. This is pretty obvious, no? WS captures this. Whether you think this is improtant or relveant can be argued. But it does capture it.
DanH #145
. . .
Since there may not be more than 1520 human beings playing baseball on this earth capable of doing that at any given time, FAT shortstops are belowaverage fielders as well as hitters. Given that, it seems entirely reasonable that the megaexpansion wouldn't affect all positions equally, and that the gap between the #24 and the #20 shortstop would be bigger than the gap between the #24 and the #20 1B. Secondly, you had the move to turf fields, which has been discussed at great length. The combination of those factors is more than enough to convince me that you really did need a super glove at SS to compete in the '70s (and many of the winning teams like the Orioles, Reds, A's, and Yankees had them), and that Concepción and Campaneris "deserve" all the credit for the pennants they added. Your mileage is free to vary.
TomH:
Mea culpa on the calculations, but does that change the substance of my point? That 52% is an arbitrary number, and that using a different baseline level would lead to a different allocation of Win Shares among batters?
What are the problems of using absolute zero, besides the fact that it's just blatantly not representative of how baseball works? (Neither is 52%, of course).
Using wins above/below average plus credit by playing time would be an INFINITELY better system than the current Win Shares model, in my opinion. Incomparably better.
Moreover, 52% is not a low enough level to make it work, because there *are* still players who hit at worse than 52% of league average. Look at Bill Bergen, who had not a single Batting Win Share in his nearly 1,000 gamelong career. The presence of Bergen causes the Batting Win Shares of all of his teammates to be inappropriately reduced.
146. TomH Posted: October 07, 2007 at 08:12 PM (#2566273)
If you used absolute zero, every batter would have so many runs above baseline, you would have to go back and take away wins per playing time to get the team wins to match win shares. Or make believe it takes oddles of runs to create a 'win'. E.g., if a team only wins 40 games and so gets 120 WS, and if 60 were batting, and they scored a mere 600 runs, that's 10 runs per win share, or 30 runs per win. Wouldn't work.
Your take on WHY the FAT talent level of shortstops dropped in 1970 is a fascinating theory. If SS FAT level drops often with expansions, that would be a large finding (not only for this dicsussion, but for MLB GMs!!). That is actually a question that really piques my interest at the moment.
DanR later
TomH:
Is that why James picked 52%? Because it's the only number that gives you 10 runs a win in the modern game with no tweaking? That would be sort of cute. Still empirically wrong, but a tiny bit less arbitrarya number selected for convenience rather than for accuracy.
Charlie Pavitt, In Response to Win Shares: A Partial Defense of Linear Weights
Charlie Pavitt maintains a bibliography of mainlyacademic mainlystatistical research on baseball recently moved from the University of Delaware website, where is it? He reviews published academic work in a front page column of By the Numbers, the newsletter of the Statistical Analysis Committee, SABR.
150. DL from MN Posted: January 02, 2008 at 09:59 AM (#2658223)
. . .
I would caution voters that using raw WS totals overrates outfielders over infielders over pitchers in the modern era due to replacement value issues.
151. kwarren Posted: January 02, 2008 at 10:57 AM (#2658270)
Pitchers are worth much less to their teams on an individual basis in the modern era, because they play considerably less than in previous eras.
I agree that outfielders rates outfielders slightly higher than infielders, but I assmumed that was because they tended to be, on average, much better hitters. And even though infielders (2b, SS, 3b) tend to contribute more than corner outfielders defensively it does not compensate for the better hitting that outfielders usually provide.
This trend has been changing recently with the advent of power hitting infielders. <u>Consider the top 18 players in 07 using Win Shares.
6 OF, 4 1B, 3 3B, 2 SS, 1 2B, 1 DH</u>
. . .
152. DL from MN Posted: January 02, 2008 at 11:28 AM (#2658292)
Look at that list  11 bats, 6 gloves (no C) and no pitchers; this demonstrates what I was saying. I don't think I want to see a HoF that is all OF and no pitchers post 1975. <u>The $$ paid out for pitchers is higher per win share than the $$ paid out for outfielders. The market is in significant disagreement with Win Shares.</u>
It seems questionable to me that people would use a strict system for HoF voting (most Win Shares) that wouldn't put a pitcher on it's 25 man roster for the best players of the most recent season.
There's a (long) discussion thread on WARP, win shares and replacement value on the HoM site. There's no need to repeat it all here.
This is that long discussion thread.
Many of the same themes have been discussed in the "Dan Rosenheck" thread since DanR introduced another ubersystem.
1. The historical distributions of playerseason win shares by fielding position and year.
2. The relation between salaries and win shares in one season, and more complicated relations among playerseason salaries and win shares.
Either study should use a complete table of win shares by playerteamseason, integrated with the baseball data that is more widely available. For example the "Lahman database" includes a complete table of fielding games by position and playerteamseason, and a nearlycomplete table of compen$ation for some recent period.
Is a complete win shares table available?
Does the digital edition of Win Shares provide the table through 2002 or so?
(3)
<u>Sum of three Win Shares</u> season ratings
The distribution of "gain" (which may be positive or negative here) from rounding at the season level is very close to Normal with standard deviation 0.5
Consider a player with reported win shares 34 + 29 + 28 = 91.
The probability is about 2/3 that the true sum lies between 90.5 and 91.5, so that taking the sum before rounding would also yield 91.
The probability is about 19/20 or 95% that the true sum lies between 90 and 91
The true sum is between 89.5 and 91.5, so that taking the sum before rounding would surely yield 90, 91, or 92
That is, 1 out of 3 more accurately round to 90 or 92 than to 91
1 out of 20 are truly outside the range 90 to 92
(5)
The distribution of gain is roughly Normal with s.d. 0.8
Consider reported 5year sum 21 + 22 + 23 + 24 + 25 = 115
The probability is more than 50% that the true sum lies between 114.5 and 115.5, so that taking the sum before rounding would also yield 91
The probability is almost 90% that the true sum lies between 114 and 116, or 115+/1
The probability is almost 99% that the true sum lies between 113.5 and 116.5, so that taking the sum before rounding would yield 114 or 115 or 116
> (5)
> <u>Sum of five Win Shares</u> season ratings
> The distribution of gain is roughly Normal with s.d. 0.64
(0.8 is the rough number of standard deviations for gain +0.5)
The approximate probabilities for (5) are correct, I think.
Win Shares for all players in a teamseason are normalized so that their sum is equal to teamseason wins (times three). That is not true of Davenport's or Palmer's ratings, nor of many others.
Therefore the following data on teamseason Games and Decisions is specifically relevant to using and understanding and improving the Win Shares system.
For every teamseason Games Played = Decisions + NoDecisions. It is only a little misleading to call the NoDecisions "Ties" and it is convenient, so let me say it. For short, G = D + T.
Even today it is common that two teams in one leagueseason (whatever that is under interleague play) finish with different numbers of Games or Decisions. There were many differences between teams in 1994 when the season ended abruptly in August. In 24 leagueseasons since then (19952006, two leagues) there have been only 5 with equal numbers of games played for all teams (almost inevitably, the number is 162). There have been 7 with equal number of decisions for all teams (again almost inevitably 162). And there have been 14 seasons with equal numbers of ties for all teams (zero). In those 24 leagueseasons, the biggest differences between teams have been two games played (four times), two decisions (four times), and one tie (10 times).
Without further ado,
Latest leagueseason with given difference in games played for some pair of teams
1 game, 2006 NL or AL
2 game, 2000 NL
3 game, see 4
4 game, 1994 NL
567g, see 8
8 game, 1981 NL
9 game, see 10
10 game, 1945 AL
11 game, 1892 NL
For now coverage ends or begins in 1892 because 1891 is the latest season a major league team did not complete the season.
Now pass over some abnormal seasons: 1994, 1981, 19421945, and 1918. (In 1918, 1942, 1945, and 1981 but neither 1943 nor 1944 the biggest difference among teams within league was at least seven games.)
Adjusted by passing over 1918, 194245, 1981, and 1994
1 game, 2006 AL or NL
2 game, 2000 NL
3 game, see 4
4 game, 1989 NL
5 or 6, see 7
7 game, 1953 AL
8 game, 1938 AL
10 game, 1893 NL
11 game, 1892 NL
1953. Is that ancient history?
Latest leagueseason with given difference in decisions for some pair of teams
1 deci, 2006 AL or NL
2 deci, 2002 NL
3 deci, see 4
4 deci, 1994 NL
5678, see 9
9 deci, 1981 NL
That 9decision difference in 1981 is the greatest in the period 18922006, tied in 1945 and 1906 but never exceeded.
Adjusted by passing over 1918, 194245, 1981, and 1994
1 deci, 2006 AL or NL
2 deci, 2002 NL
3 deci, 1979 AL
4 deci, 1978 AL
5 deci, 1962 NL
6 deci, 1938 AL
7 deci, 1907 AL or NL
8 deci, see 9
9 deci, 1906 AL
Latest leagueseason with given difference in "ties" (all nodecision games) for some pair of teams
1 tie, 2005 NL
2 ties, 1989 NL
3 ties, 1981 NL
4 ties, 1953 AL
5 ties, 1937 AL
6 ties, 1916 NL
7 ties, see 8
8 ties, 1911 NL
Commonly the maximum difference of T ties between teams in one leagueseason is a difference between one team with T ties and one or more with no ties. Detroit holds the ties record with 10 in 1904 but the maximum difference between teams was only two because every team tied at least two games. In a few leagueseasons every team tied at least three games: 1907 AL, 1914 AL, 1914 FL. In the 46 major league seasons 18921914 there were 13 with no ties.
Here are some examples of big differences in numbers of decisions
1907 NL
_G_ _D_ _W_ _L_
155 152 107 _45 Chicago
157 154 _91 _63 Pittsburgh
149 147 _83 _64 Philadelphia
155 153 _82 _71 New York
The second division played 153 to 148 decisions.
The difference in decisions between PIT and PHI teams represents about 12 win shares for the players on each team in expectation.
1907 AL
_G_ _D_ _W_ _L_
153 150 _92 _58 Detroit
150 145 _88 _57 Philadelphia
157 151 _87 _64 Chicago
158 152 _85 _67 Cleveland
The second division played 152 (St Louis) to 148 decisions.
For Cleveland that is almost 12 win shares gained relative to Philadelphia; for Philadelphia about 14 win shares lost relative to Cleveland.
1981 NL (overall records in split season)
_G_ _D_ _W_ _L_
103 102 _59 _43 St. Louis
...
103 102 _46 _56 Pittsburgh
...
111 111 _56 _55 San Francisco
The 9game difference represents a gain of about 14 win shares for San Francisco relative to St. Louis or a loss of about 16 win shares for St. Louis relative to San Francisco.
Brock Hanke wrote this today in "Ranking the Hall of Merit Firstbasemen" #93. This is only part of what he wrote and what he wrote is only preliminary.
A post on this subject that I thought would take a day or two has taken over a week to work up. The essence of the post is that I think you should amortize 1870s catcher playing time out to maybe 90 games instead of 162 when you're doing Season Equivalents, and that this is the only decade and the only position to which that applies. It only applies to catchers, and it only applies to 1870s catchers, except for one season of Charlie Bennett (1882) when he actually played his team's entire schedule, albeit not all at catcher.
Brock,
I'm sure there are some strengths and some weaknesses to your study. I hope that I can make time to give it a close look . I'm not very good at making time but the baseball subject commonly grabs me.
From what you say I infer that the player ratings part of your thesis in contrast to the history and the curve fitting puts you in the Pete Palmer school, or addresses the Palmer school. Palmer, Clay Davenport, Bill James, and their followers. Their marquee ratings are career sums of season ratings denominated in games. Among them Palmer makes no adjustment for season length. He once gave a talk or wrote a paper on the primacy of the season and I suppose he would say that the careersum is secondary although it helps sell books. Bill James, too, makes no adjustment for season length, although he does give some space to win shares per 162 games beside more the career, 5year, and 3year totals. Davenport prorates every season at the rate (162/G)^2/3 rather than the linear 162/G. (By "amortize" I think you mean the linear 162/G where G is team games played or scheduled.)
Chris Cobb is in the Palmer school.
Those who rely on raw win shares or seasonprorated win shares are taking this approach.
Joe Dimino is not. He asks "Joe Torre, 19601977: how much did he contribute toward winning 18 pennants?" "Deacon White, 18711890 (or 18691890): how much did he contribute toward winning 20 (or 22) pennants?"
People who say "a pennant is a pennant" may be, and surely some are, professing this approach. even if they don't follow it all the way to a numerical rating.
I am not sure this is true, though I am also not sure exactly what you mean by putting me in "the Palmer school."
What I mean by the Palmer school is that the focus is coming up with the right season rating denominated in wins.
(Palmer might say that they shouldn't be added except to sell books to people who are interested in adding them, and no one should be interested in adding them. If I ever have a conversation with him that gets beyond one beer and we are presently at zero beers I will ask about that. But that isn't a tenet of the school.)
Dan Rosenheck is in the Palmer school during the regular term, and he is interested in adding wins. But he has a summer job in another school, putting all of the players into one modern free agent labor market.
678. Blackadder Posted: December 27, 2008 at 09:54 AM (#3038841)
Apparently Clay Davenport is reworking BP's WARP, to include PBP fielding when it is available and a more realistic replacement level. Jay Jaffe quoted some of the preliminary results, and eyeballing it the replacement level still looked a little too low, but I'll withhold judgment until his system is public. Still, it is a very welcome development that there seems to be convergence in opinion about the correct methodology for player valuation.
680. Devin McCullen cries "Enraha!" Posted: January 09, 2009 at 04:23 PM (#3047931)
This seems like as good a place as any to mention this: BP now has searchable stats for Batting Translations, Pitcher Translations, and WARP Leaderboards here. They only go back to 1901, and its yearbyyear, but I assume folks will find these helpful.
Today I compiled some career Advanced Pitching Statistics for all fifteen major league pitchers on the 18931923 ballot. I noticed that the measures {XIP, RAA, PRAA, PRAR} have not changed from last year for Cy Young whereas they have changed for the other pitchers (fourteen). I reported this apparent problem to Clay Davenport.
XIP RAA PRAA PRAR DERA
7399 901 932 2013 3.37
4839 570 693 1231 3.21
By DERA he now ranks third in this group and he is second to Johnson by XIP, RAA, PRAA, or PRAR.
newDERA name
2.94 Rusie A
3.04 Johnson W
3.21 Young C
3.22 Alexander P
3.31 Walsh E
DERA newDERA
3.36 3.17 Hahn N
3.37 3.21 Young C
3.89 3.36 Breitenstein T
3.76 3.41 Nichols K
3.54 3.51 Waddell R
3.95 3.64 McGinnity J
3.92 3.68 Griffith C
4.03 3.83 Leever S
4.04 3.89 Willis V
3.96 3.95 Cuppy N
4.22 3.96 Donovan B
4.38 3.97 Taylor J
4.01 4.07 Tannehill J
4.17 4.08 Orth A
4.17 4.08 Mercer W
4.29 4.09 Hawley Pink
4.18 4.09 Phillippe D
4.16 4.09 Chesbro J
4.09 4.13 Dinneen B
4.43 4.15 Meekin J
4.19 4.18 Powell J
4.27 4.22 Killen F
4.24 4.23 Taylor J
4.34 4.24 Sparks T
4.37 4.29 Howell H
4.47 4.31 Kennedy B
4.37 4.42 Donahue R
4.61 4.57 Sudhoff W
4.73 4.71 Fraser C
4.73 4.75 Kitson F
4.77 5.05 Carsey K
Sam Leever now leads his 19001902 teammates comfortably.
The revision benefits Ted Breitenstein more than anyone else in this group (row three) but many with 1880s debuts gained more than he did and Charlie Getzien from the 1880s lost more than anyone in this group.
Among the leaders by career innings, at least, the size of the revision on the DERA scale is generally greater for the 1870s and 1880s debutantes. Cherokee Fisher from the early 1870s now gets credit for DERA 2.04, down from 4.19!
I included remarks on the use of park factors by Bill James in Win Shares. Pages 86ff he explains his park adjustment calculations (which include some broad and some narrow mistakes).
James also states (p87 col2) where he uses the overall park adjustment, which is a "factor" for run scoring. Those applications are in "Dividing Win Shares between Offense and Defense" and "Dividing Offensive Win Shares among a Team's Hitters" (p1725). More on that later.
However, regarding the specialized park factors for Home Run and NonHome Run adjustments, he says only that they "will also be needed later in the process" (p87).
Does anyone here know where?
I don't have any WARP data, only Advanced Pitching Statistics.
By DERA the Dutch Leonards gain 0.07 and 0.08, or about two points on the ERA+ scale.
The Hall of Merit relief pitchers Fingers, Gossage, and Eckersley all lose about 0.40.
Is anyone able to check any of these recent or active relief pitchers, because you too have the 2008 edition data?
(columns one, three, five, any one of which is redundant)
XIP newXIP PRAA newPRAA DERA newDERA name
856 1020 161 213 2.81 2.62 NATHAN J
1163 1358 137 192 3.44 3.23 PERCIVAL T
1673 1968 417 311 2.26 3.07 RIVERA M
1270 1463 246 130 2.75 3.70 WAGNER B
1789 2045 265 4 3.17 4.48 HOFFMAN T
1573 1796 155 21 3.61 4.60 HERNANDEZ R
1491 1743 87 155 3.98 5.30 JONES T
For brevity here are two pitchers only, Francisco Rodriguez and poor Todd Jones. For simplicity I will call the three sets of estimates (2008), recent, and today; 2008 and recent are the two that I posted yesterday. For readability the layout is three successive rows.
Francisco Rodriguez
XIP RAA PRAA PRAR DERA
679 ? 143 361 2.60 (2008)
816 176 81 171 3.61 recent
816 176 174 287 2.57 today
Todd Jones
XIP RAA PRAA PRAR DERA
1491 ? 87 493 3.98 (2008)
1743 86 155 39 5.30 recent
1743 86 71 313 4.13 today
maybe a decrease in replacement level but it is not enough to float the PRAR of all pitchers not those who must give back lots of runs to their fielders. For example, poster boy Al Spalding is down from 210 to 120 (PRAR). BY DERA he is now a small gainer from last year: 4.13, 2.50, 4.06.
Last month I speculated that there had been some revision regarding the cooperation by pitchers and fielders, as if Spalding had been credited with doing so remarkably well given all those jokers running around without gloves some transhistorical average fielding as a benchmark. That isn't plausible but the estimates for some early pitchers on famous teams do look like they have enjoyed that mistake and now suffered its correction.
16 losers (DERA up at least 0.20 runs)
DERA newnew
4.39 4.88 Getzien C
4.97 5.34 Billingham J
4.70 5.06 Ellis D
4.58 4.93 Briles N
4.61 4.95 Sele A
4.65 4.98 Splittorff P
4.44 4.77 Leonard Den
4.36 4.66 Kaat J
4.35 4.61 Goltz D
3.95 4.20 Radke B
4.63 4.87 Reuss J
4.57 4.81 Burkett J
3.73 3.96 Keefe T
4.55 4.76 Donohue P
4.53 4.74 Gura L
4.20 4.40 McBride D
By debut they are three from the 1870s and 80s, including the one at the head of the list; one from the 1920s; twelve from Jim Kaat to the present.
I count 27 with DERA down at least 0.2 runs. The threshold for listing here is a little higher in order to make the group size similar.
15 winners (DERA up at least 0.24 runs)
DERA newnew
3.88 3.53 Rusie A
4.10 3.76 Zettlein G
4.02 3.68 Niekro P
3.89 3.57 Breitenstein T
3.84 3.55 Garver N
4.84 4.56 Cunningham B
4.44 4.16 Ward JM
3.89 3.62 McMahon S
4.46 4.19 Ramos P
4.07 3.81 McCormick J
4.47 4.21 Honeycutt R
4.45 4.19 Hough C
4.20 3.96 Galvin J
4.08 3.84 Morris E
4.55 4.31 Patten C
By debut date they are eight from the 1870s and 80s, Breitenstein 1891, Patten 1901, and five from the 1940s to 70s.
If this holds up then at least McCormick from olden days and Garver from modern times should get another look.
Quoting from Bleed and Brent, "2010 Ballot Discussion" #307309, where I have also posted the first part of my comments as #319.
308. Brent Posted: November 19, 2009 at 09:30 PM (#3391965)
>> [Bleed #307] Does anyone else have thoughts on how to use Rally Monkey's WAR to reflect the value of 19th century ballplayers?
<<
I guess the first step is to articulate the reasons you'd like to make adjustments to reduce the results from the simple extrapolations. Is it because you think the shortseason data aren't representative and you want to regress them? Or is it an adjustment for perceived league quality (in which case you'd also want to adjust data from longer seasons)?
309. Bleed the Freak Posted: November 19, 2009 at 10:39 PM (#3392000)
My reasoning would be that shortseason data is a smaller sample size, and might not be fully representative, so regression may be necessary.
That may be reasonable regarding "peak" credit for seasons not supported by neighbors of about the same quality; that is, reasonable in "nonconsecutive peak" analysis. I think it is generally unreasonable, along two lines below (1,2).
Before getting there, let me simply state what is a generally reasonable concern about the number of games in the championship schedule, or the "length of the season", for anyone who cares about "pennants" as well as games and runs. For the same league, same teams, winning percentage .625 may generate the same probability of winning a 126game pennant race as does .600 in a 162game pennant race. I presume that handling this point is a big part of "pennants added" analysis by Joe Dimino, following Michael Wolverton, if I understand correctly.
1.
The matter of socalled sample size may be all about talent rather than achievement. It does seem to be all about talent rather than achievement for Tom Tango and "Bleed" in the ThinkFactory discussion cited here last week. or one remove from that citation. Tom Tango argued for Edgar Martinez among other things. Bleed interpreted Tango's rating system and Dan Rosenheck's WARP in terms of rootn, the square root of the number of observations, which is ubiquitous in mathematical statistics. Insofar as we care about talent rather than achievement, the issue of socalled sample size is statistically significant (an abuse of technical language) and may sometimes be significant on the scale of this project. "N" is all about how certain we can be that Barry Bonds is truly a talented player. Just how unlikely is it that a leagueaverage talent could have posted his playing career? Please quantify!
2.
Concerning achievement rather than talent, in major league baseball from 1871, I doubt that anyone really means a sample size issue. Essentially we have the complete record for Ross Barnes in championship play 18711876, same as we do for Dave Cash 19711976. Who needs more? Well, the entertainment of paying customers in lots of other games was an important part of Barnes' job, but a trivial part of Cash's job. We don't have a record of those other games Barnes played, not even how many of them he played; his playing time and all the details are unobserved, practically (existing records have not been compiled). So "who needs more?" is who cares what Barnes achieved in those other games. Maybe he and Harry Schafer played every day in 1871, and performed equally as batsmen in those other games. That isn't likely if they were both trying, but it isn't impossible either. More important, there is no reason to suppose Barnes surpassed Schafer in those games by the same margin he surpassed Schafer in their NAPBBP games. Statistics as a discipline shows, tells, teaches how to make some some auxiliary postulates, interpret the historical record as a sample, and express what we know about those other games (in terms of estimates and probabilities that jointly quantify what we know and don't know).
We do suppose that Barnes and Schafer were "trying", or they were obliged to "try" in those other games of the 1870s. That's most of the distinction from Dave Cash's and Richie Hebner's play in exhibition games of the 1970s. Nevertheless, no one cares much about those other ballgames even in the 1870s.
"A pennant is a pennant" expresses an important principle here. Perhaps some participants treat it as a constitutional obligation. For some it is a personal guideline. Almost everyone takes it seriously, no one simply dismisses it. Routinely it means uniform weight for all major league pennant races, and debate tinkers with the details (what's a major league? how if at all do we pay attention to minor league seasons?). At the same time, however, it means uniform weight zero for everything else: assaulting a nurse on the street, driving a car while intoxicated, muffing a fly in March.
And if you can find me a good new proxy for team defense or a way to get at the old BPro cards (since the BPro cards no longer have what I need), I'm all ears!
A hat tip to Chris Jaffe for mentioning the oldschool DT Cards in an article he wrote about Omar Vizquel:
http://www.hardballtimes.com/main/article/whendowestarttakingomarvizquelscooperstowncaseseriously/
Under the references and resources section, he lists the direct link to Vizquel, and mentions that, to query for other players, using the baseballreference (Lahman database) abbreviation will net the correct result.
http://www.baseballprospectus.com/dt/vizquom01.shtml
I hope this is helpful.
Does anyone know if there is a standard error associated with WAR? For example lets say the top WAR in a league is Player A with 8.0. How far behind 8.0 would Player B have to be before we could say with 66% or 95% certainty that Player A was the better player that year?
Or does each player have a separate error associated with him?
Hope my questions makes sense. I probably didn't word them in the best way.
Does anyone here have an opinion on the matter?
Does one do a better job at being fair to players in all eras?
No rush. At this point, I'm not switching for the 2018 ballot, but is something I'd like to consider going forward.
I thought I read something about that on the TangTiger.com blog in one of the threads spun off the Bill James' article on WAR, but I can't find now if I did.
I am not aware of any serious attempt to establish one, but I am probably not the best placed to know for sure. People normally talk imprecisely of it being around half a WAR margin in a season, with the fielding component being less precise than pitching and hitting.
That adds up to quite a bit over a long career, for an exercise such as the HoM, but something that can be less problematic for a single season. Of course, one can always assume that the plus or minus effect evens out over many seasons, which I suppose isn't an unreasonable position to take.
Does one do a better job at being fair to players in all eras?
Recently I was looking at the 1899 Cleveland Spiders, and it was my impression that the replacement level in bWAR was lower than than in fWAR and gWAR. IOW, by bWAR players were more likely to have positive scores and less likely to have extreme negative scores.
This is complicated somewhat by the use of FIP in fWAR, which makes 1899 pitchers look much less worse than under bWAR and gWAR. Examining the players' postSpiders' careers, I came to the conclusion that a FIPbased WAR doesn't capture the expectations of turnofthecentury baseball. That RA9 WAR mattered. Pitchers with bad bWAR and gWAR didn't have careers afterwards.
I didn't notice any difference in particular effects between the consequences of BaseRuns' based gWAR and the wRAA of bWAR and fWAR for hitters in terms of future careers, but I wasn't looking that closely because numbers were similar, unlike the case for pitchers. If there is an effect, on the basis of this bit of research I imagine it is quite small.
I think the only WAR to publish with a standard error is the Open WAR project.
I don't really know how you would start calculating a standard error around it, but yes, each player would have a different range. For one, it depends on playing time. I don't know if Aaron Judge is 8 WAR +/ .5, 1.5, 2.5 or whatever. But I know what my WAR last season was (and the WAR for Mickey Mantle in 2017, my cat, and a random fire hydrant on the street): 0.0, with no error.
Some people are going to object to me saying someone with no playing time has a WAR of 0.0, but whatever. Take someone who had one at bat. Their WAR is within a rounding error of 0.0, and the error around it will be pretty small.
Playing time aside, I would say that if David Ortiz and Derek Jeter each play 155 games with a WAR of 6.0, Jeter will have the larger error range. That is because we can assume a similar range around their offense, but while Jeter's defense is known within a big error range, Ortiz's defense has almost no error range since he probably played 5 games in the field all year.
FIP works for modern pitchers, but back in the 1800's there were relatively few strikeouts, almost no homeruns, and few walks, especially back when it took more than 4 balls to earn a walk. You just can't build a pitching metric off of rare events and expect it to mean anything.
"FIP works for modern pitchers, but back in the 1800's there were relatively few strikeouts, almost no homeruns, and few walks, especially back when it took more than 4 balls to earn a walk."
fra paolo and Rally both said similar things with regard to FIP so I thought I'd address them together. I get that FIP works better for a modern pitcher and RA9 for 1890s pitcher. Question then becomes, where should the change occur in that 120 year period? My gut says RA9 at least until WWII and FIP for at least the last 2030 years, but still not sure how to treat the interim time period. Is there even an objective way to answer this? I definitely want to be fair to pitchers from all eras, just not sure the best way to do so.
"For one, it depends on playing time. I don't know if Aaron Judge is 8 WAR +/ .5, 1.5, 2.5 or whatever."
For career WAR purposes, I assume the errors tend to even out over a players career. I'm looking to do a project where I pick my own MVPs, CY Youngs, and AllStar teams (based on full season play, instead of 23 months and without the pesky tendency to pick allstars just because they have a reputation for being Allstars) for each year going backward. Then, I can have a total AS appearances for a player that means something to me (plus it seems like a fun project). So for these purposes, I'm looking for a standard variance for players in the Aaron Judge/Jose Altuve tail of the distribution. I've heard the +/ .5 WAR figure before implies that if 2 MVP candidates are with 1 WAR of each other, we can't be certain (to a high degree anyway) or which was the better player.
My thought for the MVP side of the equation was to take the player with the highest WAR plus any players with a WAR within a certain error of that player and basically award them all the MVP for that year (and same with pitchers for Cy Young). The idea is then when I look at a player for HoM consideration, instead of asking the question "How many times did he win the MVP?", I'm recognizing and embracing the error involved and am instead answering the question "How many times could he have been considered the best player by a reasonable person?" I feel like that's much more useful and removes some of the subjectivity of selecting a single MVP in a year where maybe 3 guys were reasonable picks. The question becomes, "What is that error level that accomplishes this?" Maybe its just a question of my asking myself how many MVPs am I comfortable with having in a given year.
I just prefer to use RA9 WAR for them all, as long as you are trying to value what they did and not project them going forward. FIP can't work for the early pitchers FIP usually works OK for the moderns, on a career level they usually come out pretty close. When you do find a guy with a big difference over a large sample there are usually real reasons why the pitcher (Glavine) is better or worse than his FIP.
The quick and dirty method I used to use for position players:
Start with the assumption that an average position player gets about 15 WS over a full season. Given that according to WAR, a fullseason average player is worth about 2 wins more than a replacement player, subtract 6 from 15, thus 9 WS per season for a replacement player. So subtract 9 WS * % of season played from a players WS and then divide that number by 3 to get to an approximate WS to WAR equivalent.
Pitchers are much tougher because WS gives a much higher bonus to relievers than WAR, and also WS values showing up (raw IP totals) more than WAR. And thus I haven't figured a good conversion for pitchers.
WS does not itself have a Replacement Rate, so I don't understand the comments about its RR being too low or too high. WS has a pro point and calculates from there. That zero point can't be compared to WAR's RR. It can be compared to WAR's Runs Method zero point, which s the average. WS's zero point is vastly superior to WAR's zero point, if for no other reason than that it allowed a reasonable approach to ranking fielding.
One good reason for using WS (and it is my personal reason) is the you can then use the New Historical's Player Ranking system of Accumulated Regular Season WS, Top Three nonconsecutive years, Top Five consecutive years, Timeline and Subjective. This is by far the best Hall system that I have ever seen. Nothing else is even close. You could, of course, put together a Ranking System of similar complexity fueled by WAR, but as far as I know, no one has ever done it and published the results. You can also use the system without a component (usually Timeline), because all f the components except Subjective are trivial to figure out, and Subjective is easy to approximate just by looking at who ranks ahead of whom in the New Historical and looking for discrepancies between placement and the First Four components.
For what it's worth, I contend that there clearly is a Timeline, but it is not linear, which is what WS has. I contend that it tracks Fielding Percentage or Strikeouts or any other stat that changes very rapidly in early baseball, but very slowly in the later game. I use that Timeline for 19th century players, and then multiply their stats out by the length of their teams' schedules. Works great for position players.
For pitchers, I just assume that a "season" consists of 34 Starts, and use that to break down the pitchers' numbers into what look much more like modern seasons, by taking "orphan" parts of actual seasons and adding them to the start of the next actual season.. I can then compare very early pitchers to modern ones without much trouble. The main thing I have to look out for is the illusion of unnatural consistency in the early guys  season after season of 34 starts, with no injury years. That's not that hard.
Let's say a pitcher has these four seasons' worth of starts: 1880 = 4 starts (rookie). 1881 = 52 starts (staff ace that year). 1882 = 38 starts (too much workload in 1881 led to sore arm). 1883 = 48 starts (back to ace status). Here's what I translate that to:
1880 = 4 starts
1881 a = 44 starts, all from 1881, leaving a orphan of 5244=8
1881 b = 44 starts, 8 from 1881 and 36 from 1882, leaving an orphan of 3836=2
1882 = 44 starts, 2 from 1882 and 42 from 1883, leaving an orphan of 4842=6. Those starts will be the first 6 starts of the 1883 season (or 1883 a season).
As fr what stats to associate with these seasons, consider the 1881 b season. It has 8/52 of the real 1881 season's stats, and 36/38 of the 1882.
This produces careers that look a LOT more like modern workloads. And This REPLACES the Timeline! You get Hoss Radbourne looking like he has a Roger Clemens carer instead of the real Hoss Radbourne, but that's what you want, if you're going to compare them.
Can you expound?
I'm not sure what you are trying to explain here, but the evidence discussed in this thread and elsewhere, I don't see where WS uses a better or yields a better result when it comes to fielding evaluation.
The electorate here have come up with ways to evaluate and publish through discussion boards or have specific websites, threads, etc, of which I am comfortable in stating is an improvement on what WS and Bill James has offered.
Kiko Sakata  https://baseball.tomthress.com/
Dr. Chaleeko  https://horsehidedragnet.wordpress.com/
https://homemlb.wordpress.com/
He's not an active member here, but Matthew Cornwell has done amazing work, of which a thread is dedicated here:
https://www.baseballfever.com/forum/generalbaseball/historyofthegame/36296652022parcsdupdate
Dan Rosenheck has a page here for data from 18932005  www.baseballthinkfactory.org/files/hall_of_merit/discussion/dan_rosenhecks_warp_data
Personally, I leverage the work done by these folks, as well as other factors, to come up with an evaluation that I feel exceeds the WS system...and I feel the same for the work put in and methods used by other members of our electorate.
Chris Cobb doesn't have a publicly available, all encompassing system that I am aware of, but I would chose his methods over a rote or raw WS/WAR/whatever analysis.
James’s ranking system has two great virtues. First, it emphasizes the importance of taking a multifaceted approach to the assessment of player value, and, second, it does a good job of identifying the aspects of value that should be included. The conceptual framework for ranking that James’s system developed has guided work on the assessment of player value ever since, and I would say that the large majority of HoM voters over the years have taken a multifaceted view of value, ultimately on the basis of James's arguments for it. The scope of the project that James undertook to enable him to create player rankings that took an approach to value that was both multifaceted and precise in the New Bill James Historical Baseball Abstract was amazingly ambitious, and its results had very significant positive impacts on the historical assessment of baseball players. The Hall of Merit project in particular benefited greatly from its release during the period in which the project was being conceived, and the early influence of Win Shares on the project is demonstrated by the fact that the timing of the first election was adjusted so that the electorate would have access to the Win Shares followup to the NBJHBA prior to the first election being held. NBJHBA is a work I still consult from time to time, and it’s always a provocation to fresh thought.
James' specific ways of assessing the aspects of value that he identifies as salient have a number of shortcomings, however. These can and have been identified and improved upon. It is not surprising that the first approach to implementing a multifaceted system for the assessment of player value would have shortcomings that further work would identify and revise, especially when that system was being developed at the same time that the inventor was also working on a new system for generating the data that the assessment system would work with. The Hall of Merit project relied on the NBJHBA, especially in its first five years, but even at the time the book was released many of its participants were motivated to improve upon its work, especially its treatment of nineteenthcentury players. I’d say we emulate Bill James better by striving continually to bring fresh and rigorous analysis to bear on baseball than by adhering as closely as possible to the systems as he developed them.
So, the virtue of James' multifaceted system is its integration of three different views player value: career value, peak value, and rate of production value and placing them all in historical context. The shortcomings lie in the particular way of quantifying each of these values and integrating it a comprehensive evaluation. Here's a quick runthrough of the shortcomings as I see them.
(1) Historical context. These are the most obvious, I think. Brock has already mentioned that the linear and chronological timelining James includes in the system doesn't accurately reflect the way that quality of competition has evolved: something more flexible and more clearly tied to evidence of competition levels is needed. It's an obvious shortcoming, but probably the hardest overall to do better with. There is still no widely agreed upon way of quantifying changes in competition quality and incorporating that into rankings, but there are a lot of attempts out there that are better than what's in the NBJHBA. Also on historical context, the failure of the system to include an adjustment for differing season lengths is an obvious hole. There are simpler and more sophisticated approaches to adjusting for season length. Most HoM voters have incorporated something along these lines.
(2) Valuing rate of production. James' way of incorporating rate of production by using a career rate measure is very problematic. For a the rate stat to allow for accurate, meaningful comparison between players, the period over which the rate is calculated needs to be consistent. Career rate doesn’t do this, and so it incorporates inaccurate, potentially misleading information into the system's results. A rate stat with a consistent period of games or seasons is needed. I think a lot of systems being used don't include a rate measure at all. I think that including a rate measure is important, and it's one of the things that I particularly like about my own system.
(3) Balancing career and peak value. The balance the NBJHBA system strikes between peak and career value is weighted too heavily toward peak. The rescaling of career value using the harmonic mean reduces the weight of career value in the system excessively in relation to the system’s two peak components. The Win Shares metric invites this systemic mistake by making the opposite error: it overvalues career by setting the baseline too low. Therefore, in a ranking system using Win Shares, some rescaling of career value is needed. Otherwise, career value would outweigh all other sorts of value. Even so, the harmonic mean method is excessively reductive of career value. This element needs improvement. Switching to a comprehensive metric that sets its baseline higherat a point that gives more weight to value above average relative to value between the baseline and averageavoids this problem, and pretty much every other comprehensive metric uses a higher baseline than Win Shares. Any system that allows career value to be used without drastic adjustment will have results that relate more transparently the foundational measures on which it is based.
So those are three elements of James' ranking methodology that I see as needing serious reconsideration. I think that kind of reconsideration has been undertaken in a variety of different ways by HoM voters and others who have constructed player ranking systems. They may or may not succeed. We're mainly still working, though, with the elements of value that James identified, and that makes his work a valuable jumping off point for the whole topic.
You must be Registered and Logged In to post comments.
<< Back to main