Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Hall of Merit > Discussion
Hall of Merit
— A Look at Baseball's All-Time Best

Monday, October 11, 2004

Battle of the Uber-Stat Systems (Win Shares vs. WARP)!

Don’t ever say that I never gave you anything! :-)

John (You Can Call Me Grandma) Murphy Posted: October 11, 2004 at 02:46 PM | 381 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 4 of 4 pages ‹ First  < 2 3 4
   301. jimd Posted: February 12, 2007 at 08:44 PM (#2296237)
ba-ba-bump
   302. KJOK Posted: March 25, 2007 at 06:13 AM (#2317375)
From Clay Davenport's latest chat on Friday (March 23): (Bolded emphasis mine)

Clay Davenport: "......... Keep in mind that the replacement level in the WARPs is very low indeed, what a AA player might do. It is geared towards what the worst teams in history actually accomplished."
   303. Paul Wendt Posted: March 26, 2007 at 03:55 PM (#2317983)
Just FYI, Brandon (aka Patriot) has posted 5 year park factors for each team since 1901

They named "Pythagenpat" for him.

Joe Dimino is credited with an assist concerning these pythagen definitions with visual basic code(?).
http://www.super70s.com/Baseball/Background/Glossary/S/Sabermetrics/Pythagorean.asp
   304. Paul Wendt Posted: April 01, 2007 at 04:03 AM (#2321522)
In the Bancroft thread last Fall, someone mentioned dividing Win Shares by three . . .
That makes sense because, first, many rating systems are denominated in wins and, second, Bill James assigns three Win Shares to every team for every win.

Has anyone compared WinShares/3 and WARP1 for large numbers of player-seasons or -careers?
The idea is so simple, the answer must be yes!
Such comparison would provide one, relativistic point of entry to understanding, criticizing, etc. Which players does WinShares rate highest, relative to WARP? George Van Haltren is rated three wins greater by Win Shares, about 114 to 111. Whose approach rates pre-expansion CFs higher and, if possible, why? And so on.

Simlarly, systematic comparison with WinShares/3 or with WARP1 may be part of an effective explanation for a new rating denominated in wins, such as Warp-Rosenheck --or Win&Loss; Shares in Bill James' future. Here are some examples (or here is statistical analysis) of some of the biggest differences in the numbers of wins attributed: [list with commentary].

As far as I know, no such comparison may be equally illuminating for Pennants Added. Because it is denominated in pennants, the equally "natural" comparison is merely ordinal. Who ranks the greatest number of places higher? And so on.
   305. TomH Posted: April 01, 2007 at 07:34 PM (#2321743)
altho WS has a lower replacement level than WARP, so a comparative formula might be something like WS/3 - PA/750 = WARP, altho my '750' is merely a guess. (or IP/170)
   306. Chris Cobb Posted: April 01, 2007 at 09:13 PM (#2321794)
WS has a lower replacement level than WARP

Well, the batting replacement level for WS is lower than for WARP, but the fielding replacement level for win sharees is higher than for WARP, and the two basically balance out. I suspect the pitching replacement level for win shares is higher than for WARP for the post-world war 2 game, although I haven't tracked this data, so I don't know for sure.

I hav found that for most Hom-calibre post WW2 position players, WS/3 is similar to WARP1. There is not a consistent difference in magnitude of the sort one would expecct if replacement level in one were consistently lower than replacement level in the other. Prior to WW2, WARP1's fielding replacement level is so much lower that WARP1 totals begin consistently to outstrip win share/3 totals, and this difference continues to increase as one moves back in time.

I don't have parallel data gathered for a representative cross-section of players, but I have gathered it for all position players who would be considered "serious candidates" for the HoM.
   307. TomH Posted: April 02, 2007 at 12:12 PM (#2322408)
Yes, that is what pretty much what it looks like once I add up the ##s of my top backloggers, Chris, although I still see WS having a slightly lower repl level, which is hwat I would expect given that WS add up to ALL team wins, while WARP should approx add to team wins above 20 per yr or so.

Of my top 9 who played mostly after 1940 (nonpitchers), their WS/3 avgs about 4 more than their WARP1.
Of the top 9 pre-1940 guys, WARP1 is avg of 2 career pts higher than WS/3.
   308. KJOK Posted: April 04, 2007 at 05:24 AM (#2324250)
A great article on the whole "peak" argument, very relevant to our HOM discussions:

What is Peak?

over at Patriot's blog.
   309. Bleed the Freak Posted: July 04, 2007 at 03:34 PM (#2428724)
David Foss posted this link in February in another discussion thread, it belongs here:

ftp://ftp.baseballgraphs.com/winshares/

This link contains historical win shares from 1876-2006. Thanks again David for providing this link.

Does anyone have a link somewhere where I can download WARP data from 1876-2006.

Keep up the good work Hall of Merit voters and posters. I've learned so much from reading the threads over the past five years.
   310. ronw Posted: September 07, 2007 at 08:30 PM (#2515604)
Those much smarter than I have probably already realized this, but I am not sure Win Shares is a good method for the HOM. I was thinking about Hugh Duffy's dominance of Win Shares with a relatively low OPS+. Other than 1894 and 1891, he never had an OPS+ greater than 130.

I thought that was solely because of the way Win Shares allocates fielding WS disproportionately towards outfielders. However, I was surprised to find that Duffy was among the league leaders among outfielders in batting WS, even though he was nowhere near the leaders in OPS+.

For example, in 1892, Duffy led all outfielders with 23.5 batting WS, although he had a 125 OPS+. In 1893, he was second among outfielders at 23.1 bWS to Ed Delahanty (23.3), despite another 125 OPS+. Of course he led the league in 1894 (28.0) (177 OPS+).

I was seeing him as historically dominant from 1891-1894, primarily because of Win Shares, especially batting Win Shares. Just recently, I wondered how someone with a relatively low OPS+ could consistently be among the league leaders in batting WS?

First I thought that Duffy had more plate appearances than anyone else. It is true that he had over 600PA each of these years, and so had more opportunity to accumulate batting WS.

But others had similarly high PA but didn't have nearly the amount of batting WS.

For example, in 1892, when Duffy led all outfielders in batting WS, he had 673 PA. Sam Thompson had a similar number of PA, with 679. Let's compare their statistics.

Player  BA   OBP   SLG   AB R H  2B  3B  HR   RBI BB   SO
Duffy .301  .364  .410  612  125  184  28  12   5 81 60   37
Thompson .305  .377  .432  609  109  186  28  11   9   104 59   19


Very similar raw stats. How were their adjusted numbers? (BRAA adjusted for season from Prospectus, BatRuns and OPS+ from bbref)

Player   OPS+   BRAA BatRns
Duffy 125   25   17.1
Thompson 144   40   32.9


OK, so compared to Thompson, Duffy played in a bandbox. Indeed the Philadelphia Baseball Grounds had an 1892 batting park factor of 100, while Boston's South End Grounds was 109.

But what were their raw batting WS? (Remember, similar raw stats, similar PA, but Duffy played in an easier park.)

Player  bWS (raw)
Duffy   23.5
Thompson   18.1


How can this be? Then my relatively simple brain remembered that WS can be affected by team wins, especially team wins above their pythagorean projection.

  
Team Record  Pythag Record   RS RA
1892 Boston   102-48  94-56  862   649
1892 Phil.  87-66  92-61  860   690


Based on this, Boston had 306 WS to dole out, while Philadelphia had 261.

Here are the players who weren't traded from Boston during 1892. Only Harry Stovey (3 WS total for Boston) had significant hitting time before he was traded.

Player   bWS  OPS+ AB
Duffy 23.5 125 612
Long  19.5 107 646
McCarthy 16.4  92 603
Tucker   15.9 106 542
Nash  14.2 101 526
Lowe   9.4  84 475
Stivetts  8.6 126 240
Ganzel 4.5  97 198
Quinn  2.9  54 532
Bennett   2.5  81 114
Kelly  2.5  53 281
Nichols   0.7  57 197
Total   120.8


Here are the players who weren't traded from Philadelphia during 1892. No traded player had significant hitting time.

Player  bWS  OPS+ AB
Connor  22.0 167 564
Hamilton   20.8 152 554
Thompson   18.1 144 609
Delahanty  15.6 158 477
Hallman 11.5 117 586
Cross 9.5 108 541
Clements 9.0 128 402
Allen 6.5  89 563
Reilly   0.0  47 331
Total  113.0


So according to this, Thompson seems to be penalized because he has three great-hitting teammates, and the Phillies did not have a great record. Duffy, although clearly a worse hitter than any of the Phillies top 4, has more batting win shares than all of them because: (a) he had worse-hitting teammates; and (b) his team performed well.

I don't think I am going to use Win Shares anymore for this project.
   311. ronw Posted: September 07, 2007 at 08:32 PM (#2515607)
Those much smarter than I have probably already realized this, but I am not sure Win Shares is a good method for the HOM. I was thinking about Hugh Duffy's dominance of Win Shares with a relatively low OPS+. Other than 1894 and 1891, he never had an OPS+ greater than 130.

I thought that was solely because of the way Win Shares allocates fielding WS disproportionately towards outfielders. However, I was surprised to find that Duffy was among the league leaders among outfielders in batting WS, even though he was nowhere near the leaders in OPS+.

For example, in 1892, Duffy led all outfielders with 23.5 batting WS, although he had a 125 OPS+. In 1893, he was second among outfielders at 23.1 bWS to Ed Delahanty (23.3), despite another 125 OPS+. Of course he led the league in 1894 (28.0) (177 OPS+).

I was seeing him as historically dominant from 1891-1894, primarily because of Win Shares, especially batting Win Shares. Just recently, I wondered how someone with a relatively low OPS+ could consistently be among the league leaders in batting WS?

First I thought that Duffy had more plate appearances than anyone else. It is true that he had over 600PA each of these years, and so had more opportunity to accumulate batting WS.

But others had similarly high PA but didn't have nearly the amount of batting WS.

For example, in 1892, when Duffy led all outfielders in batting WS, he had 673 PA. Sam Thompson had a similar number of PA, with 679. Let's compare their statistics.

Player BA OBP SLG AB R H 2B 3B HR RBI BB SO
Duffy .301 .364 .410 612 125 184 28 12 5 81 60 37
Thompson .305 .377 .432 609 109 186 28 11 9 104 59 19 


Very similar raw stats. How were their adjusted numbers? (BRAA adjusted for season from Prospectus, BatRuns and OPS+ from bbref)

Player OPSBRAA BatRns
Duffy 125 25 17.1
Thompson 144 40 32.9 


OK, so compared to Thompson, Duffy played in a bandbox. Indeed the Philadelphia Baseball Grounds had an 1892 batting park factor of 100, while Boston's South End Grounds was 109.

But what were their raw batting WS? (Remember, similar raw stats, similar PA, but Duffy played in an easier park.)

Player bWS (raw)
Duffy 23.5
Thompson 18.1 


How can this be? Then my relatively simple brain remembered that WS can be affected by team wins, especially team wins above their pythagorean projection.


Team Record Pythag Record RS RA
1892 Boston 102
-48 94-56 862 649
1892 Phil
87-66 92-61 860 690 


Based on this, Boston had 306 WS to dole out, while Philadelphia had 261.

Here are the players who weren't traded from Boston during 1892. Only Harry Stovey (3 WS total for Boston) had significant hitting time before he was traded.

Player bWS OPSAB
Duffy 23.5 125 612
Long 19.5 107 646
McCarthy 16.4 92 603
Tucker 15.9 106 542
Nash 14.2 101 526
Lowe 9.4 84 475
Stivetts 8.6 126 240
Ganzel 4.5 97 198
Quinn 2.9 54 532
Bennett 2.5 81 114
Kelly 2.5 53 281
Nichols 0.7 57 197
Total 120.8 


Here are the players who weren't traded from Philadelphia during 1892. No traded player had significant hitting time.

Player bWS OPSAB
Connor 22.0 167 564
Hamilton 20.8 152 554
Thompson 18.1 144 609
Delahanty 15.6 158 477
Hallman 11.5 117 586
Cross 9.5 108 541
Clements 9.0 128 402
Allen 6.5 89 563
Reilly 0.0 47 331
Total 113.0 


So according to this, Thompson seems to be penalized because he has three great-hitting teammates, and the Phillies did not have a great record. Duffy, although clearly a worse hitter than any of the Phillies top 4, has more batting win shares than all of them because: (a) he had worse-hitting teammates; and (b) his team performed well.

I don't think I am going to use Win Shares anymore for this project.
   312. ronw Posted: September 07, 2007 at 08:46 PM (#2515627)
Third try for formatting?

Those much smarter than I have probably already realized this, but I am not sure Win Shares is a good method for the HOM. I was thinking about Hugh Duffy's dominance of Win Shares with a relatively low OPS+. Other than 1894 and 1891, he never had an OPS+ greater than 130.

I thought that was solely because of the way Win Shares allocates fielding WS disproportionately towards outfielders. However, I was surprised to find that Duffy was among the league leaders among outfielders in batting WS, even though he was nowhere near the leaders in OPS+.

For example, in 1892, Duffy led all outfielders with 23.5 batting WS, although he had a 125 OPS+. In 1893, he was second among outfielders at 23.1 bWS to Ed Delahanty (23.3), despite another 125 OPS+. Of course he led the league in 1894 (28.0) (177 OPS+).

I was seeing him as historically dominant from 1891-1894, primarily because of Win Shares, especially batting Win Shares. Just recently, I wondered how someone with a relatively low OPS+ could consistently be among the league leaders in batting WS?

First I thought that Duffy had more plate appearances than anyone else. It is true that he had over 600PA each of these years, and so had more opportunity to accumulate batting WS.

But others had similarly high PA but didn't have nearly the amount of batting WS.

For example, in 1892, when Duffy led all outfielders in batting WS, he had 673 PA. Sam Thompson had a similar number of PA, with 679. Let's compare their statistics.

Player BA   OBP  SLG  AB   R   H 2B 3B HR RBI BB SO
Duffy .301 .364 .410 612 125 184 28 12 5   81 60 37
Thompson .305 .377 .432 609 109 186 28 11 9  104 59 19


Very similar raw stats. How were their adjusted numbers? (BRAA adjusted for season from Prospectus, BatRuns and OPS+ from bbref)

Player   OPS+ BRAA BatRns
Duffy 125 25   17.1
Thompson 144 40   32.9


OK, so compared to Thompson, Duffy played in a bandbox. Indeed the Philadelphia Baseball Grounds had an 1892 batting park factor of 100, while Boston's South End Grounds was 109.

But what were their raw batting WS? (Remember, similar raw stats, similar PA, but Duffy played in an easier park.)

Player bWS (raw)
Duffy 23.5
Thompson 18.1


How can this be? Then my relatively simple brain remembered that WS can be affected by team wins, especially team wins above their pythagorean projection.

Team  Record Pythag  RS  RA
1892 Boston 102-48 94-56   862 649
1892 Phil.   87-66 92-61   860 690


Based on this, Boston had 306 WS to dole out, while Philadelphia had 261.

Here are the players who weren't traded from Boston during 1892. Only Harry Stovey (3 WS total for Boston) had significant hitting time before he was traded.

Player bWS OPS+ AB
Duffy 23.5 125 612
Long  19.5 107 646
McCarthy 16.4  92 603
Tucker   15.9 106 542
Nash  14.2 101 526
Lowe   9.4  84 475
Stivetts  8.6 126 240
Ganzel 4.5  97 198
Quinn  2.9  54 532
Bennett   2.5  81 114
Kelly  2.5  53 281
Nichols   0.7  57 197
Total   120.8


Here are the players who weren't traded from Philadelphia during 1892. No traded player had significant hitting time.

Player bWS  OPS+ AB
Connor 22.0 167 564
Hamilton  20.8 152 554
Thompson  18.1 144 609
Delahanty 15.6 158 477
Hallman   11.5 117 586
Cross   9.5 108 541
Clements   9.0 128 402
Allen   6.5  89 563
Reilly  0.0  47 331
Total 113.0


So according to this, Thompson seems to be penalized because he has three great-hitting teammates, and the Phillies did not have a great record. Duffy, although clearly a worse hitter than any of the Phillies top 4, has more batting win shares than all of them because: (a) he had worse-hitting teammates; and (b) his team performed well.

I don't think I am going to use Win Shares anymore for this project.
   313. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 07, 2007 at 09:11 PM (#2515656)
Win Shares' crediting of teams' over/underperformance of their component stats to their players is well known by most voters and has been debated virtually ad infinitum. I'm glad to see you don't find it compelling, as I don't either. Welcome to the forces of light. :)
   314. JPWF13 Posted: September 07, 2007 at 09:32 PM (#2515673)
So according to this, Thompson seems to be penalized because he has three great-hitting teammates, and the Phillies did not have a great record. Duffy, although clearly a worse hitter than any of the Phillies top 4, has more batting win shares than all of them because: (a) he had worse-hitting teammates; and (b) his team performed well.


Thompsons team had 87 wins (pythag was 92, but WS uses actual) so 261 WS are divided among the team.
Duffys team had 102 wins (pythag was 94) so 306 WS were divided among the team.

If you use pythag wins rather than real wins Duffy gets 21.5 batting WS and Thompson 19.1
Thompson WAS a lesser % of his team's offense than Duffy was to his. That could be an allocation problem- too many of DUFFY's team's WS are going to offense instead of pitching and/or defense.

ALSO James found that run estimators tend top get a bit wonky before 1920 and especially before 1900. Duffy has a great SB and SH advantage over Thompson- James found that while SB and Sh do not correlate [positively] with scoring in our era- they did pre 1920 and especially pre 1900. So James' pre 1900 run estimator (a reworked version of runs created) may see Duffy as Thompson's offensive equal despite Thompson's 144 to 125 OPS+ advantage.
   315. Paul Wendt Posted: September 07, 2007 at 11:10 PM (#2515740)
Thanks for the calculation, J
So only half of Duffy's 1892 bws margin over Thompson is generated by Bill James' full allocation of wins.

by the way,

Wins above Pythagorean projection, Boston NL 1891-99
2 <u>8 8 5 0 1 2 4 0</u>

bold - excellent team, .625 or better (7 of 9 seasons)
<u>underline</b> - Hugh Duffy a regular outfielder (8 of 9 seasons, four cf then four rf)
italic - 154 game schedule (3 seasons)

1892 was the split season with a playoff (not included in season statistics) between the first and second-half leaders, Boston and Cleveland.
   316. TomH Posted: September 08, 2007 at 01:34 AM (#2516056)
What are the OWPs for Duffy and Thompson in 1892? Betcha it's a lot different than OPS+.
   317. OCF Posted: September 08, 2007 at 01:52 AM (#2516124)
For what it's worth, the system I've been using all along, which comes from RC and outs as given in a Stats, Inc. Handbook, does basically use James's adjustment for pre-1900 run estimators (SB, etc.) but it's not WS and it doesn't care what the team's W/L record was.

In that system, I have Duffy 1892 as 34, and I have Thompson 1892 as 34. So there you are - very similar. If I switch to RC above 75% of average, I get Duffy 54, Thompson 53. OK, their playing time was pretty similar. On the same scale (back to the first version, RCAA), I have Duffy's "wow" year of 1894 as a 69. OK, that's a very good year - but it wasn't Frank Chance 1906 (scored as 78) or even George Stone 1906 (scored as 92).

These are all offense only. Of course, Duffy was a better defender than Thompson. Not that I was any fan of electing Thompson at the time. The differences between OPS+ and this version of RC do erase Thompson's advantage, but only to equalize them - not to put Duffy ahead.
   318. Chris Cobb Posted: September 08, 2007 at 02:34 AM (#2516245)
What are the OWPs for Duffy and Thompson in 1892? Betcha it's a lot different than OPS+.

Bbref's new OWP numbers for the two are

Duffy .541
Thompson .570

Not so different from what OPS+ shows.

The formula BBref is using for OWP isn't listed, however, so it's not clear where these numbers come from.

FWIW, EQA sees a big difference between the pair.

Duffy .291 EQA, 25 BRAA
Thompson .309 EQA, 40 BRAA

That gap seems a bit large to me in favor of Thompson.
   319. KJOK Posted: September 21, 2007 at 09:43 PM (#2536096)
Just wanted to get Tango's latest WARP/BP comments into the archive:


I’ve also railed on BP for replacement level and using Runs Created. Those however are more philosophical disagreements. I like to consider replacement level at around .300, and could live with it being as low as .250. Clay (and by extension BP) uses .150. Clay is in the clear minority on this one and has a tougher job to explain himself. But, he could possibly muster enough evidence to support himself. However, that has never happened. I’d also be willing to debate that with them.

Strangely, rather than using EqR as their basis for VORP (and MLVr), they use Bill James old RC equation (one that even James himself doesn’t use). It’s one of those things that is so buried under the machinations of the process, that no one bothers to look, and deride BP for using. This one, while blatantly a very poor choice, is not “wrong”, because anything short of an all-encompassing sim would be “wrong”. However, it’s an extremely poor choice, one that BP should not be making. BaseRuns is the obvious choice here.

***
One thing that BP has straightened out is they have gotten rid of Pythagenport in favor of Pythagenpat. What would be nice however is if they call it Pythagenpat or whatever name David/Patriot want, rather than continuing to use Pythagenport as the name. And another is that Woolner did use the Tango Distribution over his, even though that was also a philosophical choice.
In both these instances, they went with the cleaner method that works a bit better in the normal range, and much better at the extremes. {clap clap clap}

***
While I’m here… will OPS+ go away please? No one will bother to calculate OPS+ on their own. So, why not calculate some form of Runs Created as a “+” metric.
   320. JoeD has the Imperial March Stuck in His Head Posted: September 25, 2007 at 01:42 PM (#2541337)
Here's an interesting thing:

There is a serious multiplication bug in Excel 2007, which has been reported. The example first that came to light is =850*77.1 — which gives a result of 100,000 instead of the correct 65,535. It seems that any formula that should evaluate to 65,535 will act strangely.

Just thought you stat geeks would want to know.
   321. KJOK Posted: September 26, 2007 at 07:09 PM (#2543712)
Baseball Prospectus put together a spreadsheet of the "Top 5 Best Players in Baseball" by year based on WARP3 in a rolling, six-year period. The weights are assigned as follows:


Year N-3 7%
Year N-2 13%
Year N-1 22%
Year N 31%
Year N+1 18%
Year N+2 9%

I posted a copy of the spreadsheet in the HOM Yahoo egroup FILES section.
   322. DL from MN Posted: September 26, 2007 at 08:56 PM (#2543867)
Dazzy Vance was a surprise...
   323. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 26, 2007 at 09:24 PM (#2543912)
That may have something to do with how WARP weights strikeouts. More credit is given for two pitchers with the same ERA+ to the one with higher K (although the methodology is, of course, a mystery), so I would imagine that Vance's insanely high league-relative K rate would serve him well--perhaps too well--in their formulas.
   324. jimd Posted: September 27, 2007 at 05:43 PM (#2545568)
More credit is given for two pitchers with the same ERA+ to the one with higher K

Getting each out has a certain value; the pitcher retains all of the value of his K's while sharing the value of the BIP-out with his fielders.

Dazzy pitched in front of the fielding-challenged for almost all of his prime. His ERA+ is therefore misleadingly low.
   325. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 27, 2007 at 05:50 PM (#2545577)
There are two separate issues here. The first is adjusting for defensive support, which everyone agrees should be done. If Vance's fielders were below average, he shouldn't be penalized for that, just as Palmer should be penalized for having above-average fielders. The second issue is intrinsically rewarding K's regardless of defensive support, which BP does. Given two *teammates* with the same ERA+ and the same fielders, the one with the higher K rate will still get more BP WARP, on the grounds that he got a greater share of his outs by himself. The latter I don't agree with.
   326. jimd Posted: September 27, 2007 at 08:26 PM (#2545878)
BP builds their statistical ratings from the bottom-up, the component stats, K's, Hits allowed, HR's allowed, etc. ERA+ is not one of those but a run-level stat and the fact that the two pitchers happened to produce the same ERA+ from those component stats is deemed irrelevant by BP.

Let's look at this from the batting perspective.

Suppose you had two hitters that played for the same team, each playing half the season in the 3 slot, the other half in the 4 slot. Suppose they each had the same number of RBI's, but one had significantly more Runs Created than the other. Does that fact invalidate Runs Created?

Which is more important? The lower-level components as predictors of what "should have" happened at the run-level? Or the run-level result stats as documentors of what did happen? Are you arguing for components for hitters and run-level for pitchers? And if so, why?

(This also has some relation to Thompson/Duffy, WARP vs WinShares, component-level vs win-level.)
   327. jimd Posted: September 27, 2007 at 08:31 PM (#2545904)
And if so, why?

This reads as much more "combative" than I intended.

There may well be very good reasons for different persectives here.

I just haven't thought enough about these issues myself, either.
   328. Paul Wendt Posted: September 27, 2007 at 08:46 PM (#2545968)
325. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 27, 2007 at 01:50 PM (#2545577)
There are two separate issues here. The first is adjusting for defensive support, which everyone agrees should be done. If Vance's fielders were below average, he shouldn't be penalized for that, just as Palmer should be penalized for having above-average fielders. The second issue is intrinsically rewarding K's regardless of defensive support, which BP does. Given two *teammates* with the same ERA+ and the same fielders, the one with the higher K rate will still get more BP WARP, on the grounds that he got a greater share of his outs by himself. The latter I don't agree with.

The rationale may be that first base on error is missing data whose incidence is greater for pitchers with fewer strikeouts. Crediting pitchers for strikeouts is a proxy for debiting bases on errors.
   329. Dr. Chaleeko Posted: September 27, 2007 at 09:10 PM (#2546099)
Simple advancement on outs for runners already on base during balls in play also argues for Ks since it increases the run-expectancy above what it would be with the K.
   330. David Concepcion de la Desviacion Estandar (Dan R) Posted: September 27, 2007 at 09:49 PM (#2546197)
jimd, *Yes* I most *definitely* believe component stats should be used as the basis for calculating hitters' performance and runs the basis for calculating pitchers' performance, for the critical reason that pitchers control their own context and hitters do not. Some pitchers can and do change their approach from the stretch, distorting the component stats/runs relationship (Glavine is the obvious example on one hand, and Nolan Ryan appears to be one on the other) to a significant degree. Batters cannot do this, since they only come up one out of every nine times. This is the same reason why you can use straight BaseRuns for pitchers but not for hitters, and also why you have to apply the Pythagorean theorem differently to hitters than pitchers (for hitters, you apply it to team RS/RA totals, whereas for pitchers you should calculate the team's winning percentage in their innings and then in the remaining innings). This can be a large effect: take a pitcher who throws 30 complete games and allows 240 runs in a 4.0 R/G league on an otherwise average team. If you apply Pythagoras to the team RS/RA, you'll get 648 RS and 768 RA, which comes out to an exponent of 1.85 and a projected record of 68.3-93.7. But in fact, there are two separate run-scoring contexts: one in the pitchers' starts, one in all other starts. In games started by the pitcher, the team scores 120 runs and allows 240, for an exponent of 2.03 and a projected record of 5.9-24.1 in his starts, plus a .500 record in all other games gives an overall projected record of 71.9-90.1. So treating the pitcher like a hitter causes you to misstate his value by 3.6 wins!

Paul Wendt and Dr. Chaleeko, there is an empirically measurable marginal value of a K relative to a non-GIDP, non-SF, non-H fielded out, and it is .008 runs. Never adds up to more than a handful of runs even in the most extreme cases (although I do factor it in in my run estimation nonetheless).
   331. TomH Posted: September 27, 2007 at 10:32 PM (#2546312)
empirically measurable marginal value of a K relative to a non-GIDP, non-SF, non-H fielded out, and it is .008 runs.
--

I assume analysis of pre-1911 pitching would put this at a different value.
   332. Paul Wendt Posted: September 28, 2007 at 01:02 AM (#2546963)
I don't know Davenport's rate of credit for strikeouts even approximately, and except by reference to others such as DanR finds .008" I wouldn't be able to judge the magnitude if I knew it, although I would recognize .08 runs as too high.

TomH is right that the true strikeout premium must vary historically. I guess that where Davenport does use fixed all-time parameters he estimates them for only some recent portion of mlb history, so his estimate would be little or none influenced by 100- or 150-year-old conditions, but I am guessing.
   333. KJOK Posted: September 28, 2007 at 08:33 PM (#2548268)
I'm almost certain strikeouts consistently are around .04 runs more damaging than the average 'other' out.
   334. Paul Wendt Posted: September 29, 2007 at 03:00 AM (#2549353)
333. KJOK Posted: September 28, 2007 at 04:33 PM (#2548268)
I'm almost certain strikeouts consistently are around .04 runs more damaging than the average 'other' out.

I'm almost certain consistency at that precision is possible from year to year but not from model to model. There are too many variations in the methods of measuring that cost.

--
New England Symposium for Statistics in Sports
Months ago I mentioned this one-day conference on Statistics in Sports, tomorrow at Harvard University. See "Program" for the author-title list and the abstracts. The organizers hope that it will be annual.
Statistics in Sports (one-day conference Saturday)
   335. Paul Wendt Posted: September 29, 2007 at 03:19 AM (#2549438)
Where is the Dave Johnson thread?
heh - if i may anticipate jtm

There is a long display cabinet along one wall of the sciences library at Harvard University. The current exhibit is "From BA to BABIP - The History of Baseball Statistics". Regarding the abstruse work of Earnshaw Cook, Percentage Baseball (imagine a copy of the early 1960s book open to a telling page), the exhibit notes his splash of publicity thanks to a Sport Illustrated feature by young Frank Deford and his generally poor reception in academia, scornful reception in baseball. But Cook did make one convert [?? maybe not even a good paraphrase]. Davey Johnson, a math major in college at the time, accepted many of Cook's ideas and later became manager of the Mets.
   336. Paul Wendt Posted: October 13, 2007 at 05:05 AM (#2574168)
"heh"

[copied from "2005 Results"]

141. TomH Posted: October 06, 2007 at 10:28 PM (#2564778)
well, I can't FIND the uber WS vs WARP thread, so let's continue here...

DanR: The only way you can calculate that a player who got 3 WS on the 1899 Cleveland Spiders contributed exactly one win to the team is if you use a replacement level of precisely 52% of league average offense. Let's take a hypothetical guy who was 3 WS = 1 win in 1899. League average was .214 runs per batting out, so James's 52% baseline is .111 runs per out. Say the guy made 300 outs. If he was 1 win = 10.77 runs in 1899 above that baseline, then he created .111*300 + 10.77 = 44.2 runs. OK, so if you use James's 52% figure that's 3 batting WS. But if you used a baseline of 25% of league average offense, then he'd magically get 44.2 - (300*.214*.25 = 16.05) = 28.15 runs/10.77 runs/out * 3 WS/win = 7.8 batting WS.

No, that is not right. You can't merely move the baseline and then neglect to reaccount for the ratio of runs to wins, or there would be far more win shares than wins. WS starts with team wins, figures how many runs it takes to make thsoe wins in the team as a whole, and then distributes them (offensively) but RC/out. The above calculations are not correct; changing the baseline would not give the example batter 4.8 more BWS. Same with the Rosen example.

DanR: See, there's nothing inherently "true" or "right" about the Win Shares allocation system.

Oh, I completely agree. But it makes much sense for the application for which it was designed; because using absolute zero has its own problems, and using 80% of average (or precisely average, the only other measure that has inherent good properties to it) causes wins then that need to be re-allocated by playing time, since there would be many fewer WS earned than wins the team achieved. Using 'average', you could create a system that then adds so many "wins" for each player by playing time (1 win per 150PA or 40 IP or some such thing). But James created a system that did not require that. 52% on offense was a low enough replacement level to make it work. I am not really trying to defend 52% as "right"; is it arbitrary? Sure. Would other numbers work? Sure; IF you wished to go back and subtract or add in fudge factors to make the individual totals match ther team total. WS is a top-down approach, and maybe I should leave it at that; it is DIFFERENT from a bottom-up approach that almost every other system uses. In some applications, it will be a better tool. For others, it is worse. If I had invented it, I might have tried to come up with a player's Win Shares AND Loss Shares.

DanR: And regarding the 1975-76 Reds, who would they have been forced to play if one of their stars had gone down? A replacement player, of course--one whose production would probably be approximately 80% of positional average.

My point about the extreme teams was that great teams tend to have higher freely available talent on the bench (Dan Driessen!), whereas the Spiders do not (duh). Is this important when considering long-term replacement level for the HoM? Maybe not. Again, the question WS was designed to answer was "how do I distribute the 108 wins in Cincinnati's 1975 team among their players?". Babe Ruth, on the 99 Spiders, could have been just as great a player but would have likely won fewer additional games for that team. This is pretty obvious, no? WS captures this. Whether you think this is improtant or relveant can be argued. But it does capture it.

DanH #145
. . .
Since there may not be more than 15-20 human beings playing baseball on this earth capable of doing that at any given time, FAT shortstops are below-average fielders as well as hitters. Given that, it seems entirely reasonable that the mega-expansion wouldn't affect all positions equally, and that the gap between the #24 and the #20 shortstop would be bigger than the gap between the #24 and the #20 1B. Secondly, you had the move to turf fields, which has been discussed at great length. The combination of those factors is more than enough to convince me that you really did need a super glove at SS to compete in the '70s (and many of the winning teams like the Orioles, Reds, A's, and Yankees had them), and that Concepción and Campaneris "deserve" all the credit for the pennants they added. Your mileage is free to vary.

TomH:

Mea culpa on the calculations, but does that change the substance of my point? That 52% is an arbitrary number, and that using a different baseline level would lead to a different allocation of Win Shares among batters?

What are the problems of using absolute zero, besides the fact that it's just blatantly not representative of how baseball works? (Neither is 52%, of course).

Using wins above/below average plus credit by playing time would be an INFINITELY better system than the current Win Shares model, in my opinion. Incomparably better.

Moreover, 52% is not a low enough level to make it work, because there *are* still players who hit at worse than 52% of league average. Look at Bill Bergen, who had not a single Batting Win Share in his nearly 1,000 game-long career. The presence of Bergen causes the Batting Win Shares of all of his teammates to be inappropriately reduced.

146. TomH Posted: October 07, 2007 at 08:12 PM (#2566273)
If you used absolute zero, every batter would have so many runs above baseline, you would have to go back and take away wins per playing time to get the team wins to match win shares. Or make believe it takes oddles of runs to create a 'win'. E.g., if a team only wins 40 games and so gets 120 WS, and if 60 were batting, and they scored a mere 600 runs, that's 10 runs per win share, or 30 runs per win. Wouldn't work.

Your take on WHY the FAT talent level of shortstops dropped in 1970 is a fascinating theory. If SS FAT level drops often with expansions, that would be a large finding (not only for this dicsussion, but for MLB GMs!!). That is actually a question that really piques my interest at the moment.


DanR later
TomH:
Is that why James picked 52%? Because it's the only number that gives you 10 runs a win in the modern game with no tweaking? That would be sort of cute. Still empirically wrong, but a tiny bit less arbitrary--a number selected for convenience rather than for accuracy.
   337. Paul Wendt Posted: December 04, 2007 at 03:26 AM (#2633184)
"A Quick and Dirty Fix for Linear Weights" --Phil Birnbaum, editor, By the Numbers

Charlie Pavitt, In Response to Win Shares: A Partial Defense of Linear Weights

Charlie Pavitt maintains a bibliography of mainly-academic mainly-statistical research on baseball --recently moved from the University of Delaware website, where is it? He reviews published academic work in a front page column of By the Numbers, the newsletter of the Statistical Analysis Committee, SABR.
   338. Paul Wendt Posted: January 02, 2008 at 05:32 PM (#2658356)
[from the 2008 mock HOF thread. emphasis mine]

150. DL from MN Posted: January 02, 2008 at 09:59 AM (#2658223)
. . .
I would caution voters that using raw WS totals overrates outfielders over infielders over pitchers in the modern era due to replacement value issues.


151. kwarren Posted: January 02, 2008 at 10:57 AM (#2658270)
Pitchers are worth much less to their teams on an individual basis in the modern era, because they play considerably less than in previous eras.

I agree that outfielders rates outfielders slightly higher than infielders, but I assmumed that was because they tended to be, on average, much better hitters. And even though infielders (2b, SS, 3b) tend to contribute more than corner outfielders defensively it does not compensate for the better hitting that outfielders usually provide.

This trend has been changing recently with the advent of power hitting infielders. <u>Consider the top 18 players in 07 using Win Shares.

6 OF, 4 1B, 3 3B, 2 SS, 1 2B, 1 DH</u>
. . .


152. DL from MN Posted: January 02, 2008 at 11:28 AM (#2658292)
Look at that list - 11 bats, 6 gloves (no C) and no pitchers; this demonstrates what I was saying. I don't think I want to see a HoF that is all OF and no pitchers post 1975. <u>The $$ paid out for pitchers is higher per win share than the $$ paid out for outfielders. The market is in significant disagreement with Win Shares.</u>

It seems questionable to me that people would use a strict system for HoF voting (most Win Shares) that wouldn't put a pitcher on it's 25 man roster for the best players of the most recent season.

There's a (long) discussion thread on WARP, win shares and replacement value on the HoM site. There's no need to repeat it all here.


This is that long discussion thread.

Many of the same themes have been discussed in the "Dan Rosenheck" thread since DanR introduced another uber-system.
   339. Paul Wendt Posted: January 02, 2008 at 05:48 PM (#2658372)
The two matters I have emphasized should be studied and summarized systematically. Has anyone done that?

1. The historical distributions of player-season win shares by fielding position and year.

2. The relation between salaries and win shares in one season, and more complicated relations among player-season salaries and win shares.

Either study should use a complete table of win shares by player-team-season, integrated with the baseball data that is more widely available. For example the "Lahman database" includes a complete table of fielding games by position and player-team-season, and a nearly-complete table of compen$ation for some recent period.

Is a complete win shares table available?
Does the digital edition of Win Shares provide the table through 2002 or so?
   340. Paul Wendt Posted: January 02, 2008 at 11:38 PM (#2658680)
In print, player-team-season Win Shares is always an integer. Two popular peak measures are sums of three and five of those integers. How frequently does a player gain or lose by rounding at the season level, before the sum, rather than rounding the sum?

(3)
<u>Sum of three Win Shares</u> season ratings
The distribution of "gain" (which may be positive or negative here) from rounding at the season level is very close to Normal with standard deviation 0.5

Consider a player with reported win shares 34 + 29 + 28 = 91.
The probability is about 2/3 that the true sum lies between 90.5 and 91.5, so that taking the sum before rounding would also yield 91.
The probability is about 19/20 or 95% that the true sum lies between 90 and 91
The true sum is between 89.5 and 91.5, so that taking the sum before rounding would surely yield 90, 91, or 92

That is, 1 out of 3 more accurately round to 90 or 92 than to 91
1 out of 20 are truly outside the range 90 to 92

(5)
The distribution of gain is roughly Normal with s.d. 0.8

Consider reported 5-year sum 21 + 22 + 23 + 24 + 25 = 115
The probability is more than 50% that the true sum lies between 114.5 and 115.5, so that taking the sum before rounding would also yield 91
The probability is almost 90% that the true sum lies between 114 and 116, or 115+/-1
The probability is almost 99% that the true sum lies between 113.5 and 116.5, so that taking the sum before rounding would yield 114 or 115 or 116
   341. Paul Wendt Posted: January 02, 2008 at 11:45 PM (#2658681)
Sorry, that should be:

> (5)
> <u>Sum of five Win Shares</u> season ratings
> The distribution of gain is roughly Normal with s.d. 0.64

(0.8 is the rough number of standard deviations for gain +0.5)
The approximate probabilities for (5) are correct, I think.
   342. Paul Wendt Posted: May 01, 2008 at 12:53 PM (#2764939)
Win Shares are denominated in wins (thirds of wins). The same is true of player-team-season ratings by Davenport (WARP) and Palmer (TPR, TPI; BFW, PFW).
Win Shares for all players in a team-season are normalized so that their sum is equal to team-season wins (times three). That is not true of Davenport's or Palmer's ratings, nor of many others.

Therefore the following data on team-season Games and Decisions is specifically relevant to using and understanding and improving the Win Shares system.

For every team-season Games Played = Decisions + NoDecisions. It is only a little misleading to call the NoDecisions "Ties" and it is convenient, so let me say it. For short, G = D + T.

Even today it is common that two teams in one league-season (whatever that is under interleague play) finish with different numbers of Games or Decisions. There were many differences between teams in 1994 when the season ended abruptly in August. In 24 league-seasons since then (1995-2006, two leagues) there have been only 5 with equal numbers of games played for all teams (almost inevitably, the number is 162). There have been 7 with equal number of decisions for all teams (again almost inevitably 162). And there have been 14 seasons with equal numbers of ties for all teams (zero). In those 24 league-seasons, the biggest differences between teams have been two games played (four times), two decisions (four times), and one tie (10 times).

Without further ado,

Latest league-season with given difference in games played for some pair of teams
1 game, 2006 NL or AL
2 game, 2000 NL
3 game, see 4
4 game, 1994 NL
5-6-7g, see 8
8 game, 1981 NL
9 game, see 10
10 game, 1945 AL
11 game, 1892 NL
For now coverage ends or begins in 1892 because 1891 is the latest season a major league team did not complete the season.

Now pass over some abnormal seasons: 1994, 1981, 1942-1945, and 1918. (In 1918, 1942, 1945, and 1981 --but neither 1943 nor 1944-- the biggest difference among teams within league was at least seven games.)

Adjusted by passing over 1918, 1942-45, 1981, and 1994
1 game, 2006 AL or NL
2 game, 2000 NL
3 game, see 4
4 game, 1989 NL
5 or 6, see 7
7 game, 1953 AL
8 game, 1938 AL
10 game, 1893 NL
11 game, 1892 NL

1953. Is that ancient history?

Latest league-season with given difference in decisions for some pair of teams
1 deci, 2006 AL or NL
2 deci, 2002 NL
3 deci, see 4
4 deci, 1994 NL
5-6-7-8, see 9
9 deci, 1981 NL
That 9-decision difference in 1981 is the greatest in the period 1892-2006, tied in 1945 and 1906 but never exceeded.

Adjusted by passing over 1918, 1942-45, 1981, and 1994
1 deci, 2006 AL or NL
2 deci, 2002 NL
3 deci, 1979 AL
4 deci, 1978 AL
5 deci, 1962 NL
6 deci, 1938 AL
7 deci, 1907 AL or NL
8 deci, see 9
9 deci, 1906 AL

Latest league-season with given difference in "ties" (all no-decision games) for some pair of teams
1 tie, 2005 NL
2 ties, 1989 NL
3 ties, 1981 NL
4 ties, 1953 AL
5 ties, 1937 AL
6 ties, 1916 NL
7 ties, see 8
8 ties, 1911 NL

Commonly the maximum difference of T ties between teams in one league-season is a difference between one team with T ties and one or more with no ties. Detroit holds the ties record with 10 in 1904 but the maximum difference between teams was only two because every team tied at least two games. In a few league-seasons every team tied at least three games: 1907 AL, 1914 AL, 1914 FL. In the 46 major league seasons 1892-1914 there were 13 with no ties.
   343. Paul Wendt Posted: May 01, 2008 at 01:00 PM (#2764946)
Concerning the Win Shares rating system, variation in number of decisions is probably more important than that in numbers of games or ties. If all "ties" are replayed, that means uniformity in number of decisions and in a sense replays restore equal opportunities to earn win shares.
Here are some examples of big differences in numbers of decisions

1907 NL
_G_ _D_ _W_ _L_
155 152 107 _45 Chicago
157 154 _91 _63 Pittsburgh
149 147 _83 _64 Philadelphia
155 153 _82 _71 New York
The second division played 153 to 148 decisions.

The difference in decisions between PIT and PHI teams represents about 12 win shares for the players on each team in expectation.

1907 AL
_G_ _D_ _W_ _L_
153 150 _92 _58 Detroit
150 145 _88 _57 Philadelphia
157 151 _87 _64 Chicago
158 152 _85 _67 Cleveland
The second division played 152 (St Louis) to 148 decisions.

For Cleveland that is almost 12 win shares gained relative to Philadelphia; for Philadelphia about 14 win shares lost relative to Cleveland.

1981 NL (overall records in split season)
_G_ _D_ _W_ _L_
103 102 _59 _43 St. Louis
...
103 102 _46 _56 Pittsburgh
...
111 111 _56 _55 San Francisco

The 9-game difference represents a gain of about 14 win shares for San Francisco relative to St. Louis or a loss of about 16 win shares for St. Louis relative to San Francisco.
   344. Paul Wendt Posted: May 18, 2008 at 03:27 PM (#2784983)
Maybe this is a useful relocation. At least Brock will meet this thread.

Brock Hanke wrote this today in "Ranking the Hall of Merit Firstbasemen" #93. This is only part of what he wrote and what he wrote is only preliminary.

A post on this subject that I thought would take a day or two has taken over a week to work up. The essence of the post is that I think you should amortize 1870s catcher playing time out to maybe 90 games instead of 162 when you're doing Season Equivalents, and that this is the only decade and the only position to which that applies. It only applies to catchers, and it only applies to 1870s catchers, except for one season of Charlie Bennett (1882) when he actually played his team's entire schedule, albeit not all at catcher.

Brock,
I'm sure there are some strengths and some weaknesses to your study. I hope that I can make time to give it a close look . I'm not very good at making time but the baseball subject commonly grabs me.

From what you say I infer that the player ratings part of your thesis --in contrast to the history and the curve fitting-- puts you in the Pete Palmer school, or addresses the Palmer school. Palmer, Clay Davenport, Bill James, and their followers. Their marquee ratings are career sums of season ratings denominated in games. Among them Palmer makes no adjustment for season length. He once gave a talk or wrote a paper on the primacy of the season and I suppose he would say that the career-sum is secondary although it helps sell books. Bill James, too, makes no adjustment for season length, although he does give some space to win shares per 162 games beside more the career, 5-year, and 3-year totals. Davenport prorates every season at the rate (162/G)^2/3 rather than the linear 162/G. (By "amortize" I think you mean the linear 162/G where G is team games played or scheduled.)

Chris Cobb is in the Palmer school.
Those who rely on raw win shares or season-prorated win shares are taking this approach.

Joe Dimino is not. He asks "Joe Torre, 1960-1977: how much did he contribute toward winning 18 pennants?" "Deacon White, 1871-1890 (or 1869-1890): how much did he contribute toward winning 20 (or 22) pennants?"
People who say "a pennant is a pennant" may be, and surely some are, professing this approach. --even if they don't follow it all the way to a numerical rating.
   345. John (You Can Call Me Grandma) Murphy Posted: May 18, 2008 at 04:21 PM (#2785018)
I guess I'm also in the latter school, Paul, despite using Win Shares heavily in my analysis.
   346. Chris Cobb Posted: May 18, 2008 at 05:01 PM (#2785029)
Chris Cobb is in the Palmer school.

I am not sure this is true, though I am also not sure exactly what you mean by putting me in "the Palmer school."
   347. Paul Wendt Posted: May 19, 2008 at 12:48 AM (#2785487)
You may be right, Chris.

What I mean by the Palmer school is that the focus is coming up with the right season rating denominated in wins.
(Palmer might say that they shouldn't be added except to sell books to people who are interested in adding them, and no one should be interested in adding them. If I ever have a conversation with him that gets beyond one beer --and we are presently at zero beers-- I will ask about that. But that isn't a tenet of the school.)

Dan Rosenheck is in the Palmer school during the regular term, and he is interested in adding wins. But he has a summer job in another school, putting all of the players into one modern free agent labor market.
   348. Paul Wendt Posted: January 19, 2009 at 05:58 PM (#3055239)
[two copies from the thread on Dan Rosenheck's WARP]

678. Blackadder Posted: December 27, 2008 at 09:54 AM (#3038841)
Apparently Clay Davenport is reworking BP's WARP, to include PBP fielding when it is available and a more realistic replacement level. Jay Jaffe quoted some of the preliminary results, and eye-balling it the replacement level still looked a little too low, but I'll withhold judgment until his system is public. Still, it is a very welcome development that there seems to be convergence in opinion about the correct methodology for player valuation.

680. Devin McCullen cries "Enraha!" Posted: January 09, 2009 at 04:23 PM (#3047931)
This seems like as good a place as any to mention this: BP now has searchable stats for Batting Translations, Pitcher Translations, and WARP Leaderboards here. They only go back to 1901, and its year-by-year, but I assume folks will find these helpful.
   349. Paul Wendt Posted: April 03, 2009 at 08:18 PM (#3123658)
Last month we have reported and discussed a few measures that have changed with the new edition of WARP, the original by Clay Davenport which is incorporated in player "DT cards" at baseballprospectus.com. (Where? probably among "Pitchers for the Hall of Merit" and "Ranking Pitchers for the Hall of Merit" for 1871-1892 or 1893-1923)

Today I compiled some career Advanced Pitching Statistics for all fifteen major league pitchers on the 1893-1923 ballot. I noticed that the measures {XIP, RAA, PRAA, PRAR} have not changed from last year for Cy Young whereas they have changed for the other pitchers (fourteen). I reported this apparent problem to Clay Davenport.
   350. KJOK Posted: April 07, 2009 at 05:26 AM (#3127718)
Cy Young's WARP1 has changed from 193 to 129, so BP has certainly at least made a major revision in the WARP calculation.
   351. Paul Wendt Posted: April 07, 2009 at 12:20 PM (#3127770)
revised,
XIP RAA PRAA PRAR DERA
7399 901 932 2013 3.37
4839 570 693 1231 3.21

By DERA he now ranks third in this group and he is second to Johnson by XIP, RAA, PRAA, or PRAR.
newDERA name
2.94 Rusie A
3.04 Johnson W
3.21 Young C
3.22 Alexander P
3.31 Walsh E
   352. Paul Wendt Posted: April 07, 2009 at 07:27 PM (#3128373)
Here are the 2008 and the revised values of DERA for 31 pitchers with debuts in the 1890s and at least 2000 career innings. They are ordered by the revised value (column two).

DERA newDERA
3.36 3.17 Hahn N
3.37 3.21 Young C
3.89 3.36 Breitenstein T
3.76 3.41 Nichols K
3.54 3.51 Waddell R
3.95 3.64 McGinnity J
3.92 3.68 Griffith C
4.03 3.83 Leever S
4.04 3.89 Willis V
3.96 3.95 Cuppy N
4.22 3.96 Donovan B
4.38 3.97 Taylor J
4.01 4.07 Tannehill J
4.17 4.08 Orth A
4.17 4.08 Mercer W
4.29 4.09 Hawley Pink
4.18 4.09 Phillippe D
4.16 4.09 Chesbro J
4.09 4.13 Dinneen B
4.43 4.15 Meekin J
4.19 4.18 Powell J
4.27 4.22 Killen F
4.24 4.23 Taylor J
4.34 4.24 Sparks T
4.37 4.29 Howell H
4.47 4.31 Kennedy B
4.37 4.42 Donahue R
4.61 4.57 Sudhoff W
4.73 4.71 Fraser C
4.73 4.75 Kitson F
4.77 5.05 Carsey K

Sam Leever now leads his 1900-1902 teammates comfortably.

The revision benefits Ted Breitenstein more than anyone else in this group (row three) but many with 1880s debuts gained more than he did and Charlie Getzien from the 1880s lost more than anyone in this group.

Among the leaders by career innings, at least, the size of the revision on the DERA scale is generally greater for the 1870s and 1880s debutantes. Cherokee Fisher from the early 1870s now gets credit for DERA 2.04, down from 4.19!
   353. DL from MN Posted: April 08, 2009 at 12:01 AM (#3128961)
Dutch Leonard seems to have benefitted from this revision of WARP
   354. Paul Wendt Posted: April 13, 2009 at 03:42 PM (#3136031)
This weekend I posted a couple of items regarding park factors at Mule Suttles #94-95. Initially the point was to learn what we may know be able to do for the Negro Leagues and Mule Suttles #20-32 was the occasion for an important part of that work four years ago.

I included remarks on the use of park factors by Bill James in Win Shares. Pages 86ff he explains his park adjustment calculations (which include some broad and some narrow mistakes).

James also states (p87 col2) where he uses the overall park adjustment, which is a "factor" for run scoring. Those applications are in "Dividing Win Shares between Offense and Defense" and "Dividing Offensive Win Shares among a Team's Hitters" (p17-25). More on that later.

However, regarding the specialized park factors for Home Run and Non-Home Run adjustments, he says only that they "will also be needed later in the process" (p87).

Does anyone here know where?
   355. Paul Wendt Posted: April 14, 2009 at 05:21 PM (#3137709)
Dutch Leonard seems to have benefitted from this revision of WARP.

I don't have any WARP data, only Advanced Pitching Statistics.
By DERA the Dutch Leonards gain 0.07 and 0.08, or about two points on the ERA+ scale.


The Hall of Merit relief pitchers Fingers, Gossage, and Eckersley all lose about 0.40.

Is anyone able to check any of these recent or active relief pitchers, because you too have the 2008 edition data?
(columns one, three, five, any one of which is redundant)

XIP    newXIP    PRAA    newPRAA    DERA    newDERA    name
856    1020    161    213    2.81    2.62    NATHAN J
1163    1358    137    192    3.44    3.23    PERCIVAL T
1673    1968    417    311    2.26    3.07    RIVERA M
1270    1463    246    130    2.75    3.70    WAGNER B
1789    2045    265    4    3.17    4.48    HOFFMAN T
1573    1796    155    
-21    3.61    4.60    HERNANDEZ R
1491    1743    87    
-155    3.98    5.30    JONES T 
   356. Paul Wendt Posted: April 16, 2009 at 02:44 AM (#3140312)
Some big revisions have been posted during the last few days. --the last 36 hours if I looked up those recent or active relief pitchers at noon yesterday but I don't recall whether there was some lag here at my laptop.

For brevity here are two pitchers only, Francisco Rodriguez and poor Todd Jones. For simplicity I will call the three sets of estimates (2008), recent, and today; 2008 and recent are the two that I posted yesterday. For readability the layout is three successive rows.

Francisco Rodriguez
XIP  RAA  PRAA PRAR DERA
679  
?    143  361  2.60  (2008)
816  176   81  171  3.61  recent
816  176  174  287  2.57  today 


Todd Jones
XIP  RAA  PRAA PRAR DERA
1491 
?     87  493  3.98  (2008)
1743 86  -155   39  5.30  recent
1743 86    71  313  4.13  today 
   357. DL from MN Posted: April 16, 2009 at 02:17 PM (#3140615)
The relievers are very much in flux. Not sure what's happening.
   358. DL from MN Posted: April 16, 2009 at 04:56 PM (#3140874)
It looks like replacement value went back down again. All the pitchers gained in PRAR and most in PRAA.
   359. DJ Endless Grudge Can Use Multiple Slurp Juices Posted: April 16, 2009 at 05:31 PM (#3140937)
Lot of weird stuff going on here. I have full WARP from last year and from a week ago; tonight I might compile the new new WARP and take another look at it. I don't know if replacement level went down or up - I've seen position players moving in both directions.
   360. Paul Wendt Posted: April 16, 2009 at 06:04 PM (#3140987)
It looks like replacement value went back down again. All the pitchers gained in PRAR and most in PRAA.

maybe a decrease in replacement level but it is not enough to float the PRAR of all pitchers --not those who must give back lots of runs to their fielders. For example, poster boy Al Spalding is down from 210 to 120 (PRAR). BY DERA he is now a small gainer from last year: 4.13, 2.50, 4.06.

Last month I speculated that there had been some revision regarding the cooperation by pitchers and fielders, as if Spalding had been credited with doing so remarkably well given all those jokers running around without gloves --some transhistorical average fielding as a benchmark. That isn't plausible but the estimates for some early pitchers on famous teams do look like they have enjoyed that mistake and now suffered its correction.
   361. JoeD has the Imperial March Stuck in His Head Posted: April 17, 2009 at 05:31 AM (#3141995)
I'm lurking here and very interested in what you guys figure out about the revisions. How could they possibly decide to lower the replacement level further?
   362. Paul Wendt Posted: April 17, 2009 at 04:24 PM (#3142272)
DERA only, here are the big winners and big losers by 2009 revisions, among 407 pitchers with 2000 career innings.

16 losers (DERA up at least 0.20 runs)

DERA newnew
4.39 4.88 Getzien C
4.97 5.34 Billingham J
4.70 5.06 Ellis D
4.58 4.93 Briles N
4.61 4.95 Sele A
4.65 4.98 Splittorff P
4.44 4.77 Leonard Den
4.36 4.66 Kaat J
4.35 4.61 Goltz D
3.95 4.20 Radke B
4.63 4.87 Reuss J
4.57 4.81 Burkett J
3.73 3.96 Keefe T
4.55 4.76 Donohue P
4.53 4.74 Gura L
4.20 4.40 McBride D

By debut they are three from the 1870s and 80s, including the one at the head of the list; one from the 1920s; twelve from Jim Kaat to the present.

I count 27 with DERA down at least 0.2 runs. The threshold for listing here is a little higher in order to make the group size similar.

15 winners (DERA up at least 0.24 runs)
DERA newnew
3.88 3.53 Rusie A
4.10 3.76 Zettlein G
4.02 3.68 Niekro P
3.89 3.57 Breitenstein T
3.84 3.55 Garver N
4.84 4.56 Cunningham B
4.44 4.16 Ward JM
3.89 3.62 McMahon S
4.46 4.19 Ramos P
4.07 3.81 McCormick J
4.47 4.21 Honeycutt R
4.45 4.19 Hough C
4.20 3.96 Galvin J
4.08 3.84 Morris E
4.55 4.31 Patten C

By debut date they are eight from the 1870s and 80s, Breitenstein 1891, Patten 1901, and five from the 1940s to 70s.
If this holds up then at least McCormick from olden days and Garver from modern times should get another look.
   363. Paul Wendt Posted: April 17, 2009 at 04:25 PM (#3142274)
Now I will let this rest a few days, both hoping to hear from Clay Davenport and planning to revisit the pages for some of the biggest winners and losers by revision, also some of the extreme revised values.
   364. Paul Wendt Posted: November 20, 2009 at 08:06 PM (#3392813)
This year I didn't get any reply to email inquiry about revisions to WARP.
   365. Paul Wendt Posted: November 20, 2009 at 08:11 PM (#3392818)
Regarding "sample size" and the numbers of games scheduled.
Quoting from Bleed and Brent, "2010 Ballot Discussion" #307-309, where I have also posted the first part of my comments as #319.

308. Brent Posted: November 19, 2009 at 09:30 PM (#3391965)
>> [Bleed #307] Does anyone else have thoughts on how to use Rally Monkey's WAR to reflect the value of 19th century ballplayers?
<<

I guess the first step is to articulate the reasons you'd like to make adjustments to reduce the results from the simple extrapolations. Is it because you think the short-season data aren't representative and you want to regress them? Or is it an adjustment for perceived league quality (in which case you'd also want to adjust data from longer seasons)?


309. Bleed the Freak Posted: November 19, 2009 at 10:39 PM (#3392000)
My reasoning would be that short-season data is a smaller sample size, and might not be fully representative, so regression may be necessary.


That may be reasonable regarding "peak" credit for seasons not supported by neighbors of about the same quality; that is, reasonable in "non-consecutive peak" analysis. I think it is generally unreasonable, along two lines below (1,2).

Before getting there, let me simply state what is a generally reasonable concern about the number of games in the championship schedule, or the "length of the season", for anyone who cares about "pennants" as well as games and runs. For the same league, same teams, winning percentage .625 may generate the same probability of winning a 126-game pennant race as does .600 in a 162-game pennant race. I presume that handling this point is a big part of "pennants added" analysis by Joe Dimino, following Michael Wolverton, if I understand correctly.

1.
The matter of so-called sample size may be all about talent rather than achievement. It does seem to be all about talent rather than achievement for Tom Tango and "Bleed" in the ThinkFactory discussion cited here last week. --or one remove from that citation. Tom Tango argued for Edgar Martinez among other things. Bleed interpreted Tango's rating system and Dan Rosenheck's WARP in terms of root-n, the square root of the number of observations, which is ubiquitous in mathematical statistics. Insofar as we care about talent rather than achievement, the issue of so-called sample size is statistically significant (an abuse of technical language) and may sometimes be significant on the scale of this project. "N" is all about how certain we can be that Barry Bonds is truly a talented player. Just how unlikely is it that a league-average talent could have posted his playing career? Please quantify!

2.
Concerning achievement rather than talent, in major league baseball from 1871, I doubt that anyone really means a sample size issue. Essentially we have the complete record for Ross Barnes in championship play 1871-1876, same as we do for Dave Cash 1971-1976. Who needs more? Well, the entertainment of paying customers in lots of other games was an important part of Barnes' job, but a trivial part of Cash's job. We don't have a record of those other games Barnes played, not even how many of them he played; his playing time and all the details are unobserved, practically (existing records have not been compiled). So "who needs more?" is who cares what Barnes achieved in those other games. Maybe he and Harry Schafer played every day in 1871, and performed equally as batsmen in those other games. That isn't likely if they were both trying, but it isn't impossible either. More important, there is no reason to suppose Barnes surpassed Schafer in those games by the same margin he surpassed Schafer in their NAPBBP games. Statistics as a discipline shows, tells, teaches how to make some some auxiliary postulates, interpret the historical record as a sample, and express what we know about those other games (in terms of estimates and probabilities that jointly quantify what we know and don't know).

We do suppose that Barnes and Schafer were "trying", or they were obliged to "try" in those other games of the 1870s. That's most of the distinction from Dave Cash's and Richie Hebner's play in exhibition games of the 1970s. Nevertheless, no one cares much about those other ballgames even in the 1870s.

"A pennant is a pennant" expresses an important principle here. Perhaps some participants treat it as a constitutional obligation. For some it is a personal guideline. Almost everyone takes it seriously, no one simply dismisses it. Routinely it means uniform weight for all major league pennant races, and debate tinkers with the details (what's a major league? how if at all do we pay attention to minor league seasons?). At the same time, however, it means uniform weight zero for everything else: assaulting a nurse on the street, driving a car while intoxicated, muffing a fly in March.
   366. Bleed the Freak Posted: January 02, 2011 at 11:38 PM (#3722035)
Joe Dimino - to answer your question in the 2011 ballot discussion post 327

And if you can find me a good new proxy for team defense or a way to get at the old BPro cards (since the BPro cards no longer have what I need), I'm all ears!

A hat tip to Chris Jaffe for mentioning the old-school DT Cards in an article he wrote about Omar Vizquel:

http://www.hardballtimes.com/main/article/when-do-we-start-taking-omar-vizquels-cooperstown-case-seriously/

Under the references and resources section, he lists the direct link to Vizquel, and mentions that, to query for other players, using the baseball-reference (Lahman database) abbreviation will net the correct result.

http://www.baseballprospectus.com/dt/vizquom01.shtml

I hope this is helpful.
   367. Carl Goetz Posted: December 05, 2017 at 03:37 PM (#5586667)
I think this would be the correct thread to which to post this question.

Does anyone know if there is a standard error associated with WAR? For example lets say the top WAR in a league is Player A with 8.0. How far behind 8.0 would Player B have to be before we could say with 66% or 95% certainty that Player A was the better player that year?
Or does each player have a separate error associated with him?

Hope my questions makes sense. I probably didn't word them in the best way.
   368. Carl Goetz Posted: December 08, 2017 at 01:46 PM (#5588858)
One other question. I've been using BaseRuns (Baseball Gauge) for measuring offensive WAR on the premise that is is more predictive on a team level. I've been doing a lot of reading on wRAA (Fangraphs and BBRef) which is based on Tom Tango's wOBA (and its corollary wRC+) and it sounds like wRAA may be more accurate on a player level. My problem is that I have Tom Tango's book as well as articles on the subject from Baseball Gauge, Fangraphs, and BBRef which in my mind are all biased toward their preferred stat.
Does anyone here have an opinion on the matter?
Does one do a better job at being fair to players in all eras?

No rush. At this point, I'm not switching for the 2018 ballot, but is something I'd like to consider going forward.
   369. Mr Dashwood Posted: December 08, 2017 at 03:00 PM (#5588926)
Does anyone know if there is a standard error associated with WAR?

I thought I read something about that on the TangTiger.com blog in one of the threads spun off the Bill James' article on WAR, but I can't find now if I did.

I am not aware of any serious attempt to establish one, but I am probably not the best placed to know for sure. People normally talk imprecisely of it being around half a WAR margin in a season, with the fielding component being less precise than pitching and hitting.

That adds up to quite a bit over a long career, for an exercise such as the HoM, but something that can be less problematic for a single season. Of course, one can always assume that the plus or minus effect evens out over many seasons, which I suppose isn't an unreasonable position to take.

Does one do a better job at being fair to players in all eras?

Recently I was looking at the 1899 Cleveland Spiders, and it was my impression that the replacement level in bWAR was lower than than in fWAR and gWAR. IOW, by bWAR players were more likely to have positive scores and less likely to have extreme negative scores.

This is complicated somewhat by the use of FIP in fWAR, which makes 1899 pitchers look much less worse than under bWAR and gWAR. Examining the players' post-Spiders' careers, I came to the conclusion that a FIP-based WAR doesn't capture the expectations of turn-of-the-century baseball. That RA-9 WAR mattered. Pitchers with bad bWAR and gWAR didn't have careers afterwards.

I didn't notice any difference in particular effects between the consequences of BaseRuns' based gWAR and the wRAA of bWAR and fWAR for hitters in terms of future careers, but I wasn't looking that closely because numbers were similar, unlike the case for pitchers. If there is an effect, on the basis of this bit of research I imagine it is quite small.
   370. Rally Posted: December 08, 2017 at 03:16 PM (#5588946)
Does anyone know if there is a standard error associated with WAR?


I think the only WAR to publish with a standard error is the Open WAR project.

Or does each player have a separate error associated with him?


I don't really know how you would start calculating a standard error around it, but yes, each player would have a different range. For one, it depends on playing time. I don't know if Aaron Judge is 8 WAR +/- .5, 1.5, 2.5 or whatever. But I know what my WAR last season was (and the WAR for Mickey Mantle in 2017, my cat, and a random fire hydrant on the street): 0.0, with no error.

Some people are going to object to me saying someone with no playing time has a WAR of 0.0, but whatever. Take someone who had one at bat. Their WAR is within a rounding error of 0.0, and the error around it will be pretty small.

Playing time aside, I would say that if David Ortiz and Derek Jeter each play 155 games with a WAR of 6.0, Jeter will have the larger error range. That is because we can assume a similar range around their offense, but while Jeter's defense is known within a big error range, Ortiz's defense has almost no error range since he probably played 5 games in the field all year.
   371. Rally Posted: December 08, 2017 at 03:20 PM (#5588952)
I don't think one method of rating offense is much different than the others, assuming you are using one of the sabermetric inventions and not RBI or something. You'll get pretty much the same rank order for the great hitters.

This is complicated somewhat by the use of FIP in fWAR, which makes 1899 pitchers look much less worse than under bWAR and gWAR. Examining the players' post-Spiders' careers, I came to the conclusion that a FIP-based WAR doesn't capture the expectations of turn-of-the-century baseball. That RA-9 WAR mattered. Pitchers with bad bWAR and gWAR didn't have careers afterwards.


FIP works for modern pitchers, but back in the 1800's there were relatively few strikeouts, almost no homeruns, and few walks, especially back when it took more than 4 balls to earn a walk. You just can't build a pitching metric off of rare events and expect it to mean anything.
   372. Carl Goetz Posted: December 08, 2017 at 04:44 PM (#5589026)
"I came to the conclusion that a FIP-based WAR doesn't capture the expectations of turn-of-the-century baseball. That RA-9 WAR mattered."
"FIP works for modern pitchers, but back in the 1800's there were relatively few strikeouts, almost no homeruns, and few walks, especially back when it took more than 4 balls to earn a walk."
fra paolo and Rally both said similar things with regard to FIP so I thought I'd address them together. I get that FIP works better for a modern pitcher and RA-9 for 1890s pitcher. Question then becomes, where should the change occur in that 120 year period? My gut says RA-9 at least until WWII and FIP for at least the last 20-30 years, but still not sure how to treat the interim time period. Is there even an objective way to answer this? I definitely want to be fair to pitchers from all eras, just not sure the best way to do so.

"For one, it depends on playing time. I don't know if Aaron Judge is 8 WAR +/- .5, 1.5, 2.5 or whatever."
For career WAR purposes, I assume the errors tend to even out over a players career. I'm looking to do a project where I pick my own MVPs, CY Youngs, and All-Star teams (based on full season play, instead of 2-3 months and without the pesky tendency to pick all-stars just because they have a reputation for being All-stars) for each year going backward. Then, I can have a total AS appearances for a player that means something to me (plus it seems like a fun project). So for these purposes, I'm looking for a standard variance for players in the Aaron Judge/Jose Altuve tail of the distribution. I've heard the +/- .5 WAR figure before implies that if 2 MVP candidates are with 1 WAR of each other, we can't be certain (to a high degree anyway) or which was the better player.
My thought for the MVP side of the equation was to take the player with the highest WAR plus any players with a WAR within a certain error of that player and basically award them all the MVP for that year (and same with pitchers for Cy Young). The idea is then when I look at a player for HoM consideration, instead of asking the question "How many times did he win the MVP?", I'm recognizing and embracing the error involved and am instead answering the question "How many times could he have been considered the best player by a reasonable person?" I feel like that's much more useful and removes some of the subjectivity of selecting a single MVP in a year where maybe 3 guys were reasonable picks. The question becomes, "What is that error level that accomplishes this?" Maybe its just a question of my asking myself how many MVPs am I comfortable with having in a given year.
   373. Rally Posted: December 11, 2017 at 03:12 PM (#5590314)
I get that FIP works better for a modern pitcher and RA-9 for 1890s pitcher. Question then becomes, where should the change occur in that 120 year period? My gut says RA-9 at least until WWII and FIP for at least the last 20-30 years, but still not sure how to treat the interim time period. Is there even an objective way to answer this? I definitely want to be fair to pitchers from all eras, just not sure the best way to do so.


I just prefer to use RA-9 WAR for them all, as long as you are trying to value what they did and not project them going forward. FIP can't work for the early pitchers FIP usually works OK for the moderns, on a career level they usually come out pretty close. When you do find a guy with a big difference over a large sample there are usually real reasons why the pitcher (Glavine) is better or worse than his FIP.
   374. Rally Posted: December 11, 2017 at 03:13 PM (#5590318)
Of course I am a bit biased on that point.
   375. Carl Goetz Posted: December 11, 2017 at 03:36 PM (#5590354)
I get that. I've actually been doing a fair amount of reading into the various WAR methods since I wrote post#372. I'm alot more comfortable with; and am probably going to switch to; RA9 based WAR for next year. I don't have time at this point to switch for this year's ballot. I'm also considering switching from BaseRuns to wRaa for hitting WAR. I feel as though there is some skew for a hitter on a good offensive team (ie his offense adds more to the total offense than it would if you add to an average team). Even the proponents of BaseRuns seem to say its better on the team level and linear weights is better on an individual level.
   376. Carl Goetz Posted: November 20, 2018 at 11:32 AM (#5789613)
Anyone know if there is a quick and dirty way to convert Win shares to a WAR scale? I definitely prefer WAR, but want to use Win Shares as a check to investigate players WAR may be seriously over or underrating. The different replacement levels and Win Shares' 3 win scale make that somewhat difficult just looking at raw numbers.
   377. Michael J. Binkley's anxiety closet Posted: November 20, 2018 at 12:14 PM (#5789651)
Carl-

The quick and dirty method I used to use for position players:

Start with the assumption that an average position player gets about 15 WS over a full season. Given that according to WAR, a full-season average player is worth about 2 wins more than a replacement player, subtract 6 from 15, thus 9 WS per season for a replacement player. So subtract 9 WS * % of season played from a players WS and then divide that number by 3 to get to an approximate WS to WAR equivalent.

Pitchers are much tougher because WS gives a much higher bonus to relievers than WAR, and also WS values showing up (raw IP totals) more than WAR. And thus I haven't figured a good conversion for pitchers.
   378. Carl Goetz Posted: November 20, 2018 at 12:25 PM (#5789666)
Thanks Michael. That helps a lot. Pitchers are tough to begin with, so I completely understand that issue.
   379. bjhanke Posted: July 13, 2023 at 05:13 AM (#6136931)
Michael Brinkley's (#377) method for trying to figure out where a .294 winning percentage (which is what WAR claims to be the Replacement Rate) is a pretty good estimate for positions players playing an average defensive position. The most important and least important positions require a small adjustment from that, because they are no longer in a context where "15 WS" is a good estimate of an average position payer's full-time value.

WS does not itself have a Replacement Rate, so I don't understand the comments about its RR being too low or too high. WS has a pro point and calculates from there. That zero point can't be compared to WAR's RR. It can be compared to WAR's Runs Method zero point, which s the average. WS's zero point is vastly superior to WAR's zero point, if for no other reason than that it allowed a reasonable approach to ranking fielding.

One good reason for using WS (and it is my personal reason) is the you can then use the New Historical's Player Ranking system of Accumulated Regular Season WS, Top Three non-consecutive years, Top Five consecutive years, Timeline and Subjective. This is by far the best Hall system that I have ever seen. Nothing else is even close. You could, of course, put together a Ranking System of similar complexity fueled by WAR, but as far as I know, no one has ever done it and published the results. You can also use the system without a component (usually Timeline), because all f the components except Subjective are trivial to figure out, and Subjective is easy to approximate just by looking at who ranks ahead of whom in the New Historical and looking for discrepancies between placement and the First Four components.

For what it's worth, I contend that there clearly is a Timeline, but it is not linear, which is what WS has. I contend that it tracks Fielding Percentage or Strikeouts or any other stat that changes very rapidly in early baseball, but very slowly in the later game. I use that Timeline for 19th century players, and then multiply their stats out by the length of their teams' schedules. Works great for position players.

For pitchers, I just assume that a "season" consists of 34 Starts, and use that to break down the pitchers' numbers into what look much more like modern seasons, by taking "orphan" parts of actual seasons and adding them to the start of the next actual season.. I can then compare very early pitchers to modern ones without much trouble. The main thing I have to look out for is the illusion of unnatural consistency in the early guys - season after season of 34 starts, with no injury years. That's not that hard.

Let's say a pitcher has these four seasons' worth of starts: 1880 = 4 starts (rookie). 1881 = 52 starts (staff ace that year). 1882 = 38 starts (too much workload in 1881 led to sore arm). 1883 = 48 starts (back to ace status). Here's what I translate that to:

1880 = 4 starts
1881 a = 44 starts, all from 1881, leaving a orphan of 52-44=8
1881 b = 44 starts, 8 from 1881 and 36 from 1882, leaving an orphan of 38-36=2
1882 = 44 starts, 2 from 1882 and 42 from 1883, leaving an orphan of 48-42=6. Those starts will be the first 6 starts of the 1883 season (or 1883 a season).

As fr what stats to associate with these seasons, consider the 1881 b season. It has 8/52 of the real 1881 season's stats, and 36/38 of the 1882.

This produces careers that look a LOT more like modern workloads. And This REPLACES the Timeline! You get Hoss Radbourne looking like he has a Roger Clemens carer instead of the real Hoss Radbourne, but that's what you want, if you're going to compare them.
   380. Bleed the Freak Posted: July 13, 2023 at 02:31 PM (#6136967)
WS's zero point is vastly superior to WAR's zero point, if for no other reason than that it allowed a reasonable approach to ranking fielding.


Can you expound?

I'm not sure what you are trying to explain here, but the evidence discussed in this thread and elsewhere, I don't see where WS uses a better or yields a better result when it comes to fielding evaluation.


One good reason for using WS (and it is my personal reason) is the you can then use the New Historical's Player Ranking system of Accumulated Regular Season WS, Top Three non-consecutive years, Top Five consecutive years, Timeline and Subjective. This is by far the best Hall system that I have ever seen. Nothing else is even close. You could, of course, put together a Ranking System of similar complexity fueled by WAR, but as far as I know, no one has ever done it and published the results.


The electorate here have come up with ways to evaluate and publish through discussion boards or have specific websites, threads, etc, of which I am comfortable in stating is an improvement on what WS and Bill James has offered.

Kiko Sakata - https://baseball.tomthress.com/

Dr. Chaleeko - https://horsehidedragnet.wordpress.com/
https://homemlb.wordpress.com/

He's not an active member here, but Matthew Cornwell has done amazing work, of which a thread is dedicated here:
https://www.baseball-fever.com/forum/general-baseball/history-of-the-game/3629665-2022-parcs-d-update

Dan Rosenheck has a page here for data from 1893-2005 - www.baseballthinkfactory.org/files/hall_of_merit/discussion/dan_rosenhecks_warp_data

Personally, I leverage the work done by these folks, as well as other factors, to come up with an evaluation that I feel exceeds the WS system...and I feel the same for the work put in and methods used by other members of our electorate.
Chris Cobb doesn't have a publicly available, all encompassing system that I am aware of, but I would chose his methods over a rote or raw WS/WAR/whatever analysis.
   381. Chris Cobb Posted: July 14, 2023 at 01:13 PM (#6137025)
So let's talk about James' ranking system a bit. First, some broad context for taking a look at it.

James’s ranking system has two great virtues. First, it emphasizes the importance of taking a multi-faceted approach to the assessment of player value, and, second, it does a good job of identifying the aspects of value that should be included. The conceptual framework for ranking that James’s system developed has guided work on the assessment of player value ever since, and I would say that the large majority of HoM voters over the years have taken a multi-faceted view of value, ultimately on the basis of James's arguments for it. The scope of the project that James undertook to enable him to create player rankings that took an approach to value that was both multi-faceted and precise in the New Bill James Historical Baseball Abstract was amazingly ambitious, and its results had very significant positive impacts on the historical assessment of baseball players. The Hall of Merit project in particular benefited greatly from its release during the period in which the project was being conceived, and the early influence of Win Shares on the project is demonstrated by the fact that the timing of the first election was adjusted so that the electorate would have access to the Win Shares follow-up to the NBJHBA prior to the first election being held. NBJHBA is a work I still consult from time to time, and it’s always a provocation to fresh thought.

James' specific ways of assessing the aspects of value that he identifies as salient have a number of shortcomings, however. These can and have been identified and improved upon. It is not surprising that the first approach to implementing a multi-faceted system for the assessment of player value would have shortcomings that further work would identify and revise, especially when that system was being developed at the same time that the inventor was also working on a new system for generating the data that the assessment system would work with. The Hall of Merit project relied on the NBJHBA, especially in its first five years, but even at the time the book was released many of its participants were motivated to improve upon its work, especially its treatment of nineteenth-century players. I’d say we emulate Bill James better by striving continually to bring fresh and rigorous analysis to bear on baseball than by adhering as closely as possible to the systems as he developed them.

So, the virtue of James' multi-faceted system is its integration of three different views player value: career value, peak value, and rate of production value and placing them all in historical context. The shortcomings lie in the particular way of quantifying each of these values and integrating it a comprehensive evaluation. Here's a quick run-through of the shortcomings as I see them.

(1) Historical context. These are the most obvious, I think. Brock has already mentioned that the linear and chronological timelining James includes in the system doesn't accurately reflect the way that quality of competition has evolved: something more flexible and more clearly tied to evidence of competition levels is needed. It's an obvious shortcoming, but probably the hardest overall to do better with. There is still no widely agreed upon way of quantifying changes in competition quality and incorporating that into rankings, but there are a lot of attempts out there that are better than what's in the NBJHBA. Also on historical context, the failure of the system to include an adjustment for differing season lengths is an obvious hole. There are simpler and more sophisticated approaches to adjusting for season length. Most HoM voters have incorporated something along these lines.

(2) Valuing rate of production. James' way of incorporating rate of production by using a career rate measure is very problematic. For a the rate stat to allow for accurate, meaningful comparison between players, the period over which the rate is calculated needs to be consistent. Career rate doesn’t do this, and so it incorporates inaccurate, potentially misleading information into the system's results. A rate stat with a consistent period of games or seasons is needed. I think a lot of systems being used don't include a rate measure at all. I think that including a rate measure is important, and it's one of the things that I particularly like about my own system.

(3) Balancing career and peak value. The balance the NBJHBA system strikes between peak and career value is weighted too heavily toward peak. The re-scaling of career value using the harmonic mean reduces the weight of career value in the system excessively in relation to the system’s two peak components. The Win Shares metric invites this systemic mistake by making the opposite error: it overvalues career by setting the baseline too low. Therefore, in a ranking system using Win Shares, some re-scaling of career value is needed. Otherwise, career value would outweigh all other sorts of value. Even so, the harmonic mean method is excessively reductive of career value. This element needs improvement. Switching to a comprehensive metric that sets its baseline higher--at a point that gives more weight to value above average relative to value between the baseline and average--avoids this problem, and pretty much every other comprehensive metric uses a higher baseline than Win Shares. Any system that allows career value to be used without drastic adjustment will have results that relate more transparently the foundational measures on which it is based.

So those are three elements of James' ranking methodology that I see as needing serious reconsideration. I think that kind of reconsideration has been undertaken in a variety of different ways by HoM voters and others who have constructed player ranking systems. They may or may not succeed. We're mainly still working, though, with the elements of value that James identified, and that makes his work a valuable jumping off point for the whole topic.
Page 4 of 4 pages ‹ First  < 2 3 4

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Dynasty League Baseball

Support BBTF

donate

Thanks to
Kiko Sakata
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.9534 seconds
41 querie(s) executed