Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Hall of Merit > Discussion
Hall of Merit
— A Look at Baseball's All-Time Best

Monday, May 15, 2006

1977 Ballot Discussion

1977 (May 15)—elect 2
WS W3 Rookie Name-Pos (Died)

332 115.0 1954 Ernie Banks-SS/1B
257 90.6 1955 Jim Bunning-P
175 71.3 1954 Camilo Pascual-P
183 51.0 1960 Tony Gonzalez-CF
160 52.9 1955 Clete Boyer-3B
145 49.7 1958 Jim “Mudcat” Grant-P
137 52.0 1960 Jim Maloney-P
148 47.4 1962 Dean Chance-P
134 36.3 1961 Zoilo Versalles-SS (1995)
114 42.0 1958 Stan Williams-P
114 41.8 1960 Dick Ellsworth-P
110 40.8 1955 Dick Hall-RP
123 31.0 1959 Lee Maye-LF/RF (2002)
118 32.1 1961 Chuck Hinton-LF/RF
096 37.4 1960 Clay Dalrymple-C
109 31.3 1962 Mack Jones-CF/LF (2004)
105 24.3 1964 Tony Conigliaro-RF (1990)

Players Passing Away in 1976
HoMers
Age Elected

88 1939 Red Faber-P
86 1939 Max Carey-CF
68 1964 Wes Ferrell-P

Candidates
Age Eligible

89 1930 Larry Gardner-3b
79 1944 Jimmy Dykes-3B/2B
77 1941 Earle Combs-CF
77 1942 Firpo Marberry-RP
76 1942 George Earnshaw-P
75 1950 George Scales-2B
73 1934 Ernie Nevers-RP/NFL HOF
73——Tom Yawkey-HOF Owner
67 1949 Lon Warneke-P
59 1957 Danny Murtaugh-2B/Mgr
59 1962 Jim Konstanty-RP
55 1957 Dan Bankhead-RP

Upcoming Candidates
29 1982 Danny Thompson-SS
29 1982 Bob Moose-P

Thanks, Dan!

John (You Can Call Me Grandma) Murphy Posted: May 15, 2006 at 11:05 PM | 178 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 2 of 2 pages  < 1 2
   101. TomH Posted: May 22, 2006 at 07:13 PM (#2032145)
Win Shares rank of Frank Chance, AL and NL combined

1903 6th
1904 10th
1905 24th
1906 5th
Unlike today's game, many who finsihed ahead of him were pitchers. Considering his team's success, KJOK's posit is at least plausible.

overall for 4 years
Wagner 170
McGinnity 128
Mathewson 122
Chance 120
Chesbro 120

Yeah, that Honus guy was good. Especially considering those weren't necessarily his best years.

Small note on Honus: MVP voting did not occur until 1911. In 1912, the 38 yr old shortstop (finally) didn't win a batting or slugging title. And he finished 2nd in MVP voting.
   102. DavidFoss Posted: May 22, 2006 at 07:27 PM (#2032153)
1903 6th
1904 10th
1905 24th
1906 5th
Unlike today's game, many who finsihed ahead of him were pitchers.


That begs the question as to what his rankings are among position players in those years. How about among NL position players?

(sorry, I'd check myself but my digital win shares is at home).
   103. Dr. Chaleeko Posted: May 22, 2006 at 08:45 PM (#2032212)
i've got Chance leading all NL 1Bs in WS in 1903, 1904, 1905, 1906, 1907, 1908. Interestingly, these are his only All-Star type years (A-S defined as top-two at position in an 8-team league).
   104. Esteban Rivera Posted: May 22, 2006 at 10:05 PM (#2032266)
Another thing to remember about Chance is that his team hit the fielding win shares cap every year from 1904 to 1910. This, along with the question of whether first base defense should get more defensive value during this time could mean one or two extra win shares for Chance that he is not receiving.
   105. TomH Posted: May 22, 2006 at 10:25 PM (#2032286)
extending the table

name.... yrs used .......games OWP
Browning 1882-93 (x89) 1097 .759
Minoso..... 1951-61 .......1043 .655
Kiner....... 1946-55 .......1472 .693
Keller..... 1939-51 (x44) 1168 .748
C Jones... 1876-87 ........ 875 .700
Johnson... 1933-45 .......1863 .651

Sisler...... 1917-22 ........ 966 .737
Chance....1900-11 ....... 1155 .735

Since some have been comparing Keller to Sisler, I'll use Keller's 7-yr prime for a more direct comp:

Keller..... 1939, 41-46 ... 738 .777
Sisler...... 1917-22 ........ 966 .737

Okay, I cheated, it subs Keller's 39 for his 40, and of course he missed time in 1944-45 for the war. This shows that at their best:
1. Keller was a better hitter. Point not even really up for arguing, is it? The diff in OWP of 40 pts is about the same as .025 in batting avg.
2. Sisler was a little more durable, but not much, IF you assume Keller would have been as healthy in 44-45 as other years.

I haven't had Charlie on my ballot yet, but he may vault Sisler this week.
   106. TomH Posted: May 22, 2006 at 10:30 PM (#2032294)
BP 'translated stats' (for those who believe the league-strength adjustments, which dock Sisler relative to Keller about .020 in OPS) for Sisler and Keller:

name yrs used PA .OBP SLG
Keller 39-51 .4618 .402 .607
Sisler 16-22 .4823 .384 .567
   107. TomH Posted: May 22, 2006 at 10:31 PM (#2032296)
Minoso..... 1951-61 .......1643 .655

one thousand SIX HUDNRED forty three games, Tom, get it right!!!!!!!
   108. KJOK Posted: May 22, 2006 at 11:00 PM (#2032319)
That begs the question as to what his rankings are among position players in those years. How about among NL position players?

Since I reject Win Shares as the proper method to value players, I'll post a different measure:
NATIONAL LEAGUE RCAP LEADERS

1903 NL
1    Honus Wagner         77
2    Frank Chance         66
3    Roger Bresnahan      46
4    Jimmy Sheckard       44
5    Mike Donlin          34
6    Johnny Kling         32
7    Fred Clarke          31
8    Harry Steinfeldt     25
9    Roy Thomas           23
T10  Ginger Beaumont      21
T10  Cy Seymour           21

1904 NL
1    Honus Wagner         89
2    Mike Grady           39
3    Frank Chance         35
4    Harry Lumley         33
5    Roy Thomas           28
6    Art Devlin           25
7    Mike Donlin          24
T8   Jake Beckley         20
T8   Roger Bresnahan      20
T8   Jim Delahanty        20

1905 NL
1    Honus Wagner         89
2    Cy Seymour           65
3    Mike Donlin          50
4    Frank Chance         47
5    John Titus           35
T6   Roger Bresnahan      33
T6   Mike Grady           33
T8   Miller Huggins       29
T8   Dan McGann           29
T10  Art Devlin           26
T10  Sam Mertes           26

1906 NL
1    Honus Wagner         81
2    Harry Lumley         59
3    Frank Chance         53
4    Roger Bresnahan      47
5    Art Devlin           44
6    Sherry Magee         42
7    Sammy Strang         41
8    Harry Steinfeldt     37
9    Johnny Kling         28
10   Tim Jordan           26

1907 NL
1    Honus Wagner         88
2    Sherry Magee         53
3    Ginger Beaumont      32
4    Roger Bresnahan      28
T5   Frank Chance         26
T5   Fred Clarke          26
7    Dave Brain           24
T8   Tim Jordan           21
T8   Johnny Kling         21
T8   Sammy Strang         21 
   109. ronw Posted: May 23, 2006 at 03:01 AM (#2032854)
You know, it is tables like KJOK's that make me think, "Holy crap Wagner was good."

Really a man among boys.
   110. DavidFoss Posted: May 23, 2006 at 03:46 AM (#2032898)
RCAP actually *underrates* Honus. With only 8 SS's in the league, he does a pretty good job of raising the average RC for the position.
   111. OCF Posted: May 23, 2006 at 07:15 AM (#2033038)
You know, it is tables like KJOK's that make me think, "Holy crap Wagner was good."

Really a man among boys.

RCAP actually *underrates* Honus. With only 8 SS's in the league, he does a pretty good job of raising the average RC for the position.


Note that KJOK stopped with the 1907 season. Wagner had his best single season in 1908.

In my ballot comment I said something about not having figured out where to rank Banks agains the likes of Vaughan, Cronin, and Boudreau. There's a reason that Wagner's name didn't appear in that comment.
   112. Joey Numbaz (Scruff) Posted: May 24, 2006 at 09:28 AM (#2035120)
OK, I have the Sabermetric Encyclopedia.

I want to start with DERA and translated IP. For those unfamiliar DERA is Baseball Prospectus' Defense Independent ERA. "Normalized runs" have the same win value, against a league average of 4.5 and a pythagorean exponent of 2, as the player's actual runs allowed did when measured against his league average. DERA adjusts this for defense. It's not DiPS related.

I think there is a problem with that. At a 4.50 R/G environment, the exponent should be 1.87, not 2.00. But that's if the hitters and pitchers are at 4.50. The pitchers we are evaluating a generally well below 4.50, especially in their big seasons. So the exponent should generally be even lower than 1.87.

Take Walter Johnson 1913 - He allowed 1.46 R/9. League was 3.93. Using PythaganPat, I get an .833 WPct.. Plugging in his NRA (2.00) with 4.50 as league and an exponent of 2, I get an .835 WPct.

Or his 1907 - Johnson allowed 2.86 R/9. League was 3.66. Using PythaganPat, I get a .599 WPct. Plugging in NRA with 4.50 as league and an exponent of 2, I get a .594 WPct.

That seems OK especially since they are using PythaganPort, and not PythaganPat, which accounts for being a few % points off. But the problem is that in a 4.50 R/G environment 2.00 RA is only an .800 WPct, not .833, because the exponent should be 1.708, not 2. 1.74 RA in a 4.50 environment has the same win impact as 1.46 RA in a 3.93 environment. So they've overstated his NRA by 0.26, which is a huge difference.

Or take 1907. In a 4.5 R/G environment 3.55 RA/G has the same win impact as 2.86 RA/G does in a 3.66 R/G environment. So his NRA (3.72) is overstated, it should be 3.55.

What they should be doing is using a PythaganPat exponent for figuring the pitchers actual win impact based on his runs allowed - and then translating to a 4.50 environment with the PythaganPat exponent there as well.

To get this spreadsheet to calculate right, I need to figure out how to take

WPct of (lRA/pRA) = WPct of (4.5/nRA)

where WPct =
(lRA^((lRA+pRA)^.286))
________________________________________________

((lRA^((lRA+pRA)^.286)) + (pRA^((lRA+pRA)^.286)))

And solve for nRA

It's easy enough to do for a single season by just changing ERA manually until the WPct match up - but that's not feasible for something this big (every season of every pitcher worth evaluating).

Any math guys know how I can get excel to do this?
   113. Joey Numbaz (Scruff) Posted: May 24, 2006 at 09:30 AM (#2035121)
By the way, to summarize, the main problem with trying to figure out the above, is that I get a circular reference. The PythaganPat exponent is dependent on the eventual RA number that I come up with.
   114. Joey Numbaz (Scruff) Posted: May 24, 2006 at 11:03 AM (#2035126)
Crap - I didn't adjust for the parks in the above example, argh. Will redo in a minute, but just in case I don't, I wanted to post that.
   115. Joey Numbaz (Scruff) Posted: May 24, 2006 at 11:29 AM (#2035132)
Washington PPF for 1907 was 95 and for 1913 was 101.

Adjusting . . .

For 1913, league becomes 3.969, WPct is .836 given Johnson's RA. At 4.50, with 2.00 RA/G, we get a WPct of .835 with a 2 exponent. But the exponent should be 1.708, so at 2.00 RA/G you are really only winning at an .800 clip. RA/G should be 1.71 in a 4.50 R/G environment to get a WPct of .836.

For 1907, league becomes 3.477, WPct is .583 given Johnson's RA. At 4.50, with 3.72 RA/G, we get a WPct of .594 with a 2 exponent. But the exponent should be 1.827, so at 3.72 RA/G you are really only winning at an .586 clip. RA/G should be 3.75 in a 4.50 R/G environment to get a WPct of .583.

So you can see it makes less difference the closer to 4.50 the pitcher's ERA gets. At the extreme levels, big years (ERA wise, IP don't matter here) are being underrated.

So in a nutshell I still have the issue of solving for one independent variable (RA/G) and one dependent variable (PythaganPat) in the same equation.

Assuming I can get this figured out, I'll just take the number I get, and then do DERA/NRA*myRA to adjust for defense - since I trust Prospectus' adjustment for defensive support. That's the proper way to do it right?
   116. TomH Posted: May 24, 2006 at 12:52 PM (#2035148)
Charlie Keller helped his teams in postseason play.

His overall stats are good (SLG above even his regular season avg), he scored and drove in runs, and he had one of the 20 (maybe 10) most important World Series hits ever - after the famous Mickey Owen dropped strike / passed ball that would have ended game 4 of the 41 Series, Keller later doubled home two runs, turning a 4-3 deficit into a 5-4 Yankee win. Huge hit with 2 outs in the 9th inning.
   117. jimd Posted: May 24, 2006 at 06:05 PM (#2035421)
By the way, to summarize, the main problem with trying to figure out the above, is that I get a circular reference. The PythaganPat exponent is dependent on the eventual RA number that I come up with.

Maybe that's why Davenport is using 2? Try dropping him an email about this?
   118. Rob_Wood Posted: May 24, 2006 at 06:59 PM (#2035489)
Joe, If I am understanding your question correctly, Excel has the capability to solve these types of problems using Goal Seek. You set up your spreadsheet so that you want a certain cell to be a certain value by changing some other cell. Let me know if this doesn't work in which case I'll look more closely at the equation(s) you are trying to solve.
   119. Rob_Wood Posted: May 24, 2006 at 08:48 PM (#2035639)
Sorry Joe, I reread your post and now realize that you are looking for a closed-form formula for NRA in terms of pRA, lRA, and 4.50. After fiddling around with it for awhile, I think you are out of luck. The relationships are too tangled to solve the implicit function explicitly. I have even tried to use an approximation for the log function (and the log log function) but that doesn't seem to completely get us to where we want to go and the approximations are not accurate over the entire range of possibilities you will be using.

Even though my initial reply above was premature, I think it gives the roadmap to what you'll need to do. Goal Seek is Excel's way to do each pitcher-season separately. Since this is very time consuming, I suggest doing each season of a specific pitcher all in one fell swoop. Goal Seek's big brother is Solver. It can reach a goal by moving multiple cells.

So for each season create a cell for the required difference between the two win pcts (like you displayed above). Create another cell for the square of this difference. Then create a cell for the sum of all these squared differences across all seasons for one pitcher. Have Solver try to minimize this "meta-sum" by adjusting the collection of the pitcher's seasonal NRA's (these are the figures you seek). Of course, the minimum value will be 0 where all the squared differences are zero, so it will find the NRA for each season that equates the two win pcts you are analyzing.

Now that I think about it, I suppose you can try to set up a "meta-meta-sum" over ALL pitcher-seasons in your spreadsheet since the overall minimum would be found where all the pitcher-season squared differences are zero. To make sure things are working properly I would start first with one pitcher and one season, then hook up all the seasons for one pitcher, and, if that works, hook up multiple pitchers.

Let us know if any of this is helpful!
   120. Joey Numbaz (Scruff) Posted: May 25, 2006 at 12:19 AM (#2036021)
I'm not sure I've entirely absorbed that yet Rob . . . let me see if I can find out some more info on this 'goal seek' you are referring too.

Alternatively - I think I have an idea - what if I figured out what the myRA (my tweaked NRA, we'll get a better name later) for every WPct from 1 to 0. That would give me a table. Then I could have excel refer to the table based on the pitcher's actual RA WPct, right?

Assuming I get something working, idea for using DERA/NRA*myRA to get my adjusted DERA works fine, right?

Jim, I thought about dropping him an email - but I imagine he has access to much better math tools than Excel - I'm guessing he just uses 2 because it made sense back then. But you are right, I should drop him an email and ask him about it.

As far as what I wrote above, does it all makes sense? I'm on solid ground right, I'm not finding something that isn't there, am I?
   121. Joey Numbaz (Scruff) Posted: May 25, 2006 at 07:06 AM (#2036332)
Rob - goal seek worked. Sweet, thanks!

Now I need to figure a way to have it do that automatically. Very, very good. Thanks again.
   122. Joey Numbaz (Scruff) Posted: May 25, 2006 at 07:14 AM (#2036337)
You are correct though, very time consuming. I guess I'm not sure I understand this:

So for each season create a cell for the required difference between the two win pcts (like you displayed above). Create another cell for the square of this difference. Then create a cell for the sum of all these squared differences across all seasons for one pitcher. Have Solver try to minimize this "meta-sum" by adjusting the collection of the pitcher's seasonal NRA's (these are the figures you seek). Of course, the minimum value will be 0 where all the squared differences are zero, so it will find the NRA for each season that equates the two win pcts you are analyzing.


Any chance you could explain that a little bit differently (like very basically?) I'm not really sure I understand - is it basically forcasting for each individual pitcher?
   123. Paul Wendt Posted: May 25, 2006 at 07:11 PM (#2036969)
>> Jim, I thought about dropping him an email - but I imagine he has access to much better math tools than Excel - I'm guessing he just uses 2 because it made sense back then. But you are right, I should drop him an email and ask him about it.
<<

Commonly, people solve these problems by writing programs rather than using canned applications (math tools). Davenport is a meteorologist, probably from a generation where everyone in the mega-number-bashing observational sciences writes programs, probably trained on mainframe computing.

You may be able to write a spreadsheet that will assist immensely in a solution by iteration.
The assistance will be immense it enables one iteration for every pitcher(-season?) at once.

Here's a sketch supposing a single column of data that needs correction, one magnitude for each pitcher.

- Data in column one. For now this is uncorrected data, iteration zero.
- Supporting data (doesn't need correction) and intermediate calculations in columns 3,4,5...
- Result of *one iteration of* the correction in column two. For now that is corrected data, iteration one.
That is the hard part.
- Copy magnitudes (not formulae) from column two to column one.
Now column one is corrected data, iteration one, and column two is the iteration two values.
- Iterate.

I think one could do this in the earliest GUI spreadsheet programs.
Can one generally do better in Excel?
   124. Rob_Wood Posted: May 25, 2006 at 08:26 PM (#2037138)
Joe, I think Excel's Solver is what you are looking for. Solver is an option under the Tools menu (at least if you have added in the Data Analysis add-in).

Here's a trivial example of how to use Solver just in case it is not obvious. In a blank spreadsheet enter 3 in cell A1, 5 in A2, 2 in B1, and 8 in B2. Then create the following formula: C1=A1-B1, C2=A2-B2, D1=C1*C1, D2=C2*C2, D3=D1+D2. In this silly example, we are seeking values in B1 and B2 that minimize the value of D3. Clearly the answer is B1 should be 3 and B2 should 5.

Go to Solver under the Tools menu. Set target cell to be D3; select the Min radio button; and select B1:B2 in the by changing cells box.

The idea is that you can use Solver to do multiple goal seeks at once. The analog in the silly spreadsheet is that you can run Goal Seek on cell B1 (to make D1 0) and then run Goal Seek again on cell B2 (to make D2 0). But using Solver allows you to do all the separate Goals Seeks all at once.

Anyway, feel free to email me one of your spreadsheets and I can hook it up for you as long as I can follow what you are doing.
   125. Joey Numbaz (Scruff) Posted: May 26, 2006 at 01:30 AM (#2037581)
Thanks again for the help Rob.

I just loaded the add in for solver - but I can't see how I can get it to calculate more than one cell at a time. It won't let me use a range for the 'set target cell', it has to be a single cell.

What would be nice would be if you could set a range on the target cells, and use a cell for the 'value of' portion, instead of having to manually input a number.

I'll send my sample spreadsheet along . . . I've been using Walter Johnson to work out the kinks.
   126. Joey Numbaz (Scruff) Posted: May 26, 2006 at 02:27 AM (#2037714)
Ah - I need to read more carefully. I'll try it again . . .

I set up a cell that takes the sum of all of a pitchers career RA/WPcts at his normal environment - his RA/WPcts when the league is converted to 4.50 R/G. Basically (1907 RA/G WPct - 1907 4.50 league RA/G WPct) + (1908 RA/G WPct - 1908 4.50 league RA/G WPct) etc..

Then I tried to solve for making this cell equal to zero. What it did was make some over zero and sum under zero to get a total of zero, but the individual years are off.

So I then tried it with making each year's portion of the equation an absolute value. BINGO!!! Total error is zero. I think we're onto something here - thanks Rob!
   127. Joey Numbaz (Scruff) Posted: May 26, 2006 at 02:57 AM (#2037766)
OK a few other issues - I'm not sure why they set IP of the average of top 5 in the league = to 275 throughout history.

Being top 5 in an 8 team league isn't nearly as impressive as being top 5 in a 16 team league. This biases the IP in favor of pitchers from smaller leagues. If 5 worked for an 8 team league, then we should use 6.25 for a 10 team league, 7.5 for a 12 team league, 8.75 for a 14 team league and 10 for a 16 team league. I can fix this in this spreadsheet.

Also if they are trying to set pitchers equal to a historic average runs level, why use 4.50? Since 1901 the AL has averaged 4.47 R/G, the NL 4.29. Something like 4.35 seems more appropriate to me, but that's a minor nitpick, but I can fix that too.

Here's another issue I have here regarding hitting.

If I take the IP and normalize to league leaders = 275 or whatever (I'll probably use the historic average through whatever the election year is), how do I equalize hitting?

Over the last 105 years, the average team has played a 155.6 game schedule. But pitchers have only had 130.2 games to bat - due to the DH effectively removing half the pitcher/hitter games over the last 34 seasons. If you go back to 1876, it's more like the average season was 148.0 games and only 127.6 pitcher batting games.

So what I think I should do is take RC above position (RCAP), since replacement level pitcher hitting is the average pitcher; divide by team games and then multiply by the average season's pitcher hitting games through the election year. We're normalizing innings based on the same thing, so that makes sense.

Of course, this will reduce the value of pitcher's who hit well over time - which is an issue. In one respect, I can see the side that says what this guy did had real value at the time. But we are also putting pitchers in a historical context here - and if that skill isn't transferable across time, it isn't as valuable historically. I'd be interested to hear different sides on this one.
   128. Chris Cobb Posted: May 26, 2006 at 04:24 AM (#2037831)
Hey, questions I can not only understand, but try to answer!

OK a few other issues - I'm not sure why they set IP of the average of top 5 in the league = to 275 throughout history.

Being top 5 in an 8 team league isn't nearly as impressive as being top 5 in a 16 team league. This biases the IP in favor of pitchers from smaller leagues. If 5 worked for an 8 team league, then we should use 6.25 for a 10 team league, 7.5 for a 12 team league, 8.75 for a 14 team league and 10 for a 16 team league. I can fix this in this spreadsheet.


I agree that this is a problem: I suspect that they did it for the sake of simplicity. Your change to the system sounds like an improvement, but I think that using the top values in the league, while attractive because it is fairly easy, is problematic because the top performers are the outliers, and we shouldn't expect their performance to be consistent. In effect, this approach to normalizing workloads can penalize pitchers of great durability, because they bring up the averages in their era. I think it's better to normalize at the level of the average pitcher, but that (as discussed on the Bunning thread) is difficult to arrive at by purely empirical methods.

See that discussion for more on the issue of normalizing pitcher workloads.

Also if they are trying to set pitchers equal to a historic average runs level, why use 4.50? Since 1901 the AL has averaged 4.47 R/G, the NL 4.29. Something like 4.35 seems more appropriate to me, but that's a minor nitpick, but I can fix that too.

Here, I suspect that they use 4.50 because 4.5/9 = 1/2. It's convenient to know that average runs allowed is always 1/2 of IP. I don't believe there's any "baseball value" rationale for this figure.

On pitcher hitting: I don't understand all the programming of Excel that you will be doing, so I don't know if the following proposal would be workable at all, but I would suggest that you use RCAP and normalize it by the average season's pitcher hitting games, but don't take out for the DH. Just consider all pitchers under the DH to be "average." That way, good-hitting pitchers and bad-hitting pitchers will get full credit/debit for their the contextual value of their hitting, pro-rated to the same workload standard as their pitching.
   129. Joey Numbaz (Scruff) Posted: May 26, 2006 at 04:29 AM (#2037833)
Also, what to do about pitcher hitting for the 19th Century?

We're actually reducing the number of innings they would have played to put them on equal ground with their modern counterparts - should their hitting contributions (or failures) be reduced proportionally as well? Obviously this would only apply if they didn't hit at other positions also.

I guess my options are to

1) take RCAP/tG*X with X being a constant # of games (130.2, 154, 162 are all reasonable)

2) take RCAP/IP*tIP

Still going to be a bear to properly evaluate the hitting of a guy like Caruthers.

And a guy like Ferrell who played 161 non-pitcher games (not counting his 13 OF games in 1933) should really have those extra PA compared to a PH, not a pitcher. So RCAP is going to overstate his hitting.

Lots of little things to deal with.

****************

Anyway the goal of all of this is to come up with a normalized pennants added superstat.

Basically once I get the RA/9 (adjusted NRA I guess) set to a neutral environment at the same WPct, I'll convert that to an adjusted DERA using aNRA/NRA*DERA - this will adjust for defense, using the Prospectus' defensive adjustment.

Then I'll take the aDERA (adjusted DERA) and come up with RSAR using 5.75 as the replacement level (at a 4.50 environment this translates to a .383 WPct, or basically gives a pitcher credit if they move a team past 100 losses in a 162 game season). The new translated IP will be used as the 'playing time' factor.

I'll take RSAR add RCAP and that will get the pitchers total runs above replacement.

In a 4.50 environment, it takes 9.558 runs to flip one game in the win column, to TRAR/9.558 will give an adjusted WARP. This can then easily be converted to Pennants Added, which gives more credit for bigger years. For example Johnson ends up with 4.9 in 1908, 5.1 in 1921. That gets a total of .100 PA. However his 10.0 season in 1915 gets .116 PA. That's using 1876-1943 NL as the Pennant context - it's been awhile since I've updated that part of it.

Anyone see any flaws in this methodology?

As for time to calc, once I finalize the template, I'll need to:

1) input seasonal RCAP (from the Sinins Encyclopedia)
2) input seasonal IP (either pull from Lahman or copy from B-R).
3) input seasonal RA (see above)
4) input season NRA (either pull from Prospectus website automatically somehow or manually enter)
5) input season DERA (either pull from Prospectus website automatically somehow or manually enter)
6) point to the cells for the pitcher that contain team games played for each year (I have another worksheet in the spreadsheet that stores this for every team)
7) point to the cell for the pitcher that contains his teams park factor for that season
8) point to the cell for the pitcher that contains his teams league runs allowed per game.
9) run the 'solver' cell Rob Wood turned me onto.

I already have tables in the worksheet that contain the data for converting pennants added, the top X number of pitchers in the league in IP, etc.. I've got 73 pitchers in the consideration set right now, once I get caught up it should be easy to maintain. I'm thinking at most 10 minutes per pitcher, which would mean 12 hours of work, once I finalize how I want to do it.
   130. Joey Numbaz (Scruff) Posted: May 26, 2006 at 04:37 AM (#2037843)
Regarding number 6/7/8 above is anyone here better at programming excel than me?

What I do right now, is edit the cell, then copy down until the player changes teams and do it again.

For Walter Johnson it's easy.

Under the "tG" column (team games), I enter "=Games!Y33" - Y33 on the games worksheet is the cell that says the SenaTwins played 154 games in 1907. Then I copy that down through all of the other cells for Johnson.

I do the same thing for park factors and league runs allowed.

Now in the template, I already have "=Games!Y33" in there, and I just adjust the Y33 to whatever is appropriate for the new pitcher. If it's Ron Guidry, I adjust the Y33 to R101 (Yankees/1975) and go from there.

What would be nice would be if I could just add a column for team and another for league for each pitcher season, and the cell would just know where to look, based on instructions, it could look at the Year, Team, League and know which cells, without my having to change it.

For Johnson and Guidry, this takes seconds, for Bobo Newsom, it will take a lifetime.

If anyone can help with that part, please let me know. Thanks!
   131. Joey Numbaz (Scruff) Posted: May 26, 2006 at 04:55 AM (#2037847)
Thanks for the note Chris - I will look at the Bunning discussion for the translated innings discussion.

I would say that my initial position, before reading, is that I don't mind setting the league leaders for innings pitched as the standard for that year. One pitcher in a year can be an outlier (Wilbur Wood in 1971 for example), but the 5 or 10? That's the era norm for top pitchers, which is what we are evaluating.

If we had stats on IP per GS for every league ever, I agree that could work (actually batters faced would be even better).

I would say ideally you could use say drop the #1 guy in the league from the equation - so you don't have a Wilbur Wood throwing the numbers off and changing the average. I think once you get past the top guy or two, it starts to level off.

Looking at 1901-1977 NL and AL, here are the averages of the top 5 in each league for each season, the number in parenthesis is the distance to the next guy up the chain:

NL1: 322.4
NL2: 303.4 (19.0)
NL3: 291.8 (11.6)
NL4: 282.8 (9.0)
NL5: 277.1 (5.7)

AL1: 322.9
AL2: 301.4 (21.5)
AL3: 290.2 (11.2)
AL4: 281.2 (9.0)
AL5: 274.4 (6.8)

After the first guy throws the number off, it follows a pretty steady progression with the difference getting smaller and smaller. I'd say drop the #1 and #2 and set your league norm based on the others.

So maybe drop the first 2 in an 8 team league, first 4 in a 16 team league. # of pitchers to base it on based on league size:

8 team: 3,4,5,6
10 team: (3*.5),4,5,6,7,(8*.5)
12 team: 4,5,6,7,8,9
14 team: (4*.5),5,6,7,8,9,10,(11*.5)
16 team: 5,6,7,8,9,10,11,12

Basically drop the top pitcher for every 4 teams, and take .75 pitcher for every team in the total.

Then normalize your pitchers to that standard, and pick the normalized number (Prospectus' 275) based on the all time average of the answers.

Now I'll read the Bunning thread and completely change my mind.
   132. Joey Numbaz (Scruff) Posted: May 26, 2006 at 05:24 AM (#2037855)
By the way, what I described above is normalizing innings to what the #1 pitcher on a middle 1/2 (25-75 percentile) team throws. I think that's a very reasonable standard.

Also, by doing it for leagues as opposed to seasons, you inherently adjust for things like higher scoring leagues, DH (allows starters to throw more innings), etc.. Unfortuntately, it's still easier to throw more innings in a pitcher's park than a hitter's park, but I'm not sure how we'd adjust for that, or if it's worth the trouble.
   133. Joey Numbaz (Scruff) Posted: May 26, 2006 at 05:52 AM (#2037862)
You know Chris - another benefit of this is that you aren't over adjusting for years where there isn't an outlier - your idea does that also.

By taking the top 5, if it's a year that's stacked tight at the top - say the 1990 NL), where 2-5 are within 5 2/3 IP of each other, they all get promoted to the 275 level. But that 275 is established based on a bunch of historical outliers, so they are getting unfair benefit for that. By using my system or yours, this doesn't occur.
   134. Chris Cobb Posted: May 26, 2006 at 06:09 AM (#2037867)
By taking the top 5, if it's a year that's stacked tight at the top - say the 1990 NL), where 2-5 are within 5 2/3 IP of each other, they all get promoted to the 275 level. But that 275 is established based on a bunch of historical outliers, so they are getting unfair benefit for that. By using my system or yours, this doesn't occur.

I have a strong suspicion that WARP's assessment of AL pitchers vs. NL pitchers in the late 40s early 50s is distorted by this very issue, although I haven't had a chance to study the matter yet. In the NL, you have two workhorse outliers in Roberts and Spahn, where in the AL you have a lower, more tightly bunched group (I think). Thoughts?
   135. Joey Numbaz (Scruff) Posted: May 26, 2006 at 06:12 AM (#2037868)
Not sure how late you'll be up, still doing data entry on that worksheet (adding in the 6-12 ranked pitchers for the appropriate years.

When I'm done, I'll be able to provide two numbers - my system's number and WARP's. We'll be able to how much the differences between the leagues change . . . stay tuned!
   136. Joey Numbaz (Scruff) Posted: May 26, 2006 at 06:38 AM (#2037876)
BTW, the other thing I was thinking (taking a quick break from data entry), was that for relievers this system will work too - we just need to come up with a leverage index for their RSAR. I know Tango has done some historical leverage index work, but haven't really looked, would his stuff be of use for something like this?

I would probably want to swap runs allowed out with something like Component Runs Allowed though, since relievers RA totals are distorted by their coming in the middle of innings. Or maybe an average of the two (RA and CRA). Does that make sense?
   137. Joey Numbaz (Scruff) Posted: May 26, 2006 at 07:08 AM (#2037883)
I'll give 1946-60 here Chris . . . first set of numbers is the baseline for WARP, NL listed 1st - basically that number of innings in that season gives you 275 tIP:

Year   NL     AL      DIF
1946  250.2  298.4  
-48.2
1947  271.4  273.3   
-1.9
1948  269.6  265.4    4.2
1949  268.0  283.4  
-15.4
1950  292.4  265.8   26.6
1951  298.4  259.8   39.6
1952  279.0  284.6   
-5.6
1953  270.2  269.9    0.3
1954  279.8  261.2   18.6
1955  257.0  246.2   10.8
1956  284.8  271.8   13.0
1957  261.0  254.0    7.0
1958  271.8  251.4   20.4
1959  281.2  244.0   37.2
1960  275.0  262.8   12.2 


Now my system - which wouldn't be normalized to 275, all we care about right now are the differences between the leagues.

Year   NL     AL      DIF
1946  235.5  271.3  
-35.8
1947  257.0  257.4   
-0.4
1948  249.5  265.4  
-15.9
1949  249.3  270.5  
-21.2
1950  278.3  253.3   25.0
1951  284.5  252.0   32.5
1952  253.8  266.3  
-12.5
1953  242.0  260.6  
-18.6
1954  258.0  254.5    3.5
1955  239.3  237.0    2.3
1956  267.8  260.3    7.5
1957  249.3  244.5    4.8
1958  261.0  244.5   16.5
1959  272.5  237.0   35.5
1960  271.3  250.8   20.5 
   138. Joey Numbaz (Scruff) Posted: May 26, 2006 at 08:01 AM (#2037899)
By the way, the historical average from 1901-2005 for the top 5 in the NL is 282.3 IP, for the AL it is 281.7 IP. I guess they just used 275 because it was a nice round number, which is fine.

For mine, the baseline drops to 265.1 all-time for NL, 264.7 for the AL.

I don't if this means anything . . . but using the top 5 in the league as the baseline, the sum of the absolute values of the annual differences between the leagues is 1544.7, or an average of 14.7 IP per season.

Using my system the total difference is 1391.3, or 13.3 IP per season.

I would think the closer the two leagues are in any given year the better, to a point - obviously run environments make it much tougher for starters to pitch as many innings as others and the two leagues shouldn't always be even.
   139. Joey Numbaz (Scruff) Posted: May 26, 2006 at 08:07 AM (#2037902)
BTW, if I normalized my pitchers to 258.3 IP as being an average of the top pitcher on the 25-75 percentile team (assuming talent is spread evenly), it works out to the same as WARPs normalizing pitchers to 275 IP being an average of the top 5 in the league. Meaning my numbers would directly comparable to theirs - that probably makes sense as something I should do.
   140. Joey Numbaz (Scruff) Posted: May 26, 2006 at 09:03 AM (#2037920)
Some results off all of this on Walter Johnson . . . with comparisons to the Prospectus numbers . . . for now I'm doing hitting for normal non-Caruthers types as RCAP/IP*tIP.

Johnson's NRA and DERA were 3.32 and 3.31. Under my system they become 3.18 and 3.16 (some rounding issues are the reason for slight difference in the delta).

This makes sense as great pitchers from lower scoring eras (Johnson's career league R/9 was 4.14 park adjusted) have their win impact understated when you use a 2.0 exponent.

His translated innings changed from 5195.7 to 5202.7, he picked up 7 innings under my system, not much of a big deal there.

I get his career RSAR at 1579. That splits as 1495 PRAR, and 84 BRAR and This works out to a 165.5 Translated WARP and 1.887 Translated Pennants Added (using NL 1876-1943 as the Pennant context).

That's 84 RCAP or BRAR (same thing) vs. 96 in the Sabermetric Encyclopedia. If we are going to say that Johnson would have pitched 5202.7 IP instead of 5914.7, we need to reduce his offensive opportunity by the same amount (I think). If in a normal context he wouldn't have pitched as much as he did, he also wouldn't have hit as much.

I'd love any feedback if you have it.
   141. Joey Numbaz (Scruff) Posted: May 26, 2006 at 09:06 AM (#2037922)
Assuming I don't have to do any major overhauls, I'll at least try to run everyone that received a vote last week, Bunning and Pascal through the system. I can't promise that though.

I'll also try to update the pennant context through 1977.
   142. Joey Numbaz (Scruff) Posted: May 26, 2006 at 09:09 AM (#2037923)
One other very minor detail - any seasons that come out lower than 0 total RSAR (batting+pitching) will be zeroed out. I don't believe a player should ever lose ground for playing as long as someone is willing to stick him on the field. Johnson didn't have any such years.
   143. Joey Numbaz (Scruff) Posted: May 26, 2006 at 09:11 AM (#2037924)
One other thing, any years where the pitchers total batting and pitching runs net him negative RSAR will be zeroed out. I don't believe you should ever lose ground if management is willing to run you out there. You'd have to be as bad as pitcher with a DERA of 5.75 who was a below average hitter for a pitcher before this would kick in.
   144. Joey Numbaz (Scruff) Posted: May 26, 2006 at 09:43 AM (#2037931)
BTW, with Pennants Added, I think it's appropriate to consider each division a 'league' once we get to divisional play. Getting to the post-season is the goal of each team before the season starts, so this makes the most sense to me.
   145. Joey Numbaz (Scruff) Posted: May 26, 2006 at 10:03 AM (#2037933)
Got Pennants Added updated through 1977 (based on NL only, 1876-1977). Bumps Walter to 2.192 PA.

As the teams get bunched together and the quality of the pennant winners moves closer to 500, the value of big seasons increases, as its easier to flip a pennant.

At some point, I'll have to add the AL to the mix as well. I'm wondering what impact the Yankees will have on all of this.
   146. sunnyday2 Posted: May 26, 2006 at 10:54 AM (#2037936)
Here's a really elementary question.

Does all of this assume that the value of pitching has remained more or less constant throughout history? Or maybe I should say, has the value of pitching remained more or less constant in the modern era, at least--e.g. since 1900 or so, or since 1893?

And all of this is concerned more or less with the distribution of that value among individual pitchers?

Is that accurate? Or has the value of pitching changed substantially since 1893 or 1900?

Of to put it another way: By normalizing IP so as to not disadvantage modern pitchers, are you increasing the cumulative value of pitching in the modern era?

Seems like I should know the answer to that after how many years of HoM duty, but if I've learned anything it's not to assume I know very much.
   147. Joey Numbaz (Scruff) Posted: May 26, 2006 at 12:01 PM (#2037951)
My theory - when evaluating the immortals is that you try to put people in a neutral context most of the time.

I know that sounds a little wishy-washy - it's supposed to.

For something like pitching, I think it's important to normalize context. I don't think that being a pitcher in say 1913 when pitching was worth X amount - compared to fielding, should be an advantage or disadvantage to being a pitcher in 1976 when pitching is worth Y amount relative to fielding. You job, no matter what your birthday is to prevent runs. How well you did that, compared to your peers is what's important to me.

It's a little different with individual fielding positions - the difference between being a 3B in 1907 and being a 3B in 2006 is ginormous. So different that someone who is likely to be chosen to play 3B in 2006 wouldn't have been able to in 1907. That's the difference for me, and where I draw the line.

I hope that helps. So I'm not 'assuming that the value of pitching has remained more or less constant throughout history'. It most likely hasn't. But I think when trying to compare pitchers across eras, you have to act as if it has, or you are penalizing guys with the same job based on their birthday.
   148. Rob_Wood Posted: May 26, 2006 at 02:21 PM (#2038028)
Joe, regarding having the formula automatically pick up a pitcher's team games for each season, you'll want to use the VLOOKUP function in Excel. I think you already have a Games sheet that lists team games for every team-season in major league history. Since VLOOKUP looks for one "reference" column, you'll need to concatenate the team season data into one column (there are alternatives, but this is pretty easy).

Suppose team data (such as SenaTwins) is in column A and the year (such as 1907) is in column B. Then you need to insert a new column C which is simply =A&B so in the example above you'd get SenaTwins1907 in column C for that row. I know it looks ugly but Excel will know that this is the row for SenaTwins in 1907. What I mean is that the formula for cell C1 would be =A1&B1;, the formula for cell C2 would be =A2&B2;, etc. (same thing will be done on the pitcher sheet below). Just enter the formula in C1 and drag/copy the rest of the column.

You'll need to create a named array of all the team-season games played. Suppose the team games data is in column D and that you have 2000 rows of data in rows 2:2001. Then select (highlight) cells C2:D2001 and go up to the Insert menu, select Name, select Define, and then give the array a name (such as TeamGames) with no spaces. The cells that you highlighted should go in the "Refers To" box automatically. The idea, of course, is that you'll use this TeamGames array to look up each pitcher's team games based on the team and season.

Over in the pitcher's sheet you can create the similar concatenated column for team season too. Suppose you have the pitcher's team name in column E and the season in column F, then insert a new column G that is =E&F (see above for details). So you'll see SenaTwins1907 for Walter Johnson in 1907.

Then in the team games column, create a formula using the VLOOKUP function. Suppose the first row you need team games for is row 2 in your spreadsheet and SenaTwins1907 is in cell G2, and team games is column H. Then in cell H2 enter the formula =VLOOKUP(G2,TeamGames,2,false). G2 is the cell you are trying to get information on, TeamGames is the array of team games data you are trying to reference, 2 means that you want the 2nd column of data to be returned (in our case the first column of the array is the team-season such as SenaTwins1907, and the second column is the number of team games for the Senators in 1907 or 154). Then just drag the formula down to copy it to all the cells you want team games for in the spreadsheet.

This may sound a little complicated, but it is pretty easy once you get the hang of it. Plus, VLOOKUP (and its brother HLOOKUP) is very powerful and has many applications.

Let me know if this works or if my description is not clear.
   149. Joey Numbaz (Scruff) Posted: May 26, 2006 at 03:15 PM (#2038098)
Thanks Rob!

I'll try that, but probably not before next week . . .

************

Paging jimd (and anyone else with an opinion on this) . . . I believe you've mentioned in the past that the adjustment for league quality shouldn't just be (Value * Factor). It should be more like (Value - constant) * Factor. With the contant being the difference in replacement level between the two leagues.

I'm talking about different leagues within the same season, like the 1890 PL, AA and NL; the Federal League things like that. Or adjusting for known weaker leagues, like the majors 1943-45. I'm not talking about comparing 1949 to 1953 or 1923.

So my questions are 1) what should the constants be for the AA, FL, and war years 2) what are reasonable factors.

Along those lines are the WARP3 adjustments reasonable for this type of quality issues? I'm thinking of seeing what a players surrounding year WARP1-WARP2 differences are, then adjust for war, or weak league based on how much bigger the difference is in the questioned year. Does that make sense?
   150. Dr. Chaleeko Posted: May 26, 2006 at 04:32 PM (#2038213)
Joe,

I agree with Rob, use concatenation and vlookup.

A couple of experience-worn suggestions about them. If you already use them a lot you might want to skip it. If you are concatenating and your worried about corresponding data having punctuation in it, this is not a problem. So if you had a first name and last name column where Walter and Johnson, resided, then instead of a2&b2; returning WalterJohnson, use b2&", "&a2; to return Johnson, Walter. I find this really helpful especially if I've made the killer mistake of not making all my data match the exact same format---eliminates those pesky "VALUE!" or "NAME!" errors.

Also, when using Vlookup, you can nest a concatenation within it as needed. BUT the thing you need to know about vlookup is that it can be immensely draining on excel's computational resources. If you are pulling from a dataset that is especially large (like, let's say, a list of WS for every player for each year of the game), you may not be able to save the document which REFERS to the table (aka: the document in which the vlookup is originated, not the data.)

If you get the "can't save, not enough memory" error, try slicing your dataset smaller in the lookup function itself. So rather than selecting all 120 columns in my example table above, you might select only the ten you need for your lookup to work.

One last thing about vlookup (or any formula that returns a value if a condition isn't met). Let's say that your formula is

vlookup(a2, games a1:b1000, 2, false)

If you change the "false" to a zero, "0", then the formula will return a zero if it can't locate the data it's looking for. But because sometimes it's annoying to have zeros all over the place, you may want nothing to appear. In which instance

vlookup(a2, games a1:b1000,2, " ")

will return an empty space. Especially helpful if you are doing a count or countif function in conjunction with the result of the lookup and you don't want unconditional zeroes getting in the way.

Oh, one more thing. Make sure you lock down the target data range if you're using the same range for everyone but copying the formula from cell to cell (or worksheet to worksheet). You'll kick yourself if you don't (I've got bruises on my arse to prove it!)....

vlookup(a2, games $a$1:$b$1000,2," ")

Excel's very helpful feature of changing data ranges along the vectors at which you're cutting and pasting is then disabled for that formula, preventing that awful feeling in your tummy when you realize an hour into things that the data that's coming back seems kind of funny.
   151. jimd Posted: May 26, 2006 at 06:25 PM (#2038377)
RSAR using 5.75 as the replacement level (at a 4.50 environment this translates to a .383 WPct

Use whatever replacement level standard is also being used for the hitters.

People will compare them to the hiters no matter what caveats you state.

So my questions are 1) what should the constants be for the AA, FL, and war years 2) what are reasonable factors.

I haven't worked much with the latest WARP numbers yet. And not for those years. (I don't revise everything; it's too much work. Though I will revisit a VanHaltren, etc.)
   152. jimd Posted: May 26, 2006 at 06:39 PM (#2038399)
Use whatever replacement level standard is also being used for the hitters.

A .383 team will go 62-100, and earn 186 WS. If all dept's of a team are equally bad, the distributions of the team shares will still be typical. There would 89.3 BWS, 31.4 FWS, and 65.3 PWS. This replacement level implies that a replacement level position player that plays all 162 games earns between 14 and 15 Win Shares (shares just for playing, not for being any good). The hitters Pennants-adjusted would need to be recalculated based on that level.
   153. karlmagnus Posted: May 26, 2006 at 08:28 PM (#2038593)
Not that I really understand this sabermetric nonsense, jimd, but a team of replacement level players surely wouldn't go anything like 62-100, because even a 62-100 team will have some players well above replacement level (Mike Sweeney on the recent KC Royals, for example.) A team with only replacement level players would be the 1962 Mets, at best (even they had HOMer Richie Ashburn) and more likely the 1899 Spiders. WS allowances should thus be adjusted accordingly.
   154. Mark Shirk (jsch) Posted: May 26, 2006 at 09:01 PM (#2038637)
14-15 WS is surely not replacement level for a position player. I think 8 or so is more accurate for 162 games.
   155. Joey Numbaz (Scruff) Posted: May 27, 2006 at 03:45 AM (#2039606)
Yeah, good point, I didn't think I was making the replacement level that high - wow. 15 WS is nowhere near replacement level, I mean Joe Charboneau's rookie of the year season was 15 WS. Even if it was only 131 games, there's no way if he plays every day that season is only 3.5 WS above replacement.

Wait a minute - I'm not doing that. Setting pitching at a .385 WPct assumes an average offense. An team with an average offense and replacement level pitching would lose 100 games is what I'm setting the replacement level at.

I've always felt a full-time replacement player would get about 8 WS (7 with the DH) and a 220 IP pitcher at replacement level would get about 7 WS. I was trying to err to the side of not setting it too low - I should have thought it through under those terms. I know that shows pitching replacement being a little higher than position player replacement level but I don't think that's unreasonable.

Let me think that through again though. A true replacement level position player will hit at replacement level but field at average level.

An average team in a 162-game season: 116.6 offensive WS, 41.1 D and 85.3 P Win Shares. That gives you 19.7 WS for an average position player in 162 games in a non-DH league, 18.1 in a DH league and 12.9 for an average pitcher. Drop that to 18.7 for a 154 game season position player and 12.25 for a 154 game season average pitcher (that's over 209 IP). BTW as a side note that should dispell any myth that Jake Beckley wasn an average player - he was in the 20-27 WS prorated to 154 game seasons over his career, not in the teens and he wasn't playing every single game every year. Sorry for the digression, but it's an important point.

If the team remains average on offense and fielding, it would take 28.3 pitching WS to drop them to 100 losses. So I'm setting my replacement equivalent to 4.3 pWS over 220 IP being replacement level. That's probably too low.

If 7 WS per 220 IP is replacement level, then a team with average hitters and fielders would win 68 games with a replacement level staff. That would mean setting .420 as replacement level, assuming an average offense.

Now lets reverse it and see what a replacement level hitter would do to a team with average defense/pitching.

Setting replacement level hitters to where a team with average pitching and fielding loses 100 games, means 59.6 bWS. That sets hitters at 7.5 WS + 5.1 for fielding or 12.6 WS per 162 games - 11.8 in a DH league.

Bumping it up to the 68 win mark like we did for pitchers would make their replacement level 14.8 WS, or 13.8 were it a DH league. That's too high.

What to conclude from this - pitching replacement level - in terms of the record of a team with all replacements as pitchers and average everywhere else - is probably higher than position player replacement level - at least under the WS system.

There's no way pitching replacement level is as low as 4 WS/220 IP. And there is now way that position player replacement level is 15 per 162 games. Just look at some 15 WS position player seasons if you don't believe me. And take a look at how bad you have to be to get 4 WS in a 220 IP season.

There's a logical explanation for this apparent paradox, IMO. It's that WS only gives 1/3 of the credit (35.1% to be exact) to the pitchers. Combine that with the fact that no one at the major league level with any signficant time is a replacement level fielder AND hitter, and that's what you get.

It doesn't surprise me at all that to get their replacement levels equivalent (for a full time player or a 220 IP pitcher) on a per player level, a team with replacement level hitters dragging down 48% the team would do worse than a team with replacement level pitchers only dragging down 35.1% of the team.

So here's where I'm going. If you set a .225 team at replacement level (a little worse than the 1962 Mets), with all things equal (batting and pitching replacement level, fielding average) you get:

Hitters 39.4 WS, Fielders 41.1 WS, Pitching 28.8 WS. That sets position players at 10.1 WS, (9.5 in a DH league) and pitchers at 4.4 WS.

Think about it though - James was wondering why pitchers came out too low. Hell most of us think that. That's why - when you adjust for replacement level the pitchers get the boost they are in need of.

BTW, that's probably too low, but it's where you are if you set the pitchers and hitters as equally bad.

If you want to bump the pitchers to 6 WS being replacement level you get a team at .278 WPct (45-117). I think that's fair - the 1962 Mets certainly had some players that were way below replacement level - certainly many players that couldn't get time in other organizations were better than what the Mets put out on the field that first season.

So there you have it. I'm going to be setting my pitcher replacement level to 6 WS per 220 IP. My position player replacement level becomes 11.9 in a non-DH league, 11.2 in a DH league. Over a 154 game season, I'll go with 11.3/10.6/5.7 (the 5.7 is over 209 IP, not 220, in the shorter season).

What does this mean for my expected team WPct to use in this massive pitcher spreadsheet? Well a team with an average offensive and defense, and pitchers that pull in 6 WS per 220/IP would win 65.8 games, or play at a .406 clip. That means that I'll be using 5.48 as my replacement level aDERA - which is the equivalent of 6 WS in a 220 IP season.

Having looked at it this way, I'm pretty surprised that the replacement level for a position player comes out that high, but it makes sense. I mean an average season is generally about 2-2.5 WARP. A replacement player is about -2 to -2.5 in TPR. If an average position player gets about 19.7 WS in a full 162 game season, it would make sense that a replacement player would be about 8 WS below that.

I can't believe it took me 4-5 years of working with WS to approach it this way, thanks for triggering it jim!
   156. rawagman Posted: May 28, 2006 at 01:47 PM (#2040815)
So are the polls closing tomorrow as usual?
Or on Tuesday?
   157. Joey Numbaz (Scruff) Posted: May 28, 2006 at 02:48 PM (#2040826)
Good point rawagman - I wouldn't have any problem with holding off until Tuesday considering it's a holiday weekend.
   158. CraigK Posted: May 29, 2006 at 05:12 AM (#2042001)
I might join in with you guys starting about the next ballot or the ones thereafter; I don't know much about the 19th-early 20th century players, but this is getting to the point when the names stick out.
   159. Sean Gilman Posted: May 29, 2006 at 08:01 AM (#2042065)
Uh, there's lots of 19th and early 20th Century guys who still have a decent chance of being elected. Assuming, of course, that we don't start ignoring them in favor of names that stick out.
   160. Joey Numbaz (Scruff) Posted: May 29, 2006 at 09:19 AM (#2042084)
I would agree with Sean - new voters (or returning old ones) need to be willing/able to consider 19th Century players just as much now as we all have in the past.

My biggest fear with the 'open membership' thing is that more start joining in once they recognize the names being considered, basically destroying the candidacy of any borderline guys from way back when.
   161. rawagman Posted: May 29, 2006 at 09:30 AM (#2042086)
one advantage of joining later is that a new voter would not need to be a scholar in oldtime baseball. But they should be able to look at least at the current backlog, and study those players and their effect on the game at the time.
   162. sunnyday2 Posted: May 29, 2006 at 11:22 AM (#2042101)
>Uh, there's lots of 19th and early 20th Century guys who still have a decent chance of being elected. Assuming, of course, that we don't start ignoring them in favor of names that stick out.

Ditto.

>But they should be able to look at least at the current backlog, and study those players and their effect on the game at the time.

Ditto ditto.
   163. John (You Can Call Me Grandma) Murphy Posted: May 29, 2006 at 11:37 AM (#2042111)
I'll throw in a few dittoes myself. ;-)
   164. karlmagnus Posted: May 29, 2006 at 12:33 PM (#2042136)
I think I remarked earlier that a new balloter should be able to produce a 1000 word essay on the life and times of Jake Beckley :-)
   165. Mark Shirk (jsch) Posted: May 29, 2006 at 12:51 PM (#2042141)
Joe,

I only had a chance to lightly read your post as it is quite a dense one, but I do have a few comments.

1. James didn't use a strict split on all teams. if a team had an average offense and a replacement level pitching staff a higher percentage of the teams WS would be offensive. For intance 35% of that teams WS woudl not go to pitcher since their pitching staff was at replacement level. I didn't catch that you factored this in but I could be wrong.

2. While Beckley had a number of seasons in the 20-25 (I have no season in which Beckley had 27 WS even when schedule adjusted) he had nearly as many in the 15-20 range, which by your analysis is about average if only slightly above. And of course he still was never an MVP caliber player.

And I dont' want it to look like we are shunning possible contributors here either. Of course there is a fear that some people will join to vote in their favroite HOF cases form thier childhood, but it shouldnt' be too hard for any new voter to take a look at the 19th and early 20th century players as well. This is why we force new guys to turn in a ballot, gives us a chance to ask them why they may look over Duffy, Beckley, GVH, Waddell, etc. They may have really good reasons (all of these players have warts, especially Beckley ;-)) or they may have simply forgot/overlooked/are underrating them.
   166. Howie Menckel Posted: May 29, 2006 at 01:19 PM (#2042154)
I think that the reasonable approach is for prospective new voters to familiarize themselves with ALL of those still getting votes. That's generally in the 75 to 80 player range, and of course half of those at least should be quite familiar and another one-quarter at least vaguely familiar.
I'd then suggest voters try to read the threads of each candidate, especially Negro League holdovers getting votes, and even better if they can peruse some old ballot-discussion threads as well. Particular emphasis should be paid to the top 25 or so; a new voter needs to recognize the many pros and cons of each (that's why their holdovers still getting decent support), and come to a decision on which side they fall.

What they would not necessarily have to be are the ones to resuscitate the prospects of Jack Chesbro or Billy Nash or Joe Tinker or Oliver Marcel (not that there's anything wrong with that).

That's meant as a welcome to prospective new voters, with a rational caveat. If it's asking too much of them, they're free to keep just 'auditing' the course. If it's manageable, then by all means join the party!
   167. rawagman Posted: May 29, 2006 at 01:49 PM (#2042175)
When I joined, I first examined every player who had received votes in the last election, plus the newly eligibles. Then, when I had more time a few weeks later, I started poring through the Bill James Abstract looking at other players who had made his lists and sifting through them using my own standards of excellence which has allowed my list to expand and become gradually richer and more refined.
   168. Howie Menckel Posted: May 29, 2006 at 01:54 PM (#2042180)
Yeah, I think you've been on the right track, rawagman.
   169. John (You Can Call Me Grandma) Murphy Posted: May 29, 2006 at 01:56 PM (#2042183)
I think I remarked earlier that a new balloter should be able to produce a 1000 word essay on the life and times of Jake Beckley :-)

lol

Childs and Bresnahan, too. :-D

When I joined, I first examined every player who had received votes in the last election, plus the newly eligibles. Then, when I had more time a few weeks later, I started poring through the Bill James Abstract looking at other players who had made his lists and sifting through them using my own standards of excellence which has allowed my list to expand and become gradually richer and more refined.

Going over your ballots, you definitely are not ignoring the earlier candidates, which is the fair thing to do.
   170. Joey Numbaz (Scruff) Posted: May 30, 2006 at 06:00 PM (#2043945)
"1. James didn't use a strict split on all teams. if a team had an average offense and a replacement level pitching staff a higher percentage of the teams WS would be offensive. For intance 35% of that teams WS woudl not go to pitcher since their pitching staff was at replacement level. I didn't catch that you factored this in but I could be wrong."


Thanks for taking the time to read it. I understand it my have been too 'dense' very hard to keep it 'un-dense' with the subject matter :-)

Anyway, I understand what you are saying, but it's not an issue in this case. James does ties the splits to certain things like how much over the margin teams were on offense/defense, how many K/BB/HR the pitching staff allowed, etc.. But since I'm starting with an average team - and moving the team off average by adjusting the offense or pitching win shares, it shouldn't make a difference. In effect, when I drop the team from 116 bWS to 60, I'm changing the overall offense/defense split, etc.. So I think I'm covered there.

*************

Regarding Beckley, using WS from the WS Digital Update, and adjusting based on team decisions (not games) to 154 decision seasons, since WS is based on decisions not games, I get:

YEAR  WS  
1888 16.0
1889 21.6
1890 25.5
1891 18.3
1892 19.4
1893 20.1
1894 20.4
1895 20.7
1896 11.8
1897 17.6
1898 14.5
1899 20.7
1900 23.6
1901 19.4
1902 19.7
1903 18.3
1904 22.7
1905 15.7
1906  4.7
1907  0.3 


18.7 would be an average player if he played every single game. Beckley didn't play every game every season, obviously.

Adjusting for his games played (which treats every game as a full game, but it's close enough) as a percentage of his team, he was above average every year from 1888 through 1904, except for 1896 (he was 3 below average that year). In 1891 he was only .1 WSAA and in 1898 he was .4 WSAA.

He was at least 3 WSAA from 1888-1890, 1899-1900, 1904.

This is giving no credit for the fact that 1B was a more valuable defensive position in the 1890s and 1900s than it is in modern times.

According to the Sabermetric Encyclopdia, Beckley was 245 RCAP (RC above position) for his career, 330 RCAA.

But it gets interesting when you look at 1893-96. During those years Beckley was 77 RCAP, 64 RCAA - that's right, an average 1B in those years would have been 13 runs below an average hitter in Beckley's number of outs. WS doesn't account for this shift on the defensive spectrum at all. In 1898 Beckley had an off year, but 1B overall were still slightly below average hitters. Same for 1899. And 1900. And 1901.

Something happened in 1902 and 1B started hitting better than average. But for much of Beckley's prime, 1B were average to below average hitters. Maybe it was the wear and tear of playing in the infield in the rowdy 1890s? Or the bad gloves? But WS does not account for this at all - and it still shows Beckley as an above average to sometimes very good player throughout his career. For his career, 1B AVG OWP was only .531.

Moving to modern times, I just pulled Jeff Bagwell, since his career covers most of the last 15 years (1991-2004, a little 2005). For Bagwell's career the average 1B had an OWP of .559.

Over George Sisler's career the average 1B had an OWP of .544.

So for whatever reason, 1B didn't hit as well during Beckley's career as they have in the future. WS doesn't account for this, and as such underrates him. I still get him 351 WS when adjusting for schedule. I would think this underrating could have cost Beckley as much as 2-3 WS per season in the mid 1890s. An average defensive player gets 4.9 fWS in a 154 game season. Beckley who was at least an average 1B was getting 1.5-3.0 fWS per season. There were several years in the mid-1890s where 1B could have been considered at least middle of the pack in defensive responsibility.
   171. Howie Menckel Posted: May 30, 2006 at 06:38 PM (#2044003)
Well put, Joe D.

Basically, Beckley is beginning in an era when play was VERY different. I think Beckley was quite good at fielding the bunts and slaps and such, and therefore was quite valuable. His arm wasn't always as accurate as you'd want, but handling the 'small ball' seems to have been quite taxing, if you go by the near-total lack of 1B longevity in his era.
   172. Ardo Posted: May 30, 2006 at 10:15 PM (#2044292)
Wait a minute - I'm not doing that. Setting pitching at a .385 WPct assumes an average offense. An team with an average offense and replacement level pitching would lose 100 games is what I'm setting the replacement level at.


Joe, you are correct. A team made entirely of replacement players would both score runs (hitting+baserunning) and prevent runs (pitching+fielding) at a .383 clip. So the proper determinant of this "team's" won-lost record is the <u>square</u> of .383, which is .14669. The square of a .500 team is, of course, 1/2 * 1/2 = 1/4. So a replacement-level team would win (.14669/.25) or 58.6% as many games as a .500 team. With a 162-game schedule, a .500 team posts an 81-81 record. The replacement-level team wins (81*.586) games, which works out to 47-115.

That 47-115 record, while not as bad as the Spiders, Mets, or '03 Tigers, seems intuitively correct.
   173. Ardo Posted: May 30, 2006 at 10:23 PM (#2044304)
Going back to Win Shares: 47 wins equals 141 Win Shares. So that knocks down true replacement level for a position player who plays all 162 games to about 10-11 Win Shares. Of course, very few players on truly awful teams like those Spiders, Mets, and Tigers play every day, so this level is hard to attain in practice.
   174. Joey Numbaz (Scruff) Posted: May 31, 2006 at 04:18 AM (#2045193)
Test
   175. Brent Posted: May 31, 2006 at 12:06 PM (#2045305)
An old discussion of win shares replacement level may still be useful. See www.baseballgraphs.com.
   176. Joey Numbaz (Scruff) Posted: June 01, 2006 at 03:40 AM (#2046869)
Interesting, two of us independently (I'd never seen that study) came up with a lower replacement level for pitchers. I think that means we're onto something there, though in retrospect that should have been obvious from the beginning.

I don't understand why he'd have his replacement player pool that large (as many as the regulars).

Personally, I think replacement level is the bottom 15-20% of the regulars. Replacement level players play all the time. So I think replacement level might be a little higher than he indicates but overall that makes sense.

Thanks for the link!
Page 2 of 2 pages  < 1 2

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
JE (Jason)
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.9356 seconds
49 querie(s) executed