Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Hall of Merit > Discussion
Hall of Merit
— A Look at Baseball's All-Time Best

Monday, January 13, 2020

2021 Hall of Merit Ballot Discussion

2021 (December 2020)—elect 3

Top 10 Returning Players
Kenny Lofton, Johan Santana, Sammy Sosa, Jeff Kent, Lance Berkman, Bobby Abreu, Buddy Bell, Wally Schang, Bobby Bonds, Sal Bando

Newly eligible players

Tim Hudson
Mark Buehrle
Torii Hunter
Dan Haren
Barry Zito
Aramis Ramirez
Shane Victorino
Alex Rios
Grady Sizemore
A.J. Burnett

DL from MN Posted: January 13, 2020 at 02:06 PM | 291 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 3 of 3 pages  < 1 2 3
   201. Dr. Chaleeko Posted: April 03, 2020 at 06:46 PM (#5936339)
Jaack,

I've added those eight players to the yearly MLEs spreadsheet and updated my site, so you can download it at your convenience.

Thanks!

Eric
   202. Dr. Chaleeko Posted: April 03, 2020 at 07:08 PM (#5936343)
bjhanke,

I'm not a partisan for Cravath. I'm primarily interested in figuring him out.

2) Citing the league's mean road OPS is a valid way to express Cravath's performance as a hitter in road games. We constantly compare players to the mean around here. All the uberstats are based on performance against the mean. I'm not sure why we would suddenly abandon it.

3) The reason I provided comparable hitters was to provide additional context to a comparison against the mean. If you don't think the hitters that are most comparable are the strongest HOMers, that's fine, but Cravath is essentially competing for the HOM's last spot. But I would submit that a list of HOM RFs and their OPSes will be more problematic than my comparison because it would be unadjusted for run context.

4) Sisler got about 3,200 PA in the time span. The query wasn't whether someone was a this or that match for Cravath. The question was who else hit like Cravath at any point between 1912 and 1920. Sisler did. It doesn't matter what part of his career he was in, but it so happens that some of his most important seasons fall within this time frame. Remember that Sisler only has two other seasons as a great player: 1921-1922.

5) Again, the question is not whether Collins was in the meat or the decline of his career. His performance and Cravath's during the period in question were similar. You could certainly argue that Cravath's performance came in 1,000 fewer PA and is, therefore, not as impressive as Collins. I would absolutely concede that point. But in his own playing time, his bat adjusted to a normal home-road split was a lot like Collins'.

6) Similarly, I don't care about defense for the purpose of this specific comparison. That's an entirely different conversation. What I want to know is what kind of hitter Cravath was. The reason I want to know it is that it's the crux of the argument for or against him. I responded to your assertion that Cravath was purely a product of the Baker Bowl. Based on the results I posted, I concluded that Cravath was not a BB product. He was a highly skilled hitter with performances potentially commensurate with a HOM player, and his road performance demonstrates this to a greater degree than I thought your assertion gave him credit for.

I'm not a partisan, and I hope I don't sound argumentative. I just want to get this guy right.
   203. Jaack Posted: April 04, 2020 at 02:37 AM (#5936398)
Thanks Doc, you contributions are invaluable.

The three Negro Leaguers who intrigue me the most at this point are Hurley McNair, Marv Williams, and Newt Allen. All three currently slot in the 30-40 range for me.

McNair doesn't seem unlike the many borderline outfielders of the pre-integration era. There's a good chance he was the best palyer of the three, but I'm not sure I could ever find his canididacy compelling.

Marv Williams is very interesting to me because of his era - I feel it's more likely that we are missing a great black player in the intergration era than the 20s or 30s at this point, and Williams definitely fits the bill.

Newt Allen is the most difficult case. He looks a good bit like Johnny Evers, who may slip onto my ballot this year. If we can credibly say that his glove was on the same level as Bill Maz or Frakie Frisch, he's a pretty solid candidate. If it's simply good, he's not particularly close. As is, I'm being somewhat conservative and he's in that 30-40 range.
   204. Dr. Chaleeko Posted: April 04, 2020 at 12:54 PM (#5936476)
Personally, I would hold off on Newt until we get more fielding data.
   205. Jaack Posted: April 04, 2020 at 07:38 PM (#5936566)
Barring further information, Newt won't make my ballot, but he's still probably the most likely Negro League player to make my ballot at some point in the future. He seems to have a higher possibility for upward movement than either Hurley McNair or Marv Williams.
   206. bjhanke Posted: April 06, 2020 at 02:01 AM (#5936878)
Dr. C - Thanks for the patient response. I'm used to dealing with Gavvy Cravath junkies who vote for him over people like Bobby Bonds and Lou Brock, who are much better OF candidates, but are not in the HoM. I will admit that I just jumped to thinking you were like that when you compared him to borderline HoM guys like Magee (The HoM has 24 left fielders in it now; Magee ranks 21st in the New Historical, which still has the best ranking system around, despite being 20 years old; Lou Brock ranks 15th, and I can't sell Lou Brock to the HoM, no matter how much I cite this; Frank Howard is ranked 19th, and he's not in the HoM, either). I DID, however, do something that you might find useful. You mentioned Sherry Magee. It turns out that Sherry Magee was a teammate of Cravath's for three years: 1912-1914, in the Baker Bowl. This means that all the adjustments that you would normally have to make go away, except for age. And Cravath was, in fact, a better away hitter than Magee all three years.

Here's a small sample of Magee's away stats in those years:
1912 - OPS 825, Homer split 3 home, 3 away
1913 - OPS 847 Homer splits 8/3
1914 - OPS 891 Homer splits 12/6

I included the homer splits because Magee, like Cravath, was a righty hitter. His homer splits, while not as bad as Cravath's, indicate that he was getting a lot out of the Baker Bowl, too.

Here's Cravath those three years:

1912 - OPS 838 Homers 6/5
1913 - OPS 974 Homers 14/5
1914 - OPS 901 Homers 19/0

So, Cravath's away numbers are a bit better than Magee's, every year, although his homer splits are worse. Far worse. You can make a good case that Cravath was a slightly better hitter than Magee was. But Magee is a borderline HoM guy, and there are much better candidates still available.

There's one more thing to consider. Cravath was a right fielder, not a left fielder, like Magee, Brock and Howard. You may have noticed above that I mentioned Bobby Bonds. I've been trying to sell Bonds to the HoM for years, and he is finally on the holdover list this year. Yay. Bobby is ranked 15th among RFs by the New Historical. The usual objection I get to him is that his career was short. Well, not in the context of Gavvy Cravath, it isn't. It's huge by that standard. And Bonds ranks just behind Dave Parker, 14th in RF, who is also not in the HoM. Gavvy Cravath is ranked #29, twice as far away from Babe Ruth. He shouldn't really be in the HoM discussion, because there are too many better hitters, much less better gloves.

Trying to address the specific issue of how good a hitter Cravath was, my BEST suggestion to you is that you look at his major league numbers before 1912, in home parks other than the Bowl. He could not hit major league pitching, and had no power. That's why teams kept giving up on him. He went to Minneapolis, a top minor league team at the time, and had two great years, which resurrected his candidacy as a major league hitter. My sources, which are few in this regard, say that the Minneapolis ballpark at the time was almost a duplicate of the Baker Bowl. If someone has better sources than mine, please correct me. Anyway, the point here is that it is NOT fair, in any way, to give Cravath several seasons of minor league MLE credit. He had tryouts with three major league teams, and couldn't hit, and we're not talking about ten plate appearance cups of coffee, either. If you want to give him credit for those two years in Minneapolis, OK. But the PCL years are before the major league tryouts, and he couldn't hit major league pitching when given the chance. I have a list of people who deserve a good deal more MLE credit than Cravath. Hank Sauer, a really obnoxious one whose defense was no better than Cravath's, spent several years destroying minor league pitching, but was, first, in the Yankee organization, and then with Cincy when they had Bill McKechnie managing, who couldn't have lived with Sauer's or Cravath's glove if he had hit like Babe Ruth. THAT is a bunch of MLE credit. Cravath is due maybe two years.

I'm gonna shut up now, because if I don't, I'll end up discussing defense and career length, and that's not what you are asking about. I've tried to keep it down to Cravath's hitting as much as possible here. Best I can do. Thanks for putting up with me.
   207. Dr. Chaleeko Posted: April 07, 2020 at 12:45 PM (#5937291)
Well, the question for me was whether Cravath is was actually just a product of the BB. My little research project strong suggested that's not the case, that in fact Cravath was a highly skilled hitter even if he wasn't a no-brainer HOM hitter.

The next question then is what to make of his BB performance. The answer doesn't appear to me to have a strong precedent in Mel Ott's Home/Road performance (323/188 in HR) because Ott's road performance (311/408/510) was obviously HOM-worthy by itself. That said, I tend toward the camp that says "He made excellent use of his home park." Presumably, he was able to switch his swing on/off depending on the context, and the persistence of his home/road splits do strongly imply that he was changing something in his game to take maximum advantage of his home park. Firstly, that's pretty impressive in itself since most hitters these days talk about trying to find a consistent swing path. Secondly, is he supposed to NOT make those park-oriented changes?

So for me, I don't see a lot of reason to discount his batting stats any more than a park factor would. Which would be consistent with my interpretation of Mel Ott's stats.

As for minor league credit, I think, bjhanke, that you may have overgeneralized Cravath's MLB performance prior to 1912 as not major-league quality. His OPS+ in 1908 was 136. That's not someone who's overmatched. 1909 was definitely a bad year for him, but it's all of 79 PA. I'm not at all comfortable citing that as evidence that Cravath was not major league ready. It's not accurate to say he had no power because he had 11 triples, which equate much more to power at that time than they do in the modern game. Cravath slugged .383 IN A LEAGUE THAT SLUGGED ***.304***. It's pretty clear that he did have good power in 1908, and I am comfortable with the idea that he was a major-league hitter from at least 1908 onward.

The question this brings up is how far back in time we think he was a big league hitter. Chris Cobb was looking into this, and I would defer to his process and suggestions.

BTW: There are differing opinions on Cravath's glove, and there's a complicating factor. Rfield (BBREF) says he was a well-below average fielder (-21 runs). DRA says that he was about an average fielder (0.5 range runs). However, I'd be shocked if EITHER of them got him right because the BB's short RF dimensions would reduce BIPs and opportunities to catch flyballs that would hit the wall but be playable in pretty much any other park. This is precisely the same issue that Michael Humphreys described in Wizardry about The Green Monster's effect on Sox left fielders. He estimates that the Monster suppresses a player's DRA by 14 runs/1350 defensive innings (a full season's worth of innings). [Similarly, he also estimates that Colorado's vast expanses and thin air cost all its outfielders 10 runs/1350 innings and that Old Yankee Stadium's Death Valley cost Yankee left fielders 5 runs a year.] I don't recall when, but somewhere deep in the HOM's history (unless it was in a regular BTF thread), Sean Smith made a guest appearance in which he concurred that the Green Monster requires some sort of adjustment because it has a large effect on Boston LFs' defensive numbers.

All of that defensive-stats gobbledygook is to say that while I don't think Cravath was a good outfielder, it's entirely possible that he'd be miscast as a sluggish oaf out there.

Anyway, that's what's swimming around my brain while in COVID seclusion.
   208. Dr. Chaleeko Posted: April 07, 2020 at 03:58 PM (#5937367)
A little more thinking about Cravath's home/road splits. I pulled the raw numbers from BBREF and did some figuring. (If my OPS figures are different than theirs it's because I was feeling too lazy to pull out the sacs.)

Phillies total (1912-1920)
325 home HR/122 road HR, a 2.7 ratio
.697 home OPS/.617 road OPS, a 1.13 ratio

Phillies sans Cravath (1912-1920)
233 home HR/97 road HR, a 2.40 ratio
.675 home OPS/.602 road OPS, a 1.12 ratio

Cravath (1912-1920)
92 home HR/25 road HR, a 3.7 ratio
.951 home OPS/.781 road OPS, a 1.22 ratio

It strikes me that either Cravath was a flyball hitter with a natural inside-out swing, perfectly matching his home park's dimensions, or he had the ability to modify his swing at home to shoot for the short right field wall. Of course, both are possible separately or simultaneously. He could have been a natural right-field hitter who also knew how to exaggerate that ability to play to his park's dimensions. I haven't read his SABR bio in a while, but if there's any anecdotal evidence there or anywhere else suggesting that he made a conscious effort to change his approach at home, then it suggests or could even corroborate that he owned a repeatable skill. If it is a repeatable skill, there's little chance that he was the right guy at the right time/place. Since we know he was already a very good hitter on the road and in 1908, it wouldn't be surprising that he could tailor his approach to his home park. But I suspect that since he took so much more advantage of his park than his teammates that he probably had some natural inclination to hit it hard to right field to begin with, so probably it was a blend of talent, smarts, skill, and circumstance.

   209. Dr. Chaleeko Posted: April 07, 2020 at 04:30 PM (#5937382)
By the way, Cravath's SABR bio says that he learned to hit to right field during his time in Minneapolis because it's park was similarly configured as the BB. Additionally, GoogleBooks showed me a passage in the Tris Speaker biography Spoke that corroborates that in a little more detail. Neither of these sources speak to whether he had a natural right-field swing, but if he had to learn to hit the other way, it probably wasn't his natural swing path.

So in other words, it does appear that Cravath learned a skill, and he used it effectively in the BB for a decade.
   210. Michael J. Binkley's anxiety closet Posted: April 07, 2020 at 04:32 PM (#5937385)
Also, hitting in the Baker Bowl may have been second nature to Cravath since his home park for his time with the Minneapolis Millers was Nicollet Park, which had very similar dimensions to the Baker Bowl (BB: L-341 C-408 R-280; NP: L-334 C-435 R-279). So he may have found what worked for him in Minneapolis and when he rejoined MLB, he ended up in the perfect situation where he didn't have to alter his hitting approach with the Phillies.

Edit: soft drink of choice to the good doctor
   211. Bleed the Freak Posted: April 07, 2020 at 08:20 PM (#5937457)
Plug for our sister station, DanG kicked off a project, would be great if everyone here could join in: https://www.baseball-fever.com/forum/general-baseball/history-of-the-game/3569108-bbf-ranking-game-–-election-2-–-nominations#post3569124
   212. Dr. Chaleeko Posted: April 08, 2020 at 08:24 AM (#5937515)
A little more Cravath data. I decided to look into the question of how many western players were making an impact in MLB from 1903-1907 when Caravaggio’s was in the PCL. The answer appears to be very few.

I looked at every player with 400 PA or 200IP in any of one of those seasons in MLB. I looked for five pieces of info on BBREF:
Birthplace
Death place
Burial place
High School (not universally available)
University (not universally available).

I looked for players connected to the states west of NEKS, though II counted one with a SD connection.

When possible, I cross checked against SABR, Wikipedia, and Bullpen Wiki bios.

Here’s the players who definitely appear to be Westerners.
Sam Mertes: born/died CA
Frank Chance: born in CA; HS in CA
Charlie Babb: born/died in OR
Hal Chase: born/died/HS in CA
Ike Rockenfield: HS in WA
Joe Nelson: born/died in CA
Roy Hartzell: born/died/HS/Univ in CO
Denny Sullivan: HS in SD/dead/buried in CA

Here are players that might be Westerners but for whom no bio exists to verify and for whom no HS or univ is listed on BBREF
Phil Geier: died/buried in WA
Emil Frisk: died in CA
Del Howard: died/buried in WA
Carl Druhot: died/buried in OR

So that’s 8-12 guys who are/might be westerners among the hundreds who had at least one season of full time play. It’s not a complete census obviously, but it gets us deeper into the question of whether growing up in the west was a handicap to getting a shot at the big leagues. It seems as though that may be true, even with westward migration and expansion at the time of Cravath’s youth.
   213. Jaack Posted: April 08, 2020 at 07:55 PM (#5937730)
I did a quick look at birthplaces - California was similar in population to South Carolina, Arkansas, and New Jersey in both the 1880 and 1890 census, which covers the era in which Cravath was born. All four states surprisingly seem to have produced similar amounts of major league players - I know that major league teams were just starting to acquire players from the deep south at this point, but it seems like if there were a strong geographical bias, New Jersey, with it's proximity to four major league teams at the time, would be a hotbed of major leaguers.

Looking at players who were born in California in this period and had decent length MLB careers

Orval Overall - Spent one year in Tacoma, MLB by age 24
Sam Mertes - First reached the Majors at age 23, before played in minors in California and in the Midwest
Frank Chance - In the majors by 21
Hal Chase - In the majors by 22, continued to play in California minors for a while though
Oscar Stanage - Cup of Coffee at 23, played in east coast minors until establishing himself with Detroit at 26
Fred Snodgrass - In the majors by 20
Harry Hooper - In the majors by 21
Chief Meyers - Didn't reach majors until 28, but he'd been on the east coast for a few years at that point

I don't really see much evidence that players were getting lost in California in this period. The state was producing Major League players at a rate similar to other states of its size and those players seem to have reached the majors at typical ages.

I don't think geography was a major factor limited Cravath's career - I think his talents weren't recognized because he didn't have an impressive BA or footspeed, and he was likely, to some degree, a late bloomer. I don't think it's likely an east coast Cravath gets to the majors much earlier than west coast Cravath did.
   214. bjhanke Posted: April 10, 2020 at 04:42 AM (#5938163)
Dr. C - Thanks for doing the research. I am VERY glad to find out that my sources on the Minneapolis ballpark were NOT wrong.

I know a bit more about Cravath's batting stance. He did NOT turn it on and off, although he may have learned it in Minneapolis (that would make sense of the two years he spent there, but it also just emphasizes the degree that having a park like that influenced his numbers; plus, if he learned it there, then his PCL numbers aren't the him who was in the BB). What happened - he describes it himself in the process of trying to give Sherry Magee lessons on how to hit homers - was that he did squeeze very tightly on the bat handle, all the way through the swing. I'm certainly not a major league ballplayer, but I do have over 40 years of sports that involved swinging a stick, about 15 of those years playing baseball. So, I took my bat and went out in the yard and swung it with my grip tight all the way through. Here's what happened: The swing started by diving down very fast and moving a bit forward. Then, in an instant, it completely changed direction, producing a very strong uppercut, but with the bat pointed way back of my hands, like a slap hitter trying to go to the opposite field. In other words, it was a power uppercut to the opposite field.

Which makes perfect sense for Gavy Cravath. I've seen the Magee quote from Cravath in a couple of identical versions, except that one of the sources add that Cravath said that Magee, if he would just swing like Cravath, would hit more homers - IN ANY BALLPARK (emphasis mine). This is simply not true. What you will get is endless cans of corn to the opposite field. In a normal ballpark, RFs will run them down and catch them. In the BB, some of them go over the fence. If you're really interested in how Cravath hit, try it out for yourself. Get a bat, or any kind of stick, and grip it as hard as you can, and try to swing it like a baseball bat without relaxing your grip. The biggest oddity of the swing, IMO, is that you cannot roll your wrists. Not at all, if you're gripping the bat tightly. I also could not do anything with the swing other than have a huge opposite-field uppercut. If my grip stayed tight, my wrists would do nothing.

Hope that helps.
   215. DL from MN Posted: April 10, 2020 at 11:44 AM (#5938232)
Good photo of Nicollet Park where the Millers played.

http://stewthornley.net/nicolletfirstandlast.html

MN teams have a history of short porches in RF including Metrodome and Target Field
   216. epoc Posted: April 10, 2020 at 02:25 PM (#5938319)
I've spent some time recently further developing my system, and I'd like to provide a peak at what my ballot currently looks like. In a subsequent post, I'm going to explain my methodology in a bit of detail, so everyone can see what I'm doing.

Current ballot:

1. Cliff Lee - Peak career z-score of 7.51 for 2008-13.
2. Johan Santana - 7.22 (2002-10)
3. Roy Oswalt - 7.12 (2001-10)
4. Kevin Appier - 7.05 (1990-97)
5. Dwight Gooden - 5.85 (1984-85)
6. George Foster - 5.75 (1975-81)
7. Dizzy Dean - 5.60 (1932-37)
8. Fred Dunlap - 5.66 (1880-85)
9. Lance Berkman - 5.58 (2001-11)
10. Brian Giles - 5.51 (1999-05)
11. France Chance - 5.49 (1903-07)
12. Noodles Hahn - 5.48 (1899-04)
13. Jose Rijo - 5.65 (1988-94)
14. Sam McDowell - 5.53 (1964-70)
15. Al Rosen - 5.01 (1950-54)

16-20: Pascual, Abreu, Belle, McGriff, Giambi
21-25: Sosa, Silver King, Javy Vazquez, Smokey Joe Wood, Cicotte
26-30: Dale Murphy, Guidry, Higuera, Webb, George Stone
31-35: Brecheen, Cesar Cedeno, Shocker, Norm Cash, Cy Seymour
36-40: Garciaparra, Bernie Williams, Bobby Bonds, Benny Kauff, Dolf Luque

Other HoM-worthy by position:
C - Posada, Tenace
1b - Mattingly, Guerrero
3b - Cey, Bando
of - Dave Parker, Sizemore, Strawberry, Hack Wilson, Lofton, Cravath, Lynn
p - Jim Maloney, Viola, Carlos Zambrano, Mike Scott, Wilbur Wood, Ewell Blackwell, Tanana, Mel Harder, Lefty Gomez

Top ten returnees not mentioned:
Kent - Not HoM-worthy. Behind both Knoblauch and Lazzeri, neither of whom are particularly close.
Bell - Bell would be a big mistake, imo. The HoM already contains a number of 70s/80s 3b, and Bell is clearly behind both Cey and Bando waiting to get in.
Schang - I had previously been a supporter, but he's fallen well off my list. What he has going for him is being the best catcher of his era and being consistently above average for a long time, but I no longer find those to be persuasive arguments.

   217. epoc Posted: April 10, 2020 at 03:28 PM (#5938361)
I don't want to bore anyone with an overly detailed accounting, but I do want to give some insight into how my system works. I'll try to strike a balance.

To start from the end, the final result of my system is a "peak career z-score" (for lack of a better term). This is the z-score for the period of consecutive seasons over which a player's performance was the highest number of (estimated) standard deviations above the historical average. YMMV, but for me this balances peak/career concerns naturally, because it rewards low-peak-long-career and high-peak-short-career evenly and on the equal footing of standard deviation. From my ballot, it probably looks like my system favors peak, to which all I can say is that empirically it is rarer to be excellent for a short period than to be very good for a long time.

In the most streamlined form, here's how I get there:
1. Calculate in-season z-scores for all player-seasons.

For position players, this involves finding the in-season standard deviation for offense (hitting+baserunning) and defense (fielding+position). I use fangraphs' numbers for these components, and I regress fielding runs 30% to the mean. I also use more extreme positional adjustments than fangraphs does for 1b, dh, and catcher. Once I have the season's SD, I calculate the z-scores for each player for both offense and defense, and then I weight offense at 1 and defense at .35 to find an overall score. Catchers, 2b, and ss defense is weighted at .4 to reflect the higher importance of defense at those positions.

For pitchers, the process is basically the same, except the components are RA9-based runs (using BB-Ref's WAR components) and FIP-based runs (using fangraphs' FIP and FIP-). I weight the two 50/50.

2. To get a peak career z-score, I find the best n consecutive seasons (measured by average in-season z-score) in a player's career, where n is every number from 1 to however many seasons the player played.

At this point, for pitchers, I apply a general hitting adjustment which is derived from a pitcher's career hitting vs. the average pitcher hitting for the era.

I then apply a formula that estimates the z-score for each of those n-year stretches. The formula is basically a best-fit line that accounts for the empirical historical "difficulty" of averaging x SDs over n seasons. Whichever stretch of consecutive seasons yields the highest estimated z-score (whether a single season or twenty) is the stretch I use. I then apply an era adjustment, because (basically) expansion has historically raised in-season z-scores at the extremes.

That's it.

To provide some context, a single-season z-score of around 1.5 reflects an all-star level season. Around 2.4 would be a season that gets down-ballot MVP votes. A 4.0 is a weak MVP season, and 4.75 is a strong MVP season.

At the career level, a peak career z-score of around 4.75 means a no-doubt HoMer, and 4.00 is the in-out line (more or less). A single 4.75 season is about equivalent to four 3.00 seasons or ten 2.00 seasons. Thinking of it more subjectively, this means that a no-doubt MVP season is equivalent to four straight years of being a down-ballot MVP candidate or ten seasons of being a generic all-star.

I want to admit here that the system is imprecise. "Imprecise" is almost disingenuous, even. I think this is a good and robust system, but like any uberstat it should be taken with a grain of salt and should not be used in isolation. I enjoy thinking through these issues systematically and mathematically, but there's a ton of estimation and gray area at each step of the process, including in the basic components I'm using as inputs.

That said, there are a couple theoretical bases for this system that I'd like to highlight and advocate. The first is using average (rather than replacement level) as a baseline. Replacement level is hypothetical (rather than empirical) and theoretical (rather than practical) in ways that make it, imo, a very poor basis for judging on-field value of players. The second is using season-by-season values rather than total career values. Championships are won on a seasonal basis, and career totals can often deceive by obscuring how much (or little) a player contributed to championships on a season-by-season basis. Third, standard deviations are (I think) the best way to account for seasonal and era differences. My system has the advantage of taking the right view on all these issues.

On the other hand, there are clear weaknesses to my system as well. One is that standard deviations are measures of talent, not of value, so what I end up measuring is more like "how good was this player" than "how much value did this player provide." I prefer the former question, but I can easily see the argument for the latter. Another weakness is that, because my scores are standard deviations, I have to estimate the relative weights of offense, pitching, and fielding, rather than expressing all components in the common value of runs. This will be an obvious source of contention, especially for those who are dismayed by my ballot's balance of pitchers, outfielders, and infielders/catchers.

All that said, I like my system quite a bit, and I hope that the work I've done and some of my insights going forward prove interesting and provocative for everyone here.


   218. Kiko Sakata Posted: April 10, 2020 at 05:01 PM (#5938412)
epoc, my first instinct in reaction to your explanation in #217 is that I love the concept. You're a peak voter, which is fine; makes perfect sense for a project like this. Your use of z-scores makes a ton of sense.

That said, going back up to your ballot in #216, the obvious thing that jumps off the page is - Holy Cow, you love pitchers! The top 5 on the ballot, 6 of the top 7, 9 of 15 total. But I can see how you could end up with such a pitcher-heavy ballot. I tend to think the HOM has too few pitchers and I also think that individual voters' rankings of pitchers are more likely to diverge (which I think is part of the reason why we have too few pitchers in the HOM - we can't agree who's being unfairly left out). I would say be sure to go back and populate your own personal HOM to confirm that it has what you feel are a reasonable number of pitchers (and catchers and shortstops, etc., etc.).

And I have to say, that's a pretty idiosyncratic ballot. Not that there's anything wrong with an "idiosyncratic ballot" says the guy who is almost certainly the best friend in this project of Tommy John, Vern Stephens, Toby Harrah, and Tommy Henrich.

But I will say that I'm not sure I understand some of the idiosyncracies. I mean, I totally get Santana, Gooden, Dean, Chance, Rosen. And I can see how a guy like Sam McDowell would look better in your system than most others. And I even get the case for Cliff Lee. But Cliff Lee at the top of the ballot seems odd to me. I have him as arguably the best pitcher in baseball twice, which is really excellent and absolutely gives him a peak case. But Johan Santana was arguably the best pitcher in baseball every season for five or six consecutive seasons. Here's a comparison of the two in my pWins: Lee's best season is better than Santana's and Lee's second-best season is comparable to Santana's second-best (WOPA is relative to average, although Santana and Lee line up pretty much the same at the seasonal level measured against either average or replacement). But Santana has three more seasons better than Lee's third-best season. I feel like maybe you're not giving enough credit to career bulk - which, by the way, I realize is a bizarre thing to say with respect to rating Johan Santana, of all people, too LOW.

But that said, I'll repeat my initial comment. I love the concept and I think it's a very defensible and creative approach. Johan Santana is going to be higher on my ballot than Cliff Lee, but that's just, like, my opinion, man.
   219. epoc Posted: April 10, 2020 at 08:00 PM (#5938482)
Kiko, you are exactly right about the reasons for my pitcher-heavy ballot. Though I haven't constructed a pHoM according to the election schedule, I have examined the positional ratios of the 4.00+ players in my system (plus NeL players who are [imo] HoM-worthy). Of 272 such players, 88 are pitchers. I am very comfortably with 32% pitchers.

Lee was in the top 3% of pitchers for each season from 2008-13. His career is very similar to that of Sandy Koufax. He is an obvious HoMer, as I see it. Whether or not he was better than Santana is not really a question I can get too invested in, since they are close enough in my system that I wouldn't be too bothered by flipping them. I wish I knew enough about your system to untangle how I'm ending up with different results, but all I can say at the moment is that in my system Lee's six-year consecutive prime is better than Santana's, and the extra three years of fringe-all-star pitching that Santana adds to that don't quite make up the difference.
   220. Dr. Chaleeko Posted: April 10, 2020 at 10:12 PM (#5938527)
Epic and Kiko,

I concur that the HOM as an electorate has found less common ground on pitchers than hitters. But then, I think the world at large has less common ground on pitchers too. Just considering the difference between FIP-based and RA9- based WAR implementations is evidence of that. Personally, my intuition on hitters is, and has always been, much more accurate than in pitchers. Anyway, things are pretty settled now in what makes a good hitter and what the variables in play are for him. Voros came a solid 20-25 years after James caerme along and runs created and linear weights came into being.

Pulling back for moment, we are doing something here that is very different than what WAR and other forms of analysis do. We are analyzing the entirety of the history of professional baseball and asking which are the n-best players across it, assuming that a pennant is a pennant is a pennant. That introduces way more variables into the question of pitching than it does for hitting. The main variables for hitters are things like:
-Position
-Length of schedule
-Park effects
-League run contexts
-Maybe STDEV if that’s your cup of tea (if so I’ll share my sugar cubes with you)
-Availability of PBP
-Peak v career
-Reliability of fielding stats.
But these are all areas where there’s either a common solution or general range of agreement on the how to address them.

Not so with pitching. There is incredibly wide divergence in our group related to:
-Usage/workload patterns
-Era
-Value/importance of relief pitchers
-Use of FIP vs RA9 value measurements.

But on top of that, for pitchers, variables like park are more localized in time than for batters because more of a pitcher’s playing time is concentrated in fewer games, meaning the park can affect him more than a batter whose PAs are spread out across 80 home games instead of 15 to 20. What if a pitcher plays most of his home games at Wrigley field on days when the wind blows out even though the park may play just a little above average for hitters normally that year? Same goes for fielding. Pitchers are, in BBREF’s WAR assigned defense based on their team’s overall defensive rating. But does a fifth starter get the benefit of playing in front of the same defense as the #1 starter? Does the manager rest his regulars when the #5 starts? For that matter, do defenses play worse behind worse pitchers, for example do pitchers running continual long counts wear on the fielders behind them? And because starting pitchers pack more of their playing time into fewer games than hitters, defense can have a dramatic effect because it’s 1/35 of a year instead of 1/150 of a year for regularly playing batter.

I’ve said this before, and I will keep saying it: The place where we can do the most good to bring a little more consensus on pitchers is to start talking about the 19th century, particularly prior to 1893. Getting Jim McCormick and Mickey Welch and Tommy Bond and anyone else still getting votes out of the picture will help narrow our focus. Those guys played a different game, and we have six and a half guys from 1871-1892 (Keefe, Hoss, Clarkson, Spaulding, Pud, Caruthers, and Ward. Maybe I’m forgetting another?). These are two decades of dramatically lesser quality of play (STDEVs are crazy during these years), errors made up as high as 35% of the runs allowed (from my own researches, more on this someday) and by the 1890s were still well above what we would consider even close to normal. Home runs were very, very low. So were strikeouts. The transition from initiator-of-action to pitcher required two-plus decades and a movement of the rubber backward ten feet. Pitchers after the 60’ rubber could not throw 500 and 600 innings anymore. They had trouble getting to 400 innings and that figure quickly went to the dustbin of history. We’ve done right by the pre1893 guys already, maybe been a little generous. So let’s narrow down this era variable, give the olde tyme twirlers a break and start seeing where in history we might be able to find common ground. It’s the one first obvious step to take.

   221. Jaack Posted: April 11, 2020 at 03:14 AM (#5938554)
I'll agree - pre-1893 pitching is very well covered. Keefe, Clarkson, and Galvin are the the standouts for one reason or another, Caruthers and Ward are two way choices that fill out our pitching for the period. Spalding covers the earlier, murkier period.

Everyone else fits into similar models - a couple of years of brilliant pitching and enormous workloads, and then a quick flame out. Telling Bobby Mathews, Charlie Buffinton, Jim Whitney, Mickey Welch, Silver King, Tony Mullane, and Jim McCormick apart is pretty much impossible. Really, Old Hoss Radbourn fits in this group, although he is probably the best of them. But they are all essentially the same pitcher. I will admit to a mild interest in Tommy Bond because he's from a slightly earlier era, but that is mostly aesthetic. But I am very okay with closing the book on this grouping unless someone can come up with a compelling argument for one of those guys being substantially better than the rest. But I can't see it.

----
I think the next step is to clear up the most recent batch of pitchers. For guys with careers centered on the 2000s, we've only elected Roy Halladay as a starting pitcher. We've elected Rivera as well, but he's kind of in a category of his own and it's hard to make a case for anyone by comparing them to Mariano Rivera. We have a cluster of shorter career pitchers who have hit the ballot here in the past few years - Johan Santana, Roy Oswalt, and Cliff Lee are the three most prominent, and the only ones that seem to have much, if any support.

So what are everyone's thoughts on those four pitchers relative too each other? We are about to elect Johan Santana, but Oswalt has pretty limited support and Lee is even further back. Personally, I find this to be hard to support - Oswalt and Santana looks very neck-and-neck in particular, but Santana was on 19 ballots last year while Oswalt was on just four.

I want to clarify - I'm not against the election of Johan Santana necessarily (although he probably won't make my ballot this year).

The answer is probably to elect both Santana and Oswalt. There aren't a whole lot of options between the 90s glut of pitchers, and the guys who are just wrapping up their careers (Kershaw, Greinke, Verlander, and Scherzer). But right now, Santana looks like the only guy who has substantial support, and it's not like we are dealing with an overcrowded front log at this point. But if Santana, why not Oswalt?
   222. bjhanke Posted: April 11, 2020 at 03:55 AM (#5938555)
epic - I like your use of z-scores. It's a good method. My main worry is that a you've included 1884 in Fred Dunlap's run. Fred Dunlap, in 1884, played in the Union Association. The UA was in no way a major league, despite all the references that include it. It was, in modern terms, about a class A league. If you did use 1884, you really should just delete it and consider 1883 and 1885 to be "consecutive." I don't know what happens to Dunlap's scores if you do that, but I suspect that they will drop like a stone.

Jaack - I think that there are a few 19th-century pitchers that we really should include. Pitchers dominated that period. Until 1893, the best player in the league was usually a pitcher; in fact, it's so bad that the best player on a great majority of TEAMS was their pitcher. I'd say that McCormick and Mullane really should be in, at least. Oh, a side note about Bobby Mathews. If you check his career, you will find that he wasn't usually the ace of his team's staff. He was the #2 guy. Why? Bobby Mathews was a lefty. There weren't many lefty pitchers in the 19th, so Bobby could usually find a job with a serious number of innings, but not an ace workload. He was, essentially, used as a platoon pitcher.

Dr. C - We're close to agreement about Cravath's bat. And yes, he did demonstrate that he could hit major league pitching in his failed tryouts. But this is a trap caused by just focusing on his bat. The question wasn't whether Cravath could hit major league pitching well enough to be useful in a lineup. The question was whether he hit well enough to justify his glove. Cravath was a truly wretched defensive player. RF, at the time, was the least-valuable defensive position in the game (1B had to field lots of bunts), and Cravath was lousy even compared to other RF. Focusing on the bat certainly serves a purpose, but the glove is always there, always forcing you to consider whether his bat was SO good that it was worth his glove. When Cravath moved to Minneapolis, he finally put up numbers that were so good that they did outweigh the glove. He was certainly a major league hitter. But that's not the line we're talking about. We're talking about the HoM, and in that context, you cannot forget the glove. The problem of OK bats with lousy gloves came up a few other times with PCL players. Buzz Arlett, a star in the PCL, had only one year in the majors, and, according to the accounts I have read, only lasted one year because his glove was so bad that even his bat could not carry it. That's Cravath until 1910. The stuff about the black ink being ballpark effects doesn't change that, at the "could he play ML ball" level. At the HoM level, though, it's a serious problem.
   223. DL from MN Posted: April 11, 2020 at 10:06 AM (#5938578)
There may be park effects helping or hurting Cravath in RF as well. If he had decent range then a small RF would hurt his numbers. If he had no range but could catch a ball hit near him then a small RF would probably help his numbers.
   224. Chris Cobb Posted: April 11, 2020 at 02:02 PM (#5938625)
Contemporary pitchers is a good topic to canvass! Views are going to vary, of course, on the percentage of the Hall of Merit that should be occupied by pitchers. To make my own estimates, I figure that there should be about 3 pitchers per 8 position players: starting pitchers 4 and 5 in a rotation are typically of lesser quality, more comparable to a team's 4th infielder and 4th outfielder than to the starters. That ratio has matched fairly closely to the actual election patterns, so I've kept it as a rule of thumb.

A rough estimate for the number of players who should eventually be elected to the HoM who played the majority of their career in the decade of 2000-2009 is around 30. The pitchers who appear in my top 30 for the 2000s are as follows (in rank order):

UPPER TIER
1. Randy Johnson (.5 slot -- I split his career between the 1990s and the 2000s)
1.5. Pedro Martinez (.5 slot -- I split his career between the 1990s and the 2000s)
2. Mike Mussina (.5 slot -- I split his career between the 1990s and the 2000s)
2.5. Curt Schilling (.5 slot -- I split his career between the 1990s and the 2000s)

MIDDLE TIER
3. Roy Halladay

LOWER TIER
4. C.C. Sabathia
5. Mariano Rivera
6. Johan Santana

BORDERLINE IN
7. Tim Hudson
8. Mark Buerhle

BORDERLINE OUT
9. Andy Pettitte
10. Roy Oswalt

OUT, BUT WITHIN SHOUTING DISTANCE
11. Colon
12. Vasquez

Cliff Lee actually lands in the 2010s cohort for me, because the majority of his career value falls in 2010 and after, and he doesn't enough career value to get split between decades. Since I will be comparing him first to a lot of players that are still active, he's not close to firmly ranked in my system. If he were ranked with the 2000 players, he would be with Colon and Vasquez. His peak is HoM calibre, so I can see pure peak voters putting him on their ballots, but he has too little outside of that peak to contend in my system, pending more data on how pitching career length changes after 2010.

My sense is that we've done well so far, but handling the borderline set fairly is going to be tough. Because innings pitched keep dropping on both a career and a seasonal basis for pitchers, it's easy for earlier pitchers to get prioritized over later ones. As we get deeper into the backlog, the votes will spread out more and more, also.

Of the "borderline 4" group of Hudson, Buerhle, Pettitte, and Oswalt, Hudson seems to me to be a clear leader, with a strong peak and a solid career. Buerhle and Pettitte are more career candidates. Folks who give post-season credit will like Pettitte more than I do. Oswalt's peak is nearly as good as Hudson's, and better than both Buerhle and Pettitte, but both of them have 10 career BWAR on him, and his higher peak can't compensate sufficiently. If he had a peak as good as Cliff Lee's, yes, that would move him up, but he doesn't.

That's the basics of what I have on this pitching cohort at present.

   225. epoc Posted: April 11, 2020 at 02:11 PM (#5938628)
I agree completely with Jaack & Dr. Chaleeko about pre-1893 pitchers. The two major arguments against them are Dr. Chaleeko's point about standard deviations and Jaack's note about the similar career shapes of all these pitchers. When you take SD into account, it is much less clear that (as bjhanke says) the best players in each season of that era were pitchers. The SD for RA9 runs in the 1870s and '80s was often 20+, while the SD for offense was half that. 1885 is representative, where the RA9 SD was 26.5 while the hitting+baserunning SD was 14. So while pitchers racked up more runs than position players, that production is much less impressive (imo).

More importantly, the SD for FIP-runs at the time was only 10.5, while 45% of runs were unearned, which suggests to me that fielders rather than pitchers were responsible for a lot of pitchers' RA. So even from a value standpoint, it's not at all clear that pitchers were better or more important than position players at the time. This is corroborated by the fungibility of these pitchers as noted by Jaack. Very few of them were consistently great for multiple years in a row, and overall the career shapes of pre-1893 pitchers tend to be pretty similar. That suggests to me again that these players were (relatively) fungible and that the runs their pitching produced were somewhat illusory.

About Dunlap and the UA, I would be interested in any data or arguments that support the UA as an A-ball level league. I don't doubt it, but actual data on the exact level of play is pretty hard to come by. I do discount UA stats significantly, but I don't agree that they should be discarded altogether. In Dunlap's case, he was the best 2b in baseball while playing in the NL in 1880, '81, '85, and '86, and he averaged 1.82 SD over six years from 1880-86, excluding '84. He was clearly an excellent player in his prime in 1884, so dismissing that season entirely seems likely to get us further from the truth about him rather than closer.

If you use a simple marcel weighting of 5/3/2 for his '83, '82, and '81 in order to project his talent level for '84, you would expect him to be 2.00 SDs above average for '84 (maybe closer to 1.9 with regression/aging). Given his actual performance, I think a strong all-star level season is about the minimum value you can place on his '84. Personally, I tend to be sympathetic to the notion that some players happened to have their peak seasons in weaker leagues, so I don't have a problem believing that Dunlap was an MVP-level performer that year. He had been the fifth-best player in the NL in '80 and the third-best in '81, so MVP-level performance from him is not far-fetched. Moreover, absolutely destroying a weak league (as Dunlap did in '84) is statistically significant. As an anecdotal example, look at what Juan Soto did across A, A+, AA, and MLB in 2018. Dominating even an A-level league to the tune of a 200+ wRC+ shouldn't be dismissed; it should be recognized as a transcendent performance that indicates star-level MLB talent.

As far as exactly how much to credit him, that is pretty difficult. I penalize UA performance 25%, which might not be enough, but again I haven't found a lot of good data on exactly how weak the league was (and in any case such data might not even help that much, since as I understand it the level of play varied wildly from team to team, such that Dunlap was probably facing competition even lower than that of the league as a whole). But since we have reliable data for Dunlap from '80-83, I think a reasonable method might be to regress his actual '84 numbers toward his marcel projection. Treating his '84 performance at face value, he was a 6.5 SD player that year. As noted, his marcel projection would be about 2.00. 71% of his '81-84 PA came in '81-83, so let's use that as the regression level. Regressing 71% of the way from 6.5 to 2.0 yields an estimated z-score of 3.3, right in line with the performance of Ned Williamson, the NL's best position player for '84.

My own method for discounting UA performance yeilds an estimated z-score of 5.04 for Dunlap for 1884. Having gone through this thought exercise, I feel that that's probably too high, but at the same time I'm not sure the 71% regression toward his established performance is fair. I'll have to think more about it. Even at a z-score of 3.3 for '84, he's an obvious HoM player and the best eligible 2b, though it would push him down my ballot. Conversely, I hope that anyone dismissing a great player's best season will also reconsider.
   226. Dr. Chaleeko Posted: April 11, 2020 at 04:16 PM (#5938656)
Regarding pre-1893 pitchers, here's some data I've put together to help explain my thinking. Sorry this is messy, I'm having some connectivity issues and was forced to send it to myself in an email so I could log in on a different device and paste in.

I wanted to, as Kiko might put it, decompose the runs pitchers allow to compare across history. I wanted to know what a pitcher's "responsibility" was compared to fielders and to see how much pitchers contributed to their own RA through things they controlled. To do this I used Jim Furtado's Extrapolated Runs to attach weights to run-contributing events. I do recognize the limitation that XR was built using data from 1951-200ish, so anyone who wants to should feel free to redo this work using Base Runs or whatever. I determined the league's average BF/GS (estimated in the case of pre-splits) and then determined how often each of the following events occurred. For each event I then gave them the XR weights:

TO PITCHERS ONLY
Strikeouts: -0.098 runs
Walks: 0.34 runs
HR: 1.44 runs
HPB: 0.34 runs

TO FILEDERS
Errors: 0.518 runs

TO MANAGERS
IBB: 0.25 runs

SPLIT BETWEEN PITCHERS AND FIELDERS
BIP outs: -0.09 runs
BIP hits: generally around 0.56 runs, but customized to each league's MLB season's distribution of singles, doubles, and triples

The split between pitchers and fielders is 63%/37%. I based this on what BBREF does and on the idea that most outs are weak contact, which means all popups, most fly balls, and most ground balls. But bad pitching/good hitting leads to line drives, which are about 20% of all hits IIRC. We could tweak the percentages, certainly.

So here's how the calculation looks for starting pitchers in 2019 based on league splits for starters and league rates for errors, BIPs, and non-error BIP outcomes.
22.13 BF/G

TO PITCHERS
0.38 HR/BF x 1.44 x 22.13 = 1.195
0.22 K/BF x -0.98 x 22.13 = -0.484
0.075 BB/BF x 0.34 x 22.13 = 0.562
0.10 HPB/BF x 0.34 x 22.13 = 0.073
Total 1.35 R/GS

TO FIELDERS
0.016 * .0518 * 22.13 = 0.18 R/GS

TO MANAGERS
0.002 IBB/BF * 0.25 * 22.13 = 0.013 R/GS

SPLIT BETWEEN PITCHERS AND FIELDERS
0.47 BIPout/BF * -0.9 * 22.13 = -.936 R/GS * 0.63 = -0.59
0.19 BIP Hits/BF * 0.565 runs/BIP Hit for 2019 * 22.13 = 2.38 * 0.63 = 1.50

TOTAL R/GS CREDITED TO PITCHERS
1.35 + -0.59 + 1.50 = 2.26 runs out of 3.02 estimated runs given up during SPs 22.13 BF. This means that pitchers bear responsibility (sole or shared) for about 76% of all runs scored in a typical start.

Now let's dial that responsibility number backward in time to see whether there's anything to learn from it.
2019: 76%
2009: 74%
1999: 74%
1989: 70%
1979: 70%
1969: 70%
1959: 71%
1949: 71%
1939: 67%
1929: 66%
1919: 60%
1909: 55%
1899: 57%
1889: 53%
1879: 37%
1871: 39%

Across time, there are several trends in play that together indicate why pitchers bear more and more responsibility. First, and most obvious, is that error rates have dropped. Here's the MLB E/BF for this same groups of seasons:
2019: 0.02
2009: 0.02
1999: 0.02
1989: 0.02
1979: 0.02
1969: 0.02
1959: 0.02
1949: 0.02
1939: 0.03
1929: 0.03
1919: 0.04
1909: 0.05
1899: 0.06
1889: 0.09
1879: 0.13
1871: 0.17

So if the error rate is high, a fielder owns more responsibility for runs scoring than he does when the error rate is low. Right, so let's pull out the errors and just look at a pitcher's responsibility when errors are not included.
2019: 80%
2009: 78%
1999: 79%
1989: 77%
1979: 77%
1969: 78%
1959: 79%
1949: 78%
1939: 76%
1929: 75%
1919: 72%
1909: 72%
1899: 73%
1889: 74%
1879: 64%
1871: 68%

Pitcher's responsibility for non-error runs doesn't reach 70% until 1885, and it's higher in 1871 than at any point until 1884 (no UA). But why would this be when we've eliminated the errors? Because, partially, strikeout rates. Here's K rates:
2019: 0.22
2009: 0.17
1999: 0.16
1989: 0.14
1979: 0.12
1969: 0.15
1959: 0.13
1949: 0.09
1939: 0.09
1929: 0.07
1919: 0.08
1909: 0.11
1899: 0.06
1889: 0.09
1879: 0.07
1871: 0.02

Notice the deleterious effect of the mound moving back in 1893 on K rates. But also notice the jump by 1909? That's because of the foul-strike rule being put into place in 1903 for both leagues. You can also see that the lively ball had a big effect on Ks. In the deadball era, no one had to worry about missing badly near the middle of the plate because the ball was so mushy it didn't matter if someone got a good swing on it. Beside which, most players choked up anyway. When suddenly the homer became a frequent occurrence, pitchers began moving to the edges of the plate. You see a rise in walk rates during this period as well. (What you don't see here is that the k rate was about 0.10/BF from 1903 to 1916.) Until baseball enters its modern period beginning in the late 1950s/early 1960s strikeouts don't recover.

Here I want to break out the k-rates for game prior to the mound moving:
1892: 0.08
1891: 0.09
1890: 0.09
1889: 0.09
1888: 0.10
1887: 0.07
1886: 0.11
1885: 0.10
1884: 0.12 (no UA)
1883: 0.09
1882: 0.08
1881: 0.07
1880: 0.08
1879: 0.07
1878: 0.08
1877: 0.05
1876: 0.03
1875: 0.02
1874: 0.02
1873: 0.02
1872: 0.02
1871: 0.02

Things to consider:
1) NA pitchers and early NL pitchers are initiators. The strikeout doesn't appear to arrive as a weapon until late in the 1870s. It looks more like a happenstance before that. I know that's not enitrely accurate, after all we have descriptions of Cummings and Zettlein and their craftiness and their speed, but those guys aren't fooling anyone into swinging and missing.

2) While K rates during these years LOOK LIKE K rates after, what this list doesn't account for is that pitchers stood ten feet closer to the plate. If MLB moved the mound ten feet closer, I suspect the K rate would, at a minimum, double.

3) From 1882 to 1884, expansion occurred in all three years. That's when K rates really jumped.

So, why else don't early pitchers have as much responsibility for runs? Right, HR rates
2019: 0.038
2009: 0.029
1999: 0.030
1989: 0.020
1979: 0.023
1969: 0.021
1959: 0.024
1949: 0.018
1939: 0.015
1929: 0.014
1919: 0.005
1909: 0.003
1899: 0.005
1889: 0.008
1879: 0.002
1871: 0.004

Starting pitchers in MLB did not give up 100 homers en toto until 1881. They averaged 0.11 HR/9 from 1871-1881. From 1882-1892, they averaged 0.23 HR/9. As a point of comparison, from 1904-1919, MLB averaged the same. In 2019, the figure was 1.4/9! Which means there's less risk in putting the ball over the plate for the early guys, especially as error rates dropped.

OK, so this is a lot of stuff to digest. Here's the big takeaways for me:
1) 1870s pitchers really were initiators. They didn't strike batters out, they didn't have to worry about home runs, and the game at that time was about pitching to contact. Even when we take into account their error-prone defenses, they're still not responsible for near as much of the outcome as their later historical peers.

2) 1880s pitchers did increase k rates, and HR rates increased VERY gradually, and error rates continued to decline, but pitchers in this time were nonetheless pitching when HR rates were comparable to the deadball era, and K rates were about two-thirds to three-quarters of the deadball era, even though these guys pitched at a 50 foot distance and underhanded/sidearm. In my opinion, these guys were still initiators to a greater degree than anytime since.

3) The obvious and most important differentiator for these guys is their in-season durability. But I would again point back to a deadball environment and the 50 foot box and suggest that we think about that durability a little differently. Throwing overhand from 60 feet with speed and control MUST be more difficult than throwing under/side from 50 feet with speed and control. We can see this from the fact that the highest innings total after 1893 was 464 (Ed Walsh, 1908) was 68% of the highest innings total prior to 1893 (Will White, 680 in 1880). The average MLB-leading innings total from 1893-1917 was 399, about 66% of the MLB average league-leading total from 1874 (when the schedule lengthened enough to ramp up the innings totals)-1892. Which, in my mind, means that the toll on the arm was very different from the shorter distance, which allowed the earlier pitchers to work more often. Combine it with less emphasis on strikeouts and a substantially lower risk of homers, and it's a completely different ballgame.

So I advocate for

a) separating pre-1892 pitchers from those that came after rather than combining them into a single candidate pool because the game they played was so incredibly different

b) considering whether we have ample representation already among pitchers in the game's first 19 years given the size of the leagues involved

c) focusing on the fact that these early pitchers were much more like initiators than those who came after 1893, which requires different and less demanding skills than later pitchers and which incurs less overall responsibility for runs

d) treating the bulk innings with less emphasis because they are a result of the specific set of rules in play at that time and are not at all consistent with the other 120 years of baseball history.

We strive to be fair to all players. I think we have been less fair to post-1893 pitchers than we have to pre-1892 pitchers relative to the size of the candidate pool they each come from and the more difficult rule set faced by post-1893 pitchers. If folks think we are under-electing pitchers generally, the answer is not more olde tymers. The answer is to refocus on pitchers from afterward, the ones who are actually getting the shaft.

Respectfully,
Doc C.
   227. Dr. Chaleeko Posted: April 11, 2020 at 04:49 PM (#5938659)
Grab bag!

Re: Cravath's fielding
DRA and Rfield disagree in degree about what kind of fielder he was. Rfield says he's about -3/year, and DRA has him closer to average. I think the BB likely depresses his fielding stats a la Fenway to LFs. Also, it probably inflates his assists totals as it did Chuck Klein', though Cravath was reputed to have a good arm. There are additional potential explanations for why Cravath didn't stick with the Sox. Cravath earned a lot of value from walks, and OBP might have even been invented yet. No one likely emphasized walks at the time, so his bat wouldn't have seemed as valuable as it really was to his teams. Also, Cravath was a slow runner, which would certainly lead to questions about his defense. However, being a slow runner may not prevent a player from getting good jumps and good reads on balls in play. It is very possible that Cravath was both not an average fielder and also not as bad a fielder as his reputation for slow-footedness implies. People's eyes are notoriously unreliable about fielding.

Re: Santana, Oswalt, et. al.
I think it's entirely possible to have Santana in and Oswalt out because Santana is close to the in/out line, and Oswalt, for me, is just on the other side of it. However, I have Hudson, Pettitte, and Buhrle juuuuuust on the in side of the line. Pettitte is between Santana and the line, and his postseason bulk does give him about a year's worth of bulk at the league-average rate in my system. So two additional WAR. BTW: I think of him as the Jeff Kent of pitchers: Three very good years, though not quite MVP type years, plus a long skein of three and four WAR years. Red Faber is similar but ranks higher because his top two years are much better than Pettitte's. Hudson and Burhle are the last two guys on the good side of my in/out line. I like Cliff Lee a lot as a Phils phantom, but he really needed one more good season for me to get hm over the line. He's kind of the Jason Giambi or Cesar Cedeno of pitchers in this sense.

Re 3:8
I concur with Chris Cobb, who it is generally wise to agree with, that the golden ratio is 3:8, pitchers to hitters. That's a 30%/70% split, and I use it to calculate how many players we would induct to be theoretically balanced at each position. With 272 honorees, that means 198 hitters and 74 pitchers, or roughly 25 players per fielding position and 74 pitchers.

Re: Fred Dunlap
I give him his career average for the year. It's a lot easier that way. Dunlap played in a weak league for the runaway best team. A simple discount may not be enough for that combination of factors. Similarly I give Jack Glasscock and Jim McCormick their career average for their time there.

Re early baseball STDEVs
I forgot to put that in my big long post of moments ago, but yes, absolutely. STDEVs were cray-cray in the early game. For pitchers, the average STDEV for WAA/IP from 1871-1892 was 0.0437. Ever after it is 0.0359. That's based on pitchers who got with 25% of qualifying for the ERA title.

Love all the good discussion happening now! A respite from COVID. Hope everyone's family and friends and selves are OK and handling sheltering well.
   228. Eric J can SABER all he wants to Posted: April 11, 2020 at 04:51 PM (#5938661)
Pitcher's responsibility for non-error runs doesn't reach 70% until 1885, and it's higher in 1871 than at any point until 1884 (no UA). But why would this be when we've eliminated the errors? Because, partially, strikeout rates.

Also partially walk rates - the NL in 1876 required nine balls for a walk, gradually reduced over the next decade-plus to four in 1889. The NL's BB/PA in selected years over that time compared to balls per walk:

1876: 1.6% (9 balls)
1880: 3.0% (8 balls)
1885: 5.6% (6 balls)
1890: 10.0% (4 balls)

Control is less of a differentiator between pitchers when the penalty for poor control is reduced.
   229. epoc Posted: April 11, 2020 at 05:11 PM (#5938667)
Kiko,

It turns out I actually can get invested in the Santana/Lee question. I'm curious why my system disagrees with yours about these two, and especially if we're going to be talking as a group about contemporary pitchers, I'd like to figure out where we're disagreeing, what that means for my placement of Lee, and whether there are biases or weaknesses in my system that these players expose.

Fortunately, these players' primes overlap for 2008-10. My system agrees with yours about 2008 (Santana was great but Lee was better) and to a lesser extent 2010 (Santana was good, Lee was great), but there seems to be some disagreement about 2009. By pWOPA you have Santana decisively better (2.7 to 1.8), while by eWOPA you have Lee better by a similar margine (2.6 to 1.4). My system has Lee *way* better in 2009. He has a z-score of 3.07 and is in the top 3% of pitchers for that season, while Santana has a z-score of 1.12, good but below all-star level. I also have Lee significantly better by both RA9 z-score (2.57 to 1.47) and FIP (3.57-0.77), so it's not necessarily that I'm favoring the eWOPA-style evaluation over the pWOPA.

So let's try to figure out where we're diverging.

At the most basic level, Santana threw 166.1 innings in '09 with an ERA of 3.13, RA9 of 3.62, and an FIP of 3.79. Lee threw 231.2 innings with an ERA of 3.22, an RA9 of 3.42, and an FIP of 3.11. So Lee threw 39% more innings at a similar rate for runs and a way better rate for FIP. Welp, honestly, I would need to hear a pretty convincing argument for a stat that values Santana's 2009 better than Lee's.

Since that didn't prove too useful, let's compare Santana's fifth-best season to Lee's third-best, since you claim that the former is better than the latter. Our systems agree that Lee's third-best season is 2010. By pWOPA, Santana's '04 and '08 are tied for fourth-best, while my system has '08 as clearly ahead, so let's use that one.

pWOPA has Santana's '08 at 4.3 and Lee's 2010 at 3.9. eWOPA says 3.1 to 4.9 in Lee's favor, which kind of throws a wrench in the whole discussion but let's ignore it for now and assume that you exclusively trust pWOPA in your evaluations. My system has Santana '08 at 2.98 and Lee '10 at 3.42.

In '10, Lee threw 212.1 innings with an ERA of 3.18 (75 ERA-) and an FIP of 2.58 (61 FIP-). His RA9 was 3.56, which BB-Ref pegs at 26 runs above average given his opponents, parks, and defenses. In '08, Santana threw 234.1 innings with an ERA of 2.53 (61 ERA-) and an FIP of 3.51 (82 FIP-). His RA9 was 2.84, which generated 48 runs above average according to BB-Ref.

So Lee threw about 9% fewer innings with a significantly worse rate of runs allowed but a significantly better FIP. Based on that, I can see why eWOPA rates Lee's 2010 so much better than Santana's 2008, but I wonder why the difference in pWOPA is relatively slim.

As for my system, the reason it evaluates Lee better than Santana for these seasons is because of good old standard deviations. By percentage relative to average, Lee's advantage in FIP is only slightly better than Santana's advantage in ERA. And the same holds true in terms of runs. By RA9-based runs, I have Santana's '08 at 45.7 and Lee's '10 at 27.7. By FIP-based runs, it's the opposite: Lee has 38.9 runs to Santana's 20.0. If you average them out, the two pitchers are nearly identical: 32.8 to 33.3. But the SD for FIP is always much lower than the SD for RA9, because (to put it simplistically) RA9 has 50% more variables and thus much more variability. In these seasons specifically, the SD for RA9 runs was 11.8 in '10 and 12.6 in '08, while the SD for FIP runs was around 70% less: 8.53 in '10 and 8.39 in '08. So while the runs are similar, the z-scores favor the one with the better FIP.

So I have a few thoughts based on this analysis. One is that I simply don't understand some of your ratings. pWOPA's valuation of Lee and Santana's respective 2009 seasons doesn't pass the smell test for me, and I'd be interested to hear what you think I'm overlooking. Secondly, some of the differences in our systems may be due to my use of FIP and standard deviations. I'll be interested to hear if you agree that that's likely the case.

My final thought is to question how I am using FIP and SD. That I should use both in some capacity is (for me) undeniable, but it's possible I'm not using them the right way. Perhaps I should average FIP and RA9 runs and find z-scores based on that average, rather than finding separate z-scores and averaging them, since the two are related rather than independent. It's also possible that a 50/50 weighting isn't the best way to balance the two. I am going to spend some time thinking about this, and in the meantime if anyone has any thoughts I'd be happy to hear them.
   230. epoc Posted: April 11, 2020 at 05:25 PM (#5938675)
Dr. Chaleeko,

Awesome stuff on pre-1893 pitching.

About Dunlap and other UA players, it may be easy to give them career average years for 1884, or ignore 1884 altogether, but if we are going to do what is easy rather than seriously evaluating the evidence then what are we doing here? For Dunlap's 1884, there are two important things to think about:

1) The season occurred in Dunlap's age-25 season in the middle of a prime in which he demonstrated star-level MLB ability.
2) Dominating lesser competition at the level he did (214 wRC+) is highly significant.

It's possible that simple discounts are not the best way to treat UA. I am certainly open to other suggestions. What I can say for sure is that neither ignoring 1884 nor treating it as a career-average year are the best way to deal with it. At the very least he deserves another star-level season, since that was his established NL performance level at the time.

But I wonder what you think about my suggested method of developing a projection based on prior performance and then figuring out a proper regression level to discount his performance toward expectation?
   231. Dr. Chaleeko Posted: April 11, 2020 at 05:55 PM (#5938683)
Epoc,

If I do nothing with 1884, I get 9.0 WAR in 640 sked adj PA. If I remove those from his career totals I get:
41.5 WAR
5319 est PA

If I plugged that into his 640 est PA I get 4.2 WAR.

OK, I think that's a little low given his surrounding seasons. I think it's fair to give him an All-Star level year, call it 5.0 WAR. It's not terribly consequential because he'd need a seven WAR season to get to the in/out line, and that's out of the question.

Also, while I'm commenting, in post 227 I incorrectly stated the threshold I used for inclusion in my STDEV calculation. That was for an earlier version. I used all pitchers with 1.0 or more innings to derive the figures I showed.

Also, also, Shout out to Eric J for picking up the walks information
   232. Jaack Posted: April 11, 2020 at 07:18 PM (#5938699)
Re: Santana, Oswalt, et. al.
I think it's entirely possible to have Santana in and Oswalt out because Santana is close to the in/out line, and Oswalt, for me, is just on the other side of it. However, I have Hudson, Pettitte, and Buhrle juuuuuust on the in side of the line. Pettitte is between Santana and the line, and his postseason bulk does give him about a year's worth of bulk at the league-average rate in my system. So two additional WAR. BTW: I think of him as the Jeff Kent of pitchers: Three very good years, though not quite MVP type years, plus a long skein of three and four WAR years. Red Faber is similar but ranks higher because his top two years are much better than Pettitte's. Hudson and Burhle are the last two guys on the good side of my in/out line. I like Cliff Lee a lot as a Phils phantom, but he really needed one more good season for me to get hm over the line. He's kind of the Jason Giambi or Cesar Cedeno of pitchers in this sense.


It's certainly possible to draw the line between Oswalt and Santana, but Santana received 19 votes in the last election and Oswalt received just four. That puts Santana on the doorstep of election this year and Oswalt on pace to never get elected. That's a pretty big gap for pitcher who pitched at the exact same time, pitched about the same number of innings, and are very close by just about every uberstat. I have a mild preference for Oswalt because I lean more towards FIP-WAR than most other voters and I give Oswalt a bit more postseason credit, but the gap between them is pretty small.

I'd rank the group of borderline-ish pitchers from this era as

1. Oswalt
2. Santana
3. Pettitte
4. Hudson
5. Lee
6. Vazquez
7. Buehrle

The top four are pretty clustered around the borderline.
   233. epoc Posted: April 11, 2020 at 07:33 PM (#5938704)
More on the UA. Bill James has a list of UA players who were "legitimate" major league players. This is obviously an incomplete and subjective list, but I trust James on this account for our purposes. Of those players, 11 played in the NL between 1883-85. That is a very small sample, but again it's all we can do for the moment. In the NL from '83-85, those players collectively posted a wOBA of .285 over 4432 PA. In the UA in '84 they posted a wOBA of .354 in 3226 UA PA. So based on that sample, UA competition increased wOBA by 124%. Stated the opposite way, you would reduce UA wOBA by 19.5% to get an equivalent NL wOBA.

A subset of six players played for the 1884 St. Louis Maroons, the "super team" feasting on the lesser UA competition. Those six players compiled a .296 wOBA in 2229 NL PA from 1883-85 and .391 in 1810 UA PA. Based on that sample, UA increased wOBA 132% (implying a 24% deduction for UA to NL conversion).

Imposing that larger 24% deduction on Dunlap's 1884 wOBA brings it from .477 to .361. Converting wOBA to runs (using 1884 seasonal constants available at fangraphs) yields 30.5 batting runs above average for Dunlap's NL-equivalent 1884.* Assuming his fielding numbers don't require conversion (for reasons which I assume don't require explanation), that puts Dunlap's 1884 at about 2.36 SD above average on offense and 1.57 SD above average on D. The way I weight those two makes it a 2.91 SD season, third-best in baseball for the season, a down-ballot MVP type year historically.

In terms of WAR, that's 30 batting runs + 10 fielding + 2 positional = 42 runs above average. He played 101 of his team's 113 games, which translates to 145 games in a 162-game season, an increase of 43%. 42 runs x 1.43 = 60 runs above average + 20 replacement runs = 80 RAR. BB-Ref's conversion factor for 1884 seems to be around 10.9 runs per win, so 80/10.9 = 7.3 WAR, again an MVP-ish season.

The small sample size makes this a less-than-robust analysis, but it's further evidence that far from ignoring Dunlap's 1884 or treating it as just another year, it should be taken seriously as an MVP-level season, even if it's not as massively awesome as it appears at first glance.

*This is a much bigger reduction than I was using, by the way. I was using a 25% deduction for hitting+baserunning runs, which credited him with 58 runs above average. 30.5 batting runs is about a 60% deduction. I'll have to think more about how I want to handle it going forward, but it does seem pretty clear that I was underselling the necessary deduction.
   234. Chris Cobb Posted: April 11, 2020 at 09:41 PM (#5938722)
Jaack wrote:

It's certainly possible to draw the line between Oswalt and Santana, but Santana received 19 votes in the last election and Oswalt received just four. That puts Santana on the doorstep of election this year and Oswalt on pace to never get elected. That's a pretty big gap for pitcher who pitched at the exact same time, pitched about the same number of innings, and are very close by just about every uberstat. I have a mild preference for Oswalt because I lean more towards FIP-WAR than most other voters and I give Oswalt a bit more postseason credit, but the gap between them is pretty small.

When we've cleared out the fairly obvious HoMers and all the players we are considering are close to the all-time in-out line, the small size of the ballot magnifies small differences in quality. There are a lot of very good players, all with similar merit, across 130 years of baseball history, who are in competition for ballot spots. Just putting the top unelected player from each decade in the history of professional baseball from 1871 through 2010 would almost fill a ballot! That's not exactly how it works, but any two players from the same time decade who are next to each other in rank order are quite likely to be separated by 10 players in an all time ranking. I won't use Santana/Oswalt as an example from my rankings, because they're quite farther apart than they are for Jaack, so I'll look at Santana/Hudson. These two pitchers are separated by 5.3 points in my system: 120.6 for Santana and 115.3 for Hudson. They are adjacent in my pitcher ranking for this decade. When position players from the 2000 decade are added in, three unelected players slot between them: Jason Giambi, Brian Giles, and Jeff Kent. When the 1990s are added, two more players slip between: pitchers Kevin Appier and Chuck Finley. The 1980s add Orel Hershiser. At this point, before we even consider players from earlier eras, Hudson isn't going to make my ballot unless Santana is in the top half. I think we've covered the first hundred years of the game pretty thoroughly, but I think there's still a few under-acknowledged deadball era stars out there, so I am likely to bring a couple of players like Tinker, Fletcher, Taylor, or Shocker onto my ballot. Two contemporary pitchers who are adjacent in my ranking of 2000s pitchers are going to be 9 slots apart in my overall rankings.

That's more or less the degree of separation we see between 2000s pitchers on the 2020 ballot.

Santana -- 6th
Pettitte -- 17th
Oswalt -- 30th

Those gaps are maybe wider than they should be, but then they might be wider because Hudson and Buerhle are hidden spacing between them and will slip into those gaps in 2021.

All that's to say that I don't think the fact that Santana got 17 votes in 2020 and Oswalt got 3 votes is necessarily evidence that the electorate is vastly underrating Oswalt: that's more or less the kind of separation we might expect to find between two pitchers from the same decade who are close in value.

Oswalt's problem, as I see it, is that, historically, pitchers with his career IP numbers must have a strong case to be the best pitcher in baseball during their peak to have a strong case for HoM election. Oswalt doesn't have a strong case for best pitcher in baseball at his peak; Santana does. As pitcher career length changes, Oswalt's career length should perhaps not be viewed as being too short; someone might try making that argument. Or maybe by FIP he does look like the best pitcher in baseball during his peak. Oswalt has the misfortune, compared to Santana, of having a peak in the early 2000s instead of the later 2000s, so his in-season comparison set is harder. Maybe that's unduly affecting the electorate's view of him. That's not what my system sees, but I'm not sure my system is properly calibrated for post-2000 pitchers.

I'd like to see the ways Santana, Pettitte, Oswalt, Hudson and Buerhle can get lined up. Post-2000 pitchers are not at all easy to evaluate. If nothing changes, we're going to elect Santana this year, so it would be good to make sure that's the right choice, and along the way we can try to get the others in the right order as well, and see where the in-out line falls.
   235. Jaack Posted: April 13, 2020 at 12:34 PM (#5939151)
I'd like to see the ways Santana, Pettitte, Oswalt, Hudson and Buerhle can get lined up. Post-2000 pitchers are not at all easy to evaluate. If nothing changes, we're going to elect Santana this year, so it would be good to make sure that's the right choice, and along the way we can try to get the others in the right order as well, and see where the in-out line falls.


Ask and you shall receive:


Fangraphs FIP-WAR

1. Andy Pettitte – 68.2
2. Roy Oswalt – 52.6
3. Mark Buehrle – 52.3
4. Tim Hudson – 48.9
5. Johan Santana – 45.6

Individual Season rankings

1. 1997 Pettitte – 7.2
2. 2005 Santana – 7.1
3. 2004 Santana – 6.8
4. 2006 Santana – 6.7
5. 2004 Oswalt – 6.5
6. 2002 Oswalt – 6.2
T7. 2005 Oswalt – 6.1
T7. 2006 Oswalt – 6.1
9. 2005 Buehrle – 5.9
T10. 2003 Hudson – 5.8
T10. 2001 Pettitte – 5.8
12. 2008 Santana – 5.2
T13. 2001 Hudson – 5.1
T13. 2003 Pettitte – 5.1
T15. 2007 Hudson – 4.9
T15. 2007 Oswalt – 4.9
17. 2002 Hudson – 4.7
T18. 2004 Hudson – 4.6
T18. 1996 Pettitte – 4.6
20.2004 Buehrle – 4.5
21. 2010 Oswalt – 4.4
T22. 2002 Buehrle – 4.3
T22. 2003 Buehrle – 4.3
T24. 2001 Oswalt – 4.2
T24. 2008 Buehrle – 4.2

Relative Prime/Peak Rankings

1. Oswalt 119
2. Santana 86
3. Pettitte 63
4. Hudson 57
5. Buehrle 33

Fangraphs RA9-WAR Rankings

1. Tim Hudson – 63.0
2. Andy Pettitte – 61.6
3. Mark Buehrle – 61.4
4. Johan Santana – 54.8
5. Roy Oswalt – 53.9

Individual Season Rankings

1. 2004 Santana – 9.0
2. 2006 Santana – 8.1
T3. 2005 Santana – 7.8
T3. 2003 Hudson – 7.8
T3. 2005 Pettitte – 7.8
6. 2008 Santana – 7.6
7. 1997 Pettitte – 7.5
T8. 2006 Oswalt – 7.1
T8. 2002 Hudson – 7.1
10. 2005 Oswalt – 6.8
T11. 2002 Oswalt – 6.4
T11. 2010 Hudson – 6.4
13. 2010 Oswalt – 6.3
14. 2001 Buehrle – 6.0
T15. 2002 Buehrle – 5.9
T15. 2005 Buehrle – 5.9
T15. 2007 Hudson – 5.9
T18. 2007 Oswalt – 5.8
T18. 2007 Santana – 5.8
20. 2010 Santana – 5.7
T21. 2004 Oswalt – 5.4
T21. 2001 Hudson – 5.4
23. 2007 Buerhle – 5.2
24. 1996 Pettitte – 5.1
25. 2004 Buehrle – 5.0

Relative Prime/Peak Rankings

1. Santana 106
2. Oswalt 75
3. Hudson 72
4. Pettitte 44
5. Buehrle 38

BBRef Pitching WAR Rankings

1. Andy Pettitte – 60.7
2. Mark Buehrle – 60.0
3. Tim Hudson – 56.5
4. Johan Santana – 51.1
5. Roy Oswalt – 49.9

Individual Season Rankings

1. 2004 Santana – 8.7
2. 1997 Pettitte – 8.4
3. 2006 Santana – 7.6
4. 2003 Hudson – 7.4
5. 2005 Santana – 7.2
6. 2008 Santana – 7.1
7. 2002 Oswalt – 7.0
8. 2002 Pettitte – 6.9
9. 2005 Pettitte – 6.8
10. 2007 Oswalt – 6.6
11. 2007 Buehrle – 6.1
12. 2001 Buehrle – 6.0
T13. 2005 Oswalt – 5.9
T13. 2006 Oswalt – 5.9
15. 2010 Hudson – 5.8
16. 1996 Pettitte – 5.6
17. 2009 Buehrle – 5.3
T18. 2002 Buehrle – 5.0
T18. 2007 Santana – 5.0
20. 2005 Buehrle – 4.8
T21. 2001 Oswalt – 4.7
T21. 2010 Santana – 4.7
T21. 2007 Hudson – 4.7
24. 2001 Hudson – 4.5
25. 2008 Buehrle – 4.4

Relative Prime/Peak Rankings

1. Santana 102
2. Pettitte 69
3. Oswalt 66
4. Buehrle 53
5. Hudson 40

Kiko’s pWORL Rankings

1. Andy Pettitte – 67.4
2. Tim Hudson – 64.5
3. Roy Oswalt – 51.1
4. Johan Santana – 50.2
5. Mark Buehrle – 49.0

Individual Season Rankings

1. 2004 Santana – 9.0
2. 2006 Santana – 8.0
3. 2003 Hudson – 7.9
4. 2005 Santana – 7.8
5. 2005 Oswalt – 7.5
6. 1997 Pettitte – 7.4
7. 2005 Pettitte – 6.8
8. 1996 Pettitte – 6.1
T9. 2006 Oswalt – 6.0
T9. 2008 Santana – 6.0
11. 2005 Buehrle – 5.9
12. 2002 Oswalt – 5.7
T13. 2001 Hudson – 5.6
T13. 2001 Buehrle – 5.6
15. 2003 Santana – 5.5
T16. 2002 Hudson – 5.4
T16. 2007 Hudson – 5.4
T18. 2001 Oswalt – 5.3
T18. 2008 Oswalt – 5.3
T20. 2010 Hudson – 5.2
T20. 2007 Oswalt – 5.2
T22. 2010 Oswalt – 5.0
T22. 2012 Hudson – 5.0
T22. 2003 Pettitte – 5.0
T25. 2001 Pettitte – 4.6
T25. 2007 Santana – 4.6

Relative Prime/Peak Rankings

1. Santana 101
2. Oswalt 78
3. Hudson 66
4. Pettitte 62
5. Buehrle 28

Kiko’s eWORL Rankings

1. Tim Hudson – 64.1
2. Andy Pettitte – 59.2
3. Roy Oswalt – 49.7
4. Mark Buehrle – 46.3
5. Johan Santana – 46.0

Individual Season Rankings

1. 2004 Santana – 8.5
2. 2005 Santana – 7.2
T3. 2006 Santana – 7.1
T3. 2007 Hudson – 7.1
5. 2005 Oswalt – 7.0
T6. 2003 Hudson – 6.8
T6. 1997 Pettitte – 6.8
8. 2002 Oswalt – 6.5
9. 2005 Pettitte – 5.7
T10. 2005 Buehrle – 5.3
T10. 2010 Oswalt – 5.3
12. 2004 Oswalt – 5.2
13. 2010 Hudson – 5.1
T14. 2011 Hudson – 5.0
T14. 1996 Pettitte – 5.0
T14. 2006 Oswalt – 5.0
T17. 2008 Oswalt – 4.9
T17 2007 Santana – 4.9
19. 2008 Santana – 4.8
20. 2001 Hudson 4.7
T21. 2004 Hudson – 4.6
T21. 2007 Oswalt – 4.6
T21. 2001 Pettitte – 4.6
24. 2008 Buehrle – 4.5
25. 2003 Pettitte – 4.4

Relative Prime/Peak Rankings

1. Oswalt 95
2. Santana 88
3. Hudson 79
4. Pettitte 55
5. Buehrle 18

BaseballProspectus WARP Rankings

1. Tim Hudson – 63.4
2. Andy Pettitte – 60.9
3. Roy Oswalt – 58.8
4. Johan Santana – 53.5
5. Mark Buehrle – 34.2

Individual Season Rankings

1. 2004 Santana – 8.7
2. 2005 Santana – 8.4
3. 2006 Santana – 7.7
4. 2003 Hudson – 7.5
5. 2002 Oswalt – 7.3
T6. 2005 Oswalt – 7.0
T6. 2005 Pettitte – 7.0
8. 2007 Santana – 6.8
T9. 2001 Hudson – 6.6
T9. 2004 Oswalt – 6.6
11. 2007 Hudson – 6.3
T12. 2003 Pettitte – 6.2
T12. 2006 Oswalt – 6.2
T14. 2006 Pettitte – 5.7
T14. 2008 Santana – 5.7
T16. 2000 Hudson – 5.6
T16. 2007 Oswalt – 5.6
T16. 2010 Oswalt – 5.6
T16. 2005 Buehrle – 5.6
T20. 1997 Pettitte – 5.1
T20. 2001 Pettitte – 5.1
22. 2008 Oswalt – 5.0
23. 2003 Santana – 4.9
24. 2001 Oswalt – 4.7
25. 2004 Buehrle – 4.6

Relative Prime/Peak Rankings

1. Santana 105
2. Oswalt 98
3. Hudson 64
4. Pettitte 58
5. Buehrle 11

Baseball Gauge Pitching WAR

1. Andy Pettitte – 64.6
2. Tim Hudson – 61.1
3. Mark Buehrle – 58.5
4. Johan Santana – 54.7
5. Roy Oswalt – 48.5

Individual Season Rankings

1. 2004 Santana – 9.2
2. 2006 Santana – 7.9
3. 1997 Pettitte – 7.7
T4. 2003 Hudson – 7.6
T4. 2005 Santana – 7.6
6. 2008 Santana – 7.3
T7. 2010 Hudson – 6.7
T7. 2002 Hudson – 6.7
9. 2002 Oswalt – 6.5
10. 2005 Pettitte – 6.4
11. 2006 Oswalt – 6.3
12. 2001 Buehrle – 6.1
13. 2010 Oswalt – 6.0
14. 2007 Santana – 5.7
T15. 2007 Oswalt – 5.6
T15. 1996 Pettitte – 5.6
T15. 2005 Buehrle – 5.6
18. 2007 Hudson – 5.4
T19. 2010 Santana – 5.2
T19. 2007 Oswalt – 5.2
T19. 2002 Buehrle – 5.2
T19. 2007 Buehrle – 5.2
23. 2001 Hudson – 5.1
24. 2003 Santana – 5.0
25. 2009 Buehrle – 4.8

Relative Prime/Peak Rankings

1. Santana 100
2. Hudson 71
3. Pettitte 52
4. Oswalt 52
5. Buehrle 42

Overall Career Average Rankings

Andy Pettitte – 63.2
Tim Hudson – 60.2
Roy Oswalt – 52.7
Mark Buehrle – 51.7
Johan Santana – 50.8

Overall Relative Prime/Peak Rankings

Johan Santana – 98.3
Roy Oswalt – 83.2
Tim Hudson – 64.1
Andy Pettitte – 57.6
Mark Buehrle – 31.9

The Prime/Peak rankings are simple – 25 points for first place, 24 for second and so on.

My thoughts:
The first thing that stands out is how poorly Buehrle does. While his very poor showing in BPro’s WARP hurts him a lot, even excluding that, his career score is quite poor for a candidate with as low of a peak as him. At the very least, he’s clearly behind Hudson and Pettitte – the only metric that prefers him to either one of them in either career or peak is BBRef.

I know some people have argued that Buehrle was a likely candidate to beat his peripherals because of his personal defense and pickoff move. That seems reasonable. But at the same time, I want to heed caution – BBRef is the primary pitching metric for a number of voters, and those voters in particular should take a second look if they are planning on voting for him.

I think Pettitte versus Hudson is an interesting point of debate. The actually have similar peaks, with one season that seems to stand out for each – Hudson’s 2003 versus Pettitte’s 1997. Hudson has more of the 5ish WAR seasons while Pettitte makes up on the back end wit ha lot of 3-4 WAR seasons. The one big difference that doesn’t show up here is postseason play – both guys were basically the same pitcher in the postseason as they were in the regular season, but Pettitte had about 200 more innings. At the very least, that should push Pettitte’s career numbers up a bit.
   236. Jaack Posted: April 13, 2020 at 12:37 PM (#5939155)
My relate question - does anyone know where to get quality pitcher defense numbers? BBRef has DRS for pitchers, but that doesn't work before 2002.
   237. Kiko Sakata Posted: April 13, 2020 at 01:51 PM (#5939198)
My relate question - does anyone know where to get quality pitcher defense numbers? BBRef has DRS for pitchers, but that doesn't work before 2002.


I'm biased, so I'll let others decide the "quality" of the numbers. But I have fielding numbers for the past 100 years or so from my Player won-lost records. Here's my top (and bottom) 100 in total fielding wins, net fielding wins (wins minus losses), and fielding wins over replacement level. Generally speaking, the numbers aren't very big (the top career net fielding numbers are +2.6 wins - so around +26 runs give or take - Zach Grienke and Livan Hernandez). Also, if you're using an RA-based system, you probably don't want to include pitcher fielding because it should already be incorporated into the pitcher's runs allowed.

Buerhle is only +0.5 net wins for his career (he's #80 in total wins in the above link; he doesn't make the top 100 in the other two measures). Tim Hudson actually rates better than him (+0.8 net wins for his career). None of the other guys being talked about are in the link, except for Pettitte who shows up for being an historically BAD fielder (-1.4 career net wins).

These fielding numbers are just fielding on balls in play. The extent to which pitchers controlled the running game is classified as "pitching" in my system - Component 1, specifically, the leaders of which can be found here. Buerhle and Pettite were both excellent at this - top 10 all-time in net Component 1 pitching wins (3rd table down; pitchers are in the middle).
   238. progrockfan Posted: April 13, 2020 at 02:45 PM (#5939233)
@Jaack: "Santana received 19 votes in the last election and Oswalt received just four. That puts Santana on the doorstep of election this year and Oswalt on pace to never get elected. That's a pretty big gap for pitcher who pitched at the exact same time, pitched about the same number of innings, and are very close by just about every uberstat."

With due respect, Jaack, I can't see it that way, by traditional or advanced metrics:

ERA: Santana 3.20, Oswalt 3.36
ERA+: Santana 136, Oswalt 127
WinPct: Santana .641, Oswalt .615
H/9: Santana 7.7, Oswalt 8.8
WHIP: Santana 1.132, Oswalt 1.211

ERA titles: Santana 3, Oswalt 1
ERA+ titles: Santana 3, Oswalt 0
FIP titles: Santana 3, Oswalt 0
IP titles: Santana 2, Oswalt 0
SO titles: Santana 3, Oswalt 0
H/9 titles: Santana 3, Oswalt 0
WHIP titles: Santana 4, Oswalt 1
Cy Young Awards: Santana 2, Oswalt 0

I'm not saying Oswalt isn't a worthy candidate - that's not my point at all; I'm saying that insofar as I can determine, there is indeed a "pretty big gap" between the two.

Santana will likely have an Elect Me slot on my ballot this year, while Oswalt will probably make my ballot, but likely not in the top 10.
   239. Dr. Chaleeko Posted: April 13, 2020 at 05:57 PM (#5939344)
Also, I don’t know if Jaack was including batting or not, but here’s everyone’s career totals as well as Cliff Lee’s:
Hudson: 1.3 WAR
Lee: 0.7
Santana: 0.6
Oswalt: 0.1
Pettitte: -0.5
Buehrle: -1.0
   240. Chris Cobb Posted: April 13, 2020 at 08:10 PM (#5939385)
Jaack, thank you for posting all this data! There's much to consider here.

Two quick take-aways.

These pitchers are quite different in career profile but quite similar in apparent merit because of the inverse relationship between peak and career in the group. Assessments are going to have to be very fine-grained.

Buerhle does stand out as being below the rest of the group on the basis of collating peak and career results from all the available comprehensive metrics.

That said (and this is not an endorsement of Buerhle), I am inclined to think that BP's analysis of Buerhle's value indicates more that their "deserved runs allowed" is a junk stat--at least for HoM purposes--than it shows that Buerhle falls well below the rest of the comparison set. BP doesn't come close to passing the sniff test when it calculates that a pitcher who by RA9 earned 61.4 WAR and by FIP earned 52.3 WAR should actually be credited with only 34.2 wins above replacement. (The comparison with FIP is particularly mind-bending . . . ) How do they account for the disappearance of all the runs that led to those 18 or 27 missing wins??? I find myself imagining the mutterings of frustrated sabermetricians looking at Mark Buerhle's success without any visible reason for it: "Darn that Mark Buerhle and his secret, extra-dimensional run eliminator! He doesn't deserve to be saving all those runs using secret, space technologyl. I know! We'll invent a new statistic that exposes Buerhle for the secret cheater that he is! We'll call it "deserved runs allowed" and we'll reveal all the runs allowed that he secretly transported to the 5th dimension. That will show him!" Maybe, in general, BP's DRA is an excellent predictive stat, better than FIP?? But its methods look to me like they fall apart completely when applied to Buerhle, for whatever reason.

For retrospective analysis of pitchers, I remain doubtful of the appropriateness of using any metrics that don't start from actual runs allowed. There's too much entanglement with contextual factors that skilled pitchers can utilize to fully account for their success by counting up from discrete events.
   241. bjhanke Posted: April 13, 2020 at 11:22 PM (#5939477)
On Dunlap - IIRC, the two years before and the two years after 1884, Dunlap put up 14 Win Shares, with complete consistency. He was a 14-WS-a-year guy, and very consistent about it. In 1884, the Union Association season credit him with 38 (!) win shares. That's an ENORMOUS discount.

The other thing about Dunlap - actually the one that makes me cranky about including 1884, is that there are several, maybe many, players who got sent down to the minors by their team, but then recovered their game. Babe Adams and Rabbit Maranville are two quick examples. The thing is that neither Babe nor Rbbit DECIDED to go to AA ball. They were sent there by the teams that owned their contracts. Dunlap, on the other hand, DID decide to go to the UA. No one made him do that. It's all on him.

And yet, nobody that I know of, no system that I know of, gives any of these other players ANY credit for their minor league work. Well, if you won't give credit to players who were shipped out into the minors against their wills, how can you possibly give credit for someone who, BY HIS OWN WILL AND DECISION, went to a minor league? Dunlap is the LAST person I'd be willing to give that kind of credit too. Just may be me, but think about it - forced into the minors and then coming back right where you used to be vs. going deliberately to a minor league. Are you SURE you've to the right guy to focus on here?
   242. Jaack Posted: April 14, 2020 at 02:38 AM (#5939501)
First, a quick update to the data to add Cliff Lee to the final averages.

Here's Lee's career scores:

FIP-fWAR – 48.2
RA9-fWAR – 46.3
BBRef WAR – 42.5
Kiko pWORL – 40.2
Kiko eWORL – 40.7
BPro WARP – 40.9
Gauge WAR – 44.7
Average – 43.4

I also, added him to the single season rankings (and extended them from 25 to 30 seasons to compensate) and re-crunched the averages. Here are those:

Santana – 116.3
Lee – 94.9
Oswalt – 92.7
Hudson – 73.9
Pettitte – 64.9
Buehrle – 35.3

Thoughts on Lee - his career value is very low, even compared to Oswalt and Santana. The one thing that keeps him viable to me is I give him pretty strong postseason credit for his excellent showing there.

@progrockfan
Santana does have some nice black ink, but some of it is a bit misleading.

For example Santana has two IP titles while Oswalt has none, but Oswalt had two higher IP seasons than either of Santana's titles, he just had competetors in the NL who threw a couple more innings. Also, Oswalt's career rate stats look a whole lot better if you exclude the garbage 90 innings across two years at the end of his career - his 133 ERA+ is negligible different from Santana's 136, and Oswalt still has more career innings.

I'm not saying that Oswalt was definitively better than Santana - I prefer him, but it's certainly debatable. But I don't think the gap between them is huge - certainly not large enough to put Santana first in line to election and leave Oswalt pretty deep into the backlog.


@Chris Cobb
I agree that BPro's WARP is... often idiosyncratic. I weight it pretty minimally, and I've thrown it out before when it spits out ridiculous numbers Luis Tiant and Catfish Hunter had some silly low numbers in a previous incarnation, but both have become reasonible enough to be usable at this point. I like to incorporate it to get another viewpoint, but it's junkiness is occasionally an issue. Discounting it for Buehrle is not a bad idea at all - I don't think he was a 34 WAR pitcher.

But even excluding, his career WAR average is only ~54. That's well behind Andy Pettitte, who also has two seasons (1997 and 2005) that are clearly better than anything Buehrle ever did. Buehrle is basically David Wells. He didn't have a short career, but it's not an Eppa Rixey/Tommy John situation where he pitched forever. He's significantly better than his BPro numbers, but he's certainly behind Pettitte and Hudson, and I have a hard time seeing him over the peakier trio, even from a career standpoint.



As an aside of my pitching metric preference, In general, I like to use the two fangraphs metrics as the error bars on either side of a pitcher. BBRef tries to split the difference with their defensive adjustment, but it ends up falling outside the two fangraphs metrics too often for me to trust. Defensive metrics are iffy enough in evaluating a position player, but trying to translate them into how they affect a pitcher is a bridge too far for me. I do incorporate a good amount of FIP-WAR in my evaluation (see my Mickey Lolich placement), but even if I abandoned FIP-WAR, I'd stick to fangraphs RA9 model over BBRef's.
   243. Jaack Posted: April 14, 2020 at 02:58 AM (#5939502)
Regarding Fred Dunlap's soiree in the UA - the numbers are, in all likelyhood pretty garbage.

Looking at the UA leaderboards, eleven guys ended up at 3 WAR or more. Of those 11, only Dunlap, George Shafer, and Yank Robinson provided real major league teams with better than replacement level value. But Shafer was never more than average-y outside of his UA year, and immediately plummeted to replacement level after a big year in the UA.

Right now, I half Dunlap's UA season. That still makes it the best season of his career by a good margin. But that league is really unimpressive. There's a reason the St. Louis Maroons went 94-19 in the UA and 36-72 the next season in the NL.
   244. Dr. Chaleeko Posted: April 14, 2020 at 05:48 PM (#5939822)
Thought I'd give my system's view on the pitchers we've been talking about as well as two other soon-to-be-candidates (assuming that one of them is finished...he sure looks it).

Here's the system in a nutshell:
-BBREF WAR
-Adjusted for the STDEV of WAA/IP of a player's league
-Postseason innings added as league-average innings
-Adjustment to repWAR that puts all pitchers on the same workload basis (approximates year 2000 usage patterns)
-WPA adjustment for seasons in relief, when WPA is available
-Leverage adjustment for seasons prior to availability of LI (1.08 times est. relief RAA)
-!NEW! Historical adjustment for the number of extremely good or bad hitters in the league
-Batting and other position-player stuff included as appropriate

There's probably something else I forgot, but when this gets all mixed up together, I recalculate the WAA and WAR the BBREF way to get a final result.

This is how each of these guys comes out. The RANK column does not include currently active pitchers. Also, CHEWS+ is my cutely and rip-offedly named general sifting tool. It's a lot like JAWS, but a) it's indexed to 100 and b) it measures a player's 7-year peak and career WAR totals against the median peak and career totals of the top (n * .30 * 2) pitchers where n=number of players in the HOM (though actually, it's calibrated to the number of guys in the Hall of Miller and Eric, but for this informal purpose, it'll do just fine). And there's a little bonus for great rate stats.

You also need to know that I no longer intermingle the pre-1893 pitchers with the post-1892 pitchers. So I give myself a mental reminder that those guys require about 5 slots. Which means that when interpreting the ranking: (n * 0.30) - 5 gives you the number of post-1893 pitchers I deem apt.

NAME        CHEWS+ RANK                     >>>SEASONS ORDERED FROM BEST TO WORST<<<
============================================================================================================================
SABATHIA     110    47    7.4  6.8  6.5  6.4  5.0  4.7  3.8  3.4  3.3  3.2  3.1   3.0   2.9  2.0   2.0  1.1  0.5  0.1  -0.6
SANTANA      108    50    8.4  7.7  7.4  7.1  5.4  4.7  4.2  3.7  2.7  0.8  0.3  -0.1   
PETTITTE     103    58    8.5  6.2  6.0  3.9  3.8  3.7  3.7  3.3  3.3  3.1  2.8   2.6   2.4   2.3   2.2  2.2  1.8  1.2
HUDSON       101    65    7.4  7.1  5.8  5.3  4.6  4.1  4.0  3.8  3.3  3.0  2.9   1.9   1.3   1.2   1.1  0.9  0.8 
BUEHRLE      100    67    6.3  6.1  5.4  5.0  5.0  4.6  3.9  3.8  3.8  3.6  3.3   2.5   2.3   2.2   1.3  0.8
OSWALT        98    68    7.0  6.6  6.2  6.0  6.0  4.8  4.3  3.9  2.8  2.5  1.9  -0.8  -0.1
F HERNANDEZ   94    78    7.2  6.0  5.8  5.1  5.1  4.5  4.3  4.1  3.5  2.7  1.3   1.3   0.8  -0.5  -0.9  
LEE           93    82    9.0  6.9  6.8  6.2  5.8  4.4  2.4  2.1  1.0  0.6  0.5   0.4  -0.9


It's not like there's a lot of space between all these guys despite the gap in their rankings. The way my system is set up, peak and career are given equal weight, and the bonus is there only for really high achievers on a per inning basis. Santana gets a little bit of bonus, the other guys don't. Here Sabathia comes across as a little better version of Andy Petitte: A few All-Star seasons then a whole lot of merely good seasons. Pettitte drops off to merely good two seasons earlier than Sabathia. Hudson and Buhrle are both in this model too in their ways. Hernandez, Oswalt, Lee, and Santana, meanwhile, are all variations on the theme of Sandy Koufax. But only one of them, Santana, actually gets over the in/out line. His front four seasons are really impressive, he has another All-Star year then a near-All-Star year on top of that. Koufax's seasons in my system run this way:
10.1 9.3 8.1 6.4 5.2 3.9 1.9 1.0 0.8 0.7 0.6 -0.5.
Santana has more depth to his career and the same exact 7-year peak value as a result. In my way of looking at things, he's superior to Hernandez, Oswalt, and Lee. However, let's remember that the CHEWS+ rating above is based on a 235-person Hall (the HoME mimics the HOF). In HOM terms, Oswalt is just on the good side of the in/out line. Hernandez and Lee are three to five slots on the wrong side.

Anyway, it's not like I'm infallible or that my system of analysis and ranking is bulletproof, but I wanted to share my perspective on it if only because we truly are splitting hairs among these guys. I concur with Mr. Cobb once again (and it's still wise to do so) that it's very possible for Santana to be in an elect-me position and Oswalt still trying to gain traction. I don't oppose Oswalt, but I'm pretty certain I prefer two other pitchers to him (Santana and Pettitte) and I'm fairly certain I prefer Hudson. Buehrle and Oswalt. Oswalt and Buehrle... who knows. But I don't think we will settle that question in our next election.
   245. Howie Menckel Posted: April 16, 2020 at 09:26 PM (#5940827)
so I stumbled across a journalist who wants to write a story about Ted Trent, a Negro Leagues pitcher from 1927-39 who apparently was Bethune-Cookman's first professional athlete.

guy says that Trent rated a stratomatic card, which I guess means he's not a nobody.

but I know we have some true experts here who might be able to help.

in these times, I think we all enjoying helping more than ever where we can.
   246. Dr. Chaleeko Posted: April 16, 2020 at 10:43 PM (#5940861)
Howie, I’ll be glad to share the YOY MLE for him in this thread tomorrow. For the more biographical stuff, Kevin Johnson (KJOK) or Gary Ashwill are two guys really in the know. Though they haven’t frequented the HOM in years, you May be able to ping them via BTF’s email feature or find them in FB (especially in the Negro Leagues history group there) and maybe on Twitter.
   247. Howie Menckel Posted: April 16, 2020 at 11:01 PM (#5940868)
thanks, good Dr.!

have never been on Facebook (it has been years since anyone told me THAT was a bad idea, lol), and not familiar with the BBTF email feature.

for any and all interested...

that's the sportsjournalists.com page where I noticed it.
site is not paywalled or anything.

can't recommend a general perusal there - it would be a little like reading text messages of Titanic passengers, if they had iPhones back then.

but this is one of those "and the band played on" threads.

and at least the threat is not so much a fatal iceberg collsion as a terrible upheaval of many journalists losing their jobs. which is awful. too. but all things are relative.
   248. Dr. Chaleeko Posted: April 17, 2020 at 11:36 AM (#5941047)
Here's Ted Trent's MLEs.

TED TRENT 1927-1939
TRANSLATED TO NATIONAL LEAGUE 1927-1939

           PITCHING                                 | BATTING   |  TOTAL
YEAR  AGE   IP    RA   RA9  lgRA9  RAA  pWAA  pWAR  | PA  bWAR  |  WAR
========================================================================
1927   23  270    90  3.00   4.58   47   5.2   7.9     90  -0.7     7.2 
1928   24  270   131  4.37   4.70   10   1.0   3.8     90  -0.6     3.2
1929   25  200    82  3.67   5.36   38   3.6   5.7     67  -0.3     5.3
1930   26  230   139  5.43   5.68    6   0.6   3.1     77  -0.4     2.6
1931   27  260   132  4.57   4.48   -3  -0.3   2.4     87  -0.6     1.8
1932   28  270   140  4.68   4.60   -3  -0.3   2.6     90  -0.6     2.0
1933   29  240   125  4.67   3.97  -19  -2.0   0.4     80  -0.7    -0.3 
1934   30  270   128  4.27   4.68   13   1.3   4.1     90  -0.5     3.6
1935   31  270   115  3.84   4.71   26   2.7   5.5     90  -0.7     4.8
1936   32  230   119  4.66   4.71    1   0.1   2.5     77  -0.4     2.1
1937   33  220   117  4.81   4.51   -7  -0.7   1.5     73  -0.4     1.1
1938   34  190    98  4.65   4.42   -5  -0.5   1.5     63  -0.3     1.1
1939   35   40    40  4.16   4.44    1   0.1   0.5     13  -0.1     0.5
------------------------------------------------------------------------
TOTAL     2960  1435  4.36         106  10.8  41.5    987  -6.7    35.0
 
LEGEND
IP = Innings pitched
RA = Runs allowed
RA9 = Runs allowed per nine innings (just like ERA but with all runs included)
lgRA9 = National League's Runs allowed per nine innings
RAA = Runs allowed above an average National League pitcher
pWAA = Wins Above Average as a pitcher
pWAR = Wins Above Replacement as a pitcher
PA = Plate appearances
bWAR = Wins Above Replacement as a batter
WAR = pWAR + bWAR, his total contribution




   249. Dr. Chaleeko Posted: April 18, 2020 at 12:28 AM (#5941397)
Rankings dump for key returnees, presented sans comment.

CATCHERS
NAME        CHEWS+ RANK                     >>>SEASONS ORDERED FROM BEST TO WORST<<<
=========================================================================================================================================
CAMPANELLA*  125    11     9.6  8.9  7.4  6.6  6.4  5.4  3.3  2.2  1.5  0.8
TORRE        123    12     7.8  6.1  6.0  5.6  5.5  5.4  5.1  3.7  3.1  3.1  2.3  2.0  1.8  1.5  0.4   0.0  -0.3
MAUER        120           8.5  7.2  6.8  6.3  4.9  4.4  4.0  3.8  2.9  1.9  1.6  1.5  1.4  1.3  1.0
BENNETT      120    13     7.2  6.7  6.5  6.4  5.9  5.2  5.1  3.1  2.8  2.5  1.2  1.4  1.0  0.9  0.4
SCHANG       118    14     6.7  5.4  5.4  5.4  4.7  4.5  4.3  4.2  3.7  3.6  3.4  3.3  3.2  3.0  2.4   1.1   0.2  -0.2  -0.6  
D WHITE      116    15     5.9  5.3  5.1  4.9  4.9  4.8  4.0  4.0  3.8  3.5  2.7  2.6  2.4  2.2  1.9   1.8   1.7   1.5   0.8   0.7
BRESNAHAN    111    16     7.9  6.7  5.7  4.9  4.6  4.4  4.0  3.8  3.2  2.9  1.9  1.8  1.7  1.1  0.6  -0.1  -0.3
T SIMMONS    111    17     6.1  5.6  5.6  5.6  5.2  4.9  4.5  4.2  3.8  3.6  3.5  2.9  1.0  0.8  0.3   0.3   0.2   0.0   0.0  -0.2  -2.2   
MUNSON       111    18     7.1  5.9  5.8  5.6  5.5  4.9  4.5  4.1  3.4  2.4  0.4
*Not including MLE seasons

FIRST BASE
NAME        CHEWS+ RANK                     >>>SEASONS ORDERED FROM BEST TO WORST<<<
===============================================================================================================================================
OLERUD       103    20     8.5  7.6  6.3  5.2  5.2  5.1  3.7  3.6  3.5  2.8  2.0  2.0  1.8   1.6   1.0   0.4   0.0  
BECKLEY      102    21     5.8  5.7  5.2  5.0  4.7  4.6  4.6  4.5  4.3  4.0  3.8  3.6  3.2   3.0   2.6   2.2   1.9   0.9  0.0  -0.8
IN/OUT LINE
W CLARK      102    22     8.8  7.4  5.9  5.0  4.9  4.4  4.3  4.3  3.3  2.7  2.5  2.2  2.2   1.6   1.5    
KILLEBREW    100    23     6.6  6.5  6.2  5.4  5.2  4.7  4.0  3.9  3.9  3.5  3.5  3.1  2.7   2.0   0.6   0.2   0.1   0.0  0.0  -0.1  -0.2  -0.2  
BERKMAN       97    24     7.3  6.4  6.4  5.7  5.1  4.7  4.6  4.0  3.8  3.6  2.2  2.0  0.5   0.2  -0.2 
CHANCE        95    25     8.1  6.5  6.2  6.0  5.2  4.4  3.3  2.6  2.2  2.1  1.5  1.0  0.9   0.4   0.1   0.1   0.0  
TENNEY        95    26     6.3  5.9  5.5  5.3  5.0  4.8  4.8  4.0  3.4  2.5  2.0  2.0  2.0   1.9   1.2   0.8   0.1 
HODGES        93    27     7.7  6.2  6.1  5.9  5.7  5.0  4.3  3.8  3.0  2.6  1.5  0.5  0.0   0.0  -0.2  -0.5  -0.6
CAMILLI       93    28     7.4  6.5  6.4  6.2  5.9  5.5  5.3  2.1  0.8  0.6  0.1  0.0
GIAMBI        93    29     9.4  7.5  6.8  6.2  4.6  4.0  3.0  2.4  1.9  1.8  1.2  1.0  0.9  -0.1  -0.3  -0.4  -0.6  -0.6


SECOND BASE
NAME        CHEWS+ RANK                     >>>SEASONS ORDERED FROM BEST TO WORST<<<
===============================================================================================================================================
CHILDS       109    16     8.9  7.3  6.5  6.4  5.9  5.6  5.1  3.7  3.5  2.0  1.8  1.0  -0.1
HERMAN       107    17     7.8  7.8  7.1  5.3  4.4  4.3  4.0  3.7  3.5  3.5  3.0  2.4   0.7   0.7  -0.2
UTLEY        106           8.3  7.7  6.7  6.0  5.5  5.1  4.2  3.8  3.7  2.4  1.6  1.4   0.5   0.4   0.2  -0.2
RANDOLPH     103    18     6.6  5.7  5.2  5.1  4.6  4.4  4.4  3.9  3.8  3.7  3.6  3.6   3.0   2.8   2.4   1.8  0.7  -0.1
MCPHEE       103    19     6.2  5.6  5.5  5.1  5.1  4.3  4.2  4.1  4.1  3.7  3.6  3.1   2.1   2.1   1.8   1.7  1.5   1.0
KENT         103    20     7.4  7.4  5.3  5.2  4.7  4.3  4.1  4.1  3.6  3.1  2.8  2.5   2.4   2.1   1.2   0.8  0.4   
PHILLIPS     102    21     7.0  6.4  5.6  5.3  5.1  4.6  4.5  3.7  3.2  2.9  2.8  2.5   2.2   1.8   1.5   0.7  0.6   0.2  
IN/OUT LINE
DOERR*        99    22     6.9  6.7  5.5  5.1  5.1  5.0  4.5  3.8  3.6  2.8  2.6  2.6   2.1  -0.4
BARNES        96    23     8.0  7.6  7.5  7.4  4.8  4.3  2.5  2.2  0.6
FREY          94    24     7.1  7.0  5.8  5.5  3.8  3.7  3.3  2.7  2.2  1.4  0.9  0.3   0.2
H RICHARDSON  93    25     7.5  6.0  5.0  4.9  4.7  3.9  3.6  3.6  2.8  2.8  1.7  1.3   0.5
DUNLAP        92    26     6.7  6.4  5.9  5.7  5.6  5.0  5.0  2.6  2.2  1.3  0.1  0.0  
*No war credit

THIRD BASE
NAME        CHEWS+ RANK                     >>>SEASONS ORDERED FROM BEST TO WORST<<<
===============================================================================================================================================
ROLEN        122     8     8.2  6.7  6.2  6.1  5.8  5.8  4.9  4.8  4.7  4.2  4.0  2.9   1.7   1.3   1.2   1.0   0.1
ALLEN        121     9     9.3  9.1  7.6  6.9  5.9  5.5  4.1  4.0  3.5  2.8  2.6  1.2   0.2   0.0  -0.3
BELTRE       121           8.7  6.7  5.6  4.9  4.8  4.6  4.6  4.2  4.1  3.7  3.5  3.4   3.1   3.0   3.0   2.6   2.5   2.1   1.8   1.2   0.5 
B ROBINSON   119    10     8.1  6.7  6.7  5.7  5.5  5.0  4.7  4.4  4.0  3.8  3.5  3.4   3.3   3.1   2.7   1.7   1.0   0.4   0.0  -0.2  -0.2  -0.3  -0.5  
J COLLINS    118    11     8.6  7.2  7.0  6.2  6.2  5.4  4.7  4.4  4.2  3.1  2.9  2.3   1.0  -0.1
MOLITOR      118    12     6.3  5.9  5.9  5.8  5.4  5.3  5.1  4.8  4.6  4.4  4.0  3.4   3.4   3.0   2.6   2.0   1.4   1.4   1.3   0.1  -0.2
K BOYER      117    13     8.4  7.7  7.4  6.6  6.2  5.4  5.2  5.0  3.8  2.3  2.3  2.1   1.6   1.0  -0.1
BELL         116    14     8.0  7.5  6.2  6.2  5.5  5.0  5.0  4.7  3.8  3.8  3.2  3.1   2.9   2.1   1.5   1.1  -0.3  -0.7  
NETTLES      115    15     7.2  6.9  6.7  5.6  5.5  5.5  5.1  5.1  4.5  3.6  3.1  2.4   2.2   2.1   1.7   1.0   0.5   0.5   0.2   0.0  -0.2  -0.7 
E MARTINEZ   115    16     8.3  6.7  6.5  6.1  5.5  5.3  5.3  4.9  4.8  4.5  3.2  3.2   2.5   0.3   0.3   0.0  -0.2  -0.5
DA EVANS     112    17     9.0  7.0  5.0  4.8  4.8  4.7  4.7  4.3  4.2  4.1  3.7  3.6   3.1   2.2   1.8   1.3   1.0   0.3   0.2  -0.2  -0.2
LEACH        110    18     7.3  7.1  6.3  5.8  5.2  4.9  4.7  4.4  4.1  3.6  3.0  2.3   2.1   1.7   1.1   1.1   0.3   0.1  -0.1
D WRIGHT     105           7.8  6.7  6.1  6.0  6.0  5.4  4.3  3.5  2.6  2.3  2.0  1.1   0.4   0.0
MCGRAW       101    19     8.2  7.8  5.8  5.8  5.4  5.4  3.8  3.7  1.7  0.8  0.8  0.2   0.1   0.0   0.0
BANDO        101    20     8.1  6.5  5.9  5.8  5.3  4.6  4.5  4.2  3.9  2.9  2.7  0.7   0.6   0.3   0.0  -0.9  
CEY           98    21     6.8  6.5  5.5  5.5  4.8  4.8  4.7  4.1  3.4  2.7  2.1  2.1   1.0   0.7   0.3   0.1   0.0
IN/OUT LINE
GROH          97    22     6.9  6.1  6.0  5.8  5.6  4.8  3.7  3.7  3.6  3.4  2.0  1.7   0.3   0.1   0.0  -0.1  -0.1  
WILLIAMSON    95    23     6.9  6.7  6.3  5.5  5.2  5.0  4.1  3.4  2.9  2.8  2.1  0.2  -1.2  
ELLIOTT       93    24     6.8  6.7  4.4  4.4  4.6  4.6  4.0  3.6  2.9  2.9  2.9  2.7   1.3   1.2  -0.1
HACK          92    25     6.4  5.2  5.2  4.9  4.4  4.4  3.4  3.4  3.4  3.0  2.8  2.3   2.2   1.2   1.2   0.7
MA WILLIAMS   91    26     7.2  5.7  5.6  5.2  4.6  4.4  3.8  3.7  2.8  2.3  1.6  1.3   1.0   0.8   0.4   0.2  -0.1  

CENTERFIELD
NAME        CHEWS+ RANK                     >>>SEASONS ORDERED FROM BEST TO WORST<<<
===============================================================================================================================================
BELTRAN      117          8.5  6.9  6.6  6.5  5.9  5.5  5.1  4.3  3.8  3.7  3.3  3.1  2.5   2.0   1.2   0.9   0.5   0.2   0.0  -0.8
A JONES      116    10    8.8  8.0  7.5  7.1  6.1  5.4  5.1  5.0  3.4  3.4  2.4  1.5  0.8   0.5   0.3   0.1  -0.9 
SNIDER       111    11    8.6  8.6  7.8  6.8  5.2  4.7  4.2  4.0  3.2  2.2  2.0  1.2  1.3   1.2   1.0   0.1  -0.4  -0.5 
HINES        107    12    7.1  6.3  5.7  5.5  5.1  4.8  4.6  4.6  4.1  4.1  3.4  3.2  3.0   2.9   2.1   1.2   0.5   0.5  -0.6  -1.3 
LOFTON       107    13    8.6  6.7  5.4  5.3  4.9  4.5  4.2  4.2  4.0  3.8  3.5  3.3  2.3   2.1   1.5   1.3  -0.1  
J WYNN       107    14    7.6  7.5  7.2  6.4  5.8  5.4  5.0  4.3  3.4  3.2  2.4  1.0  0.5  -0.8  -0.8  
W DAVIS      102    15    7.6  5.7  5.6  5.4  4.7  4.6  4.4  3.4  3.3  3.2  2.8  2.6  2.4   2.2   2.2   1.9   1.0  -0.2 
CAREY        100    16    5.9  5.8  5.7  5.5  5.4  4.6  4.3  4.0  3.7  3.3  3.3  2.5  2.4   2.2   1.9   1.5   0.5   0.3   0.1  -1.3
M GRIFFIN     99    17    7.9  6.8  5.6  5.0  5.0  4.7  4.7  4.3  3.6  3.2  3.1  3.1
BE WILLIAMS   99    18    6.6  6.6  5.7  5.5  5.1  5.0  4.9  4.9  4.5  3.3  2.0  1.4  1.0   0.9  -0.1  -0.6
O’ROURKE      98    19    5.7  4.8  4.7  4.4  4.3  4.0  3.7  3.6  3.5  3.2  3.1  3.0  3.0   2.7   2.5   2.3   2.0   1.9   1.8   1.7  0.8  0.6
GORE          99    20    6.6  6.4  6.3  5.6  5.5  5.2  4.2  3.6  3.5  2.8  2.5  1.8  1.7   0.2  
CEDENO        97    21    8.1  7.0  5.9  5.4  5.1  4.9  4.6  2.2  2.0  2.0  1.8  1.7  1.0   0.8   0.8   0.2  -0.2 
IN/OUT LINE
BERGER        97    22    7.4  7.0  6.3  6.2  5.2  4.7  4.7  2.9  2.7  1.0  0.3 
BROWNING      97    23    6.6  6.6  6.5  6.0  4.6  4.6  4.1  3.5  3.3  2.9  2.8  0.3  0.0
BUTLER        97    24    7.3  5.9  5.6  5.5  4.8  4.5  4.3  3.7  3.4  3.4  3.1  3.0  3.0   0.8   0.2   0.1  -0.9 
C LEMON       96    25    6.4  6.0  5.8  5.5  5.1  4.6  4.5  4.3  3.9  2.9  2.4  1.7  1.5   1.3   0.0   0.0

RIGHT FIELD
NAME        CHEWS+ RANK                     >>>SEASONS ORDERED FROM BEST TO WORST<<<
===============================================================================================================================================
GWYNN        112    13     8.6  7.2  6.8  6.0  5.0  4.8  4.2  4.1  3.8  3.8  3.6  3.4  2.8   1.8   1.7   1.4   1.1   0.7   0.6   0.2  
SOSA         109    14    10.0  6.6  6.4  6.2  5.7  5.3  5.1  4.8  3.6  3.1  2.1  1.9  1.5   0.4   0.4   0.2  -0.3  -0.4 
SHEFFIELD    107    15     7.0  6.9  6.7  5.4  4.9  4.7  4.5  4.3  3.8  3.5  3.3  3.0  2.9   2.8   2.6   2.3   0.8   0.5  -0.1  -0.2  -0.3  -0.5      
DAWSON       105    16     9.3  7.3  6.4  6.4  4.7  4.1  3.8  3.6  3.2  3.1  2.7  2.7  2.6   2.3   2.2   1.4   0.0  -0.1  -0.2  -0.4  -1.4  
KEELER       104    17     8.6  6.4  6.0  5.4  5.1  5.1  4.4  4.3  4.2  4.1  3.4  3.2  2.4   0.9   0.7   0.4   0.3   0.1  -1.5
SLAUGHTER*   102    18     8.6  6.4  6.0  5.4  5.1  5.1  4.4  4.3  4.2  4.1  3.4  3.2  2.4   0.9   0.7   0.4   0.3   0.1  -1.5
DW EVANS     101    19     9.2  6.6  5.2  4.2  4.1  4.0  3.9  3.6  3.6  3.5  3.3  3.2  2.9   2.8   1.8   1.3   1.1   1.0   0.9   0.5
WINFIELD     101    20     8.0  5.6  5.3  5.3  4.9  4.6  4.3  3.8  3.4  3.1  3.0  3.0  2.9   2.5   2.3   1.4   0.9   0.7   0.4   0.4   0.2  -1.2
BO BONDS     101    21     7.5  6.6  6.5  5.6  5.2  5.0  4.9  4.8  4.6  3.5  2.6  2.6  0.1  -0.3
SUZUKI       100           8.2  7.9  6.2  5.5  5.2  5.0  4.7  4.4  3.5  3.3  1.5  1.0  0.8   0.8   0.5  -0.1  -0.1  -0.4  -1.4
IN/OUT LINE
V GUERRERO   100    22     7.1  6.5  6.4  5.9  5.2  4.7  4.7  4.4  4.3  3.6  2.5  1.9  1.7   0.7   0.2  -0.1
R SMITH       99    23     6.7  5.8  5.4  5.3  5.1  5.0  4.7  4.3  4.1  4.1  3.1  3.0  2.6   2.6   1.6  -0.1  -0.4
HOOPER        98    24     6.8  5.7  4.6  4.6  4.4  4.4  4.3  4.3  4.2  4.2  4.0  3.9  3.8   2.8   1.6   1.5   0.8
ABREU         97    25     6.4  6.3  5.7  5.2  5.1  5.0  4.6  4.3  4.1  3.2  3.0  2.8  2.2   1.7   0.3  -0.2  -0.5  -0.3
S RICE        97    26     5.7  5.4  5.3  5.2  5.2  4.9  4.5  4.3  3.6  3.6  3.4  3.2  2.5   2.3   1.3   1.2   0.6   0.3   0.1  -0.1
K KELLY       97    27     9.4  6.1  6.0  5.3  5.0  3.9  3.7  3.4  3.3  2.8  2.7  2.6  1.8   1.5  -0.2  -0.3
NICHOLSON     93    28     7.6  7.6  6.9  5.3  5.3  4.4  4.2  2.8  1.9  1.5  1.3  1.1  1.1   0.2   0.0  -0.4
*No war credit     

   250. kcgard2 Posted: April 19, 2020 at 01:17 PM (#5941876)
epoc (#217)

Sorry I'm late to the party here but I think there's a pretty glaring issue with your system, at least in my opinion. Gooden makes a good example.

5. Dwight Gooden - 5.85 (1984-85)
6. George Foster - 5.75 (1975-81)

Gooden has a z-score almost 6, because his best consecutive (again with consecutive, but let that go for now) stretch happened to only be a two-year stretch. Foster has a z-score almost 6 also, but over a stretch of 7 years. And I can't tell if you are adjusting for this in any way or not. The following description is difficult to understand.
I then apply a formula that estimates the z-score for each of those n-year stretches. The formula is basically a best-fit line that accounts for the empirical historical "difficulty" of averaging x SDs over n seasons. Whichever stretch of consecutive seasons yields the highest estimated z-score (whether a single season or twenty) is the stretch I use.

First, I'm confused why anything is being "estimated" at this point. You have the player's in-season z-scores for each of the n years. That second sentence makes me wonder whether you are making an adjustment here such that two players with equal mean z-scores, over two different lengths of time, get adjusted such that the player whose value of n is greater will come out with a higher adjusted score. This is what I would argue you should do, at minimum. If that's not what you are doing, I'd appreciate if you could explain, perhaps with an example.

At bottom, I'm trying to get at the idea that we should not be inducting *any* players based on what their performance may have been in a sample of two seasons. I guess I am confused where the rest of the career is going in your analysis, because it seems like there should be a lot of players with a single year aberration that go straight to the top of your rankings. It must be happening in that "difficulty of averaging" adjustment.

Also, in the case of Gooden, your uberstat is placing him based entirely on what he did in those two seasons, yes? It's placing everyone based on what they did only in the stretch of best consecutive n seasons according to the uberstat, according to my understanding.

   251. Dr. Chaleeko Posted: April 19, 2020 at 05:46 PM (#5941992)
KCGard brings up an interesting point that I haven't been able to articulate about Gooden yet, so I haven't really discussed it. Basically, I don't believe that any other candidate's support depends as much upon one single season as Gooden's.

Yes, Gooden actually completed that season with all that WAR, but at the same time, in the remaining 2524+ innings of his career, he was 14 WAA and 36 WAR. That's not a bad pitcher at all. But it's not a HOM pitcher at all. Here's all pitchers in MLB history through age 35 (Gooden's last season) with 12-16 WAA in 2250-2750 innings:
Bruce Hurst
Ice Box Chamberlain
Larry Dierker
Burt Hooton
Slim Sallee
Deacon Phillipe
Bob Rush
Kid Gleason
Bobo Newsom.

None of them is anything like a HOMer, even if you added Gooden's 9.8 WAA from 1985 to their totals. Gooden's 24 WAA is simply not an impressive total for HOM pitcher. The only HOM pitcher with 24 WAA through age 35 in 2500-3100 innings is Wes Ferrell, and he needs his tremendous batting performance to get over the line. So Gooden's singular 1985 plays tricks on folks like me because it makes it look like Gooden was a better pitcher overall than he actually was. Gooden did not demonstrate repeatability of his 1985 skill set (for various reasons), and he was never before or afterward remotely similar to his 1985 self. An interesting contrast can be drawn to Red Faber whose peak is highly dependent on on massive season, but he also has another big year and another All-Star year or two that support the notion that his ranking is not the product of a fluke year.

On the batting side of the ledger, it's very hard to come up with a player like Gooden, but Rico Petrocelli is probably the closest. A ten-win 1969 (expansion-aided, of course), and no other season above 5.0 WAR in a medium-length career (6200 PA). Al Rosen's 10+ WAR season at least has a couple 6.0+ WAR seasons around it, and so does Sammy Sosa's. Actually, Bryce Harper might end up being the batting version of Doc Gooden if he never gets above 5 WAR again.

Gooden's case is one of the few where just looking at his value/uber figures confuses things more than clarifies. I believe it has to be examined a little more individually than virtually any other candidacy.
   252. Howie Menckel Posted: April 19, 2020 at 06:50 PM (#5942023)
I have no statistical insight to add on Gooden, but I went to Shea Stadium for almost all of his home starts in 1985 (and almost no other games - in between his starts might as well have been exhibition games).

it's been noted, but Gooden already was 20-4 with a 1.81 ERA through August.

in September, he went into turbo drive. 6 starts, all complete games.

4-0 - and in the two no decisions, he pitched 9 scoreless in each before departing.

Doc went 13-2 with 6 complete game shutouts in his 18 home starts that year, with a 1.50 ERA and an 0.889 WHIP.

I guess I just want to give my young self a pat on the back. I didn't have much money, but I knew what I was seeing - and I just couldn't get enough.

who says youth is always wasted on the young?
:)



   253. Jaack Posted: April 19, 2020 at 07:43 PM (#5942039)
There is a pretty big dispute between the metrics as to how effective Gooden actually was from 1986-1993. FIP and Kiko's pWORL think he remained a very good pitcher, with perhaps another standout season in 1990 - not 1985 standout, the type of guy who finishes 3rd or 4th in Cy Young voting. The metrics highly derivative of RA9 (fangraphs RA9 WAR and bbRef WAR) see him as a slightly above average starter. Kiko's eWORL and gWAR see him as a solid number 2-ish starter.

What strikes me is that pWORL and fWAR are not two metrics that tend to like the same controversial guys. The both approach the problem of pitching value from different directions, so getting to the same result, despite Gooden's lack of success in terms of RA. The only other pitcher I can think of that sees this result is... Tommy John.

So I guess the question I have to ask is - what do Tommy John and Dwight Gooden have in common? I'd think nothing, but there seems to be some quality about both that the metrics not based on RA seem to like.
   254. Dr. Chaleeko Posted: April 19, 2020 at 08:43 PM (#5942054)
Jaack, isn’t it obvious? We have finally discovered Kiko’s East-coast bias!
   255. Chris Cobb Posted: April 20, 2020 at 12:30 AM (#5942109)
So I guess the question I have to ask is - what do Tommy John and Dwight Gooden have in common? I'd think nothing, but there seems to be some quality about both that the metrics not based on RA seem to like.

Well, they don't have much in common, but one thing that they do have in common, at least as far as Gooden's 1988 and 1990 seasons are concerned, which are the two seasons were RA/9-based measures and FIP-based measure disagree most significantly, is that they gave up a lot of hits on balls in play.

By Fangraphs' calculations, Tommy John is 10.7 wins below average on balls in play for his career, which accounts for the entirety of the difference between his RA/9-WAR and his FIP WAR. He is below average on BIP-wins for 17 out of 26 seasons of his long career, sometimes by quite a bit, although he has only one season where he is more than 2 wins below average, -2.5 in 1988. This is a pretty well known feature of John's pitching profile, and it is not an uncommon one for groundball pitchers.

Power pitchers like Gooden are frequently above average on balls in play. Gooden himself is not, although he is much less below average than Tommy John. Although Gooden, like John, has a large gap between his FIP-WAR and his RA/9-WAR, BIP wins below average account for a minority of the gap, -2.6 wins. He is also 3.8 wins below average on "LOB-Wins" which is the grab-bag of unanalyzed factors that Fangraphs places under the heading of "sequencing" -- the factors that lead to runners scoring at higher rates than average once they reach base. It is not uncommon for power pitchers to be below average here, while finesse pitchers are more often above average. Nolan Ryan is, of course, the poster child for the phenomenon. Gooden is a member of what is probably the smallest subset of elite pitchers: pitchers who are highly successful overall despite being below average on balls in play and on preventing runners from scoring once they reach base. He is not far below average in either one.

The question for evaluating Gooden's case is, of course, are these below average results his responsibility? Or, more precisely, how much of these results are his responsibility? The fact that, as Baseball-Reference sees it, he pitched in front of below average defenses for his career is probably a contributing factor, but is it the whole story?

I don't have an answer to that question, but one very intriguing feature of Gooden's profile in this respect is that his BIP-wins are highly inconsistent. For most of his career, he is around average, and he is in fact well above average in 1985 and 1986. From 1988-91, however, he is far below average, especially in 1988 and 1990, the two seasons of disputed quality. In these seasons, he registers -2.1 and -2.7 BIP WAR respectively. Where a finesse pitcher like John typically recoups some of his negative BIP WAR with positive LOB WAR (from double plays, outs on the base paths, pickoffs, etc.), Gooden is below average on LOB-wins, just as he is, in modest ways, for most of his career. Overall, in 1990, he loses 4.1 wins between his FIP-WAR (he led the league in FIP era) and RA/9-WAR. That's a very big swing; I think shifts of that magnitude are probably found in less than 1% of pitcher seasons (at least among elite pitchers). So what happened?

I'd love to see someone with more detailed knowledge of the 1990 Mets address this question, but it seems like there were several factors at work, not all of which were Gooden's fault.

(1) He pitched in front of a bad defense: his fielders were, in the aggregate, 36 fielding runs below average
(2) The badness of the defense was concentrated in the infield, especially at second base and catcher. The infield, including catcher, was -52 fielding runs, including -22 at second base and -17 at catcher. Gooden was more of a groundball pitcher than a fly ball pitcher that year, and as a right-hander, terrible 2nd base defense and terrible catcher defense (including in failing to throw out base-stealers--more on that in a minute) probably hurt Gooden disproportionately. Baseball-reference's system adjusts for the poor quality of the defense, but an adjustment based on team defensive average probably won't adjust sufficiently for the specific defensive disadvantages under which Gooden labored. (This is a tiny version of Any Pettitte's "Derek Jeter stole my HoM plaque" case.)
(3) Although Gooden gave up a lot of singles and ground balls, he netted very few double plays: only 12. In his great 1985 season, when he allowed far fewer baserunners, he had 21 double plays turned behind him.
(4) He was absolutely killed on stolen bases. 60 bases were stolen on him, against only 16 caught stealing. Over the first half of the season, when I think the Mets were going through catchers trying to find someone remotely solid for the position, Gooden gave up 35 steals against 2 caught stealing. That's a lot of lost opportunities for double plays, and a lot of runners who only need a base hit to score them. Even Gooden's bad catchers had better caught stealing percentages overall than occurred in Gooden's starts, so some of the problem is probably on him. In any case, the volume of stolen bases and the high rate of successful steals must have been a significant factor in his weak performance on preventing runners from scoring once they reached base.
(5) Gooden gave up a rather high percentage of line drives, 21%, compared to his peak numbers, which were ridiculous: a 3% line drive rate in 1985; 8% in 1986. 21% is not a terrible rate, but it's not a rate that's going to help a pitcher in 1990 achieve a below-average BABIP.

Having dug into the specifics of Gooden's 1990 a bit, I'm clear as to the reasons why RA/9 and FIP disagree so strongly on this season, and I'm inclined, in this case, to believe that RA/9-based measures are underrating Gooden's performance somewhat. It's also clear that Gooden was no better than average at hit prevention, was below average at controlling the running game, and overall didn't have a good plan for compensating for the defensive shortcomings behind him: one might guess he was still learning how to pitch in a context in which he no longer had the same ability he used to to blow batters away. I certainly don't see an argument for accepting FIP-WAR as an accurate measure of his value, either.

It's an interesting case: one doesn't always find single seasons that are extreme enough for the contributing factors to be visible to eye-ball level analysis.

   256. Dr. Chaleeko Posted: April 20, 2020 at 09:06 AM (#5942147)
Chris, I watched a lot Gooden starts on WOR back in the day, and I recall that his pickoff move was not very good. Kind of long in the limbs and deliberate. Between that, his high leg kick, and a 12 to six curve, he was not good at containing the running game. This may have been The Year of Mackey Sasser, tho I can’t quite remember, but that might provide additional explanation. If Howie is around, I’d ask him to confirm my observations.
   257. Chris Cobb Posted: April 20, 2020 at 10:35 AM (#5942182)
<i>This may have been The Year of Mackey Sasser<>. Yes, yes it was. I don't know what you recall about that year, but by BBRef's fielding analysis, Sasser was far from the worst of the many catchers the Mets burned through that season. Barry Lyons, Orlando Mercado, and a raw Todd Hundley were a combined -13 fielding runs in only 696.7 innings, which prorates out to about -30 runs for 162 games. All of them had caught stealing rates below 20% (16%, 13%, and 18%). Sasser carried a league average caught stealing rate, although he doesn't appear to have been especially skilled at catching the ball. BBRef has him at -3 fielding runs in 583.3 innings caught, which would be -7.5 runs over 162 games, so, although he was a poor fielder, he was nevertheless a huge improvement of the Lyons/Mercado/Hundley trio, who were epically bad.

It would be interesting to line up Gooden's 1990 season of very poor fielding support, in which his 6.8 FIP-WAR plummeted to 2.7 RA/9 WAR and 2.5 BWAR, against, say, Mark Buerle's 2007 season, where he parlayed 3.3 FIP-WAR into 5.2 RA/9-WAR and 6.1 BWAR while pitching in front of a terrible defense.
   258. Jaack Posted: April 20, 2020 at 07:21 PM (#5942433)
It would be interesting to line up Gooden's 1990 season of very poor fielding support, in which his 6.8 FIP-WAR plummeted to 2.7 RA/9 WAR and 2.5 BWAR, against, say, Mark Buerle's 2007 season, where he parlayed 3.3 FIP-WAR into 5.2 RA/9-WAR and 6.1 BWAR while pitching in front of a terrible defense.


I've tried to shy away from BBRef because of results like this. I think their defensive adjustment adds more noise than value - their defensive numbers are shaky enough for evaluating defenders themselves, and trying to use them to account defense's effect on pitchers is a little too iffy for me. Their numbers end up outside the RA9-FIP spectrum a little too often. I can buy that, in some cases, a pitcher may be worse than their peripherals and also get lucky with a good defense and sequencing luck, resulting in a pitchers' true value being lower than either their RA9-WAR or FIP-WAR would say. But that happens far too often with baseball-reference WAR, and the opposite (a pitcher being better than either RA9 or FIP) is much, much rarer.

I much prefer using an average of the two fangraphs models as a starting point than BBRef's model. Neither fangraphs number may be truly accruate, but both are entirely honest.
----

For Gooden, it seems he had a bit of a positive feedback loop - his faults compounded with the teams faults, which produced middling results, despite Gooden being very good at most aspects of his job. If we give him back some credit for the Mets' defensive failing, he checks in at about the 4-4.5 WAR level for 1990 (and 1988). That gives him one juggernaut season (85), one top line ace seasons (84), four and a half nice all-starish seasons (86-90, which the half season in 89), and then another three seasons (91-93) of above average padding. That's a wonky career arc, but there's enough there to demonstrate that 1985 wasn't a complete fluke. Norm Cash has a similar-ish career, although not as strong and his big year came in an expansion year.

I don't think Gooden is out of place on the borderline.
-----

In the case of 2007 Buehrle - most of the White Sox poor defense seems to be Jermaine Dye being just awful and the rest of the defense being mostly fine. For the rotation, Buehrle, along with Javier Vazquez and Jon Garland, actually have positive BIP values. Jose Contreras, along with the bullpen, seem to have suffered the most in terms of actual results. I have to imagine having Jermaine Dye with a warlus stapled to his shoelaces in right field hurt Buehrle, but it couldn't have been that bad - Buehrle's results were still pretty good and his BABIP was slightly lower than his career average. His peripherals were almost exactly his career averages that season as well. 2007 might be the most typical Mark Buehrle season in his career. He did what he normally did and his results were what they normally were. I'm not sure how that translates into the career season that BBRef has, but I'm not inclined to believe that 6.1 number.
   259. Chris Cobb Posted: April 20, 2020 at 11:08 PM (#5942532)
Jaack wrote re pitching WAR:
I've tried to shy away from BBRef because of results like this. I think their defensive adjustment adds more noise than value - their defensive numbers are shaky enough for evaluating defenders themselves, and trying to use them to account defense's effect on pitchers is a little too iffy for me. Their numbers end up outside the RA9-FIP spectrum a little too often. I can buy that, in some cases, a pitcher may be worse than their peripherals and also get lucky with a good defense and sequencing luck, resulting in a pitchers' true value being lower than either their RA9-WAR or FIP-WAR would say. But that happens far too often with baseball-reference WAR, and the opposite (a pitcher being better than either RA9 or FIP) is much, much rarer.

There's no reason to view FIP-WAR and RA9-WAR as setting the boundaries of a pitcher's value within which a pitcher's true value ought to fall. They are measuring different things. A pitcher's true value can either be above or below their RA9-WAR, depending on whether their fielding support was above or below average. A pitcher's true value can be above or below their FIP-WAR, depending on whether their own contributions to the success of the team's fielding performance when they are on the mound is above or below average. In fact FIP-WAR is the value that should be considered the unreliable outlier when looking at RA/9-WAR, BBREF-WAR, and FIP-WAR. It is unreliable because it leaves completely unmeasured a substantial portion of the pitcher's actual value. It also makes the unwarranted assumption that the elements of pitching whose value it is measuring are actually fielding independent.

The cases of the Gooden and Buerhle seasons we are looking at are instructive with respect to the issue of the relationship between supposedly independent pitching factors and baserunner advancement. To produce the large number of strikeouts and (hopefully) weakly struck balls in play, Dwight Gooden used a big, lengthy delivery to generate power pitches. He achieved good results with it, but that same delivery, coupled with him being right-handed, made him highly vulnerable to stolen bases, especially when he was working with catchers who were below average at throwing out base-stealers. That cost Gooden 60 bases in 1990. Those stolen bases place a significant opportunity cost on Gooden's strikeouts. Gooden also had 3 balks and 6 wild pitches, leading to more baserunner advancement. The ability of baserunners to advance off of first base reduced the number of double play opportunities, contributing to the low number of double plays turned behind Gooden in 1990. In 2007, Mark Buerhle allowed 2 stolen bases (against 3 caught stealing). He threw one wild pitch, and had no balks. (Amusing fact: opposing players stole 59 bases on Mark Buehrle in his career, one fewer than were stolen on Gooden in 1990 alone.) So Buerhle's handling of himself on the mound with respect to baserunners allowed approximately three bases advanced, whereas Gooden allowed 69. That's a vast difference, almost entirely attributable to the pitcher, and entirely left out of consideration by FIP, even though Gooden's allowance of all those bases is directly linked to the style of pitching that enabled him to notch 108 more strikeouts than Buerhle. Partly because Buerhle held runners on first, he was able to induce more double plays than Gooden as well, even though he allowed fewer runners to reach first base by allowing fewer walks and suppressing hits on balls in play. Gooden's catcher shares responsibility for the stolen bases, but on Buerhle's side of the equation, he doesn't even need a good defensive catcher; he doing almost all of the work of suppressing stolen bases himself. All he needs is a first baseman who is able to receive throws over to first and a catcher who is able to receive pitches. Gooden was unlucky in his fielding support in 1990--the base advancement results are not all his fault--but Buerhle was just being Buerhle: this was a standard part of the way he did his job as a pitcher of preventing runs. It's readily observable in the statistics and its value can be fairly readily estimated, but this value not accounted for in FIP-WAR at all.

Ironically, with the goal of reducing error by eliminating the uncertainty of distributing responsibility for what happens on balls in play and for the sequencing of events that determines base-runner advancement, FIP introduces more error than it removes. Really, that's not FIP's fault: it wasn't designed or intended to be used as the basis for calculating Wins Above Replacement for pitchers. It was designed to remove a lot of the noise from pitcher assessments that is created by the randomness that accompanies outcomes on balls in play, and it does that. The problem for HoM purposes is that there's an essential portion of pitchers' effectiveness in preventing runs that FIP is blind to, and we can't make accurate judgments of pitcher value on a career basis relying on FIP-WAR for much of anything, because it wildly overrates pitchers who sacrificed influence over certain aspects of pitcher-batter events in order to exercise more control over those aspects of pitcher-batter events that FIP has chosen to count.

Nolan Ryan sacrificed just about everything to keep hitters from putting balls in play. FIP-WAR wildly overrates him by zeroing out all the sequencing costs of Ryan's pitching style. Tommy John sacrificed rates of hits on balls in play to suppress slugging. FIP gives him credit for suppressing slugging with respect to home runs but zeroes out his high BABIP. Consequently, it greatly overrates him. Mark Buerhle, on the other hand, sacrificed strikeouts in order to minimize walks, extinguish the running game, and influence balls in play outcomes through synergy with fielders and his own defensive excellence. FIP credits his walk prevention but it zeroes out all of the rest of the upside that he gained from a pitching style that doesn't prioritize strikeouts, and so it underrates him significantly.

In sum, FIP-WAR is an unreliable guide to pitching merit because it systematically favors one style of pitching over others in ways that significantly misrepresent the results of those pitching styles with respect to run prevention. It will underrate pitchers who exert downward pressure on BABIP through their pitching technique, and it will underrate pitchers who exercise influence over event sequencing to suppress runs, and it will overrate pitchers who prioritize strikeouts over these other means of run prevention.

I think FIP-WAR is a useful statistic to examine, especially in conjunction with RA/9-WAR BBREF-WAR, BIP-WAR, and LOB-WAR. Looking at all of these stats can help us develop a clear understanding of how a pitcher prevented runs, and looking at all of the factors involved together can help us assess when a pitcher was lucky or unlucky rather than good or bad. But I think that putting FIP-WAR into any formula used to assess the comparative merit of pitchers quantitatively is not a good idea.
   260. Howie Menckel Posted: April 20, 2020 at 11:40 PM (#5942546)
Catcher Mackey Sasser had "the yips" - what seemed to be a psychological inability to throw the ball back to the pitcher after a ball or strike. this began around 1989 and continued into this 1990 season. I don't recall if he ever solved it.

I also don't recall it being as severe when a runner was trying to steal - but note that he didn't actually throw a lot of them out, especially when Gooden was pitching.

it's possible that even if Sasser fared better with other pitchers, Gooden might have existed on a continuum where a good catcher could have forgiven his weak pickoff moves with CSs anyway - whereas poor Mackey couldn't help at all.

Mackey could hit, for a catcher - 1988-91 OPS+s of 110, 110, 111, 100 - all in part-time duty.

well, he did get 87 starts in 1990 - vs. 20-something games started each by journeymen Charlie O'Brien, Barry Lyons, and Orlando Mercado plus 19 starts for 21-year-old Todd "The Butcher" Hundley. (also 2 from prodigal son Alex Trevino in his final season and 1 inning from Dave Liddell, who singled in his only MLB AB. eat your heart out, Moonlight Graham!)

a deeper dive into these catchers is probably justified.

oh, and this really intrigues me:

"Nolan Ryan sacrificed just about everything to keep hitters from putting balls in play. FIP-WAR wildly overrates him by zeroing out all the sequencing costs of Ryan's pitching style. Tommy John sacrificed rates of hits on balls in play to suppress slugging. FIP gives him credit for suppressing slugging with respect to home runs but zeroes out his high BABIP. Consequently, it greatly overrates him. Mark Buerhle, on the other hand, sacrificed strikeouts in order to minimize walks, extinguish the running game, and influence balls in play outcomes through synergy with fielders and his own defensive excellence. FIP credits his walk prevention but it zeroes out all of the rest of the upside that he gained from a pitching style that doesn't prioritize strikeouts, and so it underrates him significantly."
   261. DL from MN Posted: April 21, 2020 at 11:48 AM (#5942682)
Interesting comments, learning from this discussion.
   262. Dr. Chaleeko Posted: April 21, 2020 at 01:32 PM (#5942729)
Chris Cobb, that's the single best description of the argument against using FIP-WAR that I've ever seen. You put into words a whole bunch of threads that I've never been able to express together coherently. Thank you for explaining why I strongly prefer a runs-allowed approach instead of a components-based approach.

Another piece to add to your very large pile of evidence actually comes to us from Kiko's book, Player Won-Lost Records in Baseball. On pages 158 and 159, he describes the seven component to which he assigned pitcher decisions (not including pitcher fielding) and puts a weight on each of them with respect to the other six.
Component 1 SBs: ~1.5%
Component 2 WPs and PBs: ~1.5%
Component 3 Balls not in play: ~25%
Component 4 Balls in play: ~55%
Component 5 Hits v. Outs: ~15%
Component 6 Singles, doubles triples: ~1.5%
Component 7 Double plays: ~1%

But components 1, 2, 5, 6, and 7 are shared with fielders. Here's the split of those components that go to the pitcher:
SBs: 52%
WPs/PBs:76%
Hits v. Outs: 31%
Singles etc: 26%
DPs: 36%

Then he goes on to say this: "My research suggests that, in general, pitchers bear about one-third of the responsibility for what happens to batted balls which stay in the field of play." So if one uses FIP-WAR, one is, according to Kiko, not assigning any value to the pitcher on BIP when he bears a third of the responsibility. When you consider the number of BIPs per pitcher per year, that's a tremendous amount of value/blame. But ALSO, FIP-WAR will not account for the pitcher's contributions to SB, WP/PB, DPs, etc as noted above. DIPS theory is not an absolute, and pitchers do have responsibility for inducing weak contact or allowing hard contact. Gooden's high line-drive rate in 1990, noted above, being an example.

Anyway, just to point out that the research of one of our own strongly implies that FIP-WAR simply ignores a lot of important pitcher-influenced events...as Chris Cobb said upthread.
   263. Jaack Posted: April 21, 2020 at 03:19 PM (#5942756)
I won’t dispute that FIP-WAR is not perfect. But it’s approaching the question from the opposite direction as RA9-WAR. While FIP-WAR makes an error of exclusion, RA9-WAR makes an error of inclusion – it’s including a ton of information that is not in a pitcher’s control – defense and variance, in exchange for incorporating the pitcher’s influence on batted balls (although FIP WAR does account for IFFB). RA9 is essentially a total defense metric, but since pitchers make up the majority of defensive value, it is valuable as a pitching metric.

The fangraphs framing of WAR is

RA9=BIP+Sequencing+FIP.

But for our purposes, this is still a lot of noise. Both BIP and Sequencing are things a pitcher has some control over, but are also dependent on defense and random variance. FIP is almost completely under a pitcher's control. A more helpful model for us would be something like

RA9=FIP+Defense+Luck+Other Pitcher skills

We are most concerned with a pitcher’s value, which we could arrive either by constructing it (FIP+Other Pitcher Skills) or by deriving it from the total defensive value (RA9-(Defense+Luck).

But both are going to arrive at the same place. I think it makes more sense to start from FIP and try to incorporate a pitcher’s other skills, than it is to start from RA9 and remove the aspects a pitcher has no control over. The primary issue is the random variance, at least as far as BIP goes. It takes up a large portion of the difference, at least as far as BIP goes. Tangotiger goes into this a bit here in the section titled Accountable. I can’t find the article at the moment, but he also rates defense and pitcher influence over BIP to be pretty close in value.

For batters, there is a similar dichotomy between constructive models and derivative models. Linear weights is built from the various components of offense, while contextual models like RE24 start from runs and then divide the credit among the actors. Linear weights is the model more similar to FIP – both take components and produce an expected run value. As I weigh linear weights more strongly for batters, I also weigh FIP more strongly for pitchers. Philosophically, I prefer the constructive approach to the derivative one, as I think it’s more effective to add information than it is to remove the random variance.

That being said, for pitchers, the unknown value is greater, which is to say, there is more missing information from the constructive model for pitchers than for hitters. To account for this, FIP takes up less of a share of pitching in my system. My initial weighting for pitchers is something like 55% FIP, 25% RA9, 15% Kiko’s W-L records, and 5% BPro, while for batters, 70% of the weighting is on linear weights, with 15% for RE, and 15% for Kiko’s W-L records.

I do not include BBRef pitching WAR because it doesn’t really add much beyond RA9-WAR. Their defensive adjustment is too haphazard to help all that much. Furthermore, their model does not give enough credit to pitchers overall – this is particularly evident when looking at pitchers with long careers like Jim Kaat, Tommy John, Eppa Rixey, and Don Sutton, who are systematically underrated by BBRef WAR, but it affects all pitchers to some degree. Since there seems to be a fairly solid consensus that we should be inducting more pitchers, relying on the metric that likes them the least feels counter productive to me.
   264. Chris Cobb Posted: April 21, 2020 at 04:07 PM (#5942790)
Jaack, your argument againt BBRef pitching WAR seems largely to depend on two claims whose basis you haven't yet explained:

(1) the defensive adjustment is too haphazard to be helpful

(2) it doesn't give enough credit to pitchers, because certain long-career pitchers are systematically underrated.

I'd like to know more about what you see as the basis of these claims.

I'll say that my understanding in both cases leads to the opposite conclusion. My understanding is that assessments of fielding quality are more reliable at the team level than at the level of the individual fielder, so that BBRef WAR's way of adjusting pitchers' RA/9 to what it would be if they were pitching in front of an average defense is, if not perfectly reliable, more reliable than other methods of making this adjustment. (The greater reliability of team-level fielding assessments was one of Bill James's basic premises for the Win Shares system, and I suspect it moved from there to WAR as a design principle.)

My view is that all of the difference between the RA/9 WAR and BBRef WAR for the pitchers you mention can be empirically accounted for by their adjustment for the quality of pitchers' defensive support and (less obviously) by their adjustments for league quality. Of the pitchers you've listed Kaat, John, and Rixey, at least all spent long stretches of their careers in a league that BBRef's methods indicate was substantially weaker than the other league. Similar effects can be observed more directly by comparing the BBRef WAR and Fangraphs WAR for NL position players from 1900-1930. Fangraphs is consistently substantially higher for NL players, while AL players from the same period show much less difference between BWAR and FWAR. These patterns are also observable in AL position players from 1950-70. I'll acknowledge that I don't think this case accounts fully for BWAR's evaluation of Don Sutton, but before a claim of systematic bias is justified, the biased element of the system should be identified. If Don Sutton is underrated but not Nolan Ryan or Phil Niekro, then it can't be a systematic bias against pitching; it is exists, it must have another source. If there is a systemic bias, then I want to know what it is!
   265. Jaack Posted: April 21, 2020 at 07:36 PM (#5942865)
As far as the defensive adjustment goes, I'll link to tangotiger who explains it better than I ever could.

But in a nutshell, the issue is that the defensive adjustment looks at what the team defense did as a whole and not how it affects a particular pitcher. Combining that with the uncertainty of our defensive metric to begin with, and I don't feel confident that its telling us much worthwhile.

As far as the bias goes, both fangraphs and BBRef use the same replacement level, but Fangraphs devotes 43% of the value to pitching while BBRef devotes only 41%. The fangraphs ratio is closer to what MLB teams spend on payroll,and I think is closer to what we see in practice. It's not a huge difference, but over the course of a long career, it's going to add up.
   266. Kiko Sakata Posted: April 21, 2020 at 11:59 PM (#5942916)
We have finally discovered Kiko’s East-coast bias!


I tend to think of Tommy John as a Los Angeles Dodger. Maybe subconscious New York - Los Angeles bias? Which would be a really weird thing for somebody who lived over half his life in Chicago.

Dwight Gooden's 1990 season is weird and I don't know that I really have much insightful to say about it. Per BB-Ref, he led the NL in FIP (2.44) and HR/9 (0.4). But he was extremely run-un"lucky" - ERA of 3.83 vs. FIP of 2.44. But then he was win-"lucky" - he had an ERA+ of 98 but the Mets went 22-12 in the games he started (Gooden was 19-7). Which is partly because the Mets led the NL in runs scored, but Gooden was 4th among Mets starters in ERA but 1st in win-pct.

I don't know. I checked to see if his bullpen maybe allowed a lot of runners they inherited from Gooden to score, but there's nothing there (he left 9 baserunners, 3 of whom scored, which is exactly the 1990 NL average). I thought maybe the fact that I go game by game might be a factor if his high ERA was due to one or two bad starts. But no, he had 6 starts in which he allowed 6-8 runs (and 3 5-run games) which, if anything, seems like an unusually high number of bad games for a good pitcher.

I think the great FIP plays well because those are the factors that Gooden doesn't have to share with his fielders. And since pWins tie to team wins, he's basically getting more credit for the good stuff in his wins / less blame for bad stuff in games the Mets won anyway. Put it all together and he looks really good in pWins.

But yeah, it's an odd season.
   267. Howie Menckel Posted: April 22, 2020 at 01:21 AM (#5942931)
this is so good because 1990 Gooden to some extent makes or breaks his borderline case.

and I don't have an answer yet, either.

but if we can't watch real baseball - well, let's "catch" up on something else.
   268. Dr. Chaleeko Posted: April 22, 2020 at 06:06 PM (#5943240)
That's a wonky career arc, but there's enough there to demonstrate that 1985 wasn't a complete fluke.

Jaack, I hope you don't think I'm picking on you, but this whole line of thinking is interesting to me, I want to go back to the quoted point for a sec. If 1990 becomes a 4-5 WAR season, I don't believe it does demonstrate that 1985 wasn't fluky. In fact, it's more support for that perspective. If Gooden's 1990 were to magically clock in at 7 or 8 WAR, then I'd likely feel much more at ease about Gooden. But at 33 to 40 percent of his 1985 value, a boosted 1990 does not negate the idea that 1985 was a stone cold fluke of a year. Gooden never repeated it again and never, ever came close. Not even with 50% of it. He did not demonstrate repeatability, which is one of the basics that define a great player: The guy who can perform at a very high level year over year. That's important (to me!) in assessing his overall case because his career arc is so wonky that I feel uncomfortable with what my numbers tell me. As I said upthread, it's not like we're working with a Red Faber who had another killer season on his resume to bolster his big year.

A quick example. We just elected Luis Tiant. I adjust innings so that Tiant and Gooden should be on a similar plane workload wise. I do use BBREF and not FIP-WAR, which I know is not everyone's cup of tea, but the shape of their respective careers should be similar enough for us all to play along at home:
Tiant    7.6  6.8  5.8  5.7  5.6  4.7  4.6  3.8  3.3  3.1  2.2  2.1  2.0  1.6   1.3   0.1  -0.2  -0.3  -0.5
Gooden  12.8  5.8  4.4  4.0  3.8  3.7  3.5  3.2  3.0  2.7  2.7  1.5  1.2  1.0  -0.5  -0.6

Even if you turn 1990 into a 4.5 win season, Tiant, the most recent pitcher we elected, outperforms Gooden over virtually every season except for their best respective years. But the big fluky year makes it appear that Gooden hangs in with Tiant when we assess a peak over several seasons without looking at the seasons themselves. I don't know if I can explain this well, but I'll try anyway. The smallest unit we use to analyze greatness in this project is generally the season. We pretty much never talk about individual games as keys to a case. We always start with seasons as they build toward a career. How many great/good/useful seasons did a guy have? is one way that we approach conversations. Gooden had one amazing year, one All-Star year, and shoulder seasons. 1985 is just one year, no matter how much it impacts our view of Gooden's peak. I can't tell myself that 1985 is like two peak seasons in one because it is only one season. Yes, it is worth 12-something wins no matter what, but I don't believe that the number is the number is the number is the best argument to use in a really extreme case like Gooden's. In fact, I think Gooden likely "breaks" all of our systems. There's probably no player with such an extreme difference between his best year and everything else he ever did, and it certainly warps my thinking when I see his peak value but don't look at the fact that about a third of that peak comes from just one season. Maybe that's just me being ornery, but 280 innings of 12-WAR pitching versus 2500 innings of above-average starting pitcher career is too weird to trust.
   269. Eric J can SABER all he wants to Posted: April 22, 2020 at 06:59 PM (#5943255)
For what it's worth (doesn't have to be much), if you lean toward FIP then Gooden's insane peak is '84-'85.

Also not entirely sure what the deal is with Gooden's 1990. According to this, his component ERA was 3.31, so half a run lower than his actual ERA and .87 higher than his FIP. He had a higher H/9 rate than the Met average, despite also having a higher K/9 rate than the Met average.

Does FIP count hit batters? Gooden hit 7 in 1990, 4th in the NL; it's not a huge factor but it's something. Just for fun - the Mets won all five games in which Gooden had at least one HBP in 1990, including two by scores of 19-8 and 10-9. Gooden himself got wins in four of the five, including the 19-8 and 10-9 games.
   270. Howie Menckel Posted: April 22, 2020 at 07:13 PM (#5943259)
a boosted 1990 does not negate the idea that 1985 was a stone cold fluke of a year.

I don't think of a player who quickly succumbs to alcoholism and extreme substance abuse - issues which bedevil him to this day - as a "fluke."

I mean, he was so out of his mind on the day of the Mets' tickertape parade down The Canyon of Heroes after the Mets' 1986 World Series parade that he missed the entire thing.

   271. Bleed the Freak Posted: April 22, 2020 at 10:11 PM (#5943299)
In fact, I think Gooden likely "breaks" all of our systems. There's probably no player with such an extreme difference between his best year and everything else he ever did, and it certainly warps my thinking when I see his peak value but don't look at the fact that about a third of that peak comes from just one season. Maybe that's just me being ornery, but 280 innings of 12-WAR pitching versus 2500 innings of above-average starting pitcher career is too weird to trust.


The closest I can think of is Dolf Luque, who receives support off and on, and likely deserves some level of MLE credit on top of fitting the Gooden conundrum.
   272. Dr. Chaleeko Posted: April 22, 2020 at 10:20 PM (#5943301)
Howie, I would say that Gooden’s substance abuse issues are an injury/illness (and I do subscribe to the disease model of addiction) that kept him from performing at his peak level. I have a lot of compassion for Gooden, but it doesn’t change the fact that he could never duplicate his 1984 season let alone his 1985 season. He was never again able to be BOTH fully healthy and able to pitch at a HOM level again.
   273. Kiko Sakata Posted: April 22, 2020 at 10:24 PM (#5943303)
We pretty much never talk about individual games as keys to a case. We always start with seasons as they build toward a career. How many great/good/useful seasons did a guy have? is one way that we approach conversations. Gooden had one amazing year, one All-Star year, and shoulder seasons. 1985 is just one year, no matter how much it impacts our view of Gooden's peak. I can't tell myself that 1985 is like two peak seasons in one because it is only one season. Yes, it is worth 12-something wins no matter what, but I don't believe that the number is the number is the number is the best argument to use in a really extreme case like Gooden's. In fact, I think Gooden likely "breaks" all of our systems. There's probably no player with such an extreme difference between his best year and everything else he ever did, and it certainly warps my thinking when I see his peak value but don't look at the fact that about a third of that peak comes from just one season.


I think there's something to be said for this. My system goes game by game and you do run into the occasional "okay, sure, that was terrible [or great] but it was just one game." And there probably should be a seasonal counterpart. You can only win one pennant (and the 1985 Mets didn't even manage to win the one).

That said, while granting that Gooden's 1985 was far better than any of his other seasons, my system does also like his 1984, 1986, and 1990 seasons quite a bit and thinks his 1987, 1988, and 1991 seasons were at least a step better than "hanging around" seasons. Is that enough? I think so, but even within my system, Gooden's best seasons tend to look better in pWins than in eWins, so if you prefer the latter, there's a strong case for leaving him off-ballot.

And to build on Eric in #269, Gooden led the NL in FIP three times (1984, 1985, 1990 - his 1984 FIP was an eye-popping 1.69), was second in FIP twice (1987, 1988), and third once (1991). For his career, his ERA (3.51) wasn't that much worse than his FIP (3.33), although that's partly because he actually out-performed his FIP in his hanger-on seasons. From 1984 - 1992 - which is basically the entirety of his HOM case - he had a FIP of 2.64 and an ERA of 2.99 (not that 2.99 is bad - although, of course, the 1.53 in 1985 is doing a good bit of the work there).
   274. Jaack Posted: April 23, 2020 at 02:55 AM (#5943325)
Jaack, I hope you don't think I'm picking on you, but this whole line of thinking is interesting to me, I want to go back to the quoted point for a sec. If 1990 becomes a 4-5 WAR season, I don't believe it does demonstrate that 1985 wasn't fluky. In fact, it's more support for that perspective. If Gooden's 1990 were to magically clock in at 7 or 8 WAR, then I'd likely feel much more at ease about Gooden. But at 33 to 40 percent of his 1985 value, a boosted 1990 does not negate the idea that 1985 was a stone cold fluke of a year. Gooden never repeated it again and never, ever came close. Not even with 50% of it. He did not demonstrate repeatability, which is one of the basics that define a great player: The guy who can perform at a very high level year over year. That's important (to me!) in assessing his overall case because his career arc is so wonky that I feel uncomfortable with what my numbers tell me. As I said upthread, it's not like we're working with a Red Faber who had another killer season on his resume to bolster his big year.


I guess I define fluke as a big season that can partially be explained away as unrepeatable and unrelated to a players' own skills - Norm Cash in an expansion year, Fred Dunlap in the UA, Snuffy Stirnweiss in the war years. They had big performances, but in a large part due to things out of their control. Norm Cash wasn't going to repeat his 1961 season because it wasn't created on his own talent.

Gooden's 1985 is different. He actually was that good. He went up against a normally strong league and was destroyed it. He didn't repeat that level of dominance across a full season, but not because it was unrepeatable in the way dominating an expansion year is unrepeatable - I believe his true talent level for that season was probably in the 10-12 WAR range.

Even if you turn 1990 into a 4.5 win season, Tiant, the most recent pitcher we elected, outperforms Gooden over virtually every season except for their best respective years. But the big fluky year makes it appear that Gooden hangs in with Tiant when we assess a peak over several seasons without looking at the seasons themselves. I don't know if I can explain this well, but I'll try anyway. The smallest unit we use to analyze greatness in this project is generally the season. We pretty much never talk about individual games as keys to a case. We always start with seasons as they build toward a career. How many great/good/useful seasons did a guy have? is one way that we approach conversations. Gooden had one amazing year, one All-Star year, and shoulder seasons. 1985 is just one year, no matter how much it impacts our view of Gooden's peak. I can't tell myself that 1985 is like two peak seasons in one because it is only one season. Yes, it is worth 12-something wins no matter what, but I don't believe that the number is the number is the number is the best argument to use in a really extreme case like Gooden's. In fact, I think Gooden likely "breaks" all of our systems. There's probably no player with such an extreme difference between his best year and everything else he ever did, and it certainly warps my thinking when I see his peak value but don't look at the fact that about a third of that peak comes from just one season. Maybe that's just me being ornery, but 280 innings of 12-WAR pitching versus 2500 innings of above-average starting pitcher career is too weird to trust.


I'm sympathetic to the idea that a single season can break systems - you have to expect diminishing returns eventually.

There is some point where being better isn't going to produce any more wins. For a hitter this is obvious; eventually, you'd just IBB a guy if he were threatening enough. But for a starting pitcher, it's harder to see that breaking point, because there's no strategic way to avoid them. But there has to be point where any improvement isn't going to help you win more games. I can't imagine there being a substantial difference in terms of wins and losses between a 0.00 ERA pitcher and a 0.50 ERA pitcher in terms of the teams' record.

Did Gooden quite reach that point in 1985? Obviously, he added some superfluous value in games the Mets were going to win with just about anyone pitching, but that happens with pretty much every pitcher at some point. We can measure this to some exten with WPA - Gooden's was 9.46 in 1985, which works out to ~11.5 WPA-WAR. So perhaps there is a bit of evidence of diminishing returns there.

On a more basic level, the Mets did lose seven games Gooden started that season. In two of them the Mets were shut out, and two more where they only scored one run, but there were three games that Gooden could have been better in that would have resulted in Mets wins.

So did Gooden reach a level where he wasn't adding much more? I don't think he was quite there.

-----

Overall, Gooden is a tough candidate. A single season shouldn't propel a guy into the HoM, but it can't be brushed off easily at all. There is not a lot of brilliance after 1985, but there he still provided a good amount of success and value through at least 1992. So far this conversation has pushed me to debit Gooden some for his poor ability in dealing with the run game, but he's still in a mess of eight guys for the bottom three spots on my ballot. I'm think that his 1986-92 period has enough value to it that, when combined with the huge peak, he becomes a borderline candidate.
   275. progrockfan Posted: April 23, 2020 at 11:15 AM (#5943407)
@Dr. Chaleeko: "I would say that Gooden’s substance abuse issues are an injury/illness (and I do subscribe to the disease model of addiction) that kept him from performing at his peak level."

That's an enlightened attitude, Doc, and I think I agree.

Adoption of this attitude has an impact on other players in my consideration set as well.
   276. cookiedabookie Posted: May 14, 2020 at 09:59 AM (#5950357)
For all of us negro league fans: https://negroleagueshistory.com/shop/baseball-cards-postcards/negro-leagues-legends-baseball-card-set/
   277. Mike Webber Posted: May 14, 2020 at 12:19 PM (#5950402)
@276

I got to see the exhibit of Graig Kreindler work at the Negro League Museum in Kansas City, and actually met Graig briefly. In addition to the wonderful portraits there were some really excellent memorabilia on display including tickets to the Olympic events with Jesse Owens and Joe Louis fights, the oldest know broadsheet for a Negro team - playing against Marshall University in the 1880's I believe.

The portraits were fascinating. I have very clear images in my mind of the top Negro League stars - especially the ones that play after 1925 or so. But the ones that are not inner circle types, or maybe just a little older, like Nip Winters or Jimmy Claxton, those guys are a blank for me. Or even Pete Hill.

The exhibit was planning to travel, if it does I think everyone that posts in our thread would enjoy seeing it. Keep an eye open for it.
   278. progrockfan Posted: May 16, 2020 at 05:18 PM (#5951249)
I own Kriendler's original limited-edition Negro Leagues set. It's superb & worth every penny.
   279. Esteban Rivera Posted: May 18, 2020 at 02:43 PM (#5951728)
I was checking some of the updated WAR numbers over at Baseball Reference to update the references I use for my ballot. I noticed that while the Wins Above Average numbers for position players changed for those that were impacted by the update and those that didn't stayed the same, all of the pitchers' WAA values appear to have changed regardless of whether they were impacted by the WAR update. Is this the case or am I misreading the values?
   280. kcgard2 Posted: May 18, 2020 at 03:49 PM (#5951761)
Esteban, I just checked the top 8 (unelected) pitchers on my rankings, and 7 of the 8 were completely unchanged on WAA. Cicotte went from 27.5 to 27.6 after the WAR updates. I think you must be misreading.
   281. Esteban Rivera Posted: May 18, 2020 at 04:25 PM (#5951784)
Thanks! I'll check again then, I was cross referencing the numbers with the Baseball Reference pitching WAR/WAA values at Baseball Gauge which are not yet updated through the 2020 revision and the numbers were different there. That may be what is throwing me off.
   282. kcgard2 Posted: June 01, 2020 at 04:35 PM (#5954785)
Germane to the discussion that has developed about the valuation of catchers, Tango has posted a quick-and-dirty study on the potential effect size of pitch framing, game-calling, pitcher-catcher rapport, or whatever other intangibles may go into run prevention that should be credited to the catcher, using WOWY methods. It looks like all-time greats at these aspects may derive ludicrous amounts of value over backup catchers. There are some obvious next steps that I believe Tango intends to run (controlling for pitcher quality, for example, of which he's shown a few examples). However, there still remains very significant chunks of value that should probably be credited (or debited as the case may be) to catchers. The natural off-shoot of this is that this credit should be deducted from (or debit be added to) the pitchers who are presently taking all of the credit or blame for these catcher skills or lack thereof. But that is a separate discussion.

In comment #2, Tango suggests, after controlling for pitcher strength, one standard deviation of this effect is probably around 2 WAR. Obviously, that is huge, and his conclusion is that WAR has probably been too harsh on catchers in general.

http://tangotiger.com/index.php/site/article/catcher-wowy
   283. Jaack Posted: June 01, 2020 at 11:53 PM (#5954857)
Since the numbers are obviously very preliminary (and we don't have data on any candidates who are eligible) it's limited what we can do with the data. Hopefully tangotiger or someone else will give us more concrete and complete data before it's time to vote, but it is further confirmation that we've probably been undervaluing catcher defense.

It would be very funny if, after all this time, Rick Ferrell turned out to be a not awful Cooperstown choice. Not that the VC had any clue what they were doing when they elected him.
   284. cookiedabookie Posted: June 02, 2020 at 04:19 PM (#5954970)
I've always believed WAR was underrating catchers, and have a pretty significant boost to them in my personal system. What I've struggled with is how to remove that value accurately and fairly from pitchers. I'll be interested to see if/when that happens, how that will change voting around here. Although I do think we are short on pitchers, and this could end up knocking more pitchers down lists when they should be doing the opposite (especially our overlooked 80s/90s cohort).
   285. Eric J can SABER all he wants to Posted: June 02, 2020 at 11:17 PM (#5955042)
I've always believed WAR was underrating catchers, and have a pretty significant boost to them in my personal system. What I've struggled with is how to remove that value accurately and fairly from pitchers.

Based on Tango's comments below the initial article, one standard deviation of catcher defense is 16 runs per 162 games - which means, if a very good starting pitcher pitches 20% of his team's innings (certainly nobody gets close to that today), a pro-rated 1-SD catcher would affect the pitcher by about 3 runs per year. Unless a pitcher is paired with either one very good catcher or a series of very good catchers over the course of a long career, I don't think the effect on an individual pitcher's numbers is going to be especially large.
   286. Jaack Posted: June 03, 2020 at 02:32 AM (#5955046)
The first pitcher that comes to mind is actually Andy Pettitte, who probably looks a lot better without Jorge Posada.

If Steve Rogers had more support we'd need to have more caution, but I wouldn't be surprised if I was the only voter with him in the top 50 eligibles.
   287. DL from MN Posted: June 03, 2020 at 08:40 AM (#5955061)
There are few catchers among the top hitters who were also top defenders possibly because backup catchers are usually selected for their defense. There are some interesting players coming up for discussion soon - Yadier Molina, Brian McCann and Russell Martin. Among catchers who receive votes I think Elston Howard is one of the most interesting as far as deserving a defensive bonus.
   288. Eric J can SABER all he wants to Posted: June 03, 2020 at 05:33 PM (#5955172)
The first pitcher that comes to mind is actually Andy Pettitte, who probably looks a lot better without Jorge Posada.

If Steve Rogers had more support we'd need to have more caution, but I wouldn't be surprised if I was the only voter with him in the top 50 eligibles.


Dwight Gooden also threw to Carter quite a bit (or at least he was on Carter's Mets teams, I'm not sure how often Carter actually caught him).

Counterbalancing Posada at least a little, Pettitte has 2 years and change of throwing to Brad Ausmus, who's in the initial uncontrolled top 5 in WOWY (and having Adam Everett at short instead of Jeter) while in Houston.
   289. cookiedabookie Posted: June 03, 2020 at 05:33 PM (#5955173)
Among catchers who receive votes I think Elston Howard is one of the most interesting as far as deserving a defensive bonus.

I'd shout out Jim Sundberg too. With any defensive value change, I suspect he'd jump up eligible catcher rankings. Thurman Munson as well - this type of data could finally boost him into an elect-me spot. Posada could take a hit and fall off ballots.
   290. Eric J can SABER all he wants to Posted: June 03, 2020 at 05:42 PM (#5955175)
I've always believed WAR was underrating catchers, and have a pretty significant boost to them in my personal system. What I've struggled with is how to remove that value accurately and fairly from pitchers.

Revisiting this again from yesterday, I'm not sure if this is referencing adjusting individual pitchers for catcher defense compared to average, or referencing an adjustment to all pitchers for previously unmeasured value inherent to the catcher position. If the latter, that value wouldn't actually come out of pitcher totals. In the WAR framework, that value would be in the position adjustment, and would therefore be compensated for by reducing the position adjustment to non-catcher position players to keep the average (non-DH) position adjustment at 0. The current position adjustments in bWAR are:

C: +9 runs
SS: +7 runs
2B: +3 runs
CF: +2.5 runs
3B: +2 runs
RF: -7 runs
LF: -7 runs
1B: -9.5 runs
DH: -15 runs

If we're undervaluing catchers compared to the average position player, then we're inherently overvaluing other position players compared to average. If you boost the positional adjustment for catchers by, say, 7 runs per 150 games (chosen for ease of calculation), you would reduce the positional adjustment for the other positions by 1 run per 150 games, giving a +16 for catchers, +6 for shorstops, +2 for 2B, etc.
   291. Dr. Chaleeko Posted: June 03, 2020 at 09:24 PM (#5955212)
Max Marchi’s handling numbers from BP c. 2012ish also give Carter a boost and Posada the boot. I suspect he was finding the same thing as Tango, and I hope this work continues because it is very important. Once I incorporated Marchi’s handling (and not even at its full value), Carter shot up to the top of my catcher list. I’ve never said anything about it because the idea that Carter would unseat Bench seemed like it might not be as defensible as I’d like. But it sure matches what Tango is saying here.
Page 3 of 3 pages  < 1 2 3

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Dynasty League Baseball

Support BBTF

donate

Thanks to
aleskel
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 1.5423 seconds
41 querie(s) executed