Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Dr. Chaleeko
Posted: November 07, 2004 at 02:36 PM (#956874)
Lots of questions here.
First off, no i9s information to help out...gulp.
Second, how much credit should Johnson get for his time in the 25th infantry? Riely doesn't list the years he was in the 25th, but indicates that Johnson made the Monarchs in 1922, a good two years after Moore. Does this mean that Johnson got out of the miltary later or that he played for lesser teams?
Last, how bad was his fielding? I get the impression from Riley that Johnson was a Heilmannesque player, with a powerful, high-average bat, but not much fielding skill.
2. Chris Cobb
Posted: November 09, 2004 at 02:58 AM (#958655)
Heavy Johnson Data
Negro-League Play
1922 KC 231 ab, 104 hits, 22 2b, 9 3b, 13 hr, .451 avg, .792 slg. ; leads lg in BA, all-star LF
1923 KC 186 ab, 68 hits, x 2b, 9 3b, 20 hr, .367 avg., appx. .866 slg; leads in HR, all-star CF, Holway MVP
1924 KC 296 ab, 111 hits, 16 2b, 11, 3b, 7 hr, .374 avg., .574 slg; all-star RF
1925 Bal .352, played lf
1926 Bal/Hbg 209 ab, 72 hits, 8 hr, .346 avg.; avg. 4th in lg, all-star in LF
1927 Hbg .336 played rf.
1928 Cle .279 played lf; .Memphis .331 played rf
1929 no data
1930 no data
I can find no records in Holway for Johnson after 1928. Cleveland and Memphis were not good teams in 1928, so it looks like his career was on a downward slope at that point.
vs. Major-league competition
1922 2-4 (double, home run) vs. Babe Ruth all-stars. Jack Quinn pitched.
According to Holway, he was 8-16 lifetime vs. major-league pitching, with 2 home runs.
Brief analysis: Johnson left the military and joined the Negro National League when he was 26. (That answers one of Dr. Chaleeko's questions . . . ) He was clearly an immediate star, with stunning seasons in 1922 and 1923, though he may have been assisted by a friendly home ballpark for the Monarchs. (They changed parks around this time -- I'm pretty sure the details are on the Dobie Moore thread from KJOK, but I haven't had time to cross-check.) Certainlty, the Monarchs dominated the offensive leaderboards during these years without dominating the standings, although they were the best team in the league after the light-hitting Chicago American Giants. Johnson never had seasons as good as those two again, although he remained a star from 1924 to 26. He was slightly above average at best in 1927 and 1928, playing for poor teams in 1928, and he disappears from the records of major teams after that, even though Riley has him playing for the Memphis Red Sox through 1933, and Holway has data for Memphis for most of those seasons. I think he deserves probably three or four seasons of MLE credit, two at star-level, for his play in the military (I don’t know when exactly he joined Rogan and Moore on the 47th Infantry Division team, but that should be discoverable). But it’s hard to credit him with any more than a eleven-year MLE career, which is very short for an outfielder.
I’ll try to come up with win-share projections, soon, since his peak looks high enough that it’s worth considering how good he was overall. But I don’t think he’ll match Dobie Moore,
3. KJOK
Posted: November 09, 2004 at 05:40 AM (#958873)
Johnson played at least a partial season for the Newark Browns in 1932.
4. KJOK
Posted: November 09, 2004 at 05:53 AM (#958895)
5. Michael Bass
Posted: November 14, 2004 at 04:54 AM (#965286)
I'm still kind of in the dark on Johnson. My thought is still not to have him anywhere near the ballot, is this correct? Decent but unspectacular peak, short career?
I'm still kind of in the dark on Johnson. My thought is still not to have him anywhere near the ballot, is this correct? Decent but unspectacular peak, short career?
That's where I have him, too. A long career would have helped him.
7. KJOK
Posted: November 15, 2004 at 07:01 AM (#966314)
As Gadfly reminded me on the Moore thread, Muehleback Park opened July 23, 1923, and was the Moanrchs main park by 1924.
Muehleback was certainly a pitchers park by Negro League standards, and was the same park (different name) used by the Kansas City A's in the 1960's.
So, yes, I believe Johnson was helped by playing in a hitter friendly park pre-1924.
8. KJOK
Posted: November 15, 2004 at 07:02 AM (#966316)
I'd say if Chino Smith, with a higher peak and more defensive value, doesn't make your ballot then no way should Heavy be there.
9. Gary A
Posted: December 07, 2004 at 09:52 PM (#1004111)
From Patrick Rock's 1923 NNL Yearbook. He seems to have had a somewhat better year than Holway's book indicates.
Heavy Johnson Batting
*-led league
G-98*
AB-374
H-152*
D-32*
T-13 (2nd to Turkey Stearnes' 14)
HR-20* (tied with Candy Jim Taylor, Tol.-StL)
R-91*
RBI-120* (next highest--Oscar Charleston's 94)
W-38
SB-17
AVE-.406*
OBA-.462 (2nd to Torriente's .471)
SLG-.722*
Again, the Monarchs divided their home games between hitter-friendly Association Park (31 games) and pitcher-friendly Muehlebach Field (27 games). I don't have park factors, but if I had to guess, I'd say the Monarchs played in a more-or-less neutral context.
10. burniswright
Posted: December 10, 2007 at 10:14 AM (#2639846)
I have no quarrels with the basic evaluations here. Yes, Johnson lived to hit the ball, and he was of little or no help on defense. It's also true that whereas he had decent seasons in Baltimore, they were nothing like the monster numbers he put up in KC. Park factors or no, if Johnson had continued to hit at the pace he established as a Monarch, he would have passed Mule Suttles and Jud Wilson. When a career declines as sharply as this one, I always suspect alcohol, although I have no primary-source evidence for that.
What interests me the most about Heavy Johnson is what a good example he is of the Jamesian division between peak and career value, and how troubling it sometimes is to try to reconcile the two. Ed Wesley is another example of a guy who had some seasons in Detroit that absolutely knock your socks off. Take a look at his 1925 numbers, for instance. Ruth never had a season that good. Yes, of course he took advantage of the short porch in Mack Park, but he also hit .416.
The short answer to my question is that, in almost all cases where the stats are this dichotomous, I feel that career value has to trump peak value. But in the Johnson and Wesley cases, those are some peaks!
11. sunnyday2
Posted: December 10, 2007 at 10:21 AM (#2639848)
Hi, Burnis. Looks like it's you and me here right now.
12. sunnyday2
Posted: December 10, 2007 at 10:21 AM (#2639849)
P.S. what do you know about Heavy's old teammate Dobie Moore?
13. burniswright
Posted: December 11, 2007 at 08:13 AM (#2641108)
Sunnyday2: give me a chunk of time to read all 227 posts in the Moore thread, to see what you guys have already talked over. I'm not begging off, it's just that, occasionally, it's necessary to pull myself away from the real business of thinking about baseball and engage in the diversion of paying the rent.
What I can tell you, right from the get-go, is that I'm damned mad that Elsie Brown shot him in the leg. This is a guy who, like Chino Smith, seemed absolutely bound for glory. From most accounts, he wasn't a beautiful shortstop to watch--he wasn't Omar Vizquel--but he could play the position, plus he had that Shawon Dunston gun on his right shoulder.
And man, could he hit!
14. burniswright
Posted: December 11, 2007 at 09:15 AM (#2641115)
"Hi, Burnis. Looks like it's you and me here right now."
OK then sunny, help me out please. My posts #58 and 59 on the Monroe thread are primarily about a very general HOM issue, and to the extent they're specific, they're not about Monroe.
So they don't belong there. Where do they belong? Thanks.
15. burniswright
Posted: December 16, 2007 at 10:20 AM (#2647195)
OK sunny: my comments on Dobie Moore are now posted at the end of the Moore thread.
16. KJOK
Posted: September 17, 2011 at 06:26 AM (#3927848)
I'd say "the Lefty O'Doul of the Negro Leagues" might be a reasonable comp.
19. DL from MN
Posted: June 17, 2019 at 02:54 PM (#5852852)
Huge bat for a player who started as a catcher.
BASICS MLE COMPONENTS STATUS
PLAYER NAME POS HOF HOM PA Rbat Rbaser Rdp Rfield WAA WAR WAR/650 "DATA COMPLETENESS SCORE" DATA+
JOHNSON, HEAVY RF N N 7650 396 -4 0 -11 34.7 62.1 5.3 43.6 73
20. DL from MN
Posted: June 17, 2019 at 03:00 PM (#5852855)
I think the RField numbers are probably too generous. He was listed as 6' 250lbs and his defensive reputation is poor. That's a Fat Tony Gwynn physique and his numbers were likely closer per season to -10 than -1.
21. Dr. Chaleeko
Posted: June 17, 2019 at 03:38 PM (#5852866)
Those are out of date figures for Heavy. Here's what he looks like now:
1916 2.0
1917 2.5
1918 3.6
1919 3.2
1920 6.0
1921 6.5
1922 6.0
1923 5.6
1924 6.0
1925 1.3
1926 3.3
1927 4.3
1928 2.5
1929 2.1
1930 2.0
1931 3.5
1932 1.5
TOT 62.1
22. DL from MN
Posted: June 18, 2019 at 03:16 PM (#5853245)
Turn those -2 and -1 RField numbers into -7 and -6 and you shave off 6 WAR.
23. Brent
Posted: January 05, 2022 at 12:15 AM (#6059683)
There's been some discussion of Heavy Johnson on the 2022 ballot discussion thread. I thought I'd move a couple of items here before continuing the discussion.
Here are Heavy Johnson's latest actual Negro league statistics from Seamheads, along with Dr C's latest MLE WAR estimates:
Year / Age / G / PA / BA / OBP / SLG / OPS+ / MLE WAR
1916 / 21 / 9 / 35 / .273 / .314 / .455 / 195 / 3.3
1917 / 22 / NA / NA / NA / NA / NA / NA / 4.9
1918 / 23 / NA / NA / NA / NA / NA / NA / 4.0
1919 / 24 / NA / NA / NA / NA / NA / NA / 4.5
1920 / 25 / 3 / 10 / .300 / .300 / .500 / 128 / 2.3
1921 / 26 / NA / NA / NA / NA / NA / NA / 9.1
1922 / 27 / 74 / 288 / .396 / .445 / .715 / 207 / 8.7
1923 / 28 / 99 / 433 / .406 / .472 / .719 / 210 / 5.7
1924 / 29 / 79 / 332 / .360 / .421 / .531 / 173 / 8.2
1925 / 30 / 61 / 251 / .327 / .383 / .538 / 141 / 1.8
1926 / 31 / 48 / 186 / .350 / .418 / .540 / 170 / 5.3
1927 / 32 / 58 / 228 / .379 / .461 / .568 / 164 / 6.2
1928 / 33 / 69 / 273 / .348 / .390 / .468 / 129 / 3.2
1929 / 34 / NA / NA / NA / NA / NA / NA / 2.4
1930 / 35 / 5 / 13 / .250 / .250 / .250 / 38 / 0.6
1931 / 36 / 5 / 21 / .286 / .286 / .476 / 109 / 2.6
1932 / 37 / 2 / 5 / .000 / .250 / .000 / -28 / 1.4
Johnson was playing for the Army Wreckers team from 1916 to 1921, which is why only a handful of games are recorded for those years.
24. Brent
Posted: January 05, 2022 at 12:25 AM (#6059685)
I also suggested a comparison with Albert Belle, whose career from 27 onward parallels Johnson's.
To convert Negro league performance to major league equivalents, I use this simple back-of-the-envelope formula: Add 100 to the Seamheads OPS+; then multiply by 0.89; then subtract 100. For example, Johnson’s 1922 NeLg OPS+ is 207, so I calculate his eqOPS+ as 173 = 0.89 * (207 + 100) – 100.
Here are his equivalent OPS+ for ages 27 to 33, followed by Belle’s at the same ages:
We see they have similar OPS+, though Belle is better. On defense, both players were pretty bad, though I believe the evidence indicates that Johnson was worse. Bbref shows Belle costing his team about 8 runs per season over these years compared with an average left fielder, while Seamheads shows Johnson costing his team 1.2 wins per 150 games on defense. I note that his is one of the worst fielding rates in the Seamheads database.
After age 33, Belle is injured and his career is over. I don’t know if Johnson was injured, but he doesn’t show up in the statistics for age 34 (1929 - Dr C responded that Johnson was playing with an independent barnstorming team that year). When he returns to the Negro leagues he barely plays and is pretty awful in the few games he does play.
Of course, we know how Belle played before age 27 -- 3 good seasons from ages 24 to 26 averaging an OPS+ of 135 and 3.1 WAR, plus 71 games of replacement level play at ages 22 and 23. For Johnson, we know that he was playing at those ages for a very good Army team, but we can only make educated guesses about the numbers. Dr C's MLEs show him earning 28.1 WAR from ages 21 to 26, or an average of 4.7 WAR per season.
25. Jaack
Posted: January 05, 2022 at 12:42 AM (#6059686)
Whenever I look over Johnson's record, the name that always pops into my head is Harry Heilmann. They are pretty much perfect contemporaries, both had killer bats, and underwhelming gloves. Heilmann isn't an inner-circle guy, but he's well over the line. Even if Heavy wasn't quite as good, there's enough room for him to be a compelling candidate.
I guess the biggest question is how likely we are to get anything from his years with the Infantry Wreckers. I'm not too in tune with the day-to-day of the research there, but if it's possible those records exist, it might be worth holding off to ensure he's not just a flash in the pan. But what we have points to a HoM level talent, as strong as any remaining Negro League player.
26. Jaack
Posted: January 05, 2022 at 12:46 AM (#6059687)
I do think the Belle comp is a good one too. Splitting the difference between Belle and Heilmann puts Heavy right on the borderline for me.
It really comes down to those missing Wreckers' seasons.
27. Brent
Posted: January 05, 2022 at 02:02 AM (#6059691)
Now moving on to a new comment, let me describe how I would approach doing an MLE for a player like Heavy Johnson and see where that takes us.
I start by focusing on the well-measured seasons, which for Johnson are his ages 27 to 33. From my post # 24, his average eqOPS+ over those seasons is 144 (if you weight the years by plate appearances) and 141 if you equally weight the seasons. I'll use 144. Belle's average OPS+ for his age 27-33 seasons is 151. Belle's rfield averages -8 per season over that period. For Johnson, if we use the Seamheads fielding WAR number, which according to my understanding doesn't include the Positional adjustment, thus making it comparable to rfield, his defense averages about -1.2 wins or -12 fielding runs per season. Belle and Johnson both seem not to have missed much playing time over those seasons, so I'll assume there's no difference there. So based on the differences in offense and defense, it appears that Johnson should average maybe 7 or 8 fewer runs per season than Belle, or let's say 5 fewer wins over the 7-season period. Belle's WAR for the 7 seasons is 31, so I would peg Johnson's total at 26, and distribute them among the seasons based his seasonal eqOPS+, though smoothing the variation out somewhat because of the variance due to shorter NeLg seasons.
Dr C's MLEs show Johnson with 39.1 WAR over these 7 relatively well-measured seasons. The most important difference between Dr C's estimates and mine is that Dr C's assume that Johnson was a league-average right fielder, with an average rfield of 0. In addition to Seamheads fielding data, the evidence that Johnson was a poor fielder includes anecdotal information (Riley says he "was an unpolished fielder and not noted for performance afield"; see also comment #10 on this thread). I would ask Dr C to explain his reasoning for treating Johnson as an average fielder.
Offense also contributes to the difference in our estimates, as Dr C's MLEs appear to show Johnson as a slightly better hitter than Belle, whereas my back-of-the envelope calculation (which I've checked against other MLEs, such as those done by Chris Cobb) show Johnson a bit below Belle.
In making my own MLEs, I wouldn't rely entirely on the comparison with Belle. Using Stathead's season and career finder, I found four additional players who from ages 27 to 33 had an average OPS+ between 134 and 154 and poor fielding as evidenced by cumulative rfield of at least -40 runs (or -6 runs per season). These players are Frank Howard, Ken Singleton, Ralph Kiner, and Roy Siever. Here are some statistics:
The average OPS+ for these 5 players is 146 and the average WAR is 25.4.
Having established an estimate of Johnson's MLE WAR for ages 27 to 33, I would then look to estimate his earlier and later seasons for which we have almost no information, other than the fact that he was playing baseball for a good team. For the earlier seasons, I think a reasonable way to approach this is by looking at the average WAR from ages 21 to 26 for the five comparable players.
Ages 21 to 26
Name / PA / WAR
FHoward / 1829 / 9.6
Belle / 2099 / 9.1
Singleton / 1877 / 10.3
Kiner / 2582 / 24.8
Sievers / 1406 / 0.6
The average is 1959 PA and 10.9 WAR, with quite a bit of variation from Kiner on the high end to Sievers on the low. My MLEs would use take this average and then bump it up slightly based on Seamheads data showing that Johnson played pretty well in 9 games at age 21. I think 14 WAR from age 21 to 26 is about as high as I would be comfortable going.
Dr C's MLEs for these seasons (again, there is hardly any actual data) show 2430 PA and 28.1 WAR, including a career high 9.1 WAR at age 26. While the example of Kiner tells us that this level of early career performance is not impossible, the data for the other four comparables surely suggests that the number is on the high side given our level of ignorance.
For the years older than 33, I note that Johnson's play in his age 33 season was near replacement level according to Seamheads (his non-MLE WAR was 0.2, of which his offense was +1.6 wins, his defense was -1.4 wins; he also had -0.3 positional adjustment and +0.2 for pitching 3 shutout innings). Limited statistics for ages 35 to 37 tell us that he was never used for more than a few games a season was probably below replacement level when he did play. I think the reasonable conclusion is that Johnson would not have kept a major league job after a year like that, so my MLEs would show his career ending after his age 33 season. Dr C's show him earning another 7 WAR from ages 34 to 37.
So my method suggests that a reasonable MLE for Johnson would be about 14 + 26 = 40 career WAR, which would include a couple of really good (6 to 7 WAR) seasons at ages 27 and 28. Dr C's career MLE is 74.2 WAR with 3 seasons of more than 8 WAR.
Have my back-of-envelope estimates led me astray? Or are the differences indicative of a problem in Dr C's methodology?
28. cookiedabookie
Posted: January 05, 2022 at 03:01 AM (#6059693)
Started preparing for next year, and did another deep dive on the NeL data from Dr C. Using his numbers, with various amount of reductions for uncertainty's sake, it looks like Heavy will be in an elect me spot on my ballot next year. I'd definitely like a bit more explanation on the lack of penalty for defense. I may have to think about how to bake that into my evaluations
29. DL from MN
Posted: January 05, 2022 at 10:03 AM (#6059702)
Good work, Brent.
I am not impressed with Heavy Johnson serving as the catcher for the Wreckers. It's not like they had dozens of other options to pick from. Anyone on the roster had to be enlisted in the 25th infantry of the Army. They're going to pick from the few people who can catch the ball when Bullet Rogan throws it. I wouldn't give MLE credit for those seasons as a catcher.
30. cookiedabookie
Posted: January 05, 2022 at 05:00 PM (#6059786)
So looking at pre-integration outfielders on fangraphs, it looks like Sam Crawford is the worst, he lost about 14-15 wins from defense/position. If we assume Heavy would have been the worst outfielder defensively at the time, you could give him a -15 or even a -20 on Dr. C's projection, leaving him at about 54-59 wins. If you think he'd be in the top ten, it would be about -10 wins, leaving him in the 64 win range. And if you think he'd be in the middle of the top ten worst, that's about -12 wins. If I give him the -15, he drops down from an elect me spot into the 40s and off my ballot. At -10, he'd fall to around 20th and just off my ballot. Curious what everyone thinks about my thought process here
31. kcgard2
Posted: January 05, 2022 at 05:15 PM (#6059795)
Brent, thanks for the dive into your analysis. I think Dr. C's MLEs are diverging from your estimates mainly due to the way small seasonal samples are handled, using surrounding season performance to fill in. Well, almost all the data is from Johnson's peak years with a great bat (I think), so what gets filled in to surrounding smaller sample seasons is that peak performance slightly downweighted. I think Dr C's MLE would propagate better if the WAR baseline from top seasons had the poor fielding baked in, so if a top season had -1 WAR from fielding baked in instead of the unlikely value of average, that propagates to near seasons as a slightly lesser performance, and the MLE comes out more like 62 WAR than 75 (numbers for illustration). Might also propagate better with more refined aging pattern estimates, but now it's starting to be a lot to ask of one person, in fairness.
32. Brent
Posted: January 05, 2022 at 10:40 PM (#6059858)
Jaack - I know Dr C pointed to Heilmann as a comp for Johnson, but I don't see it (unless you think the discount for quality of play should be much smaller than has been traditionally assumed by this project). Heilmann's OPS+ from ages 27 to 33 was 162, which is just a lot better than Johnson's equivalent 144 that I calculated above. I'm pretty confident that Heilmann was a better hitter than Johnson. (Which is not a knock on Johnson's hitting ability - while he's well below Charleston, Johnson is in the mix for #2 NeLg hitter of that era, along with Torriente, Stearnes, Beckwith, Wilson, and Moore.)
DL - I'm not sure how to evaluate Johnson as a catcher. AFAIK, he was a catcher during all of his 6 years with the Wreckers, then caught 16 games with the Monarchs in 1922 before switching to the OF. I assume that if he was a really good catcher he might have stayed there. The Monarchs' rivals at the time, the Chicago American Giants, were famous for playing small ball, bunting and stealing, so a poor defensive catcher wasn't going to play for them. And the Monarchs had Frank Duncan, one of the best. But there's nothing in the record to indicate that Johnson was a poor catcher.
Cookie - On Johnson's page at Seamheads, there's a little graphic showing his percentile rankings in various batting and fielding categories relative to other players in the database. It says he's in the 6th percentile as a left fielder and in the 17th percentile as a right fielder. To me, that's evidence that he was pretty bad. My impression is that before WWII, major league teams wouldn't let a guy like Adam Dunn, Frank Howard, or Jeff Burroughs play in the outfield, with maybe rare exceptions. To me, the defensive data from Seamheads suggest that Johnson may have approached that territory, which probably would have made him worse than any pre-war major league outfielder with a long career.
kcgard2 - I agree that Dr C's MLEs seem to be greatly influenced by Johnson's two peak seasons in 1922 and '23. On the other hand, I don't think it's reasonable for any statistical extrapolation model to project a 9 WAR season, as Dr C has projected for 1921, even if the adjacent seasons were as good Dr C assumes. Every statistical projection system I'm familiar with includes a regression or dampening component, so it just doesn't seem reasonable to me for his system to project a peak higher than anything in the player's experience.
Dr C - It appears I had missed reading your discussion of Johnson's fielding statistics in post 356 of the 2022 ballot discussion thread. While that post clarifies things somewhat, I'd like to ask why you apparently didn't use his left field statistics in evaluating his defense. Seamheads currently shows Johnson's range as -13 runs in 257 games played in right field, and -26 runs in 189 games played in left field. If I understand your comment correctly, you only used the right field numbers. For players who split time between the corners, I've usually pooled the two positions, thinking the benefits of a larger sample size more than offsets any biases from pooling the two positions. And in this case, adding left field would affect the assumptions that feed into the MLEs.
33. Dr. Chaleeko
Posted: January 06, 2022 at 04:13 PM (#6059999)
Brent,
These are all great questions and points. I'm glad when people question results because it helps me get better at this stuff.
I want to start with Johnson's fielding. First off, I did find a computational error in double checking, so I'm really glad you queried. Parts of this process do require manual work, and I ain't perfect. I also changed his age 28-30 to LF to better align with where he played in real life. Those changes combined reduced his Rfield to -21 (and WAR to 71.7). Incidentally, the figure -12.9 DRA RF DRA has been bandied about. That's not the number I use. I only use figures (for all aspects of the game) from regular season games (of course, the regular season in Cuba is in the NeL's offseason, but I count Cuba as a "regular season" for this purpose). No exhibitions, East-West games, or postseason games. Heavy has -8.8 DRA in 233 RF games by this reckoning.
Now, a big area of confusion with his fielding, IMO, is between DRA and Rfield. A couple of years ago, I took the top 50 RFs in the Negro Leagues Database by G in RF, and I found the STDEV of their DRA/154 in RF (counting only league games). The STDEV was 14.1 runs. I then took the top 50 MLBs by G in RF through the same era, and I found the STDEV of their Rfield in RF. It was 3.01. That's a huge gap! In fact, there's a huge gap at all positions. (The widest gap is CF: 17.0 vs 3.1.) When you look at individual bad corner OFs in MLB history for a pattern between DRA and Rfield, there's big gaps, but they fly in both directions. In the larger sample, however, the STDEVs move dramatically lower from DRA to Rfield, and I think it's more defensible to use the larger sample than to try to peg a player to specific comps. I present the MLEs such that they look like what you'd find at BBREF to keep them familiar and consistent with MLB players. So I take the career DRA/154 (adjusted for sample size as noted in post 356 in the 2022 discussion thread), and I multiply it times the ratio of MLB Rfield STDEV to NeL DRA STDEV. For Heavy in RF that looks like this:
RATE PER 154: -8.8 DRA / 233 G in RF * 154 = -5.8 DRA per 154
SAMPLE ADJUSTMENT: -5.8 DRA/154 * (233 G in RF / 308) = -4.4 DRA / 154
RFIELD CONVERSION: -4.4 DRA/154 * (3.0/14.1) = -0.9 Rfield/154
CAREER RFIELD: -0.9 Rfield/154 * 1308 G in RF = -8.0 career Rfield
Repeating this with his catching stats for ages 21-22, he gets 0.4 Rfield; repeating it for LF ages 28-30 he gets -13.9 Rfield.
Couple things to note:
-Someone mentioned his catching upthread. I just rolled with it because that's what he did.
-Jaack mentioned the Wreckers data. I'm not sure how helpful it would actually be. We'd be talking very small numbers of games, and there's no "league" to work with, so we can't really compare a player effectively to the other players.
-We might want to wonder about 1926. That year, Heavy played 44 games in LF and had -12.3 DRA. Obviously that's terrible. But it's a little bit difficult to take at face value because it's just 44 games. If a guy hit .400 in 44 games we probably would say it's not very representative if it is out of line with the rest of his career. Heavy wasn't a good fielder, but that's roughy double the badness of his other 90 games in LF. Small samples.
-As to the question regarding combining RF and LF. I do not currently do that. TBH, it wasn't something I'd thought about because I tend to treat each position discretely. There are good reasons for doing so, IMO. First is that for a long time in the early 20th Century, LF was a more important defensive position than it later became. LF putouts are similar to (and IIRC in a couple cases higher than) CF putouts during that time. During that same period, RF was still evolving away from a place to hide pitchers and catchers who could hit when they weren't in the battery (Caruthers, King Kelly, Buck Ewing) or to stash your worst fielder. But at some point RF became a more skilled and often more athletic position than LF. To the point that in the 1970s, you've got Dewey Evans in RF and Luzinski in left or in the 2010s you've got Mookie Betts or Shane Victorino in RF (I know, Fenway) and Adam Dunn in LF. Lou Brock was a guy who was stashed in LF because he was a spotty flycatcher with a scattershot arm and didn't have the tools to play RF despite his athleticism. I know a lot of folks like to just lump RF and LF together like there's no difference. I've just never felt like that's necessarily true. However, that query led me to look again at his LF/RF playing time and recast his age 28-30 seasons, which is good.
I'll talk about 1921 in another post.
34. DL from MN
Posted: January 06, 2022 at 04:50 PM (#6060013)
In the Negro Leagues (with the smaller rosters) RF was often where you hid your other pitcher who could hit a little (or the player with a bad hangover :P)
35. Dr. Chaleeko
Posted: January 06, 2022 at 05:05 PM (#6060022)
Re Johnson's 1921. Here's how this calculation is working.
We have no data for 1921. I try build a sample of >= 200 PA for each season in a player's career. I do it in four possible stages, in this order.
A) If the player has 200 PA in season n I move forward.
B) If the player has <200 PA, I use a combination of season n and seasons n+1 and n-1 to build up 200 PA
C) If the sample is still <200 PA, I add seasons n+2 and n-2 but at 60%
D) If the sample is STILL <200 PA, I add the player's career to the sample.
Also, I calculate my own wOBAs and set them to a common .330 league-average baseline so that all seasons are apples-to-apples. Or close.
Here's the situation with Johnson:
1920: 10 PA, .377 wOBA, 0.73 z-score
1921: No data
1922: 263 PA, .488 wOBA, 2.16 z-score
I combine the seasons like this (there may be some rounding error/sig dig stuff here):
10 * 0.733 = 7.333
+ 263 * 2.165 = 569.286
= 576.619
/ 273 = 2.112 STDEV
The NL in 1921, excluding pitchers, had a STDEV of 0.109 points of wOBA and a mean wOBA of .302. Therefore,
2.112 * .109 = .230 pts of wOBA
.230 pts of wOBA + .302 mean = .532 wOBA (w/ sig digs it's .556)
That turns into 0.345 wRC/PA, which is then adjusted for league quality (0.80) to get 0.276 wRC/PA and a wOBA of .4865. The league had 4.26 PA/lineup spot in 1921, and given his durability characteristics, that translates to 579 PA. Therefore, 579 * .4865 wOBA = 78.663 batting runs. Because the league leader in batting runs had 76 that season in the NL, I lower it to 76. On the numbers side, it projects to .361/.419/.588, a 172 OPS+. That's how it gets to 9.1 WAR. The NL scored less often than it would a year later when Heavy's MLE basically repeats the season, which is why he has a higher WAR in 1921 than 1922.
That's the guts of it. In terms of the outcome itself, Johnson is an edge case for all the reasons we've discussed in this thread. Does the MLE overstate his case? It certainly could. And I'm completely sympathetic to the point that 1921 is a whole-cloth estimation. In reality it has to be. We can't have a zero there, and no one can say with anything approaching certainty that it "should" or "should not" be at a certain level. It's all a question of approach, and both the "just use his career rates" and the method I've chosen have downsides. I just dislike the downsides of the method I've chosen less than other methods (many of which I've tried!). It's also worth noting here that MLEs are generally conservative because they rely so much on central tendency. MLEs giveth and taketh away.
What I can say is that I'm using the same set of principles and techniques on every single hitter (or pitcher) and trying to make as few decisions as possible because I want to avoid introducing too much of myself into each player's results. I think internal consistency matters tremendously with this work, and completely bespoke MLEs (as someone outside this group has suggested to me) would introduce more problems than they solve. But, as always, someone else's mileage may vary!
36. Dr. Chaleeko
Posted: January 06, 2022 at 05:08 PM (#6060024)
For those interested in what Johnson's corrected numbers look like (see a couple posts back), here the are:
37. Chris Cobb
Posted: January 06, 2022 at 07:29 PM (#6060067)
Dr. C, here's a question regarding fielding standard deviations, DRA vs. rfield, and apologies if your responses above have already addressed this and I just didn't pick up on it:
Do you have the data to compare DRA's standard deviations for NL and AL players during the NeL period to rField's SDs? If so, how do the NL/AL fielding SDs in DRA compare to the NeL SDs?
It seems to me possible that the differences in SDs between the NeL DRA numbers and the NL/AL rfield numbers could result partly from differences in distribution of values within the sets of players, in addition to differences between the fielding measurement systems. In other words, if there's a lot more variability in fielding skills in the NeL than in the NL/AL, the NeL fielding numbers would have a larger SD than the NL/AL numbers in any measurement system, and that range of variation shouldn't be entirely smoothed out in the construction of MLEs. Since different measurement systems with incommensurate inherent variance are being used in the two different contexts, however, the only way to compare NeL SD to NL/AL SD would be to calculate the SD for the NL/AL in DRA. Does that make sense?
If the DRA SDs for the NL/AL 1920-48 are similar to the DRA SDs for the NeL, 1920-48, then that would suggest that Heavy Johnson's fielding values need to be very heavily smoothed out to bring him into the range of values produced by the rfield system. However, if the DRA SDs for the NL/AL are smaller than the DRA SDs for the NeL, then that would suggest that a greater variance in NeL fielding numbers would need to be retained to reflect the fact that the NeL conditions allowed for a wider range of fielding skills than occurred in the contemporary NL/AL.
It may well be that the DRA data that would be needed to test this case aren't available, but there might be other ways to get at the issue through looking comparatively at SDs in batting and/or pitching values? Maybe you have already done this?
Thanks for any illumination you can bring to this question!
38. Brent
Posted: January 07, 2022 at 12:50 AM (#6060121)
Dr C,
You've documented that the standard deviation of DRA is much larger than the standard deviation of Rfield in the major leagues. I can think of three reasons, each of which probably contribute:
1. Smaller sample size in the NeLg should contribute to larger standard deviations. We can estimate that effect using the standard square-root of n formula. For example, if NeLg seasons during the 1920s were half as long as major league seasons, sample size would explain a 41% increase in the standard deviation.
2. Differences in the standard deviations of DRA and Rfield due to the difference in the two estimation methods. As Chris has suggested, if we have DRA data for major leagues (or Rfield estimates for NeLgs) we could figure out how much is due to the formula. My recollection is that MLB DRA data used to be available from the now defunct Baseball Gauge site. Are the data still available anywhere else?
3. As Chris has also suggested, differences in the true underlying distribution of fielding ability. I would argue that this is probably the case. First, we know that there were a lot more errors recorded in NeLg games. For example, the league fielding percentage for the 1922 NNL was .953, compared to .967 for the NL. When there are more errors in a league, I think it's likely that there is more variation. Here are some other quick numbers taken at the team level:
1922 / SD of DER / SD of Fld%
NL / .013 / .003
NNL / .030 / .007
These differences are partly due to the NNL teams playing fewer recorded games, but sample size can't explain the entire difference.
Second, I think that NeLg defense was more widely dispersed because their teams were picking up a much wider range of talent. Here's how I think about it. Suppose the leagues in the 1920s started out integrated, and there were 22 teams in the majors, 22 in the highest minor lg level (nowadays called AAA), and 22 in the second highest level (AA). Then they tell the teams they have to segregate, and they split the players into 16 White teams and 6 Black teams at each level. The White teams just go ahead play at the same three levels, but the Black teams now have to split up and mix, so each team now contains major league, AAA and AA players. That gives you a league with an average quality of play about equivalent to AAA, as we've assumed for the quality of play adjustments, but 1/3 of the players are major league quality (including many top stars), 1/3 are AAA quality, and 1/3 are AA quality. That new NeLg is going to have a much wider dispersion of fielding ability than the major leagues. Also, they don't have the coaches and time spent in spring training that the MLB teams have, so it seems inevitable that there would be more dispersion in fielding (as well as hitting and pitching). I think the best fielders (Lundy at SS, Charleston in his prime in center, etc.) were every bit as good as the best major league fielders, but I think the worst NeLg fielders were probably worse than the worst MLB fielders.
Regarding Johnson's 1921 season, if I understand your methodology, the projection is essentially based 96% on his 1922 season and 4% on his 1920 season with no regression. Which I guess is why it ends up looking quite a bit like his 1922 season. I think if I were approaching the problem of projecting a season for which there are no actual data, I would start with something like Tangotiger's Marcel method (though in reverse, since you have data for the seasons after 1921 rather than before). In other words, a weight of 5 for 1922, 4 for 1923, and 3 for 1924, along with a regression factor. I also might throw in 1920 with a weight of 5, though that sample is so small it shouldn't make a difference. And I'd regress toward the major league equivalent average rather than toward the NeLg average, since we're taking it as given that these players are all major league quality. I think that approach would still project a strong season for Johnson in 1921 because his 1922 and 1923 seasons were both so good, but not as strong as what your current method gives you. The current projection seems more like an upper limit than an unbiased projection.
39. Brent
Posted: January 07, 2022 at 01:30 AM (#6060122)
For projecting WAR, tango suggested an even simpler system called "WAR Marcels or WARcels" (google it). It's just 60% of year T, 30% of year T-1, and 10% of year T-2, then multiply by 80% for the regression. Then there's an aging effect added or subtracted (depending on whether the player is approaching or going away from age 30), which is 0.1 times the number of years from 30.
Using your WAR numbers for 1922 to 24, here's what that method would give for Johnson's 1921 season:
5.6 WAR = 0.8*(0.6*8.6 + 0.3*5.1 + 0.1*7.7) - 0.4
To me, that seems like a more reasonable projection conditional on your estimates for 1922-24.
40. Dr. Chaleeko
Posted: January 08, 2022 at 10:42 AM (#6060236)
Good stuff, gents, thanks! Fortunately I have a lot of data for DRA for players in the Negro Leagues era, so I'm assembling that now and will report back when I have some results.
Brent, I'm wondering about something re WARcels. My MLEs include a full range of values beyond WAR (and including trad stats derived from the WAR inputs). I already have fielding, running, DP, positional, and replacement WAR inputs, it's really only hitting that I'm missing. Do you think it would be better to either:
a) Figure it the WARcel way you mentioned then algebraically solve for X where X = batting runs?
b) Use a Marcel-type calculation t figure the batting runs then calculate WAR the usual way?
Thanks!
41. Brent
Posted: January 08, 2022 at 02:49 PM (#6060247)
Dr. C, I look forward to seeing your work on DRA.
Regarding WARcel vs. Marcel, I'm not aware of any research showing that one approach is better than the other. Since you have the data to do it both ways, why not do them both? Then use the simple WARcel calculation as a check on the more detailed Marcel approach. The thing I like about both Marcel and WARcel is that they are designed to be simple and don't make a lot of assumptions.
42. Dr. Chaleeko
Posted: January 11, 2022 at 04:33 PM (#6060663)
OK, I have some data and a little analysis to report re DRA/Rfield.
Here's what I gathered:
NeLs
-Current top fifty at each position by innings via the SH query machine
-I noted their defensive games at a position and their DRA
-For OFs, I only use their Range runs, not their arm runs (DRA's arm runs isn't good)
-The lowest number of career games at any position was 85
-Some players appear at multiple positions
-328 individual players
MLBs
-All players in my rankings (about 120-150 per position)
-Only seasons prior to 1965
-I have them season by season in my db, so I noted G, Rfield (minus Rof), and DRA range for each season at their primary position that season, which means that some players appear at multiple positions
-For consistency with the NeL sample, I removed all samples below 80 games at a position
-574 individual players
There's some minor differences in the samples, especially because for MLB guys I may not have a career's worth of information because I looked at it season-by-season but the NeL are total career values.
Here's the analytical stuff. Quick summary of the following tables:
-DRA values for MLBs were quite a bit higher than for NeLers on a per 154 basis
-DRA values were less widely dispersed for MLBs than for NeLers, except at catcher, on a per 154 basis
-Rfield values were quite a bit lower than DRA for MLBs on a per 154 basis
From Chris’ post 37:
“If the DRA SDs for the NL/AL 1920-48 are similar to the DRA SDs for the NeL, 1920-48, then that would suggest that Heavy Johnson's fielding values need to be very heavily smoothed out to bring him into the range of values produced by the rfield system. However, if the DRA SDs for the NL/AL are smaller than the DRA SDs for the NeL, then that would suggest that a greater variance in NeL fielding numbers would need to be retained to reflect the fact that the NeL conditions allowed for a wider range of fielding skills than occurred in the contemporary NL/AL.”
Looks like we need to retain some of the variance Chris mentions. Except going the other way for catcher.
From Brent in post 38:
“Smaller sample size in the NeLg should contribute to larger standard deviations. We can estimate that effect using the standard square-root of n formula.”
A little help, please. Which STDEV should I be applying this to? I think you’re suggesting this but correct me, please, if I’ve misunderstood you:
STDEV of DRA/154 / SQRT(Number of players in sample)
“As Chris has suggested, if we have DRA data for major leagues (or Rfield estimates for NeLgs) we could figure out how much is due to the formula.”
Glad to do this, but I’ll need a suggestion for the calculation, please.
43. Brent
Posted: January 12, 2022 at 12:15 AM (#6060737)
Dr C,
This is really nice work. Your results seems to confirm my guess that there was more dispersion of fielding ability in the NeLgs than in MLB, with the exception of catchers.
I was drafting a long comment on how to adjust standard deviations for sample size, then decided what I was proposing would be too complicated or not feasible (Hint: it involved calculating DRA from retrosheet data). But because you've limited your sample to players with at least 80 games at a position, the effects of differences between the leagues in sample sizes probably shouldn't be too large. I guess one simple way to check the effect of sample size would be for the MLB players with a lot more playing time than the NeLg players, throw out a few of the players' seasons at random to bring their sample size in line with what's in the NeLg sample, then see how much difference it makes to the variation. My guess is it wouldn't make enough difference to change the basic picture you've presented.
Again, my compliments on a nice little study.
44. Chris Cobb
Posted: January 12, 2022 at 01:30 PM (#6060801)
Dr. C,
I should add my thanks also. I can reason about statistics, but I can't do sophisticated statistical analysis. I am grateful that you are willing and able to do this work!
I am not understanding what the lefthand column in the second table shows, so if you have a moment to explain that data further, I'd greatly appreciate it!
45. Dr. Chaleeko
Posted: January 12, 2022 at 05:38 PM (#6060841)
Chris,
The lefthand column in the second table got a little garbled. Sorry about that. It's simply the ratio of the means of mlbDRA/154 to nelDRA/154 at each position. At catcher, for example, the mean mlbDRA/154 is 3.2 times larger than the mean of nelDRA/154. The average at the bottom of the column indicates that taken together, the positional means average 3.1 times more mlbDRA per year than nglDRA.
Or to put it more simply, MLBs had DRA rates about 3 times as high as NeLs did, on average.
46. Brent
Posted: January 14, 2022 at 01:04 AM (#6061018)
Dr. C,
With these numbers calculated, there is the interesting question of how you might use them to calculate or adjust the fielding records of NeLg players for MLEs. It is not entirely obvious what should be done.
First, I'll recommend that you not pay too much attention to the differences in the "average" numbers in your tables. We know that DRA is designed to have a mean of zero for all players at a position, so what the averages show is that the players in your dataset (presumably good players with fairly longer careers) are typically slightly better fielders than the average player in their league and at their position. That's not surprising and not especially helpful for comparing the NeLg with MLB. I will focus more on the differences in standard deviations.
In your MLEs, I understand that you're planning to convert the DRA-based data to match the distribution for Rfield. I won't debate that decision, but for the rest of this post will ignore the conversion from DRA to Rfield and focus on the problems we face in simply comparing the two distributions of DRA data. I'll ask how we might make the NeLg DRA distribution equivalent to the MLB DRA distribution.
Focusing on the differences in standard deviation, I suggest three stories that one might tell:
Story 1. The higher standard deviations of the NeLgs are simply an artifact of the playing conditions they faced. They played fewer recorded games per season; there was more variation in the quality of the fields they played on, in the the quality of the umpiring, and in the quality of the equipment they used. These environmental factors could explain the higher variation. For this story, the solution is to just adjust (or standardize) the NeLg data to have the same means and standard deviations as the MLB data for each position and use the standardized data in the MLEs.
Story 2. The higher standard deviations in the NeLgs reflect more dispersion in actual fielding ability relative to MLB, but there is no reason to think that NeLg players were on average any different than MLB players. In this story, you'd just treat the NeLg fielding data as equivalent to MLB fielding data and not make any conversion. Under this assumption, I'll note that the wider dispersion of the NeLg distributions implies that its upper tail would show the best NeLg fielders playing better than the best MLB fielders. This seems contrary to what we've found in the MLEs for offense--that the best NeLg hitters (Gibson, Charleston) were similar to, but not better than, the best MLB hitters.
Story 3. This is the story I sketched in #38 above, in which the higher standard deviations in the NeLgs reflect a lower average quality of fielding, because NELg rosters encompassed many players who would not have been able to play in MLB along with many others who were definitely MLB-quality players. While this judgment may seem harsh, it is in line with what all of our MLEs have been telling us about offense--that the average quality of play in the NeLgs was lower than in MLB, so the raw NeLg statistics need to be adjusted downward to make them equivalent. The problem is, how do we know how much to adjust them?
For offense, Chris Cobb was able to look at batting statistics for players who appeared in both the NeLgs of the 1940s and in MLB during the early years of integration. I think it would be difficult to apply this approach to fielding statistics. You'd have to look at multiple positions. Players often played different positions in the NeLgs than they did in MLB. (JRobinson was famously a shortstop for the Monarchs; Doby was a second baseman for the Newark Eagles.) When you do find matches, sample sizes tend to be small, and I will guess that fielding statistics require larger samples to stabilize than batting statistics (though I don't know that for a fact). Sometimes players spent two or three years in the minors between their last NeLg appearance and starting MLB play, which raises the possibility that aging might affect the comparisons. So, I'm not optimistic about being able to do a conclusive study on this subject, though it would be interesting to see how far the data might take us.
So, to the extent we believe Story 3 to be the most accurate one, I will suggest an ad hoc method for adjusting NeLg DRAs to MLB equivalents. For RF, your data show the NeLgers with a mean per 154 games of 1.4 and a stdev of 13.1. For the MLBers, the mean is 3.1 and the stdev is 9.4. Let's say we think the top 20% of NeLgers were about as good as the top 20% of MLBers. If we use a normal distribution with mean 3.1 and stdev 9.4, the 80th percentile of the MLB distribution is 11.0 [in Excel, use norm.inv(0.8,3.1,9.4)]. For the NeLg players, the 80th percentile calculated the same way is 12.4. If we keep the same standard deviation and want the 80th percentile to be the same, we need to subtract 1.4 from the DRA for all of the NeLg players as a "quality adjustment" for the lower level of play. The adjusted DRA data would then be treated as equivalent to MLB DRA data, and no further standardization would be made.
A note--rather than assuming a normal distribution, I think it would be preferable to calculate the 80th percentiles from your actual data. We don't know that the data are normally distributed, so it's probably better not to assume it when we don't have too.
This method for Story 3 is admittedly ad hoc. You may ask, why the 80th percentile, and not the 70th or 90th? I don't have a good reason; it just seemed reasonable to me to try to keep the top end of the talent distribution aligned, and quintiles are pretty commonly used for describing data distributions. Unlike batting statistics, where we can make multiplicative adjustments to rate statistics like batting average, wOBA, and and OPS, the fielding statistics are centered around zero, so any quality adjustment is likely to require a subtraction. This just seemed like a plausible way to come up with such an adjustment and would only slightly change the picture from the available NeLg DRA data.
How do we decide which of the three stories to believe? I don't think Story 2 (larger std dev in the NeLg, but no quality adjustment) makes sense, so I would discard it. I think that both Stories 1 and 3 do make sense. What I would probably do if I were constructing MLEs myself is some mixture of the two--maybe calculate MLE fielding statistics with the standardization as in Story 1, then with the quality adjustment as in Story 3, then take an average of the two figures. But you can use your judgment as to which story seems most plausible to you.
47. Dr. Chaleeko
Posted: January 14, 2022 at 10:51 AM (#6061044)
Thanks, Brent! Truly helpful. Completely agree that 2 is out. I think u r right that a combo of 1 and 3 is in order. I’ll do it 20% for 1 and 80% for 3. I will report back after some testing.
48. Chris Cobb
Posted: January 14, 2022 at 10:58 AM (#6061047)
Brent, those seem like reasonable stories to tell, and I can see the rationale for the adjustment method you propose. I'm curious about the implications of this approach for below-average fielding scores. In dealing with doing MLEs for batting values, one has the luxury(?) of envisioning a system that needs to be concerned primarily with positive values and with the high end of achieved values within the context of the Negro Leagues. The players for whom MLEs are needed are not going to show positive and high-end fielding values with nearly the same consistency.
Would a process of straight subtraction as the quality of play adustment and then norming the DRA range of values to the Rfield range of values work as appropriately and fairly for below-average fielding as for above-average fielding? I don't immediately see that it wouldn't, but I think it's worth asking the question explicitly. Will a conversion rate based on one end of the fielding spectrum work fairly across the whole fielding spectrum?
49. kcgard2
Posted: January 14, 2022 at 06:07 PM (#6061117)
Yeah I'm not sure I agree with it either. Hitting and fielding ability are, if anything, actually negatively correlated with each other.
50. Brent
Posted: January 14, 2022 at 08:22 PM (#6061132)
Chris,
I don't think I was suggesting that Dr C skip the step of norming the DRA range of values to the Rfield range. I was looking at the more basic issue of how to translate NeLg DRA values to an MLB DRA context. I was assuming that the next step would be to adjust those MLE DRA numbers to a range appropriate for Rfield. (I'll mention, however, that after reading Wizardry, my own opinion is that it might be better to substitute DRA for Rfield for the MLB players, which I guess is what the gWAR numbers from Baseball Gauge did.)
Will a conversion rate based on one end of the fielding spectrum work fairly across the whole fielding spectrum? I'm not quite sure how we measure "fair," but my proposal would just be subtracting a relatively small number (1.4 runs per 154 games) from the DRA fielding records of all NeLg players. And if you re-norm those numbers to Rfield, it would wind up being an even smaller adjustment, probably less than a run a year. I don't think that should cause too much distortion.
One player we elected before the Seamheads fielding data became available who might have been adversely affected by those data is Cool Papa Bell. His batting record was always pretty marginal, but I think many of us voted for him assuming that his legendary speed must have resulted in outstanding defensive play. But the DRA data now available suggest that after about age 25 he was only an average outfielder. If I had those elections to do over again, I don't think he'd make my ballot.
51. Howie Menckel
Posted: January 14, 2022 at 11:58 PM (#6061147)
Dr. C,
I should add my thanks also. I can reason about statistics, but I can't do sophisticated statistical analysis. I am grateful that you are willing and able to do this work!
hell, I can't even do that, Chris!
all I can contribute is that while I am not a "touchy/feely" fellow by nature, the fair analysis - back and forth - on any Negro League candidate that continues to this day, well, I find that very gratifying.
Not many may remember it, but in the dawn of the HOM creation discussion, there were some who thought there was no realistic way to compare white players with Negro Leaguers, so.....
Mind you, I never sensed even a hint of racism in that sentiment. it was absolutely NOT that. it struck me as data-driven challenges, and how to possibly balance the merits of players who didn't face each other except in exhibition games (which I always have found quite relevant, but I digress).
but we were able to prevail - for me, personally and for others, it was specifically on the idea of "wait, are we talking about segregating these great baseball players AGAIN?" hell no.
so we gamely soldiered on, having to do our best with what incomplete data we had, and rate the 15 best per year as best we could.
now the information available has improved greatly, and we can talk about an HOM electee who got in more than 15 years ago in real time and say, maybe it was a mistake - and if so, then so be it.
far more importantly, finally it's an 'even playing field' that these greats never got when they were alive.
so fellows, keep up the good work !
52. Dr. Chaleeko
Posted: January 22, 2022 at 06:27 PM (#6062028)
Brent, just wanted to check in on this one to be sure I was barking up the correct tree.
[strong]DRA/154 QUINTILES BY POSITION[/strong] [strong]MLB[/strong] QUIN* C 1B 2B 3B SS LF CF RF 1 6.3 6.5 12.8 8.0 11.1 8.3 10.5 10.9 2 2.5 2.3 3.8 4.4 7.7 4.1 2.6 5.2 3 -0.3 -0.9 -0.4 0.2 2.6 -0.2 -0.3 0.4 4 -4.4 -4.2 -5.0 -5.3 -4.4 -4.9 -6.1 -4.1 I often label percentiles in the wrong order. These labels mean that 80% of values fall beneath the first quintile.
Brent, based on this information, and with the caveat that I don’t think the data is normally distributed because we’re only dealing with better players, I THINK you’re either recommending one of two approaches:
First
Player’s NeL DRA - (NeL 80th percentile – MLB 80th percentile)
For Heavy that means:
-6.2 DRA/154 – 0.9 = -7.1 DRA/154
Second
(Player’s NeL z-score * MLB STDEV )+ MLB mean – (NeL 80th percentile – MLB 80th percentile)
In Johnson’s case, using the figures in post 42:
(-6.2 DRA /154 – 1.4 NeL RF mean) / 13.1 STDEV = -0.6 z-score
(-0.6 z-score * 0.5 MLB RF STDEV) + 3.1 MLB RF mean – (10 DRA/154 – 10.9 DRA/154)
reduces to
-0.3 + 3.1 – 0.9 = 1.9 DRA/154
Since the second version isn’t giving us a very intuitive result, I’m guessing that’s not what you were recommending, and that the first is. But I wanted to be sure before I moved any further.
Thanks!
53. Brent
Posted: January 24, 2022 at 10:07 PM (#6062340)
Dr C,
The first approach looks right--yes, that's what I was expecting.
In the second one, the math looks right, but the result doesn't seem to make sense. I think the reason is that, if I understand how bb-ref calculates rfield, it should average to zero for each position. But the average for the RFs in your sample is +3.1, which means that they are notably better than average. My guess is that is due to your sample consisting of long career players who tended to be better than average defenders. But it results in Johnson being assigned an MLB equivalent rfield score of +1.9, which doesn't seem right for someone who is below average in the NeLg statistics. I guess one possible way to "fix" the numbers so it wouldn't do that would be to leave out the adjustment to the mean when calculating the z-scores (that is, -6.2DRA/154 / 13.1 STDEV, assuming the "correct" mean is zero, and also leave out the +3.1 for the MLB term). It would be kind of unusual, but maybe we could justify it by saying that we think the true means should be zero.
I don't know; what do you think?
54. Dr. Chaleeko
Posted: January 25, 2022 at 02:34 PM (#6062441)
Brent, I think the first version makes more sense and is more defensible. I’m going to do some further testing on players from various positions and will report back to the group.
55. James Newburg
Posted: January 25, 2022 at 03:51 PM (#6062452)
Dr C/Brent,
I've been following this discussion with interest since Heavy Johnson's case hinges on his fielding (if one buys Eric's MLE for him).
It looks like there's a calculation error in #52. It should be -0.9 because the 80th percentile DRA is lower among NeL RF than AL/NL RF, resulting in -5.3 DRA/154 for Johnson.
56. James Newburg
Posted: January 25, 2022 at 04:17 PM (#6062455)
Making Johnson a better fielder in translation seems hinky, though. Dr C, the following idea just occurred to me:
You might get more intuitive results by using the 80th percentile's distance from the mean in each sample. In this case, you'd subtract 0.8 from Johnson's DRA/154 (RF 80th percentile: 8.6 from NeL mean (10.0 - 1.4); 7.8 from AL/NL mean (10.9 - 3.1)).
But I was also thinking you can drop the worst fielders from your NeL sample so that the NeL and AL/NL samples are equal in average DRA/154 at each position. Then, you would separate the truncated NeL data into quintiles and compare the truncated 80th percentile of NeL DRA/154 to the 80th percentile of your AL/NL sample.
57. kcgard2
Posted: January 25, 2022 at 04:24 PM (#6062456)
James, you're right, according to what is written in words. But how should it be the case that NeL 80th percentile fielding as a group being worse than MLB results in Johnson's DRA conversion being better? It doesn't make sense. I think Dr. C should have written (MLB 80th - NeL 80th). But it's hard for me to follow. If you go back to Brent's lengthier explanation, he gives this exact scenario. The NeL 80th percentile value is higher than MLB 80th percentile in his example, and he concludes that you have to downgrade NeL value by the difference.
It points to Dr. writing the wrong words but doing the right math, all taken together.
58. kcgard2
Posted: January 25, 2022 at 04:28 PM (#6062459)
But I was also thinking you can drop the worst fielders from your NeL sample so that the NeL and AL/NL samples are equal in average DRA/154 at each position. Then, you would separate the truncated NeL data into quintiles and compare the truncated 80th percentile of NeL DRA/154 to the 80th percentile of your AL/NL sample.
Why? What would this approach accomplish? It would show you how much more or less dispersion is present in a weirdly truncated sample of NeL fielding values compared to a non-truncated MLB sample. Which gets us...not anywhere useful that I can discern. Am I missing something?
59. Dr. Chaleeko
Posted: January 25, 2022 at 06:02 PM (#6062473)
Re #57…wouldn’t be the first time I’d gotten the words wrong and the math right. And vise verse.
I also forgot to mention, however, that I still feel it’s important to regress fielding rates toward zero/mean when we don’t have much fielding data. Johnson had 262 games in RF, so my approach would be to regress about 15% toward zero. I get 15% because I favor a minimum 308 game sample (two full years of fielding data). Then I would proceed to the calculation Brent recommended using the regressed rate rather than the “raw” career rate.
60. James Newburg
Posted: January 25, 2022 at 08:30 PM (#6062504)
I think the AL/NL and NeL samples are too different to directly compare 80th percentiles without further adjustment. If you look at the bottom quintiles of each, the NeL positional averages are generally much lower than the AL/NL averages, which would have the effect of understating the NeL 80th percentiles in comparison to the AL/NL.
61. Dr. Chaleeko
Posted: January 25, 2022 at 10:20 PM (#6062530)
OK, here's some testing on the fielding thing. Full method goes like this:
A) find DRA/154 for NeL player
B) perform Brent's recommended calculation on (A) (subtracting difference in mean of 80th percentiles (I used the absolute value of the difference, spitballing that the differences between the leagues might also be a result of other defensive-spectrum pressures, not just quality of play)
C) regress (B) for samples under 308 games (as noted in post 59)
D) transform (C) from DRA to Rfield ( C * ( mlbRfieldmean / mlbDRAmean))
I tried to pick guys for whom we have a lot of information, who are current candidates, or who represent extremes. LMK what you all think!
Heavy Johnson
Current: -21 career Rfield
Test: -45 runs
Est. reduction in MLE: 2.5 WAA/WAR
This total would effectively tie him with Chuck Klein and Harry Heilmann as the worst RF by Rfield through WW2.
Newt Allen
Current: 167 runs
Test: 262 runs
Est. increase in MLE: 10 WAA/WAR
The reason this jumps so much is that under my current system, his fielding rate (35/154) is chopped by two-thirds to 11/154. Under the test scenario, there's virtually no difference in 80th percentile means at 2B (0.1 runs/154). The new transformation to Rfield knocks him down only about 40% from 35/154 to 19/154. Because 20/154 is an insanely good total in Rfield, a player with that career average would be amazing. Mark Belanger, for example, was a 18/154 shortstop in just 2016 games. 262 runs would be 120 more than Frankie Frisch's career total at 2B, which was the best through WW2.
Ben Taylor
Current: 69 runs
Test: 62 runs
Est. decrease in MLE: 0.5 WAA/WAR
Jud Wilson
Current: 26 runs
Test: 58 runs
Est. increase in MLE: 4 WAA/WAR
Current system knocks Wilson down by about two-thirds at both 3B and 1B, test system by about 40-50%. Both the current and test figures run contrary to his defensive reputation in the narrative.
Biz Mackey
Current: 17 runs
Test: 75 runs
Est. increase in MLE: 6 WAA/WAR
Mackey would place him 5th from 1871-1945 among catchers, right behind Hartnett's 78 Rfield. This aligns with his defensive reputation.
Julian Castillo
Current: -2 runs
Test: -22 runs
Est. decrease in MLE: 2 WAA/WAR
This would be in keeping with Castillo's reputation as a plodder.
Dobie Moore
Current: 96 runs
Test: 207 runs
Est. increase in MLE: 11 WAA/WAR
Current system knocks him down by about two-thirds, test system would only knock him down by 30 percent, hence the big jump. Like Allen's, this result is way out of scope for Rfield. Joe Tinker leads pre-1946 players with 162 Rfield; he had 155 in his best ten years, Moore would dwarf that in his only ten years.
Mule Suttles
Current: 26 runs
Test: 21 runs
Est. decrease in MLE: 0.5 WAA/WAR
John Henry Lloyd
Current: 39 runs
Test: 73 runs
Est. increase in MLE: 4.0 WAA/WAR
This would be more in keeping with Lloyd's defensive reputation.
Burnis Wright
Current: 43 runs
Test: 90 runs
Est. increase in MLE: 5.0 WAA/WAR
This would make Wright the highest Rfield total through 1945 (yes, I know his career went past that year) by 13 runs over Harry Hooper.
Oscar Charleston
Current: 28 runs
Test: 61 runs
Est. increase in MLE: 3.0 WAA/WAR
This is more in keeping with Charleston's reputation as a fleet centerfielder.
John Beckwith
Current: -13 runs
Test: -39 runs
Est. increase in MLE: 2.5 WAA/WAR
This total aligns more closely with his defensive reputation.
Pete Hill
Current: 32 runs
Test: 95 runs
Est. increase in MLE: 6.0 WAA/WAR
Tris Speaker leads MLB CFs with 92 Rfield, so I suspect this is puffy.
Cool Papa Bell
Current: -28 runs
Test: -94 runs
Est. decrease in MLE: 6.5 WAA/WAR
This would make him the worst CF in MLB history through the war by -60 runs.
Last one...
Hurley McNair
Current: 37 runs
Test: 68 runs
Est. increase in MLE: 3.0 WAA/WAR
This would put him 5th among corner OFs through 1945, behind Fred Clarke, Hooper, Sheckard, and George J. Burns.
Generally, I'm feeling kind of hit-and-miss here. Seems like we're getting some results that hew more closely to the defensive reputations we've read about, some results that are out of scope for Rfield entirely, and some that merely confirm what the current method suggests. One thing we can say is that the current method's findings are not contradicted by the test method: the good fielders are good in both, the bad fielders bad in both.
One thing worth noting here, is that the Rfield transformation does what it's supposed to do, which is reduce DRA down to Rfield's size. I don't believe that's having any particular distorting effect. I think the distortions we see enter earlier in the procedure.
62. Brent
Posted: January 26, 2022 at 12:34 AM (#6062539)
Nice work, Dr C.
I'm encouraged that most of these changes move in the right direction. But if you want to stick having numbers that are comparable with Rfield, it would probably be advisable to shrink the range some more so that Allen, Moore, Wright, and Hill don't go outside the MLB range. (I wouldn't consider a finding that Johnson is comparable to Klein and Heilmann to be surprising.)
And Bell's numbers--ouch! The Seamheads data only show him at -20.6 runs as an outfielder over his career, but I guess the quality of play adjustment is bigger in center field than at some of the other positions.
63. Dr. Chaleeko
Posted: January 26, 2022 at 11:11 AM (#6062597)
One quick way I could increase the ratio of Rfield:DRA is to square both terms.
For example, in LF instead of using means of 1.3 for Rfield and 1.9 for DRA (which results in a discount of 32% off DRA, i would square those rates to get 1.69 for Rfield and 3.61 for DRA, which would result in a steeper 54% discount. That feels more in line. But I’d ask our more stats-trained members to comment on whether that’s a good idea or if there’s a better way. Thanks!
64. James Newburg
Posted: January 26, 2022 at 01:21 PM (#6062629)
Dr. C:
I thought converting DRA to its equivalent in Rfield requires a proportional reduction in variance between the two metrics (second chart, third column in #42), but your process in #61 appears to do away with that step.
I've been puzzling over this in my mind because estimates of NeL fielding value/quality come down to how we account for the variance in their DRA estimates. As I see it, there are at least three systematic factors increasing NeL variance in career DRA relative to White MLB:
1. Shorter seasons, which increase variation in estimates of player value.
2. Worse field conditions, which increase variation in estimates of player value.
3. Greater variation in player talent independent of Factors 1 and 2.
As you know, variation in player value is not perfectly related to variation in player talent. We can't make any inferences about relative NeL/White MLB quality of defensive play without first addressing Factors 1 and 2. Therefore, I'd suggest the following process for generating NeL Rfield/154 estimates:
1. For NeL players in your sample with fewer than 308 career games, regress their career DRA/154 as you are already doing. This is probably the only reasonably simple way to account for Factor 1, given inconsistent season lengths and incomplete data coverage within player careers.
2. Using the regressed data, calculate the positional means and standard deviations in NeL DRA/154.
3. Standardize NeL DRA/154 data based on the corresponding positional standard deviations in White MLB, dividing the White MLB SD by the NeL SD. This addresses Factor 2.
4. Add the standardized NeL DRA/154 to the NeL positional mean DRA/154. Steps 3 and 4 allow you to approximate an NeL player's fielding value under MLB playing conditions.
5. Calculate the 80th percentile of NeL DRA/154 based on this regressed, standardized distribution.
Regressing small samples and standardizing the NeL distribution to the White MLB distribution means that any differences between the NeL and White MLB 80th percentiles will be mainly attributable to greater variation in NeL player talent, underlying differences in NeL and White MLB sample characteristics, or some combination of both factors. Therefore, if you don't think differences in sample characteristics introduce bias into the comparison between NeL and White MLB estimates, you can go ahead and compare 80th percentiles because you can safely assume that a higher regressed, standardized NeL DRA is primarily the result of greater variation in player talent.
6. Take the White MLB 80th percentile in DRA/154 at the NeL player's position and subtract the corresponding regressed, standardized NeL 80th percentile. This functions as a quality of play adjustment.
7. Add the difference to the NeL player's DRA/154.
8. Convert to Rfield by multiplying the result from Step #7 by the ratio of the White MLB SDs in Rfield and DRA at the NeL player's position.
65. Dr. Chaleeko
Posted: January 27, 2022 at 01:39 PM (#6062748)
James, I want to be sure I'm reading you accurately. In step 3 I'm a tad unclear. Are you suggesting this?
67. James Newburg
Posted: January 27, 2022 at 07:54 PM (#6062802)
It would be NeL player's regressed DRA/154 / regressed NeL positional STDEV * unregressed MLB positional STDEV
Only NeL positional STDEVs should be calculated from regressed DRA/154 because you're trying to account for any inflated variation introduced by shorter, volatile season lengths.
My teaching evaluations often say I'm unclear, so I hope this helps!
68. James Newburg
Posted: January 27, 2022 at 08:29 PM (#6062810)
I've been stuck in explaining what multiplying by the MLB positional STDEV does in this step. Here's my best shot:
In my proposed method, I'm proceeding on the assumption that calculating NeL positional STDEVs from regressed DRA/154 data is sufficient enough to account for any variation added by shorter, more volatile schedules. Yet in making this assumption, I still expect that NeL positional STDEVs based on regressed DRA/154 will be higher than MLB positional STDEVs due to some combination of inferior field conditions and greater variance in player talent. If this expectation holds, then multiplying by the MLB positional STDEV is a way of expressing what NeL fielding values would look like without these two sources of increased variation.
69. Dr. Chaleeko
Posted: January 28, 2022 at 11:34 PM (#6062967)
After trying James’ suggested method, here’s results for the same group of players I tested up thread. Except for Julian Castillo who wasn’t in the original data set, so I didn’t have quick access to his Test 2 results.
Current = what I have now
Test 1 = same as post 61, based on Brent’s suggestions
Test 2 = latest test, based on James Newberg’s suggestions
Heavy Johnson
Current: -21 career Rfield
Test 1: -45 runs
Test 2: -53 runs
Test 1 feels closer to the mark than Test 2. Test 2 is more extreme by 8 runs and pushes beyond the Heilmann/Klein lower limits of Rfield.
Newt Allen
Current: 167 runs
Test 1: 262 runs
Test 2: 299 runs
Current system gives most realistic results. Both test 1 and 2 are way out of scope.
Ben Taylor
Current: 69 runs
Test 1: 62 runs
Test 2: 115 runs
Current and Test 1 both place Taylor 4th through 1945 at 1B. Test 2 is well outside scope for Rfield (Fred Tenney leads at 910).
Jud Wilson
Current: 26 runs
Test 1: 58 runs
Test 2: -2 runs
Test 2 is closest to defensive reputation. Test 1 is surprisingly good. Current is smack dab between them.
Biz Mackey
Current: 17 runs
Test 1: 75 runs
Test 2: 64 runs
Test 1 and 2 both match Mackey’s reputation, the former placing him 5th prior to 1946 and the latter 8th (just above Al Lopez for reference’s sake).
Dobie Moore
Current: 96 runs
Test 1: 207 runs
Test 2: 222 runs
Current is clear winner. Tests 1 and 2 are way outside scope. Tinker leads with 162 Rfield prior to 1946. Current places Dobie 13th, which given his short career makes good sense.
Mule Suttles
Current: 26 runs
Test 1: 21 runs
Test 2: 34 runs
These are all very similar and say essentially the same thing about Suttles: He was more than just a plodding slugger.
John Henry Lloyd
Current: 39 runs
Test 1: 73 runs
Test 2: 108 runs
Lloyd is reputed to be “The Black Wagner,” and Honus had 85 Rfield at SS. Current looks low, Test 1 is in the area of Wagner, Test 2 would place Lloyd in 11th place prior to 1946. Either of tests 1 or 2 could be viable depending on one’s interpretation of Lloyd’s career.
Burnis Wright
Current: 43 runs
Test 1: 90 runs
Test 2: 74 runs
Test 1 seems out of scope (Hooper highest with 77). Test 2 would make Wright the second-best defensive RF before 1946. Current places him 7th.
Oscar Charleston
Current: 28 runs
Test 1: 61 runs
Test 2: 58 runs
Tests 1 and 2 reach the same conclusion, both match his defensive reputation better than Current.
John Beckwith
Current: -13 runs
Test 1: -39 runs
Test 2: 6 runs
Beckwith’s fielding is all over the place, so it’s hard to pin him down. Generally, if you think he’s in the Heavy Johnson/Dick Allen/Harm Killebrew camp, you’ll prefer Test 1. If you think he’s probably about average, Test 2 is for you. Current essentially lies between them.
Pete Hill
Current: 32 runs
Test 1: 95 runs
Test 2: 115 runs
I think Current captures Hill the best of these three. Reputation-wise, Hill is usually not thought of as the best defensive CF in the NeL (that’s usually Charleston), but that’s what Tests 1 and 2 are suggesting. Furthermore, they both exceed Tris Speaker’s leading 92 Rfield for the period.
Cool Papa Bell
Current: -28 runs
Test 1: -94 runs
Test 2: -19 runs
Cy Williams “leads” all CFs with -34 Rfield prior to 1946. Test 1 is clearly not a tenable answer. Current puts him in the 6 worst. Test 2 ties him for 12th worst.
Hurley McNair
Current: 37 runs
Test 1: 68 runs
Test 2: ~55 runs (he wasn’t in data set for test 2 for CF)
Current lies 9th prior to 1946. Test 1 is second to Hooper. Test 2 (which simply ballparks his CF runs at +5 career) would be 4th best just behind Sam Rice.
Tallying all this up, here’s what I’m seeing.
1) Current is generally more moderate in its results and doesn’t seem to go outside the scope of Rfield on extreme/complicated players. That’s likely because I’ve tweaked it over time to avoid that.
2) Test 1 and Test 2 tend to overinflate/overdeflate more extreme players.
3) Test 2 is probably a little more bullish than Test 1.
My intuition tells me that Test 2 is more complicated that either Current or Test 1, but it’s not producing better results, so it’s probably better to use Current or Test 1 with modifications to whichever works out better.
Test 1’s use likely requires a modification that reduces the result by more than the ratio of the MLB means of Rfield to DRA. That could be math transformation like squaring the those means or using different terms than the MLB means.
Current’s use likely requires a modification to the final DRA-to-Rfield conversion, which is currently STDEVmlbRfield / STDEVNeLDRA. But I’m not sure what the best mod would be and am very open to suggestions.
70. theorioleway
Posted: February 01, 2022 at 09:11 AM (#6063318)
Thanks for doing Dr. C. Seems like current works best, and people can make mental note that players are probably slightly better/worse as needed. I guess theoretically you could average current and test one if you want more elasticity in the numbers, but that is probably too much of a pain for you to be worth it.
For Johnson, I don't think those fielding numbers are bad enough to make him unworthy of HOM - I think only thing that would do that is treatment of his Wrecker seasons, if they end up being discounted a lot more.
P.S. exciting stuff with the new website!
71. Dr. Chaleeko
Posted: February 01, 2022 at 04:30 PM (#6063396)
Thanks, O! Hope you enjoy the site.
The current seems to provide the most reasonable answers. But it doesn't mean it's a well-constructed system, sadly. It's the last step in it that's troublesome:
72. Dr. Chaleeko
Posted: March 13, 2022 at 06:49 PM (#6067653)
Finally getting around to the no-data season question. Sorry it took so long, but you are going to love why it took so long when I can pull the curtain back on it.
So, I'm looking at Heavy via the Marcel technique Brent suggested. There are two places in the process where this could be performed.
a) Before: I solve for the Marcel z-score of a hitter's NeL wOBA, and then make all the usual adjustments.
b) After: Just use the MLE Rbat outputs as the weighted Marcel inputs.
These come out with fairly different results. Here's how the calculations look.
BEFORE---solving for 1921 with actual z-scores
1922: 1.663 z-score, 263 PA
1923: 1.487 z-score, 489 PA
1924: 0.482 z-score, 302 PA
League average assumed to be 0 z-score
73. Dr. Chaleeko
Posted: March 14, 2022 at 04:20 PM (#6067746)
Responding to Cookie’s post 157 in ballot discussion thread for 2023….
Thanks, Cookie! I feel like I want to pick one or the other rather than average them. The reason is that I feel like they represent two slightly different theoretical approaches. The BEFORE approach says that it is better to use the actual performances, the AFTER says that it is better to use the translated, already adjusted performance. One of these two is more theoretically accurate. But which? The reason this becomes crucial is that earlier in Heavy’s career, there are years where there is no actual adjacent data to rely on for a Marcel so we’d need to base a Marcel on career rates.
74. Dr. Chaleeko
Posted: March 22, 2022 at 01:38 PM (#6068633)
Yay! My login finally allowed me to post again.
After a lot of thought, I'm coming down on the BEFORE side instead of the AFTER side. The reason being that seasons included in the AFTER inputs could have skewing effect. For example, in 1926, I have Heavy Johnson capped at 38 Rbat because that's what the NL's leader earned. But in 1922, the NL leader was in the 90s. While Hornsby in 1922 was much further from the second-place finisher than in 1926, it leaves more room for the results to skew up or down due to factors outside the player's (in this Heavy's) control. So, using the BEFORE treatment isolates him compared to the league more effectively.
One condition that bears mentioning is the instance of a season that has no data surrounding it or limited data. Here are several cases:
1) No data for year n, n+1, n+2, or n+3: In this case, I would use the data from prior seasons and perform a normal Marcel (the one I wrote up earlier is backwards Marcel)
2) No data for year n, n-1, n-2, n-3: This is what we've already talked about
3) Data in none of the surrounding seasons: I would use career average, adjusted for age in the same way Marcels do.
4) Extremely low sample of data (under 200 PA in surrounding data): Use Marcel in whichever direction has more data, and add n PA of career average to reach 200 PA.
5) Data only in some of the surrounding seasons. Let me flesh this one out a little. First, this is assuming that the player didn't miss the season due to injury; in other words that he was active but we don't currently have a usable record of that play. This is the procedure I'm thinking of:
A) Guesstimate playing time: Player's career G / Team G * length of MLB schedule
B) For missing seasons, use the player's career rate of performance
C) Calculate the Marcel in whichever direction has the most seasons of data. If they are the same in terms of most seasons of data, use whichever one has more adjacency of data. If they are the same adjacency, use whichever data contains more PA. If they have the same PA, just use the normal Marcel method.
Let's look at Heavy. His career, unadjusted z-score for wOBA, the metric I'm using, is 1.34. He played in 0.768 of all team games during his career.
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Dr. Chaleeko Posted: November 07, 2004 at 02:36 PM (#956874)First off, no i9s information to help out...gulp.
Second, how much credit should Johnson get for his time in the 25th infantry? Riely doesn't list the years he was in the 25th, but indicates that Johnson made the Monarchs in 1922, a good two years after Moore. Does this mean that Johnson got out of the miltary later or that he played for lesser teams?
Last, how bad was his fielding? I get the impression from Riley that Johnson was a Heilmannesque player, with a powerful, high-average bat, but not much fielding skill.
Negro-League Play
1922 KC 231 ab, 104 hits, 22 2b, 9 3b, 13 hr, .451 avg, .792 slg. ; leads lg in BA, all-star LF
1923 KC 186 ab, 68 hits, x 2b, 9 3b, 20 hr, .367 avg., appx. .866 slg; leads in HR, all-star CF, Holway MVP
1924 KC 296 ab, 111 hits, 16 2b, 11, 3b, 7 hr, .374 avg., .574 slg; all-star RF
1925 Bal .352, played lf
1926 Bal/Hbg 209 ab, 72 hits, 8 hr, .346 avg.; avg. 4th in lg, all-star in LF
1927 Hbg .336 played rf.
1928 Cle .279 played lf; .Memphis .331 played rf
1929 no data
1930 no data
I can find no records in Holway for Johnson after 1928. Cleveland and Memphis were not good teams in 1928, so it looks like his career was on a downward slope at that point.
vs. Major-league competition
1922 2-4 (double, home run) vs. Babe Ruth all-stars. Jack Quinn pitched.
According to Holway, he was 8-16 lifetime vs. major-league pitching, with 2 home runs.
Brief analysis: Johnson left the military and joined the Negro National League when he was 26. (That answers one of Dr. Chaleeko's questions . . . ) He was clearly an immediate star, with stunning seasons in 1922 and 1923, though he may have been assisted by a friendly home ballpark for the Monarchs. (They changed parks around this time -- I'm pretty sure the details are on the Dobie Moore thread from KJOK, but I haven't had time to cross-check.) Certainlty, the Monarchs dominated the offensive leaderboards during these years without dominating the standings, although they were the best team in the league after the light-hitting Chicago American Giants. Johnson never had seasons as good as those two again, although he remained a star from 1924 to 26. He was slightly above average at best in 1927 and 1928, playing for poor teams in 1928, and he disappears from the records of major teams after that, even though Riley has him playing for the Memphis Red Sox through 1933, and Holway has data for Memphis for most of those seasons. I think he deserves probably three or four seasons of MLE credit, two at star-level, for his play in the military (I don’t know when exactly he joined Rogan and Moore on the 47th Infantry Division team, but that should be discoverable). But it’s hard to credit him with any more than a eleven-year MLE career, which is very short for an outfielder.
I’ll try to come up with win-share projections, soon, since his peak looks high enough that it’s worth considering how good he was overall. But I don’t think he’ll match Dobie Moore,
G-72
AB-260
H-90
2B-15
3B-5
HR-3
R-40
BB-16
HBP-2
SH-7
SB-0
AVE-.346 (.279 avg)
OBP-.388 (.334 avg)
SLG-.477 (.384 avg)
OPS-.865 (.718 avg)
That's where I have him, too. A long career would have helped him.
Muehleback was certainly a pitchers park by Negro League standards, and was the same park (different name) used by the Kansas City A's in the 1960's.
So, yes, I believe Johnson was helped by playing in a hitter friendly park pre-1924.
Heavy Johnson Batting
*-led league
G-98*
AB-374
H-152*
D-32*
T-13 (2nd to Turkey Stearnes' 14)
HR-20* (tied with Candy Jim Taylor, Tol.-StL)
R-91*
RBI-120* (next highest--Oscar Charleston's 94)
W-38
SB-17
AVE-.406*
OBA-.462 (2nd to Torriente's .471)
SLG-.722*
Again, the Monarchs divided their home games between hitter-friendly Association Park (31 games) and pitcher-friendly Muehlebach Field (27 games). I don't have park factors, but if I had to guess, I'd say the Monarchs played in a more-or-less neutral context.
What interests me the most about Heavy Johnson is what a good example he is of the Jamesian division between peak and career value, and how troubling it sometimes is to try to reconcile the two. Ed Wesley is another example of a guy who had some seasons in Detroit that absolutely knock your socks off. Take a look at his 1925 numbers, for instance. Ruth never had a season that good. Yes, of course he took advantage of the short porch in Mack Park, but he also hit .416.
The short answer to my question is that, in almost all cases where the stats are this dichotomous, I feel that career value has to trump peak value. But in the Johnson and Wesley cases, those are some peaks!
What I can tell you, right from the get-go, is that I'm damned mad that Elsie Brown shot him in the leg. This is a guy who, like Chino Smith, seemed absolutely bound for glory. From most accounts, he wasn't a beautiful shortstop to watch--he wasn't Omar Vizquel--but he could play the position, plus he had that Shawon Dunston gun on his right shoulder.
And man, could he hit!
OK then sunny, help me out please. My posts #58 and 59 on the Monroe thread are primarily about a very general HOM issue, and to the extent they're specific, they're not about Monroe.
So they don't belong there. Where do they belong? Thanks.
http://www.seamheads.com/NegroLgs/player.php?playerID=johns01hea
http://www.baseball-fever.com/showthread.php?130322-HOF-cases-of-Negro-Leaguers-using-MLEs-of-ERA-and-OPS-from-Seamheads&p=2634079#post2634079
http://www.baseball-fever.com/showthread.php?130322-HOF-cases-of-Negro-Leaguers-using-MLEs-of-ERA-and-OPS-from-Seamheads&p=2634810#post2634810
Negro League Similarity vs Neg Lg to MLB Players in MLB
I'd say "the Lefty O'Doul of the Negro Leagues" might be a reasonable comp.
https://homemlb.wordpress.com/2018/11/02/the-rest-of-our-negro-leagues-mles-centerfield-right-field-and-a-few-more-infielders/
I think the RField numbers are probably too generous. He was listed as 6' 250lbs and his defensive reputation is poor. That's a Fat Tony Gwynn physique and his numbers were likely closer per season to -10 than -1.
1916 2.0
1917 2.5
1918 3.6
1919 3.2
1920 6.0
1921 6.5
1922 6.0
1923 5.6
1924 6.0
1925 1.3
1926 3.3
1927 4.3
1928 2.5
1929 2.1
1930 2.0
1931 3.5
1932 1.5
TOT 62.1
Here are Heavy Johnson's latest actual Negro league statistics from Seamheads, along with Dr C's latest MLE WAR estimates:
Year / Age / G / PA / BA / OBP / SLG / OPS+ / MLE WAR
1916 / 21 / 9 / 35 / .273 / .314 / .455 / 195 / 3.3
1917 / 22 / NA / NA / NA / NA / NA / NA / 4.9
1918 / 23 / NA / NA / NA / NA / NA / NA / 4.0
1919 / 24 / NA / NA / NA / NA / NA / NA / 4.5
1920 / 25 / 3 / 10 / .300 / .300 / .500 / 128 / 2.3
1921 / 26 / NA / NA / NA / NA / NA / NA / 9.1
1922 / 27 / 74 / 288 / .396 / .445 / .715 / 207 / 8.7
1923 / 28 / 99 / 433 / .406 / .472 / .719 / 210 / 5.7
1924 / 29 / 79 / 332 / .360 / .421 / .531 / 173 / 8.2
1925 / 30 / 61 / 251 / .327 / .383 / .538 / 141 / 1.8
1926 / 31 / 48 / 186 / .350 / .418 / .540 / 170 / 5.3
1927 / 32 / 58 / 228 / .379 / .461 / .568 / 164 / 6.2
1928 / 33 / 69 / 273 / .348 / .390 / .468 / 129 / 3.2
1929 / 34 / NA / NA / NA / NA / NA / NA / 2.4
1930 / 35 / 5 / 13 / .250 / .250 / .250 / 38 / 0.6
1931 / 36 / 5 / 21 / .286 / .286 / .476 / 109 / 2.6
1932 / 37 / 2 / 5 / .000 / .250 / .000 / -28 / 1.4
Johnson was playing for the Army Wreckers team from 1916 to 1921, which is why only a handful of games are recorded for those years.
To convert Negro league performance to major league equivalents, I use this simple back-of-the-envelope formula: Add 100 to the Seamheads OPS+; then multiply by 0.89; then subtract 100. For example, Johnson’s 1922 NeLg OPS+ is 207, so I calculate his eqOPS+ as 173 = 0.89 * (207 + 100) – 100.
Here are his equivalent OPS+ for ages 27 to 33, followed by Belle’s at the same ages:
Age - Johnson / Belle
27 / 173 / 194
28 / 176 / 177
29 / 143 / 158
30 / 114 / 116
31 / 140 / 172
32 / 135 / 143
33 / 104 / 109
We see they have similar OPS+, though Belle is better. On defense, both players were pretty bad, though I believe the evidence indicates that Johnson was worse. Bbref shows Belle costing his team about 8 runs per season over these years compared with an average left fielder, while Seamheads shows Johnson costing his team 1.2 wins per 150 games on defense. I note that his is one of the worst fielding rates in the Seamheads database.
After age 33, Belle is injured and his career is over. I don’t know if Johnson was injured, but he doesn’t show up in the statistics for age 34 (1929 - Dr C responded that Johnson was playing with an independent barnstorming team that year). When he returns to the Negro leagues he barely plays and is pretty awful in the few games he does play.
Of course, we know how Belle played before age 27 -- 3 good seasons from ages 24 to 26 averaging an OPS+ of 135 and 3.1 WAR, plus 71 games of replacement level play at ages 22 and 23. For Johnson, we know that he was playing at those ages for a very good Army team, but we can only make educated guesses about the numbers. Dr C's MLEs show him earning 28.1 WAR from ages 21 to 26, or an average of 4.7 WAR per season.
I guess the biggest question is how likely we are to get anything from his years with the Infantry Wreckers. I'm not too in tune with the day-to-day of the research there, but if it's possible those records exist, it might be worth holding off to ensure he's not just a flash in the pan. But what we have points to a HoM level talent, as strong as any remaining Negro League player.
It really comes down to those missing Wreckers' seasons.
I start by focusing on the well-measured seasons, which for Johnson are his ages 27 to 33. From my post # 24, his average eqOPS+ over those seasons is 144 (if you weight the years by plate appearances) and 141 if you equally weight the seasons. I'll use 144. Belle's average OPS+ for his age 27-33 seasons is 151. Belle's rfield averages -8 per season over that period. For Johnson, if we use the Seamheads fielding WAR number, which according to my understanding doesn't include the Positional adjustment, thus making it comparable to rfield, his defense averages about -1.2 wins or -12 fielding runs per season. Belle and Johnson both seem not to have missed much playing time over those seasons, so I'll assume there's no difference there. So based on the differences in offense and defense, it appears that Johnson should average maybe 7 or 8 fewer runs per season than Belle, or let's say 5 fewer wins over the 7-season period. Belle's WAR for the 7 seasons is 31, so I would peg Johnson's total at 26, and distribute them among the seasons based his seasonal eqOPS+, though smoothing the variation out somewhat because of the variance due to shorter NeLg seasons.
Dr C's MLEs show Johnson with 39.1 WAR over these 7 relatively well-measured seasons. The most important difference between Dr C's estimates and mine is that Dr C's assume that Johnson was a league-average right fielder, with an average rfield of 0. In addition to Seamheads fielding data, the evidence that Johnson was a poor fielder includes anecdotal information (Riley says he "was an unpolished fielder and not noted for performance afield"; see also comment #10 on this thread). I would ask Dr C to explain his reasoning for treating Johnson as an average fielder.
Offense also contributes to the difference in our estimates, as Dr C's MLEs appear to show Johnson as a slightly better hitter than Belle, whereas my back-of-the envelope calculation (which I've checked against other MLEs, such as those done by Chris Cobb) show Johnson a bit below Belle.
In making my own MLEs, I wouldn't rely entirely on the comparison with Belle. Using Stathead's season and career finder, I found four additional players who from ages 27 to 33 had an average OPS+ between 134 and 154 and poor fielding as evidenced by cumulative rfield of at least -40 runs (or -6 runs per season). These players are Frank Howard, Ken Singleton, Ralph Kiner, and Roy Siever. Here are some statistics:
Ages 27 to 33
Name / PA / OPS+ / Rbat / Rfield / RAA / WAA / WAR
FHoward / 4271 / 152 / 249 / -83 / 103 / 10.4 / 25.1
Belle / 4577 / 151 / 284 / -59 / 161 / 15.4 / 31.0
Singleton / 4581 / 145 / 243 / -58 / 136 / 14.5 / 29.6
Kiner / 3674 / 144 / 191 / -47 / 103 / 9.8 / 23.3
Sievers / 4098 / 137 / 166 / -57 / 58 / 5.5 / 18.2
The average OPS+ for these 5 players is 146 and the average WAR is 25.4.
Having established an estimate of Johnson's MLE WAR for ages 27 to 33, I would then look to estimate his earlier and later seasons for which we have almost no information, other than the fact that he was playing baseball for a good team. For the earlier seasons, I think a reasonable way to approach this is by looking at the average WAR from ages 21 to 26 for the five comparable players.
Ages 21 to 26
Name / PA / WAR
FHoward / 1829 / 9.6
Belle / 2099 / 9.1
Singleton / 1877 / 10.3
Kiner / 2582 / 24.8
Sievers / 1406 / 0.6
The average is 1959 PA and 10.9 WAR, with quite a bit of variation from Kiner on the high end to Sievers on the low. My MLEs would use take this average and then bump it up slightly based on Seamheads data showing that Johnson played pretty well in 9 games at age 21. I think 14 WAR from age 21 to 26 is about as high as I would be comfortable going.
Dr C's MLEs for these seasons (again, there is hardly any actual data) show 2430 PA and 28.1 WAR, including a career high 9.1 WAR at age 26. While the example of Kiner tells us that this level of early career performance is not impossible, the data for the other four comparables surely suggests that the number is on the high side given our level of ignorance.
For the years older than 33, I note that Johnson's play in his age 33 season was near replacement level according to Seamheads (his non-MLE WAR was 0.2, of which his offense was +1.6 wins, his defense was -1.4 wins; he also had -0.3 positional adjustment and +0.2 for pitching 3 shutout innings). Limited statistics for ages 35 to 37 tell us that he was never used for more than a few games a season was probably below replacement level when he did play. I think the reasonable conclusion is that Johnson would not have kept a major league job after a year like that, so my MLEs would show his career ending after his age 33 season. Dr C's show him earning another 7 WAR from ages 34 to 37.
So my method suggests that a reasonable MLE for Johnson would be about 14 + 26 = 40 career WAR, which would include a couple of really good (6 to 7 WAR) seasons at ages 27 and 28. Dr C's career MLE is 74.2 WAR with 3 seasons of more than 8 WAR.
Have my back-of-envelope estimates led me astray? Or are the differences indicative of a problem in Dr C's methodology?
I am not impressed with Heavy Johnson serving as the catcher for the Wreckers. It's not like they had dozens of other options to pick from. Anyone on the roster had to be enlisted in the 25th infantry of the Army. They're going to pick from the few people who can catch the ball when Bullet Rogan throws it. I wouldn't give MLE credit for those seasons as a catcher.
DL - I'm not sure how to evaluate Johnson as a catcher. AFAIK, he was a catcher during all of his 6 years with the Wreckers, then caught 16 games with the Monarchs in 1922 before switching to the OF. I assume that if he was a really good catcher he might have stayed there. The Monarchs' rivals at the time, the Chicago American Giants, were famous for playing small ball, bunting and stealing, so a poor defensive catcher wasn't going to play for them. And the Monarchs had Frank Duncan, one of the best. But there's nothing in the record to indicate that Johnson was a poor catcher.
Cookie - On Johnson's page at Seamheads, there's a little graphic showing his percentile rankings in various batting and fielding categories relative to other players in the database. It says he's in the 6th percentile as a left fielder and in the 17th percentile as a right fielder. To me, that's evidence that he was pretty bad. My impression is that before WWII, major league teams wouldn't let a guy like Adam Dunn, Frank Howard, or Jeff Burroughs play in the outfield, with maybe rare exceptions. To me, the defensive data from Seamheads suggest that Johnson may have approached that territory, which probably would have made him worse than any pre-war major league outfielder with a long career.
kcgard2 - I agree that Dr C's MLEs seem to be greatly influenced by Johnson's two peak seasons in 1922 and '23. On the other hand, I don't think it's reasonable for any statistical extrapolation model to project a 9 WAR season, as Dr C has projected for 1921, even if the adjacent seasons were as good Dr C assumes. Every statistical projection system I'm familiar with includes a regression or dampening component, so it just doesn't seem reasonable to me for his system to project a peak higher than anything in the player's experience.
Dr C - It appears I had missed reading your discussion of Johnson's fielding statistics in post 356 of the 2022 ballot discussion thread. While that post clarifies things somewhat, I'd like to ask why you apparently didn't use his left field statistics in evaluating his defense. Seamheads currently shows Johnson's range as -13 runs in 257 games played in right field, and -26 runs in 189 games played in left field. If I understand your comment correctly, you only used the right field numbers. For players who split time between the corners, I've usually pooled the two positions, thinking the benefits of a larger sample size more than offsets any biases from pooling the two positions. And in this case, adding left field would affect the assumptions that feed into the MLEs.
These are all great questions and points. I'm glad when people question results because it helps me get better at this stuff.
I want to start with Johnson's fielding. First off, I did find a computational error in double checking, so I'm really glad you queried. Parts of this process do require manual work, and I ain't perfect. I also changed his age 28-30 to LF to better align with where he played in real life. Those changes combined reduced his Rfield to -21 (and WAR to 71.7). Incidentally, the figure -12.9 DRA RF DRA has been bandied about. That's not the number I use. I only use figures (for all aspects of the game) from regular season games (of course, the regular season in Cuba is in the NeL's offseason, but I count Cuba as a "regular season" for this purpose). No exhibitions, East-West games, or postseason games. Heavy has -8.8 DRA in 233 RF games by this reckoning.
Now, a big area of confusion with his fielding, IMO, is between DRA and Rfield. A couple of years ago, I took the top 50 RFs in the Negro Leagues Database by G in RF, and I found the STDEV of their DRA/154 in RF (counting only league games). The STDEV was 14.1 runs. I then took the top 50 MLBs by G in RF through the same era, and I found the STDEV of their Rfield in RF. It was 3.01. That's a huge gap! In fact, there's a huge gap at all positions. (The widest gap is CF: 17.0 vs 3.1.) When you look at individual bad corner OFs in MLB history for a pattern between DRA and Rfield, there's big gaps, but they fly in both directions. In the larger sample, however, the STDEVs move dramatically lower from DRA to Rfield, and I think it's more defensible to use the larger sample than to try to peg a player to specific comps. I present the MLEs such that they look like what you'd find at BBREF to keep them familiar and consistent with MLB players. So I take the career DRA/154 (adjusted for sample size as noted in post 356 in the 2022 discussion thread), and I multiply it times the ratio of MLB Rfield STDEV to NeL DRA STDEV. For Heavy in RF that looks like this:
RATE PER 154: -8.8 DRA / 233 G in RF * 154 = -5.8 DRA per 154
SAMPLE ADJUSTMENT: -5.8 DRA/154 * (233 G in RF / 308) = -4.4 DRA / 154
RFIELD CONVERSION: -4.4 DRA/154 * (3.0/14.1) = -0.9 Rfield/154
CAREER RFIELD: -0.9 Rfield/154 * 1308 G in RF = -8.0 career Rfield
Repeating this with his catching stats for ages 21-22, he gets 0.4 Rfield; repeating it for LF ages 28-30 he gets -13.9 Rfield.
Couple things to note:
-Someone mentioned his catching upthread. I just rolled with it because that's what he did.
-Jaack mentioned the Wreckers data. I'm not sure how helpful it would actually be. We'd be talking very small numbers of games, and there's no "league" to work with, so we can't really compare a player effectively to the other players.
-We might want to wonder about 1926. That year, Heavy played 44 games in LF and had -12.3 DRA. Obviously that's terrible. But it's a little bit difficult to take at face value because it's just 44 games. If a guy hit .400 in 44 games we probably would say it's not very representative if it is out of line with the rest of his career. Heavy wasn't a good fielder, but that's roughy double the badness of his other 90 games in LF. Small samples.
-As to the question regarding combining RF and LF. I do not currently do that. TBH, it wasn't something I'd thought about because I tend to treat each position discretely. There are good reasons for doing so, IMO. First is that for a long time in the early 20th Century, LF was a more important defensive position than it later became. LF putouts are similar to (and IIRC in a couple cases higher than) CF putouts during that time. During that same period, RF was still evolving away from a place to hide pitchers and catchers who could hit when they weren't in the battery (Caruthers, King Kelly, Buck Ewing) or to stash your worst fielder. But at some point RF became a more skilled and often more athletic position than LF. To the point that in the 1970s, you've got Dewey Evans in RF and Luzinski in left or in the 2010s you've got Mookie Betts or Shane Victorino in RF (I know, Fenway) and Adam Dunn in LF. Lou Brock was a guy who was stashed in LF because he was a spotty flycatcher with a scattershot arm and didn't have the tools to play RF despite his athleticism. I know a lot of folks like to just lump RF and LF together like there's no difference. I've just never felt like that's necessarily true. However, that query led me to look again at his LF/RF playing time and recast his age 28-30 seasons, which is good.
I'll talk about 1921 in another post.
We have no data for 1921. I try build a sample of >= 200 PA for each season in a player's career. I do it in four possible stages, in this order.
A) If the player has 200 PA in season n I move forward.
B) If the player has <200 PA, I use a combination of season n and seasons n+1 and n-1 to build up 200 PA
C) If the sample is still <200 PA, I add seasons n+2 and n-2 but at 60%
D) If the sample is STILL <200 PA, I add the player's career to the sample.
Also, I calculate my own wOBAs and set them to a common .330 league-average baseline so that all seasons are apples-to-apples. Or close.
Here's the situation with Johnson:
1920: 10 PA, .377 wOBA, 0.73 z-score
1921: No data
1922: 263 PA, .488 wOBA, 2.16 z-score
I combine the seasons like this (there may be some rounding error/sig dig stuff here):
10 * 0.733 = 7.333
+ 263 * 2.165 = 569.286
= 576.619
/ 273 = 2.112 STDEV
The NL in 1921, excluding pitchers, had a STDEV of 0.109 points of wOBA and a mean wOBA of .302. Therefore,
2.112 * .109 = .230 pts of wOBA
.230 pts of wOBA + .302 mean = .532 wOBA (w/ sig digs it's .556)
That turns into 0.345 wRC/PA, which is then adjusted for league quality (0.80) to get 0.276 wRC/PA and a wOBA of .4865. The league had 4.26 PA/lineup spot in 1921, and given his durability characteristics, that translates to 579 PA. Therefore, 579 * .4865 wOBA = 78.663 batting runs. Because the league leader in batting runs had 76 that season in the NL, I lower it to 76. On the numbers side, it projects to .361/.419/.588, a 172 OPS+. That's how it gets to 9.1 WAR. The NL scored less often than it would a year later when Heavy's MLE basically repeats the season, which is why he has a higher WAR in 1921 than 1922.
That's the guts of it. In terms of the outcome itself, Johnson is an edge case for all the reasons we've discussed in this thread. Does the MLE overstate his case? It certainly could. And I'm completely sympathetic to the point that 1921 is a whole-cloth estimation. In reality it has to be. We can't have a zero there, and no one can say with anything approaching certainty that it "should" or "should not" be at a certain level. It's all a question of approach, and both the "just use his career rates" and the method I've chosen have downsides. I just dislike the downsides of the method I've chosen less than other methods (many of which I've tried!). It's also worth noting here that MLEs are generally conservative because they rely so much on central tendency. MLEs giveth and taketh away.
What I can say is that I'm using the same set of principles and techniques on every single hitter (or pitcher) and trying to make as few decisions as possible because I want to avoid introducing too much of myself into each player's results. I think internal consistency matters tremendously with this work, and completely bespoke MLEs (as someone outside this group has suggested to me) would introduce more problems than they solve. But, as always, someone else's mileage may vary!
1916 C: 3.2
1917 C: 4.7
1918 RF: 3.9
1919 RF: 4.4
1920 RF: 2.2
1921 RF: 9.1
1922 RF: 8.6
1923 LF: 5.1
1924 LF: 7.7
1925 LF: 1.3
1923 RF: 5.2
1924 RF: 6.2
1928 RF: 3.1
1929 RF: 2.4
1930 RF: 0.6
1931 RF: 2.6
1932 RF: 1.4
TOTAL 71.7
Do you have the data to compare DRA's standard deviations for NL and AL players during the NeL period to rField's SDs? If so, how do the NL/AL fielding SDs in DRA compare to the NeL SDs?
It seems to me possible that the differences in SDs between the NeL DRA numbers and the NL/AL rfield numbers could result partly from differences in distribution of values within the sets of players, in addition to differences between the fielding measurement systems. In other words, if there's a lot more variability in fielding skills in the NeL than in the NL/AL, the NeL fielding numbers would have a larger SD than the NL/AL numbers in any measurement system, and that range of variation shouldn't be entirely smoothed out in the construction of MLEs. Since different measurement systems with incommensurate inherent variance are being used in the two different contexts, however, the only way to compare NeL SD to NL/AL SD would be to calculate the SD for the NL/AL in DRA. Does that make sense?
If the DRA SDs for the NL/AL 1920-48 are similar to the DRA SDs for the NeL, 1920-48, then that would suggest that Heavy Johnson's fielding values need to be very heavily smoothed out to bring him into the range of values produced by the rfield system. However, if the DRA SDs for the NL/AL are smaller than the DRA SDs for the NeL, then that would suggest that a greater variance in NeL fielding numbers would need to be retained to reflect the fact that the NeL conditions allowed for a wider range of fielding skills than occurred in the contemporary NL/AL.
It may well be that the DRA data that would be needed to test this case aren't available, but there might be other ways to get at the issue through looking comparatively at SDs in batting and/or pitching values? Maybe you have already done this?
Thanks for any illumination you can bring to this question!
You've documented that the standard deviation of DRA is much larger than the standard deviation of Rfield in the major leagues. I can think of three reasons, each of which probably contribute:
1. Smaller sample size in the NeLg should contribute to larger standard deviations. We can estimate that effect using the standard square-root of n formula. For example, if NeLg seasons during the 1920s were half as long as major league seasons, sample size would explain a 41% increase in the standard deviation.
2. Differences in the standard deviations of DRA and Rfield due to the difference in the two estimation methods. As Chris has suggested, if we have DRA data for major leagues (or Rfield estimates for NeLgs) we could figure out how much is due to the formula. My recollection is that MLB DRA data used to be available from the now defunct Baseball Gauge site. Are the data still available anywhere else?
3. As Chris has also suggested, differences in the true underlying distribution of fielding ability. I would argue that this is probably the case. First, we know that there were a lot more errors recorded in NeLg games. For example, the league fielding percentage for the 1922 NNL was .953, compared to .967 for the NL. When there are more errors in a league, I think it's likely that there is more variation. Here are some other quick numbers taken at the team level:
1922 / SD of DER / SD of Fld%
NL / .013 / .003
NNL / .030 / .007
These differences are partly due to the NNL teams playing fewer recorded games, but sample size can't explain the entire difference.
Second, I think that NeLg defense was more widely dispersed because their teams were picking up a much wider range of talent. Here's how I think about it. Suppose the leagues in the 1920s started out integrated, and there were 22 teams in the majors, 22 in the highest minor lg level (nowadays called AAA), and 22 in the second highest level (AA). Then they tell the teams they have to segregate, and they split the players into 16 White teams and 6 Black teams at each level. The White teams just go ahead play at the same three levels, but the Black teams now have to split up and mix, so each team now contains major league, AAA and AA players. That gives you a league with an average quality of play about equivalent to AAA, as we've assumed for the quality of play adjustments, but 1/3 of the players are major league quality (including many top stars), 1/3 are AAA quality, and 1/3 are AA quality. That new NeLg is going to have a much wider dispersion of fielding ability than the major leagues. Also, they don't have the coaches and time spent in spring training that the MLB teams have, so it seems inevitable that there would be more dispersion in fielding (as well as hitting and pitching). I think the best fielders (Lundy at SS, Charleston in his prime in center, etc.) were every bit as good as the best major league fielders, but I think the worst NeLg fielders were probably worse than the worst MLB fielders.
Regarding Johnson's 1921 season, if I understand your methodology, the projection is essentially based 96% on his 1922 season and 4% on his 1920 season with no regression. Which I guess is why it ends up looking quite a bit like his 1922 season. I think if I were approaching the problem of projecting a season for which there are no actual data, I would start with something like Tangotiger's Marcel method (though in reverse, since you have data for the seasons after 1921 rather than before). In other words, a weight of 5 for 1922, 4 for 1923, and 3 for 1924, along with a regression factor. I also might throw in 1920 with a weight of 5, though that sample is so small it shouldn't make a difference. And I'd regress toward the major league equivalent average rather than toward the NeLg average, since we're taking it as given that these players are all major league quality. I think that approach would still project a strong season for Johnson in 1921 because his 1922 and 1923 seasons were both so good, but not as strong as what your current method gives you. The current projection seems more like an upper limit than an unbiased projection.
Using your WAR numbers for 1922 to 24, here's what that method would give for Johnson's 1921 season:
5.6 WAR = 0.8*(0.6*8.6 + 0.3*5.1 + 0.1*7.7) - 0.4
To me, that seems like a more reasonable projection conditional on your estimates for 1922-24.
Brent, I'm wondering about something re WARcels. My MLEs include a full range of values beyond WAR (and including trad stats derived from the WAR inputs). I already have fielding, running, DP, positional, and replacement WAR inputs, it's really only hitting that I'm missing. Do you think it would be better to either:
a) Figure it the WARcel way you mentioned then algebraically solve for X where X = batting runs?
b) Use a Marcel-type calculation t figure the batting runs then calculate WAR the usual way?
Thanks!
Regarding WARcel vs. Marcel, I'm not aware of any research showing that one approach is better than the other. Since you have the data to do it both ways, why not do them both? Then use the simple WARcel calculation as a check on the more detailed Marcel approach. The thing I like about both Marcel and WARcel is that they are designed to be simple and don't make a lot of assumptions.
Here's what I gathered:
NeLs
-Current top fifty at each position by innings via the SH query machine
-I noted their defensive games at a position and their DRA
-For OFs, I only use their Range runs, not their arm runs (DRA's arm runs isn't good)
-The lowest number of career games at any position was 85
-Some players appear at multiple positions
-328 individual players
MLBs
-All players in my rankings (about 120-150 per position)
-Only seasons prior to 1965
-I have them season by season in my db, so I noted G, Rfield (minus Rof), and DRA range for each season at their primary position that season, which means that some players appear at multiple positions
-For consistency with the NeL sample, I removed all samples below 80 games at a position
-574 individual players
There's some minor differences in the samples, especially because for MLB guys I may not have a career's worth of information because I looked at it season-by-season but the NeL are total career values.
Here's the analytical stuff. Quick summary of the following tables:
-DRA values for MLBs were quite a bit higher than for NeLers on a per 154 basis
-DRA values were less widely dispersed for MLBs than for NeLers, except at catcher, on a per 154 basis
-Rfield values were quite a bit lower than DRA for MLBs on a per 154 basis
NEGRO LEAGUES | MAJOR LEAGUES
DRA/154 | RFIELD 154 DRA/154
POS AVG STDEV | AVG STDEV AVG STDEV
====================|==========================
C 0.8 7.7 | 2.0 5.4 2.7 11.3
1B -0.6 9.0 | 0.4 4.5 1.1 7.2
2B 2.4 13.0 | 1.6 7.8 2.9 11.0
3B -0.2 15.7 | 1.6 7.1 2.0 10.3
SS 1.0 13.0 | 2.7 8.3 3.9 10.9
LF 1.3 10.3 | 1.3 4.3 1.9 8.0
CF 1.6 12.3 | 1.4 4.6 2.2 9.2
RF 1.4 13.1 | 1.5 5.1 3.1 9.4
Then we can turn that into this chart:
MEAN | STDEVS
DRA/154 | DRA/154 AND RFIELD/154
|
mlbDRA | mlbDRA/ mlbRFIELD/
POS nelDRA | nelDRA mlbDRA
===================================
C 3.2 | 1.47 0.48
1B -1.8 | 0.80 0.62
2B 1.2 | 0.85 0.71
3B -10.1 | 0.65 0.69
SS 3.8 | 0.83 0.76
LF 1.5 | 0.78 0.54
CF 1.4 | 0.75 0.50
RF 2.2 | 0.72 0.54
-----------------------------------
AVG* 3.1 | 0.85 0.60
*Average for first column uses absolute values for negative figures.
From Chris’ post 37:
“If the DRA SDs for the NL/AL 1920-48 are similar to the DRA SDs for the NeL, 1920-48, then that would suggest that Heavy Johnson's fielding values need to be very heavily smoothed out to bring him into the range of values produced by the rfield system. However, if the DRA SDs for the NL/AL are smaller than the DRA SDs for the NeL, then that would suggest that a greater variance in NeL fielding numbers would need to be retained to reflect the fact that the NeL conditions allowed for a wider range of fielding skills than occurred in the contemporary NL/AL.”
Looks like we need to retain some of the variance Chris mentions. Except going the other way for catcher.
From Brent in post 38:
“Smaller sample size in the NeLg should contribute to larger standard deviations. We can estimate that effect using the standard square-root of n formula.”
A little help, please. Which STDEV should I be applying this to? I think you’re suggesting this but correct me, please, if I’ve misunderstood you:
STDEV of DRA/154 / SQRT(Number of players in sample)
“As Chris has suggested, if we have DRA data for major leagues (or Rfield estimates for NeLgs) we could figure out how much is due to the formula.”
Glad to do this, but I’ll need a suggestion for the calculation, please.
This is really nice work. Your results seems to confirm my guess that there was more dispersion of fielding ability in the NeLgs than in MLB, with the exception of catchers.
I was drafting a long comment on how to adjust standard deviations for sample size, then decided what I was proposing would be too complicated or not feasible (Hint: it involved calculating DRA from retrosheet data). But because you've limited your sample to players with at least 80 games at a position, the effects of differences between the leagues in sample sizes probably shouldn't be too large. I guess one simple way to check the effect of sample size would be for the MLB players with a lot more playing time than the NeLg players, throw out a few of the players' seasons at random to bring their sample size in line with what's in the NeLg sample, then see how much difference it makes to the variation. My guess is it wouldn't make enough difference to change the basic picture you've presented.
Again, my compliments on a nice little study.
I should add my thanks also. I can reason about statistics, but I can't do sophisticated statistical analysis. I am grateful that you are willing and able to do this work!
I am not understanding what the lefthand column in the second table shows, so if you have a moment to explain that data further, I'd greatly appreciate it!
The lefthand column in the second table got a little garbled. Sorry about that. It's simply the ratio of the means of mlbDRA/154 to nelDRA/154 at each position. At catcher, for example, the mean mlbDRA/154 is 3.2 times larger than the mean of nelDRA/154. The average at the bottom of the column indicates that taken together, the positional means average 3.1 times more mlbDRA per year than nglDRA.
Or to put it more simply, MLBs had DRA rates about 3 times as high as NeLs did, on average.
With these numbers calculated, there is the interesting question of how you might use them to calculate or adjust the fielding records of NeLg players for MLEs. It is not entirely obvious what should be done.
First, I'll recommend that you not pay too much attention to the differences in the "average" numbers in your tables. We know that DRA is designed to have a mean of zero for all players at a position, so what the averages show is that the players in your dataset (presumably good players with fairly longer careers) are typically slightly better fielders than the average player in their league and at their position. That's not surprising and not especially helpful for comparing the NeLg with MLB. I will focus more on the differences in standard deviations.
In your MLEs, I understand that you're planning to convert the DRA-based data to match the distribution for Rfield. I won't debate that decision, but for the rest of this post will ignore the conversion from DRA to Rfield and focus on the problems we face in simply comparing the two distributions of DRA data. I'll ask how we might make the NeLg DRA distribution equivalent to the MLB DRA distribution.
Focusing on the differences in standard deviation, I suggest three stories that one might tell:
Story 1. The higher standard deviations of the NeLgs are simply an artifact of the playing conditions they faced. They played fewer recorded games per season; there was more variation in the quality of the fields they played on, in the the quality of the umpiring, and in the quality of the equipment they used. These environmental factors could explain the higher variation. For this story, the solution is to just adjust (or standardize) the NeLg data to have the same means and standard deviations as the MLB data for each position and use the standardized data in the MLEs.
Story 2. The higher standard deviations in the NeLgs reflect more dispersion in actual fielding ability relative to MLB, but there is no reason to think that NeLg players were on average any different than MLB players. In this story, you'd just treat the NeLg fielding data as equivalent to MLB fielding data and not make any conversion. Under this assumption, I'll note that the wider dispersion of the NeLg distributions implies that its upper tail would show the best NeLg fielders playing better than the best MLB fielders. This seems contrary to what we've found in the MLEs for offense--that the best NeLg hitters (Gibson, Charleston) were similar to, but not better than, the best MLB hitters.
Story 3. This is the story I sketched in #38 above, in which the higher standard deviations in the NeLgs reflect a lower average quality of fielding, because NELg rosters encompassed many players who would not have been able to play in MLB along with many others who were definitely MLB-quality players. While this judgment may seem harsh, it is in line with what all of our MLEs have been telling us about offense--that the average quality of play in the NeLgs was lower than in MLB, so the raw NeLg statistics need to be adjusted downward to make them equivalent. The problem is, how do we know how much to adjust them?
For offense, Chris Cobb was able to look at batting statistics for players who appeared in both the NeLgs of the 1940s and in MLB during the early years of integration. I think it would be difficult to apply this approach to fielding statistics. You'd have to look at multiple positions. Players often played different positions in the NeLgs than they did in MLB. (JRobinson was famously a shortstop for the Monarchs; Doby was a second baseman for the Newark Eagles.) When you do find matches, sample sizes tend to be small, and I will guess that fielding statistics require larger samples to stabilize than batting statistics (though I don't know that for a fact). Sometimes players spent two or three years in the minors between their last NeLg appearance and starting MLB play, which raises the possibility that aging might affect the comparisons. So, I'm not optimistic about being able to do a conclusive study on this subject, though it would be interesting to see how far the data might take us.
So, to the extent we believe Story 3 to be the most accurate one, I will suggest an ad hoc method for adjusting NeLg DRAs to MLB equivalents. For RF, your data show the NeLgers with a mean per 154 games of 1.4 and a stdev of 13.1. For the MLBers, the mean is 3.1 and the stdev is 9.4. Let's say we think the top 20% of NeLgers were about as good as the top 20% of MLBers. If we use a normal distribution with mean 3.1 and stdev 9.4, the 80th percentile of the MLB distribution is 11.0 [in Excel, use norm.inv(0.8,3.1,9.4)]. For the NeLg players, the 80th percentile calculated the same way is 12.4. If we keep the same standard deviation and want the 80th percentile to be the same, we need to subtract 1.4 from the DRA for all of the NeLg players as a "quality adjustment" for the lower level of play. The adjusted DRA data would then be treated as equivalent to MLB DRA data, and no further standardization would be made.
A note--rather than assuming a normal distribution, I think it would be preferable to calculate the 80th percentiles from your actual data. We don't know that the data are normally distributed, so it's probably better not to assume it when we don't have too.
This method for Story 3 is admittedly ad hoc. You may ask, why the 80th percentile, and not the 70th or 90th? I don't have a good reason; it just seemed reasonable to me to try to keep the top end of the talent distribution aligned, and quintiles are pretty commonly used for describing data distributions. Unlike batting statistics, where we can make multiplicative adjustments to rate statistics like batting average, wOBA, and and OPS, the fielding statistics are centered around zero, so any quality adjustment is likely to require a subtraction. This just seemed like a plausible way to come up with such an adjustment and would only slightly change the picture from the available NeLg DRA data.
How do we decide which of the three stories to believe? I don't think Story 2 (larger std dev in the NeLg, but no quality adjustment) makes sense, so I would discard it. I think that both Stories 1 and 3 do make sense. What I would probably do if I were constructing MLEs myself is some mixture of the two--maybe calculate MLE fielding statistics with the standardization as in Story 1, then with the quality adjustment as in Story 3, then take an average of the two figures. But you can use your judgment as to which story seems most plausible to you.
Would a process of straight subtraction as the quality of play adustment and then norming the DRA range of values to the Rfield range of values work as appropriately and fairly for below-average fielding as for above-average fielding? I don't immediately see that it wouldn't, but I think it's worth asking the question explicitly. Will a conversion rate based on one end of the fielding spectrum work fairly across the whole fielding spectrum?
I don't think I was suggesting that Dr C skip the step of norming the DRA range of values to the Rfield range. I was looking at the more basic issue of how to translate NeLg DRA values to an MLB DRA context. I was assuming that the next step would be to adjust those MLE DRA numbers to a range appropriate for Rfield. (I'll mention, however, that after reading Wizardry, my own opinion is that it might be better to substitute DRA for Rfield for the MLB players, which I guess is what the gWAR numbers from Baseball Gauge did.)
Will a conversion rate based on one end of the fielding spectrum work fairly across the whole fielding spectrum? I'm not quite sure how we measure "fair," but my proposal would just be subtracting a relatively small number (1.4 runs per 154 games) from the DRA fielding records of all NeLg players. And if you re-norm those numbers to Rfield, it would wind up being an even smaller adjustment, probably less than a run a year. I don't think that should cause too much distortion.
One player we elected before the Seamheads fielding data became available who might have been adversely affected by those data is Cool Papa Bell. His batting record was always pretty marginal, but I think many of us voted for him assuming that his legendary speed must have resulted in outstanding defensive play. But the DRA data now available suggest that after about age 25 he was only an average outfielder. If I had those elections to do over again, I don't think he'd make my ballot.
hell, I can't even do that, Chris!
all I can contribute is that while I am not a "touchy/feely" fellow by nature, the fair analysis - back and forth - on any Negro League candidate that continues to this day, well, I find that very gratifying.
Not many may remember it, but in the dawn of the HOM creation discussion, there were some who thought there was no realistic way to compare white players with Negro Leaguers, so.....
Mind you, I never sensed even a hint of racism in that sentiment. it was absolutely NOT that. it struck me as data-driven challenges, and how to possibly balance the merits of players who didn't face each other except in exhibition games (which I always have found quite relevant, but I digress).
but we were able to prevail - for me, personally and for others, it was specifically on the idea of "wait, are we talking about segregating these great baseball players AGAIN?" hell no.
so we gamely soldiered on, having to do our best with what incomplete data we had, and rate the 15 best per year as best we could.
now the information available has improved greatly, and we can talk about an HOM electee who got in more than 15 years ago in real time and say, maybe it was a mistake - and if so, then so be it.
far more importantly, finally it's an 'even playing field' that these greats never got when they were alive.
so fellows, keep up the good work !
[strong]DRA/154 QUINTILES BY POSITION[/strong]
[strong]MLB[/strong]
QUIN* C 1B 2B 3B SS LF CF RF
1 6.3 6.5 12.8 8.0 11.1 8.3 10.5 10.9
2 2.5 2.3 3.8 4.4 7.7 4.1 2.6 5.2
3 -0.3 -0.9 -0.4 0.2 2.6 -0.2 -0.3 0.4
4 -4.4 -4.2 -5.0 -5.3 -4.4 -4.9 -6.1 -4.1
I often label percentiles in the wrong order. These labels mean that 80% of values fall beneath the first quintile.
[strong]NeL[/strong]
QUIN* C 1B 2B 3B SS LF CF RF
1 6.5 6.3 12.7 11.5 11.8 9.2 12.4 10.0
2 2.6 1.6 4.2 1.7 4.8 5.2 4.0 5.4
3 -0.4 -2.8 -1.3 -2.2 -1.5 -1.6 -1.4 0.3
4 -5.5 -8.4 -7.1 -11.5 -8.6 -4.4 -7.0 -8.6
Brent, based on this information, and with the caveat that I don’t think the data is normally distributed because we’re only dealing with better players, I THINK you’re either recommending one of two approaches:
First
Player’s NeL DRA - (NeL 80th percentile – MLB 80th percentile)
For Heavy that means:
-6.2 DRA/154 – 0.9 = -7.1 DRA/154
Second
(Player’s NeL z-score * MLB STDEV )+ MLB mean – (NeL 80th percentile – MLB 80th percentile)
In Johnson’s case, using the figures in post 42:
(-6.2 DRA /154 – 1.4 NeL RF mean) / 13.1 STDEV = -0.6 z-score
(-0.6 z-score * 0.5 MLB RF STDEV) + 3.1 MLB RF mean – (10 DRA/154 – 10.9 DRA/154)
reduces to
-0.3 + 3.1 – 0.9 = 1.9 DRA/154
Since the second version isn’t giving us a very intuitive result, I’m guessing that’s not what you were recommending, and that the first is. But I wanted to be sure before I moved any further.
Thanks!
The first approach looks right--yes, that's what I was expecting.
In the second one, the math looks right, but the result doesn't seem to make sense. I think the reason is that, if I understand how bb-ref calculates rfield, it should average to zero for each position. But the average for the RFs in your sample is +3.1, which means that they are notably better than average. My guess is that is due to your sample consisting of long career players who tended to be better than average defenders. But it results in Johnson being assigned an MLB equivalent rfield score of +1.9, which doesn't seem right for someone who is below average in the NeLg statistics. I guess one possible way to "fix" the numbers so it wouldn't do that would be to leave out the adjustment to the mean when calculating the z-scores (that is, -6.2DRA/154 / 13.1 STDEV, assuming the "correct" mean is zero, and also leave out the +3.1 for the MLB term). It would be kind of unusual, but maybe we could justify it by saying that we think the true means should be zero.
I don't know; what do you think?
I've been following this discussion with interest since Heavy Johnson's case hinges on his fielding (if one buys Eric's MLE for him).
It looks like there's a calculation error in #52. It should be -0.9 because the 80th percentile DRA is lower among NeL RF than AL/NL RF, resulting in -5.3 DRA/154 for Johnson.
You might get more intuitive results by using the 80th percentile's distance from the mean in each sample. In this case, you'd subtract 0.8 from Johnson's DRA/154 (RF 80th percentile: 8.6 from NeL mean (10.0 - 1.4); 7.8 from AL/NL mean (10.9 - 3.1)).
But I was also thinking you can drop the worst fielders from your NeL sample so that the NeL and AL/NL samples are equal in average DRA/154 at each position. Then, you would separate the truncated NeL data into quintiles and compare the truncated 80th percentile of NeL DRA/154 to the 80th percentile of your AL/NL sample.
It points to Dr. writing the wrong words but doing the right math, all taken together.
Why? What would this approach accomplish? It would show you how much more or less dispersion is present in a weirdly truncated sample of NeL fielding values compared to a non-truncated MLB sample. Which gets us...not anywhere useful that I can discern. Am I missing something?
I also forgot to mention, however, that I still feel it’s important to regress fielding rates toward zero/mean when we don’t have much fielding data. Johnson had 262 games in RF, so my approach would be to regress about 15% toward zero. I get 15% because I favor a minimum 308 game sample (two full years of fielding data). Then I would proceed to the calculation Brent recommended using the regressed rate rather than the “raw” career rate.
A) find DRA/154 for NeL player
B) perform Brent's recommended calculation on (A) (subtracting difference in mean of 80th percentiles (I used the absolute value of the difference, spitballing that the differences between the leagues might also be a result of other defensive-spectrum pressures, not just quality of play)
C) regress (B) for samples under 308 games (as noted in post 59)
D) transform (C) from DRA to Rfield ( C * ( mlbRfieldmean / mlbDRAmean))
I tried to pick guys for whom we have a lot of information, who are current candidates, or who represent extremes. LMK what you all think!
Heavy Johnson
Current: -21 career Rfield
Test: -45 runs
Est. reduction in MLE: 2.5 WAA/WAR
This total would effectively tie him with Chuck Klein and Harry Heilmann as the worst RF by Rfield through WW2.
Newt Allen
Current: 167 runs
Test: 262 runs
Est. increase in MLE: 10 WAA/WAR
The reason this jumps so much is that under my current system, his fielding rate (35/154) is chopped by two-thirds to 11/154. Under the test scenario, there's virtually no difference in 80th percentile means at 2B (0.1 runs/154). The new transformation to Rfield knocks him down only about 40% from 35/154 to 19/154. Because 20/154 is an insanely good total in Rfield, a player with that career average would be amazing. Mark Belanger, for example, was a 18/154 shortstop in just 2016 games. 262 runs would be 120 more than Frankie Frisch's career total at 2B, which was the best through WW2.
Ben Taylor
Current: 69 runs
Test: 62 runs
Est. decrease in MLE: 0.5 WAA/WAR
Jud Wilson
Current: 26 runs
Test: 58 runs
Est. increase in MLE: 4 WAA/WAR
Current system knocks Wilson down by about two-thirds at both 3B and 1B, test system by about 40-50%. Both the current and test figures run contrary to his defensive reputation in the narrative.
Biz Mackey
Current: 17 runs
Test: 75 runs
Est. increase in MLE: 6 WAA/WAR
Mackey would place him 5th from 1871-1945 among catchers, right behind Hartnett's 78 Rfield. This aligns with his defensive reputation.
Julian Castillo
Current: -2 runs
Test: -22 runs
Est. decrease in MLE: 2 WAA/WAR
This would be in keeping with Castillo's reputation as a plodder.
Dobie Moore
Current: 96 runs
Test: 207 runs
Est. increase in MLE: 11 WAA/WAR
Current system knocks him down by about two-thirds, test system would only knock him down by 30 percent, hence the big jump. Like Allen's, this result is way out of scope for Rfield. Joe Tinker leads pre-1946 players with 162 Rfield; he had 155 in his best ten years, Moore would dwarf that in his only ten years.
Mule Suttles
Current: 26 runs
Test: 21 runs
Est. decrease in MLE: 0.5 WAA/WAR
John Henry Lloyd
Current: 39 runs
Test: 73 runs
Est. increase in MLE: 4.0 WAA/WAR
This would be more in keeping with Lloyd's defensive reputation.
Burnis Wright
Current: 43 runs
Test: 90 runs
Est. increase in MLE: 5.0 WAA/WAR
This would make Wright the highest Rfield total through 1945 (yes, I know his career went past that year) by 13 runs over Harry Hooper.
Oscar Charleston
Current: 28 runs
Test: 61 runs
Est. increase in MLE: 3.0 WAA/WAR
This is more in keeping with Charleston's reputation as a fleet centerfielder.
John Beckwith
Current: -13 runs
Test: -39 runs
Est. increase in MLE: 2.5 WAA/WAR
This total aligns more closely with his defensive reputation.
Pete Hill
Current: 32 runs
Test: 95 runs
Est. increase in MLE: 6.0 WAA/WAR
Tris Speaker leads MLB CFs with 92 Rfield, so I suspect this is puffy.
Cool Papa Bell
Current: -28 runs
Test: -94 runs
Est. decrease in MLE: 6.5 WAA/WAR
This would make him the worst CF in MLB history through the war by -60 runs.
Last one...
Hurley McNair
Current: 37 runs
Test: 68 runs
Est. increase in MLE: 3.0 WAA/WAR
This would put him 5th among corner OFs through 1945, behind Fred Clarke, Hooper, Sheckard, and George J. Burns.
Generally, I'm feeling kind of hit-and-miss here. Seems like we're getting some results that hew more closely to the defensive reputations we've read about, some results that are out of scope for Rfield entirely, and some that merely confirm what the current method suggests. One thing we can say is that the current method's findings are not contradicted by the test method: the good fielders are good in both, the bad fielders bad in both.
One thing worth noting here, is that the Rfield transformation does what it's supposed to do, which is reduce DRA down to Rfield's size. I don't believe that's having any particular distorting effect. I think the distortions we see enter earlier in the procedure.
I'm encouraged that most of these changes move in the right direction. But if you want to stick having numbers that are comparable with Rfield, it would probably be advisable to shrink the range some more so that Allen, Moore, Wright, and Hill don't go outside the MLB range. (I wouldn't consider a finding that Johnson is comparable to Klein and Heilmann to be surprising.)
And Bell's numbers--ouch! The Seamheads data only show him at -20.6 runs as an outfielder over his career, but I guess the quality of play adjustment is bigger in center field than at some of the other positions.
For example, in LF instead of using means of 1.3 for Rfield and 1.9 for DRA (which results in a discount of 32% off DRA, i would square those rates to get 1.69 for Rfield and 3.61 for DRA, which would result in a steeper 54% discount. That feels more in line. But I’d ask our more stats-trained members to comment on whether that’s a good idea or if there’s a better way. Thanks!
I thought converting DRA to its equivalent in Rfield requires a proportional reduction in variance between the two metrics (second chart, third column in #42), but your process in #61 appears to do away with that step.
I've been puzzling over this in my mind because estimates of NeL fielding value/quality come down to how we account for the variance in their DRA estimates. As I see it, there are at least three systematic factors increasing NeL variance in career DRA relative to White MLB:
1. Shorter seasons, which increase variation in estimates of player value.
2. Worse field conditions, which increase variation in estimates of player value.
3. Greater variation in player talent independent of Factors 1 and 2.
As you know, variation in player value is not perfectly related to variation in player talent. We can't make any inferences about relative NeL/White MLB quality of defensive play without first addressing Factors 1 and 2. Therefore, I'd suggest the following process for generating NeL Rfield/154 estimates:
1. For NeL players in your sample with fewer than 308 career games, regress their career DRA/154 as you are already doing. This is probably the only reasonably simple way to account for Factor 1, given inconsistent season lengths and incomplete data coverage within player careers.
2. Using the regressed data, calculate the positional means and standard deviations in NeL DRA/154.
3. Standardize NeL DRA/154 data based on the corresponding positional standard deviations in White MLB, dividing the White MLB SD by the NeL SD. This addresses Factor 2.
4. Add the standardized NeL DRA/154 to the NeL positional mean DRA/154. Steps 3 and 4 allow you to approximate an NeL player's fielding value under MLB playing conditions.
5. Calculate the 80th percentile of NeL DRA/154 based on this regressed, standardized distribution.
Regressing small samples and standardizing the NeL distribution to the White MLB distribution means that any differences between the NeL and White MLB 80th percentiles will be mainly attributable to greater variation in NeL player talent, underlying differences in NeL and White MLB sample characteristics, or some combination of both factors. Therefore, if you don't think differences in sample characteristics introduce bias into the comparison between NeL and White MLB estimates, you can go ahead and compare 80th percentiles because you can safely assume that a higher regressed, standardized NeL DRA is primarily the result of greater variation in player talent.
6. Take the White MLB 80th percentile in DRA/154 at the NeL player's position and subtract the corresponding regressed, standardized NeL 80th percentile. This functions as a quality of play adjustment.
7. Add the difference to the NeL player's DRA/154.
8. Convert to Rfield by multiplying the result from Step #7 by the ratio of the White MLB SDs in Rfield and DRA at the NeL player's position.
Player's regressed DRA/154 * regressed mlbSTDEV / regressed nelSTDEV?
Or
(Player's regressed DRA/154 / regressed nelSTDEV) * regressed mlbSTDEV / regressed nelSTDEV
Or
nel regressed mean DRA/154 * regressed mlbSTDEV / regressed nelSTDEV
I could read it a couple different ways, so I want to be sure I know what you're saying. Thanks!
Doesn't that reduce to
(Player's regressed DRA/154) * regressed mlbSTDEV
Only NeL positional STDEVs should be calculated from regressed DRA/154 because you're trying to account for any inflated variation introduced by shorter, volatile season lengths.
My teaching evaluations often say I'm unclear, so I hope this helps!
In my proposed method, I'm proceeding on the assumption that calculating NeL positional STDEVs from regressed DRA/154 data is sufficient enough to account for any variation added by shorter, more volatile schedules. Yet in making this assumption, I still expect that NeL positional STDEVs based on regressed DRA/154 will be higher than MLB positional STDEVs due to some combination of inferior field conditions and greater variance in player talent. If this expectation holds, then multiplying by the MLB positional STDEV is a way of expressing what NeL fielding values would look like without these two sources of increased variation.
Current = what I have now
Test 1 = same as post 61, based on Brent’s suggestions
Test 2 = latest test, based on James Newberg’s suggestions
Heavy Johnson
Current: -21 career Rfield
Test 1: -45 runs
Test 2: -53 runs
Test 1 feels closer to the mark than Test 2. Test 2 is more extreme by 8 runs and pushes beyond the Heilmann/Klein lower limits of Rfield.
Newt Allen
Current: 167 runs
Test 1: 262 runs
Test 2: 299 runs
Current system gives most realistic results. Both test 1 and 2 are way out of scope.
Ben Taylor
Current: 69 runs
Test 1: 62 runs
Test 2: 115 runs
Current and Test 1 both place Taylor 4th through 1945 at 1B. Test 2 is well outside scope for Rfield (Fred Tenney leads at 910).
Jud Wilson
Current: 26 runs
Test 1: 58 runs
Test 2: -2 runs
Test 2 is closest to defensive reputation. Test 1 is surprisingly good. Current is smack dab between them.
Biz Mackey
Current: 17 runs
Test 1: 75 runs
Test 2: 64 runs
Test 1 and 2 both match Mackey’s reputation, the former placing him 5th prior to 1946 and the latter 8th (just above Al Lopez for reference’s sake).
Dobie Moore
Current: 96 runs
Test 1: 207 runs
Test 2: 222 runs
Current is clear winner. Tests 1 and 2 are way outside scope. Tinker leads with 162 Rfield prior to 1946. Current places Dobie 13th, which given his short career makes good sense.
Mule Suttles
Current: 26 runs
Test 1: 21 runs
Test 2: 34 runs
These are all very similar and say essentially the same thing about Suttles: He was more than just a plodding slugger.
John Henry Lloyd
Current: 39 runs
Test 1: 73 runs
Test 2: 108 runs
Lloyd is reputed to be “The Black Wagner,” and Honus had 85 Rfield at SS. Current looks low, Test 1 is in the area of Wagner, Test 2 would place Lloyd in 11th place prior to 1946. Either of tests 1 or 2 could be viable depending on one’s interpretation of Lloyd’s career.
Burnis Wright
Current: 43 runs
Test 1: 90 runs
Test 2: 74 runs
Test 1 seems out of scope (Hooper highest with 77). Test 2 would make Wright the second-best defensive RF before 1946. Current places him 7th.
Oscar Charleston
Current: 28 runs
Test 1: 61 runs
Test 2: 58 runs
Tests 1 and 2 reach the same conclusion, both match his defensive reputation better than Current.
John Beckwith
Current: -13 runs
Test 1: -39 runs
Test 2: 6 runs
Beckwith’s fielding is all over the place, so it’s hard to pin him down. Generally, if you think he’s in the Heavy Johnson/Dick Allen/Harm Killebrew camp, you’ll prefer Test 1. If you think he’s probably about average, Test 2 is for you. Current essentially lies between them.
Pete Hill
Current: 32 runs
Test 1: 95 runs
Test 2: 115 runs
I think Current captures Hill the best of these three. Reputation-wise, Hill is usually not thought of as the best defensive CF in the NeL (that’s usually Charleston), but that’s what Tests 1 and 2 are suggesting. Furthermore, they both exceed Tris Speaker’s leading 92 Rfield for the period.
Cool Papa Bell
Current: -28 runs
Test 1: -94 runs
Test 2: -19 runs
Cy Williams “leads” all CFs with -34 Rfield prior to 1946. Test 1 is clearly not a tenable answer. Current puts him in the 6 worst. Test 2 ties him for 12th worst.
Hurley McNair
Current: 37 runs
Test 1: 68 runs
Test 2: ~55 runs (he wasn’t in data set for test 2 for CF)
Current lies 9th prior to 1946. Test 1 is second to Hooper. Test 2 (which simply ballparks his CF runs at +5 career) would be 4th best just behind Sam Rice.
Tallying all this up, here’s what I’m seeing.
1) Current is generally more moderate in its results and doesn’t seem to go outside the scope of Rfield on extreme/complicated players. That’s likely because I’ve tweaked it over time to avoid that.
2) Test 1 and Test 2 tend to overinflate/overdeflate more extreme players.
3) Test 2 is probably a little more bullish than Test 1.
My intuition tells me that Test 2 is more complicated that either Current or Test 1, but it’s not producing better results, so it’s probably better to use Current or Test 1 with modifications to whichever works out better.
Test 1’s use likely requires a modification that reduces the result by more than the ratio of the MLB means of Rfield to DRA. That could be math transformation like squaring the those means or using different terms than the MLB means.
Current’s use likely requires a modification to the final DRA-to-Rfield conversion, which is currently STDEVmlbRfield / STDEVNeLDRA. But I’m not sure what the best mod would be and am very open to suggestions.
For Johnson, I don't think those fielding numbers are bad enough to make him unworthy of HOM - I think only thing that would do that is treatment of his Wrecker seasons, if they end up being discounted a lot more.
P.S. exciting stuff with the new website!
The current seems to provide the most reasonable answers. But it doesn't mean it's a well-constructed system, sadly. It's the last step in it that's troublesome:
Regressed DRA/154 * ( STDEV mlbRfield / STDEV NeLDRA)
It's kinda sorta in the ballpark, but....
So, I'm looking at Heavy via the Marcel technique Brent suggested. There are two places in the process where this could be performed.
a) Before: I solve for the Marcel z-score of a hitter's NeL wOBA, and then make all the usual adjustments.
b) After: Just use the MLE Rbat outputs as the weighted Marcel inputs.
These come out with fairly different results. Here's how the calculations look.
BEFORE---solving for 1921 with actual z-scores
1922: 1.663 z-score, 263 PA
1923: 1.487 z-score, 489 PA
1924: 0.482 z-score, 302 PA
League average assumed to be 0 z-score
Weighted z-scores: ((1.663 *5)+(1.487*4)+(0.482*3))/12=1.313
Weighted PA: (263*5)+(489*4)+(302*3) = 4177
r: 4711 / (4177+1200) = 0.776
Regression: (1.313 * 0.776) + ((1-0.876)*0) = 1.019
Age adjustment: (0.991 * 1.150) = 1.001
When I run this through the sausage mill, I get 28 Rbat and, eventually, 4.4 WAR
AFTER---solving for 1921 with MLE Rbat/PA
1922: 77.5 Rbat, 590 PA
1923: 40.4 Rbat, 640 PA
1924: 65.3 Rbat, 600 PA
League average assumed to be 0 Rbat
Weighted Rbat: ((77.5*5)+(40.4*4)+(65.3*3))/12 = 61.7
Weighted PA: (590*5)+(640*4)+(600*3)=7310
r = 7310 / (7310+1200) = 0.859
Regression: (0.859 * 61.7) +((1-.0859)*0) = 53.0
Age adjustment: (0.991 * 53.0) = 52.5 Rbat
When I plug this into the existing MLE I get 6.8 WAR.
So as a reminder, this is what we're looking at:
BEFORE method
1921 (26): 28 Rbat, 4.4 WAR
1922 (27): 77 Rbat, 8.6 WAR
1923 (28): 40 Rbat, 5.1 WAR
1924 (29): 65 Rbat, 7.7 WAR
1925 (30): 3 Rbat 1.3 WAR
1926 (31): 38 Rbat (leads NL), 5.2 WAR
1927 (32): 48 Rbat, 6.2 WAR
AFTER method
1921 (26): 52 Rbat, 6.8 WAR
1922 (27): 77 Rbat, 8.6 WAR
1923 (28): 40 Rbat, 5.1 WAR
1924 (29): 65 Rbat, 7.7 WAR
1925 (30): 3 Rbat 1.3 WAR
1926 (31): 38 Rbat (leads NL), 5.2 WAR
1927 (32): 48 Rbat, 6.2 WAR
I greatly appreciate any feedback anyone wants to provide re which method makes more sense, Before or After. Thanks!
BTW: I relied on this post at Triples Alley. It's more specific than Tango's original instructions.
Thanks, Cookie! I feel like I want to pick one or the other rather than average them. The reason is that I feel like they represent two slightly different theoretical approaches. The BEFORE approach says that it is better to use the actual performances, the AFTER says that it is better to use the translated, already adjusted performance. One of these two is more theoretically accurate. But which? The reason this becomes crucial is that earlier in Heavy’s career, there are years where there is no actual adjacent data to rely on for a Marcel so we’d need to base a Marcel on career rates.
After a lot of thought, I'm coming down on the BEFORE side instead of the AFTER side. The reason being that seasons included in the AFTER inputs could have skewing effect. For example, in 1926, I have Heavy Johnson capped at 38 Rbat because that's what the NL's leader earned. But in 1922, the NL leader was in the 90s. While Hornsby in 1922 was much further from the second-place finisher than in 1926, it leaves more room for the results to skew up or down due to factors outside the player's (in this Heavy's) control. So, using the BEFORE treatment isolates him compared to the league more effectively.
One condition that bears mentioning is the instance of a season that has no data surrounding it or limited data. Here are several cases:
1) No data for year n, n+1, n+2, or n+3: In this case, I would use the data from prior seasons and perform a normal Marcel (the one I wrote up earlier is backwards Marcel)
2) No data for year n, n-1, n-2, n-3: This is what we've already talked about
3) Data in none of the surrounding seasons: I would use career average, adjusted for age in the same way Marcels do.
4) Extremely low sample of data (under 200 PA in surrounding data): Use Marcel in whichever direction has more data, and add n PA of career average to reach 200 PA.
5) Data only in some of the surrounding seasons. Let me flesh this one out a little. First, this is assuming that the player didn't miss the season due to injury; in other words that he was active but we don't currently have a usable record of that play. This is the procedure I'm thinking of:
A) Guesstimate playing time: Player's career G / Team G * length of MLB schedule
B) For missing seasons, use the player's career rate of performance
C) Calculate the Marcel in whichever direction has the most seasons of data. If they are the same in terms of most seasons of data, use whichever one has more adjacency of data. If they are the same adjacency, use whichever data contains more PA. If they have the same PA, just use the normal Marcel method.
Let's look at Heavy. His career, unadjusted z-score for wOBA, the metric I'm using, is 1.34. He played in 0.768 of all team games during his career.
[code]
YEAR AGE METHOD Rbat Prev
==============================
1916 21 3 11 16
1917 22 3 14 20
1918 23 3 17 23
1919 24 4 14 25
1920 25 -- 8
1921 26 2 32 28
1922 27 -- 77
1923 28 -- 40
1924 29 -- 65
1925 30 -- 3
1926 31 -- 38*
1927 32 -- 48
1928 33 -- 19
1929 34 1 18 14
1930 35 -- 1
1931 36 -- 19
1932 37 -- 10
-----------------------------
TOTAL Rbat 434 453
TOTAL WAR 64.5 67.0
Feedback welcomed!
You must be Registered and Logged In to post comments.
<< Back to main