Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Friday, August 24, 2018

Posnanski: Baseball 100 Rules

In this era of reboots, it was perhaps inevitable that Joe Posnanski would take another crack at the 100 greatest players in major league history. 

The Baseball 100 is more than just a ranking system to me. The difference between my 78th ranked player and my 212th ranked player is so miniscule that it’s mathematically irrelevant. With one slight adjustment, I could have those two players switch places.

Nearly all of the series is to be pay walled, but Zach Greinke is No. 100 on the list.

In the original version of this list, I included a bunch of Negro leaguers — I can tell you that four were in my Top 20. I still believe this. But Negro leaguers will now be a major part of my corresponding Shadowball 100….It’s an eclectic list that includes players who are, in their own ways, larger than life.

No. 100 on this list is Duane Kuiper.

 

 

Rennie's Tenet Posted: August 24, 2018 at 08:01 AM | 272 comment(s) Login to Bookmark
  Tags: history, joe posnanski, joe posnanski top 100, reboots

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 3 of 3 pages  < 1 2 3
   201. McCoy Posted: August 30, 2018 at 10:05 AM (#5736180)
Flip



Hitting a lazy 325 foot flyball is not a good thing. You don't want your hitters to do that. But once every so often that 325 foot lazy flyball goes over a wall or falls out of a fielder's mitt. Recording what happened in that play does not mean one should encourage more lazy 325 foot flyballs.


A speedy weak groundball right handed hitter is going to be not as bad as a slow weak groundball left handed hitter. Knowing this and choosing accordingly does not incentive weak groundball hitting.
   202. Endless Trash Posted: August 30, 2018 at 10:52 AM (#5736228)
This does a pretty good job of estimating total runs scored in a given league year:

R = S*.595 + D*.735 + T*1.251 + HR*1.71 + BB*.421 + SB*.0766 + E*.551 - Outs*.1324 - DP*.438 - CS*.1562 - 249

We could look at how the error term differs over different decades, I suppose? Is that what we are interested in?
   203. Mefisto Posted: August 30, 2018 at 10:59 AM (#5736236)
We already credit routine fly balls that leave the park (see Dusty Rhodes). I'm not sure what characteristic of a hitter we want to "credit" for hitting a routine fly ball that happens to be dropped.

I can see the argument for errors on GB. Logically, it makes sense to think that speed might be a factor. I'm not sure we will see much separation on that because (a) I think the events are relatively rare; (b) I think the issue of speed will be confounded with the issue of the extent to which a hitter is a FB hitter or a GB hitter (that is, fast runners might be FB hitters and therefore hit very few weak GB -- see Mickey Mantle). But as I said, I don't see any reason not to account for it, assuming we do it the right way, if for no other reason than to get real data on this point.
   204. Rally Posted: August 30, 2018 at 01:48 PM (#5736412)
Do infield errors happen more on weakly hit ground balls or are they harder hit balls that a fielder has less time to react to?

The data are available on Baseball Savant if someone has the time and motivation to check. Just download the dataset and compare the average exit velocity of ground ball outs to ground ball errors. Average might not be the best way to look at it, maybe break down exit velocity into buckets (soft, medium, hard, whatever) and see how the error rate compares.
   205. Rally Posted: August 30, 2018 at 01:56 PM (#5736419)
I've generally found the characteristics of hitters who get more ROE to be:

1. Right handed batters
2. Good speed
3. Hit the ball hard - guys with at least some pop in the bat
4. Hit a lot of ground balls
5. Known for hustle

Derek Jeter ticked all the boxes, and he reached on 194 errors over his career. Jim Thome was a slow lefty hitter who hit the ball in the air. Over a similar long career he hit into 70 errors.

Another long career guy is Omar Vizquel. In these categories he differed from Jeter in 1) not hitting the ball as hard and 2) the majority of his AB were as a lefty. He hit into 112.
   206. Mefisto Posted: August 30, 2018 at 02:49 PM (#5736463)
Do infield errors happen more on weakly hit ground balls or are they harder hit balls that a fielder has less time to react to?


Could go either way, I think. Hard hit balls may get turned into DPs more often, but could lead to E too. I'm not sure what the net would be.

One of the arguments in favor of ROE is that speedsters would have an advantage we should recognize. That's why I kept referring to "weak GB"; on a hard GB that wasn't kicked, only Ichiro seemed able to beat them out.

Checking some names randomly, Aaron had 203 ROE, which fits your suggestion in 205. Ichiro had just 129. Obviously, a lot of Ichiro's GB became H, and error rates were higher when Aaron played, but it seems as if the criteria in 205 are pretty good.
   207. Rally Posted: August 30, 2018 at 03:46 PM (#5736501)
Ichiro had everything going for him except being a lefty. Wade Boggs (131) and Tony Gwynn (139) had very similar ROE totals to Ichiro, as long career, high average lefty singles hitters, despite not having his speed. Gwynn was fast for the first half of his career, Boggs played his whole career without speed.

Clemente: 188

Gary Sheffield had 113, which is less than I would have guessed as his name always came up as one of the guys whose quick bat made playing 3B very uncomfortable.
   208. Mefisto Posted: August 30, 2018 at 04:00 PM (#5736508)
Here's a combo for you: Jack Clark had 119, Willie Wilson had 107.
   209. Sunday silence Posted: August 30, 2018 at 04:41 PM (#5736530)
Willie mcGee 147; about 1.7% rate. Another outstanding rate. He too is RH
   210. Rally Posted: August 30, 2018 at 06:25 PM (#5736592)
Willie was a switch hitter
   211. Mefisto Posted: September 04, 2018 at 10:51 AM (#5738172)
Ok, at the suggestion of several people here, I and a friend of mine (who had the stats package) ran some numbers to try to get at the influence of errors on offense. We used a simplified model in which rBat= S+D+T+HR+BB-K-DP, using the per game data from BBREF for the period 1901-2017. It's simplified in part for time reasons and in part because we didn't want to get into estimations where we lacked data. It's also simplified because the goal was to estimate the impact of errors. That means, for example, that we used K rather than outs, because an error is recorded as an out for the batter and we needed to avoid that.

To give everyone a sense of how accurate the simplified model is, here are the coefficients for these events as calculated by Pete Palmer: .47S+ .78D + 1.09T + 1.4HR + .33(BB+HBP) - .72GIDP.

For comparison, here are the numbers we got for total runs/game: .42S + .88D + 1.6T + 1.81HR + .29BB - .14K - 1.79DP. All coefficients rounded. If we had added in HBP, the results would have been a bit closer to Palmer.

In my view, these results by us were reasonably accurate for a designedly-simplified model, so we went ahead with two different ways to approach the issue of errors.

The easiest way was to simply add errors to the list of offensive events, on the theory that errors cause runs to be scored. The impact was pretty dramatic: .21S + 1.15D + .82T + 1.8HR + .21BB + .59E - .15K - .79DP. Errors create more runs than singles and walks combined. Note that this is an average over 117 years; I doubt errors cause that many runs today, but they probably caused even more in high-error periods such as the Deadball Era.

The second way to get at errors was to take my suggestion above and re-run the coefficients on the basis of earned runs only (which Rally didn't like, but hey, it was my idea). Here are the results: .21S + 1.1D + .5T + 1.66HR + .22BB - .53DP - .14K.

So. Any thoughts/comments?
   212. Mefisto Posted: September 04, 2018 at 12:19 PM (#5738238)
I'm gonna keep this on the main page for a bit to give anyone the chance to respond.
   213. GuyM Posted: September 04, 2018 at 12:20 PM (#5738239)
You won't get the right values using regression to generate the coefficients. You can see that just by looking at your coefficients: a single and a walk can't have equal value, and a double can't be more valuable than a triple. For that matter, a ROE can't be 3x as valuable as a single.

Many people have tried generating coefficients via regression, and invariably they get the wrong values. What you have to do instead is calculate the average change in run expectancy (based on base/out situation) for each offensive event. That should give you values similar to Palmer's.
   214. Mefisto Posted: September 04, 2018 at 12:27 PM (#5738246)
Hm. Ok, that may take some time and energy. I hate it when that happens.
   215. GuyM Posted: September 04, 2018 at 12:36 PM (#5738254)
Tom Ruane famously did this years ago at Retrosheet.

Seems like this would be a huge amount of work, especially considering that 1) the value has to be extremely close to the value of a single; and 2) this must have been calculated previously by others (e.g. Fangraphs must have a value they use when calculating any linear-weight-based metric).
   216. Mefisto Posted: September 04, 2018 at 01:33 PM (#5738291)
I'm not sure how anyone would even use errors with the base/out tables. Errors are indeterminate in the end state.
   217. Sunday silence Posted: September 04, 2018 at 01:58 PM (#5738311)
Using my own basic reasoning my guess is that infield errors as well as not getting to a ground ball are going to be somewhere round 0.8 runs per error. OF errors/not making the catch: probably about 1.1. Ive already put my reasoning out there on this, i'll just say: GB it has to do with the idea that errors/lack of range increases w/ men on base. Also 60% of general errors are throwing errors which are usually a huge payout in wt'd runs with base runners advancing.


I have to believe that not having a good range factor is very similar to making an error. MOst of these errors/not errors are going to happen on tough plays, tough fly balls and tough ground balls; thus for a lot of these the scorers call can go either way. Similar to the reasoning w/ passed balls and wild pitches, the catcher is in part responsible for both and so is the pitcher. cause usually these are close either way and the scorers judgment is mostly a coin flip/guess. Obviously this doesnt hold true for muffing an easy catch but not going to worry about that for now.

its also interesting that GB errors with men on base go up about the same percentage as batting.avg. I.e. these are going to be 50/50 calls either way. So they can be lumped together in terms of weighted runs. Other than missing a routine GB, these errors/lack of range are happening on the same types of plays with the same end results. My point is this is sort of anecdotal evidence that this concept of errors = lack of range is not without some basis.

So for purposes of simplification why not include both errors/range factor above or below MLB average as a short cut? I guess you can use defensive efficiency in similar way. I'd love to say you plug in these numbers into errors/lack of ranges and see what you get.


Finally my new thoughts. Most def. systems dont seem to be as liberal as i am w/ according wt'd runs to errors/range somewhere either I am stupid or the runs lost on defense is being hidden somewhere. My latest thinking on this is that most of these systems build in some sort of positional bonus. It's like 10 runs for CF and SS less along the lesser positions. Maybe we can think about it like this. Say an outstanding CF gets only 9 or 10 DRS; but his positional bonus is e.g. 10 runs. So really he's 20 runs better than an avg. CF. Crappy CFer's and average CF's are probably gaining some sort of undeserved bonus here, so that's where I think defensive systems are hiding these runs lost to defense. Cause they have to be hiding somewhere if my belief is correct.

Anyhow its a fantastic thread so thanks for keeping it alive.

   218. Mefisto Posted: September 04, 2018 at 02:11 PM (#5738321)
I agree that errors and range are related. Errors very likely do come at the edge of a player's range, though Rally's point above about hard hit balls (many of which will be right at the defender) has some force too. Then there's the discretion of the official scorer about whether the play could have been made with ordinary effort (or whatever the phrase is).

I was hoping to avoid the complications along these lines with my suggestion to just re-do the coefficients based on ER rather than total runs. It's at least simple, though it might be wrong. The thing is, though, that we don't have and probably never will have the PBP data to implement some of these ideas. I don't think I could come up with good estimations anyway.
   219. McCoy Posted: September 04, 2018 at 04:19 PM (#5738422)
Baseball Hacks by Adler provides the software language to create a Run Expectancy Matrix and then from there create Linear Weights based on those numbers. You can use any set years of PBP data to get the numbers. I did it about 12 years ago and many computers ago. Took awhile because I didn't know what I was doing with the programs and had to trial and error it through all the little bugs that pop up.
   220. Mefisto Posted: September 04, 2018 at 04:22 PM (#5738424)
Thanks. That still leaves me with the problem of how to treat the end state of an error (1 base? 2 bases? more?) without PBP data. I'll have to think about that.
   221. McCoy Posted: September 04, 2018 at 04:25 PM (#5738426)
But again TangoTiger did this already several different ways. From 1974 to 1990 he came up with an error being slightly more valuable than a single and from 1999 to 2002 he found that a ROE was even slightly more valuable than a single.
   222. McCoy Posted: September 04, 2018 at 04:27 PM (#5738427)
Thanks. That still leaves me with the problem of how to treat the end state of an error (1 base? 2 bases? more?) without PBP data. I'll have to think about that.

But the PBP data is available for a lot of years via retrosheet. You can go back to 1921 for PBP files and I think around about 1939 they are complete for the most part.
   223. Mefisto Posted: September 04, 2018 at 05:10 PM (#5738473)
Yes, Retrosheet is complete back to 1939 now. I don't know how complete it is before that.
   224. Mefisto Posted: September 04, 2018 at 06:37 PM (#5738530)
As I'm thinking this through, I want to set out the issues I see in case anybody has better ideas. So,

1. As I said above, errors need to be included in the runs created formulas.
2. Palmer's estimate of outs is wrong. He used AB-H-DP. However, errors get recorded as outs even though in many (most?) cases they aren't -- a batter is safe. Errors, or some estimates of ROE, need to be added back in or else there will be too many actual outs estimated.
3. I need to take a look at the other estimates of the value of errors that folks here have mentioned, such as tango's. It seems to me that the value alone isn't enough unless the runs created formulas can be run with the value of errors fixed. That's contrary to the whole point, which means it's necessary to
4. Estimate the average bases gained by errors in order to use that information with the base/out tables. That could be done by PBP data, but some estimation is essential for the times we lack PBP. And since I haven't looked at tango's estimates yet, maybe he already did this.
5. Once all this is done, it would be necessary to account for the different error rates in different historical eras, just as we do for the run environment.
6. I still think it's easiest to run the coefficients against ER rather than total runs in order to avoid all this.

Am I missing anything or wrong about what I did say?
   225. McCoy Posted: September 04, 2018 at 07:47 PM (#5738565)
I'm not sure why you need to find an estimate for the average bases gained. The PBP data will tell exactly how an error changes the base/out situation and from there you can get linear weights. Treat the errors just like you would singles or doubles when calculating run values.

   226. Mefisto Posted: September 04, 2018 at 08:24 PM (#5738577)
But some errors are 3 base or even 4 base errors. Now I can probably ignore that, because those are pretty rare, but I still have to allocate some errors to be similar to singles and others to be similar to doubles in terms of their impact on the base/out situation. I don't have any real basis for doing that at the moment.
   227. Endless Trash Posted: September 04, 2018 at 08:39 PM (#5738592)
Ok, at the suggestion of several people here, I and a friend of mine (who had the stats package) ran some numbers to try to get at the influence of errors on offense. We used a simplified model in which rBat= S+D+T+HR+BB-K-DP, using the per game data from BBREF for the period 1901-2017


Is there any particular reason you ignored me when I did this in 202?
   228. McCoy Posted: September 04, 2018 at 09:10 PM (#5738623)
Re 226. I'm really not sure what the issue would be. The PBP will give you what you would need. It's like you're stuck on what to do for homers when they're grand slams or three run homers. The Run expectancy matrix and then converting that data into linear weights takes care of all of that.
   229. Mefisto Posted: September 04, 2018 at 09:13 PM (#5738628)
The problem is that with errors I don't know whether to treat it as a single or a double. If I knew how many of each there were on average, I could solve the problem. But right now I don't know how to make that estimate.
   230. Mefisto Posted: September 04, 2018 at 09:17 PM (#5738630)
Is there any particular reason you ignored me when I did this in 202?


I somehow just missed it. Looking at the time stamps, I was composing 203 when yours posted and probably just checked from 204 on the next time I looked. Sorry. Where do those coefficients come from?
   231. Endless Trash Posted: September 04, 2018 at 09:19 PM (#5738633)
Palmer's estimate of outs is wrong. He used AB-H-DP. However, errors get recorded as outs even though in many (most?) cases they aren't -- a batter is safe. Errors, or some estimates of ROE, need to be added back in or else there will be too many actual outs estimated


I was using opposing innings as a measure of outs. Was this incorrect?
   232. McCoy Posted: September 04, 2018 at 09:25 PM (#5738637)
Use the PBP data and treat it as an error.
   233. Mefisto Posted: September 04, 2018 at 09:51 PM (#5738654)
I was using opposing innings as a measure of outs. Was this incorrect?


Seems right to me.
   234. Mefisto Posted: September 04, 2018 at 09:52 PM (#5738655)
Use the PBP data and treat it as an error.


Yes, that works for when we have PBP, but not otherwise. Maybe we can estimate earlier years using the averages of PBP.
   235. McCoy Posted: September 04, 2018 at 09:57 PM (#5738659)
You can at least see if this is something you need to worry about at the level you currently think you have to. Right now you think it is a major for flaw while most people are telling you it is a minor discrepancy at worst.
   236. Sunday silence Posted: September 05, 2018 at 08:41 AM (#5738801)

The problem is that with errors I don't know whether to treat it as a single or a double. If I knew how many of each there were on average, I could solve the problem. But right now I don't know how to make that estimate.


If this is the summary of your issue then I am having trouble following what the problem is. Some questions:

what does it mean: "treat as a single or a double."? I thought you wanted to find a weighted runs value for errors? but maybe that is just my own brain assuming this and you have something different in mind.

If you are trying to find weighted runs (and this is my guess but it is by no means clear what you are after) then I think on average most errors are going to weight closer to a double than a single in wt'd runs.

Q2: is there some methodological reason that you cannot simply plug in a a value for errors and created some sort of adjusted value for hitters during the pre 1920 era of MLB? Since I thought this was your main concern but again, its not clear. Why cant you simply plug values in again and again until you find the right value that will balance out. I.e presumably you have a number for total runs in season 1911, and you are able via using wt'd runs for errors to attribute batter created runs = total runs allowed. or something like that.

Q3 what specific data are you looking for that you dont have? Presumably you have total runs scored in each seaons. Yes? You also have a total for errors corect? What exactly do you need? is it broken down for inf vs of errors? presumably every position has errors assigned to it. why not just assume some value for OF errors and some for inf errors.
Then again: plug in numbers and see which values makes the total equation of runs scored and runs given up balance out.

Q4: why are you having so much trouble assigning values for errors? Can you create your own, a priori, method such as I have described. Say for example for OF errors, assume every error and every failed put out is at the edge of an OF's range. Thus most errors go for doubles. Since half the time runners are on base then assume one runner advances two bases or about 0.6 weighted runs... Ok sometimes there's more than one runner on base, and sometimes they dont advance two bases, but maybe it breaks about even either way. So it comes out about that.

Q5: Instead of method in no. 4; can you not sample a limited series of games and do it that way instead of a priori? For instance what I started to do last year and hopefully will again this year. is to make note of all such plays during the playoffs. A limited sample size, but a sample played at the highest level with the most at stake. moreover: the playoffs are highly watched, we have tons of video, we have many eyewitness accounts going back to 1903 etc. We should be able to reproduce every play with an error; where on the field it occurred, who advanced etc.

this playoff season I propose to count the following: what happens on GB errors as well as GB that go through the infield. How often are runners on base? how far do they advance. 2) discretionary chances. How often do two OF's converge on the same fly ball; this is an important consideration for rating OF such as Undruh Jones or Richie Ashburn. Infielders also (i.e. who takes the throw on SB; as well as pop ups.

I already have some firm ideas on this but it doesnt matter. if other people can contribute with their own observations and analysis our understanding will get better.
   237. Mefisto Posted: September 05, 2018 at 09:16 AM (#5738821)
Some answers:

1. If I were to use the base/out tables, then suppose I start with a runner on first and nobody out. Next batter reaches on an error. The issue I'm raising is that I don't know -- at least not without PBP data -- where the runner on first ended up. That means I can't know the increase in scoring probability. Maybe I'm thinking of this wrong.

2. The problem is that there are too many possible answers.

3. I only have data for total errors committed by team or by league.

4. See the 3 answers above.

5. That might be do-able. I can check.
   238. Rennie's Tenet Posted: September 05, 2018 at 09:17 AM (#5738823)
100: Greinke
99: Gehringer
98: Fisk

Posnanski also did a Shadowball article about Fleet Walker and Cap Anson. I think that means that Anson is not in his top 100 major league players, which pretty much wipes out all of the position players who played most of their careers in the 19th century.
   239. Sunday silence Posted: September 05, 2018 at 12:07 PM (#5738955)
well I am looking at baseballreference site and it has the errors for every player on the 1911 NY giants. So again, I am still at a loss here. Does it need to be compiled in some format for you? Cant you do one season by hand and count all the errors by position?
   240. Sunday silence Posted: September 05, 2018 at 12:09 PM (#5738957)
I don't know -- at least not without PBP data -- where the runner on first ended up



Cant you just use baserunners generically w/o regard to which base and then try to determine how often they advance one and how often they advance two bases?

Again this could also be done by say looking at say the first 50 years of world series data and extrapolate from there.
   241. Mefisto Posted: September 05, 2018 at 12:53 PM (#5738965)
Yes, it has the errors *committed by* each player, but for compiling that particular player's batting results I'd need errors *committed against*. AFAIK, that data doesn't exist. I'm not even sure it exists for teams. When I use league-wide totals, I don't need to worry about that.

Yes, I could try samples to get an estimate of frequency for the various possibilities for advancement.
   242. Sunday silence Posted: September 05, 2018 at 12:54 PM (#5738966)
Just to play around with the 1911 NYG some more and touch on a couple of issues.

IN regards to Mephisto and defensive runs. It seems that NYG allowed 542 runs in a league where average would be 737.5 for the season. A difference of 195 runs. Now baseball reference gives their pitchers credit for preventing 117 runs. better than average.

That leaves 78 runs prevented that are missing. Well surely that is where the defense come in, right?

Except it doesnt quite work out that way with the way they've jiggered these defensive rating systems. We've got infielders Bridwell and Devlin accounting for all of 2/3 defensive runs saved, but then you got Doyle giving it all back with -5; in the OF you've got Devore -1, Snodgrass 2 and the other guy at -3. That leaves us down 2 more missing defensive runs.

So now where looking for 80 defensive runs. Oh wait! You've got Chief Myers at C. He's got credit for 2 runs.

I'm not even going to get into Herzog and the rest of the subs. Obviously the NYG are doing a lot of work defenively and these silly systems are close to acknowledging it in a way that makes sense.

So anytime CFB and the gang start saying that defensive ability really isnt that important and one can prevent 20 runs a season and then they start citing stuff like DRS etc. you know there's a real problem with this argument.
   243. Rally Posted: September 05, 2018 at 02:35 PM (#5739041)
It seems that NYG allowed 542 runs in a league where average would be 737.5 for the season. A difference of 195 runs. Now baseball reference gives their pitchers credit for preventing 117 runs. better than average.


League average was 688 that year. 146 below average. Still a few runs unaccounted for, but with pitchers at +117 and defense +9 it's only 20 runs off.
   244. Mefisto Posted: September 05, 2018 at 02:36 PM (#5739042)
Off by 20 for one team out of 8 seems like quite a bit to me. Still, in the absence of PBP data for those years I can see how it's hard to come up with perfect numbers.
   245. Rally Posted: September 05, 2018 at 02:39 PM (#5739045)
That might be where the park factor comes in. BBref has one-year park factors of 103 for batters and 98 for pitchers. That seems off, how can the park help both batters and pitchers? I think the way that's calculated accounts for the fact that a good team like the Giants doesn't have to face their own pitchers, and pitchers don't have to face their own batters. So they have things a bit easier than the rest of the league.
   246. Mefisto Posted: September 05, 2018 at 02:56 PM (#5739059)
Wouldn't the PF for pitchers get included in calculating their runs saved above average?
   247. Rally Posted: September 05, 2018 at 04:11 PM (#5739143)
Yes?

They allowed 146 runs less than an average team. A pitcher park factor under 100 means that (pitching + defense) is not really 146 runs better, some of that is favorable ballpark.
   248. Mefisto Posted: September 05, 2018 at 04:34 PM (#5739169)
Got it.
   249. Sunday silence Posted: September 05, 2018 at 05:38 PM (#5739211)
From the stats I've looked at these discrepancies are not random they always slew away from defense leaving lots of defensive runs saved missing. I mostly looked at 1930s and 1950s and in particular w respect to arky vaughn. The errors seem larger the further you go back in time
   250. Sunday silence Posted: September 05, 2018 at 05:40 PM (#5739213)
What does PF mean? Goddam you people and acronyms. Please define uncommon acronyms

   251. Sunday silence Posted: September 05, 2018 at 05:50 PM (#5739223)
No. 242 you've finally explained why we've been talking past each other for all this time. You are trying to find actual contribution to wins instead of the expected. Why would you want to do that and not many really care...because all you're doing then is rewarding batters who are lucky enuf to hit to a bad fielder and penalizing those who hit to one making a great play..all you're going to do is distort the data. Cause you are introducing random odd occurances that occur without any batter skill...so instead of a list runs created that goes: Cobb speaker Wagner lajoi you'll get a list like Cobb speaker art devlin lajoie arlie Latham wagner. Who cares what does that prove?
   252. Mefisto Posted: September 05, 2018 at 05:53 PM (#5739224)
Park Factor.

242 is you. Did you mean a different comment?
   253. Sunday silence Posted: September 05, 2018 at 05:57 PM (#5739228)
What i assumed you were doing was trying to calculated the weighted value of runs created by hitters and accounting for errors.thats what weighted values are they are averages based on hitting an expected number of balls to avg fielders making an avg number of errors.that would be useful
   254. Sunday silence Posted: September 05, 2018 at 05:59 PM (#5739231)
252 I meant 241
   255. Mefisto Posted: September 05, 2018 at 06:50 PM (#5739263)
Yes, that is the goal (253). But the way to do that is to use league averages to calculate the coefficient for errors (as my friend and I did, but using the base/out tables) based on league-wide totals. From this we could apply the same coefficients (which are league averages) to individual players, much as Palmer did with runs.

The problem I've talked myself into is that in order to use the base/out tables, I need to establish first the average impact of an error in advancing runners/causing runs. One way to do that, as you suggested in 239, is to use the existing data for errors. The problem with that as I saw it in 241 is that the data we have gives the errors committed by the league as a whole; by each team; and by each player (while playing as a defender). None of that can be used to establish an average value when it comes to offense. In order to establish an average value, I'd need, for example, the number of errors committed *against* a team (which I don't have) rather than the number of errors committed *by* a team (which I do have).

That's why your suggestion of using the game records we do have to create an estimate. I think that works, though I'm not sure the sample would be representative (e.g., World Series games are not likely representative).
   256. Sunday silence Posted: September 07, 2018 at 02:59 PM (#5740599)
Mef. i thought what you were gonna do was weight batters by giving them more credit for creating runs by putting the ball in play more. Since as you say errors are higher back then. So taking the 1911 season here are the top ten players by WAR:

Also the third column is how often they KO per 600 AB:

T Cobb 10.2 43
J Jackson 8.5 43
F Baker 6.6 39
E Collins 6.6 39
S Crawford 6.4 34.5
L Doyle 6.0 41
H Wagner 6.0 42.5
T Speaker 5.9 41
F Schulte 5.4 74
Birdee Cree 5.3 70

Yeah, I never heard of BIrdie Cree either. Or Khedive PA (his birthplace).

OK so the error rate seems to be about 6.2% on Balls in Play (I dont know if the AL is different than the NL) this as opposed to the current rate of about 2.5% I think. I didnt get the actual error count for 1911, but I used a graph on another site that seems to point to approx this figure. GB errors are far more common than FB ones today and I have no reason to think there'd be much change. So going from my best guess I would average errors out to 1 run per error. Maybe an overestimate...

Anyhow at that rate you 'd have to hit 16 balls in play to get one more error and 1 more run to your WAR for 0.1 wins.

As you can see there's not a whole lot of difference among the leaders here. The biggest KO guy is Schulte. If say league avg. KOs per 600 is 40, then we should probably adjust his WAR down by 0.2 wins. Sam Crawford may require the biggest positive adjustment but that works out to 0.3 runs; so for all Sam's ability to make contact it's like getting an extra walk or a SB over the course of a season. something you would never even notice.

Cobb and the others would be adjusted less than 0.1 of a win, essentially nothing. So there's not much to be gained by this exercise in a league like that of 1911 where pretty much what power there is is a function of making contact. there's really no such thing as swinging for the fences or there's no evidence since KO rates are similar across the board. it's a more one dimensional era of hitting, although speed is more of a factor, perhaps bunting, etc. So there's still other dimensions to strategy.

That's where I thought you were heading with this line of reasoning. I have no idea why you are trying to find the actual real value every time Cobb happened to hit one to a bad SS or Wagner happened to hit one to a Larry Doyle making a circus catch. that's not how any of these weighted averaging systems work. So whats up with that?

   257. Mefisto Posted: September 07, 2018 at 05:35 PM (#5740700)
i thought what you were gonna do was weight batters by giving them more credit for creating runs by putting the ball in play more.


No, I'd do the opposite of that. BBREF already gives them extra credit (see #166), and I think that's wrong. In my view, a batter does not deserve credit when the defender kicks the ball or throws it away. My goal is to give the batter credit just for those things he did (hits, walks, etc.).

I have no idea why you are trying to find the actual real value every time Cobb happened to hit one to a bad SS or Wagner happened to hit one to a Larry Doyle making a circus catch.


As I understand it, any time Larry Doyle actually caught a ball he probably made a circus catch out of it. :)

Seriously, I don't think I need to figure out what happened for each individual player. That would (a) require PBP data which doesn't exist; and (b) be more akin to Win Probability Added than WAR.

The issue, as I see it, involves the need to use the base out tables. If that's necessary, as people are saying, I need to have some way to get the average impact of an error. That's because the base/out tables give the difference in run probability after each event. It's trivial to know this for, say, a HR (everybody scores) or a BB (every runner forced moves up 1 base). But for errors it's indeterminate -- an error might be anywhere from 1 base to 4 bases.

I can get an average value for any time after 1939 because Retrosheet has PBP data. But I'm not sure that an average value from 1939-present would be representative for earlier periods. It might be, but it might not be.

Anyway, with an average value I should be able to get the impact of errors on runs *on average* (not for each individual). If that average accurately predicts runs for leagues and teams, then we can assume that it's probably a good estimate for individuals. That's basically what Pete Palmer did, but without including errors.
   258. Mefisto Posted: September 07, 2018 at 08:27 PM (#5740754)
I should clarify. When I say that I think BBREF is "doing it wrong", that requires context. It's not necessarily wrong to treat a "put the ball in play" strategy as a successful strategy in a high error environment. As long as comparisons get made to other players in the same time frame, that's fair (even if I probably wouldn't do it). The issue I've raised in this thread is limited to the use of WAR across different time frames without normalizing for errors. A successful strategy in 1905 would be disastrous if attempted today. I'd like to be able to compare the performance of hitters as hitters across time, which needs a different measure.
   259. Rennie's Tenet Posted: September 07, 2018 at 09:33 PM (#5740791)
97. Santana
   260. Rennie's Tenet Posted: September 14, 2018 at 02:13 PM (#5744465)
96. Utley

I think there were 93 major leaguers in his original 100 (which I think actually would have totaled 103), counting Ichiro and Irvin as major leaguers. He can introduce seven new players before he starts bouncing players off the old list, and he's used four of those.
   261. Sweatpants Posted: September 14, 2018 at 06:33 PM (#5744636)
Utley was a wonderful player during his peak, but I think that the idea of him as one of the 100 best in history will age almost as poorly as Bill James's quote about Biggio passing Bonds as the best player in baseball in 1999 (or however that one went).
   262. QLE Posted: September 14, 2018 at 08:24 PM (#5744660)
Utley was a wonderful player during his peak, but I think that the idea of him as one of the 100 best in history will age almost as poorly as Bill James's quote about Biggio passing Bonds as the best player in baseball in 1999 (or however that one went).


And, somehow, that still doesn't seem as off to me as making the same claim about Santana- even for those of us like me who prefer peak to career, it's deeply questionable that he'd be in the top fifty even in his own position, and to put him among the top 100 overall would probably require both timelining up the wazoo and a really high percentage of pitchers on the list.
   263. Rally Posted: September 14, 2018 at 09:21 PM (#5744685)
#261 - I remember James comparing Biggio favorably to Griffey at that time period. That worked out fine. I don’t know where he had Bonds in comparison to either.
   264. Sweatpants Posted: September 14, 2018 at 10:14 PM (#5744707)
I looked it up. The exact line was "Biggio passed Bonds as the best player in baseball in 1997." He didn't go into any more detail than that.
   265. Mefisto Posted: September 14, 2018 at 10:26 PM (#5744719)
Maybe he meant just that one year ('97). It was Biggio's best; he's not really close to Bonds in any other season.
   266. Sweatpants Posted: September 15, 2018 at 12:10 AM (#5744764)
It's possible that he meant it like that, but even if he did there are still others in contention for the best performer of 1997 (Bonds himself included).

Here's the page that has that line. It looks to me like he just really liked Craig Biggio at that time. Maddux's 1990s would suffice as a Hall of Fame career on their own. Biggio's 1990s as a career would be sort of like an even more peak-intensive Chase Utley.
   267. QLE Posted: September 15, 2018 at 12:50 AM (#5744765)
Maybe he meant just that one year ('97). It was Biggio's best; he's not really close to Bonds in any other season.


On Page 362 of the New Historical Abstract, he states directly that "Craig Biggio is the best player in major league baseball today", on a page where (as #263 mentions) James focuses on comparing Biggio to Griffey.
   268. Ithaca2323 Posted: September 17, 2018 at 03:18 PM (#5745798)
Utley was a wonderful player during his peak, but I think that the idea of him as one of the 100 best in history will age almost as poorly


Posnanski has said that he values peak over career for this list. If you believe the fielding numbers, Utley was an 8-9 win player during his peak. It also explains Santana's presence at 97.

   269. Gonfalon Bubble Posted: September 17, 2018 at 03:47 PM (#5745845)
In 1999's Historical Abstract, Bill James ranked Bonds as already being baseball's 16th-greatest player, writing, "Certainly the most unappreciated superstar of my lifetime. ...Griffey has always been more popular, but Bonds has been a far, far greater player. When people begin to take in all of his accomplishments, Bonds may well be rated among the five greatest players in the history of the game."
   270. Eric J can SABER all he wants to Posted: September 17, 2018 at 04:34 PM (#5745890)
3. I only have data for total errors committed by team or by league.

Errors are tricky for an extra reason, too. I ran a regression a few years ago on ROE against teams compared to errors committed at the various positions (data taken from 1974-2007), and came up with this approximate formula (you can split it up more by position but the correlation stays the same - and really high):

ROE = .83* (infield errors) + .30* (outfield errors) + .21* (catcher errors) + .42* (pitcher errors)

Which is to say that most non-infield errors are runner advancement errors (bad throws on pickoffs/stolen base attempts/runners trying to take an extra base, etc). So even if you went through the box scores and found how many errors were committed by the opposing team in each game, that may or may not correspond well to the number of ROEs that took place in each game.

And of course, that formula may not be as accurate for baseball in 1907 as it is for baseball in 1987; the game has had one or two small changes in the interim.
   271. Eric J can SABER all he wants to Posted: September 17, 2018 at 04:44 PM (#5745899)
I looked it up. The exact line was "Biggio passed Bonds as the best player in baseball in 1997." He didn't go into any more detail than that.

Part of the Biggio issue is that James was looking at raw GIDP totals as the measure of the player's double-play avoidance without controlling for opportunities. Biggio, of course, was an NL leadoff hitter, so his DP opportunities were rather limited.

From the Craig Biggio section in the NHBA: "If you compare [Biggio in 1997] to, let's say, Jim Rice in 1984, Biggio has a hidden advantage of 69 extra times on base, since he was hit by pitchers 33 more times (34 to 1) and beat the throw to first on a double play attempt 36 more times (0 to 36)."

Which is, let's say... not entirely fair. Biggio in '97 had 78 PA with a runner on first and less than two outs; Rice in '84 had 202. (Which still doesn't explain the entire GDP disparity between them, but Biggio had only 37 (AB-H) in that situation all year; it is unlikely that 36 of those were double play attempts that he beat out.)
   272. Mefisto Posted: September 17, 2018 at 06:41 PM (#5745977)
@270: Thanks. Good points.
Page 3 of 3 pages  < 1 2 3

You must be Registered and Logged In to post comments.

 

 

<< Back to main

News

All News | Prime News

Old-School Newsstand


BBTF Partner

Dynasty League Baseball

Support BBTF

donate

Thanks to
rr
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogOTP 2018 September 24: Baseball and the presidency
(456 - 7:12am, Sep 25)
Last: PreservedFish

NewsblogOT - 2018 NBA Thread (Pre-Season Edition)
(561 - 5:39am, Sep 25)
Last: NJ in NY (Now with Baby!)

NewsblogOT - Catch-All Pop Culture Extravaganza (September 2018)
(398 - 4:57am, Sep 25)
Last: Ben Broussard Ramjet

NewsblogBobby Evans’ days as the Giants’ GM appear to be numbered
(3 - 4:22am, Sep 25)
Last: QLE

NewsblogTickets available as Marlins host Reds
(81 - 3:24am, Sep 25)
Last: The Yankee Clapper

NewsblogScrabble added 300 words, none of them OMNICHATTER! for Sept. 24, 2018
(78 - 12:42am, Sep 25)
Last: Howie Menckel

NewsblogLong-time White Sox broadcaster 'Hawk' Harrelson bids emotional farewell in home finale vs. Cubs
(30 - 10:51pm, Sep 24)
Last: Howie Menckel

Sox TherapyDecisions Decisions
(6 - 10:00pm, Sep 24)
Last: Jose is an Absurd Force of Nature

Gonfalon CubsThe Final Push
(191 - 9:25pm, Sep 24)
Last: Walt Davis

NewsblogFive Tool Players | Articles | Bill James Online
(41 - 8:53pm, Sep 24)
Last: vortex of dissipation

NewsblogFowler, still owed almost $50 million, eager to be part of Cardinals' future | St. Louis Cardinals | stltoday.com
(12 - 7:40pm, Sep 24)
Last: cardsfanboy

NewsblogAlen Hanson gets back-to-back starts, likely still in Giants’ plans
(6 - 5:30pm, Sep 24)
Last: Walt Davis

NewsblogTim Anderson's eventful day at the yard ends with shot at Joe West: 'Everybody knows he's terrible'
(25 - 5:00pm, Sep 24)
Last: PreservedFish

Hall of MeritMost Meritorious Player: 1947 Discussion
(11 - 4:59pm, Sep 24)
Last: DL from MN

NewsblogKen Giles: ‘I’m actually enjoying the game more than I did for my entire tenure in Houston’
(7 - 4:10pm, Sep 24)
Last: Pat Rapper's Delight (as quoted on MLB Network)

Page rendered in 0.5517 seconds
46 querie(s) executed