|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
You are here > Home > Primate Studies > Discussion
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Primate Studies — Where BTF's Members Investigate the Grand Old Game Saturday, September 21, 2002The Good, the Bad and the Context-AdjustedRecently, I received an e-mail from someone wanting an explanation of how Context-Adjusted Defense works for the math-impaired. I figured it would make a good article, so I'm writing it as an article. The principle behind CAD is that a fielder's contribution to his team's fielding will, in some way, show up on the scoreboard. However, his context (mostly pitching staff, but also ballpark) will alter just how we see his contribution in the final stat line. Therefore, we must use known principles of fielding as a machete to slash through the weeds that surround traditional defensive statistics. Something important -- I believe that, at their core, traditional fielding stats are mostly valid. All other things being equal, a shortstop who recorded 527 assists is a better shortstop who only recorded 436 assists. Some statistics are not valid, and we should know which ones they are to keep them from fooling us. Above all, the first clause of the second sentence of the above paragraph -- "all other things being equal" -- is important in evaluating defensive statistics. It's valid in evaluating other baseball statistics, but it is tantamount when evaluating fielders. The shortstop who recorded but 436 assists could well be a better shortstop than the one who recorded 527 assists. There are many important cues for which one must watch so one can know when the numbers are lying. First, some ground rules: * I prefer innings estimates to a Claim Points system for individual fielders. Ultimately, any method of determining defensive innings will show itself in a Claim Points system, since the fellows who recorded more outs played more innings. Estimating innings based on total chances works because Range Factor does not vary much at a position (remembering a fielding corollary of Voros's Law: anyone can do anything in 100 innings afield), and because those outs recorded themselves determine innings. An inning is three outs, after all. So, when determining an individual player's opportunities, prorate the adjusted team opportunities to the player's innings. If Derek Jeter played 972 of the Yankees' 1458 innings at shortstop, and you determine the Yankees shortstops are responsible for 600 adjusted hits allowed, Jeter is responsible for two-thirds of those, or 400 adjusted hits allowed.
I removed putouts from infielders for many reasons. For pitchers, they reflect two things, both of which are unrelated to skill: covering first base on 3-1 groundouts, and pop flies. For third basemen, many putouts are related to skill, but again there are pop flies, and a team's foul territory heavily influences the number of foul files he catches. (First basemen are also affected by these phenomena, but not putouts as they primarily reflect unassisted groundouts, which is the most important measure of their fielding, so we must keep them.) I handle middle infielders' putouts separately later on as to weed out the forceouts (which are about 30-40% of a middle infielders' putouts). For catchers and first basemen, scale down these net putouts as as a percentage of putouts, giving the same percentage to each fielder. The good fielders will make up for this when you scale down the hits, for which you use defensive innings. If you do not have a defensive innings total, you'll need to estimate it. I was using a weighted formula of team games, but now I stole Bill James's since it works much better. If you're dealing with an outfielder who played in multiple fields, scale his stats by the percentage of games in each field, and give a 25% boost to putouts in center field. The first context in which we place a fielder's outs is his team's hits allowed. I mentioned this above, but it bears repeating. Indeed, this is the most important adjustment of all, since a good defensive team became such through its good individual fielders. Thus a fielder on a team like the 2001 Seattle Mariners, which was an outstanding defensive team, will be considered a good fielder unless his traditional defensive statistics (Range Factor, Fielding Percentage, Double Play Rate) were very poor. A fielder on the 2001 Cleveland Indians, a poor defensive team, would considered to be a bad fielder unless his traditional defensive statistics were very good. Those people who decided Roberto Alomar was a better defensive player than Bret Boone (in 2001) should take note. The second context is his team's groundball/flyball rate. Looking at the play-by-play data, we have found the following to be true:
Incidentally, I wrote four years ago that the ratio of groundouts to flyouts was low, but this accurately predicted ranking. I was wrong. This is the proper ratio. We have become accustomed to looking at data from the Elias Bureau that has a normal ground-air ratio as 1.17. Elias figures this because it counts double plays as two groundouts. If you remove them, you find a team's ground-air ratio is closer to 0.80, just including the outs. Thus, one figures infielders' outs in relation to team groundball hits, and outfielders' outs in relation to team flyball hits. Next, we come to the pitching staff. Really, we come to the batter at the plate. Left-handed batters are more likely to hit the ball to first base, second base or left field, and right-handed batters are more likely to hit the ball to third base, shortstop or right field. In post-Casey Stengel times, a right-handed pitcher is more likely to face a left-handed batter, and a left-handed pitcher is more likely to face a right-handed batter. I was wrong on this adjustment before for four reasons: This is for third basemen. Divide by 2 and add 0.50 for shortstops, and divide by 5 and add 0.80 for right fielders. For first basemen, divide the third base rate into one (or, more mathematically, this is the inverse); for second basemen, divide the shortstop rate into one; and for left fielders, divide the right field rate into run. The second is for seasons for which we do have LHB/RHB data:
A right-handed batter is four times as likely to hit a ball to the third baseman than a left-handed batter, and twice as likely to hit a ball to the shortstop than a left-handed batter. This sounds like a big difference, and it is, but teams don't vary much on the number of right-handed batters they face. On the flip side (pun intended), that same batter is 30% more likely to hit the ball to the right fielder. That's not a huge difference, and I debated on whether to adjust for that. I chose to do so, since it's not entirely trivial. Do not make this adjustment when you are making estimates of left/center/right playing time. Finally, I adjust for ballpark. I haven't made any changes here, though I do make the assumption that a team's ground/air rate remains the same from park to park, since I calculate groundball and flyball singles for both home and road. It may well not, but since we're primarily talking about historical data, we shall never know whether most teams do this or not. Still, a pitcher could throw lower in the strike zone in Coors Field than in Dodger Stadium, and it would be worth knowing if this is true. As some have noted in discussions, I make no adjustment for team strikeouts. This is partially true, actually; I make no overt adjustment for team strikeouts. A team with a high strikeout rate will have fewer outs available to the fielders, but will also have fewer hits allowed. Strikeouts are Adam Smith's invisible hand -- they correct for themselves without adjustment. Herein lies the ultimate difference between my system and Clay Davenport's system. For a team, the two of us will create virtually identical ratings. However, when a team allows 100 fewer hits than normal and strikes out 100 fewer batters than normal, Davenport's methods will assume the two events negate each other. Davenport's system will rate third baseman with 300 assists versus 300 assists for league average as average in this context. CAD will assume the player is a bit better than average. Next, I make an assessment of a player's ability to remove runners who are already on base. The basic calculation for this is:
Notice this has not changed, aside from adding the extra measure of a first baseman's arm. See below ("What I learned from Bill James, and what I already knew") for details. Similarly, each position has a different way of determining how many runners are available:
Again, notice that nothing has changed. Actually, I'm experimenting with different weights for errors by position, but it isn't a big deal. Here are the values for that:
That's the percentage of time an error at that position puts a man on first base. I have no idea how well that holds up over time. After working with National Association data, I would assume it does not hold up well in the 19th Century. This figure is affected by a team's pitching staff's handedness in the same way the range figure is, except I do not make this adjustment for middle infielders. Apply adjustments to opportunities, by the way, for both range and arm calculations. The number of outs a fielder made or the double plays he turned is certain. His opportunity to make those outs and turn those double plays is not. For groundballs and flyballs, I took a different tack. I took the team's groundball or flyball total, divided it by BFP, and multiplied the resulting figure by the team's opportunities. This combines the groundball/flyball adjustment with the old balls in play adjustment, and produces better results. It would be interesting to see park data for these figures. Frankly, I hope someone who has more time on his hands than I do will tally up DP and Errors by park, at a minimum, in addition to my long-standing wish to have H, 2B and 3B for every ballpark ever. It's tedious but possible. However, with the data we have now, I make no park adjustments. By the way, I should let you know that accuracy with this is good but not great for infielders and outfielders, and poor for catchers. Stolen bases allowed by each catcher varies more than opponents caught stealing do, and thus a catcher's arm rating may be very different than his real throwing capacity. Until opposition stolen bases become available for all teams (which will be a few years), take single-season catcher arm ratings with a grain of salt. We are working on methods of determining dropped third strike assists from a team's passed ball/wild pitch rate, but it only helps a little, and only with the caught stealing side of the equation, which, as I just said, is the lesser side. That being the case, one can and should use stolen bases when they are available, which is from 1978 to the present. Now, we move on to error rates. I chose to treat these separately from range, largely because I can then use a different value for errors when it comes time to determining run-values. Determine error rates as below:
For the most part, these results vary little from traditional fielding percentage, but it helps to remove the "dead plays" from a fielder's error rate. It does make a large difference for catchers, as Bill James has noted. For many positions, I calculate position-specific details:
For catchers, this is easy. Same as before -- PB and WP rate per runner on base. A low rate is good. Including Wild Pitches keeps the value from varying too much. For first basemen, we figure the rate at which a team's third basemen and shortstops made errors. It's an easy calculation: (A.3b + A.ss) / (A.3b + A.ss + Err.3b + Err.ss) Finally, I figure net putouts for middle infielders. A middle infielder obtains a large portion of his putouts as cheap plays, plays a high school shortstop could make, namely popups and fielders' choices. Furthermore, these plays are elective plays, either the shortstop or second baseman could make them. They occur a bit more often on bad teams, since a requirement for a fielder's choice is a runner on first, which occurs more often with a bad team. Bad teams also allow a few more balls in play. The calculation is: A note on this. This particular formula should brand me as a hypocrite, as I use Balls in Play as the primary source of opportunities instead of hits allowed. There are reasons why I did it this way: I should note I am comparing totals to league averages to come up with an outs over/under number. Thus, the initial context in which I am working is a Pete Palmer-style context. I can switch to a Bill James-style context, and I worked my butt off to do so. The nature of fielding statistics makes it easier to come up with a Pete Palmer-type over/under number. I can create a number like Fielding Runs, like Defensive Winning Percentage, and am even working on Win Shares. With the right plus/minus value and the right base, you can turn a decent fielding metric into anything. Finally, we come to create a table of values for statistics. I stole some values from XRuns, and some others I used a Tangotiger suggestion and adjusted values to a 27-out context. Basically, turning a hit into an out not only puts that batter out, it prevents another batter from coming to the plate. I shan't run this table with this article; its values would change annually. Remember, if you come up with your own table that you need to adjust everything down a bit because we know where the outs went, but not the hits (I used 1/3 the value for infielders and 2/3 the value for outfielders). Or, to put that in plainer English, were you to evaluate an infield as a unit (add all infield assists together, as well as net putouts by first basemen and catchers, and figure its rating vis-a-vis hits) versus adding up all the individual totals, the infield would be one third the sum of the individual totals. It is a mathematical phenomenon. To compute these values:
Some additional explanations are in order. For errors we're making an assessment of whether or not the error put a man on base. As such, different positions have different error weights. The other errors at the position are weighted as advancing a man. There are a few muffed foul flies, but not many, and even in those cases, a small penalty is in order. I also reduced the values ever so slightly for third basemen and shortstops, since we are crediting first basemen for part of their rating (that's the "E.throw"). In the case of net assists for first basemen, we're only interested in the advancement penalty. We already counted the hit prevention/out making ability of those assists in the range calculation. Also, to avoid additional double counting, we halved the value of first basemen's double plays, as well as pitchers (though, for pitchers, this is because they often end double plays). For outfielders, I only calculated the amount an extra base hit has over a single. I know, there are exclamation points in your eyes, hear me out. For the most part, outfielders cannot prevent extra base hits, but rather keep them from becoming extra base hits. This isn't a pretty solution, as turning that double into a single lowers the rating, but the values do work. Read that last sentence in Bill Jamesese -- I tried it the other way, and Mike Emeigh and I agreed it didn't work. I tried it about three ways, and the play-by-play data did not support the variance between each fielder each other way showed. I am certainly open for another solution. For outfield assists, Mike Emeigh ran the correlation, and there is an inverse correlation between outfield assists and advances, but it is not large. Therefore, I chose to make it equal to one advance plus one runner pegged, as well as the value of not having another man come to bat. Finally, to come up with a Defensive Winning Percentage, we need to devise a baseline, a level of basic defensive responsibility. We first find the league (R - HR) / (PO - SO). Then, we come up with an outs value for each position:
Multiply this by the team BFP, GB or FB, and apply the left/right adjustment. This is the baseline. From here, you can use the Pythagorean Formula to derive a percentage. What I learned from Bill James, and what I already knewI have heard (such as one can hear in cyberspace) a fair number of
people remark about the similarities between my defensive system and
Bill James's Win Shares fielding. Inevitably, someone posits the the
idea that James read CAD and stole it. I would be surprised if that
were true, and were it true, I would be flattered. The truth is, most of these concepts are obvious, if you think about them. Team assists represent groundballs -- I learned that one from the 1985 Baseball Abstract, when someone recommended to James putting an outfielder's putouts in the context of PO-SO-A. Left-handed pitchers may cause a third baseman to record fewer plays -- someone wrote into The Sporting News in 1991 to defend Carney Lansford's low Range Factor with this fact. I claim two things as mine: That Bill James and I would come to similar conclusions about fielding data is not the surprise. The real surprise is that James, Clay Davenport and I are the only people who bothered to adjust for what we already knew. I first published this system to rec.sport.baseball in 1994. The system resembled James's Range Index then, as Davenport's system does now. James published Range Index in the 1984 Abstract, which I had not read at that time. However, my estimate of first basemen's double plays is consciously adapted from those early Abstracts. Good works feed off each other. Clay Davenport's work caused me to lower my exponent for left-handed pitching on two occasions. I sure as hell hope someone fixes places where I have flaws, like using catchers' assists. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||