You are here > Home > Primate Studies > Discussion
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Primate Studies — Where BTF's Members Investigate the Grand Old Game ## Saturday, September 21, 2002## The Good, the Bad and the Context-AdjustedCharlie Saeger demonstrates the inner workings of Context-Adjusted Defense v3.0. Zero Clint Eastwood references. Recently, I received an e-mail from someone wanting an explanation of how Context-Adjusted Defense works for the math-impaired. I figured it would make a good article, so I’m writing it as an article. The principle behind CAD is that a fielder’s contribution to his team’s fielding will, in some way, show up on the scoreboard. However, his context (mostly pitching staff, but also ballpark) will alter just how we see his contribution in the final stat line. Therefore, we must use known principles of fielding as a machete to slash through the weeds that surround traditional defensive statistics. Something important—I believe that, at their core, traditional fielding stats are mostly valid. All other things being equal, a shortstop who recorded 527 assists is a better shortstop who only recorded 436 assists. Some statistics are not valid, and we should know which ones they are to keep them from fooling us. Above all, the first clause of the second sentence of the above paragraph—“all other things being equal”—is important in evaluating defensive statistics. It’s valid in evaluating other baseball statistics, but it is tantamount when evaluating fielders. The shortstop who recorded but 436 assists could well be a better shortstop than the one who recorded 527 assists. There are many important cues for which one must watch so one can know when the numbers are lying. First, some ground rules: * I prefer innings estimates to a Claim Points system for individual fielders. Ultimately, any method of determining defensive innings will show itself in a Claim Points system, since the fellows who recorded more outs played more innings. Estimating innings based on total chances works because Range Factor does not vary much at a position (remembering a fielding corollary of Voros’s Law: anyone can do anything in 100 innings afield), and because those outs recorded themselves determine innings. An inning is three outs, after all. So, when determining an individual player’s opportunities, prorate the adjusted team opportunities to the player’s innings. If Derek Jeter played 972 of the Yankees’ 1458 innings at shortstop, and you determine the Yankees shortstops are responsible for 600 adjusted hits allowed, Jeter is responsible for two-thirds of those, or 400 adjusted hits allowed.
I removed putouts from infielders for many reasons. For pitchers, they reflect two things, both of which are unrelated to skill: covering first base on 3-1 groundouts, and pop flies. For third basemen, many putouts are related to skill, but again there are pop flies, and a team’s foul territory heavily influences the number of foul files he catches. (First basemen are also affected by these phenomena, but not putouts as they primarily reflect unassisted groundouts, which is the most important measure of their fielding, so we must keep them.) I handle middle infielders’ putouts separately later on as to weed out the forceouts (which are about 30-40% of a middle infielders’ putouts). For catchers and first basemen, scale down these net putouts as as a percentage of putouts, giving the same percentage to each fielder. The good fielders will make up for this when you scale down the hits, for which you use defensive innings. If you do not have a defensive innings total, you’ll need to estimate it. I was using a weighted formula of team games, but now I stole Bill James’s since it works much better. If you’re dealing with an outfielder who played in multiple fields, scale his stats by the percentage of games in each field, and give a 25% boost to putouts in center field. The first context in which we place a fielder’s outs is his team’s hits allowed. I mentioned this above, but it bears repeating. Indeed, this is the most important adjustment of all, since a good defensive team became such through its good individual fielders. Thus a fielder on a team like the 2001 Seattle Mariners, which was an outstanding defensive team, will be considered a good fielder unless his traditional defensive statistics (Range Factor, Fielding Percentage, Double Play Rate) were very poor. A fielder on the 2001 Cleveland Indians, a poor defensive team, would considered to be a bad fielder unless his traditional defensive statistics were very good. Those people who decided Roberto Alomar was a better defensive player than Bret Boone (in 2001) should take note. The second context is his team’s groundball/flyball rate. Looking at the play-by-play data, we have found the following to be true:
Incidentally, I wrote four years ago that the ratio of groundouts to flyouts was low, but this accurately predicted ranking. I was wrong. This is the proper ratio. We have become accustomed to looking at data from the Elias Bureau that has a normal ground-air ratio as 1.17. Elias figures this because it counts double plays as two groundouts. If you remove them, you find a team’s ground-air ratio is closer to 0.80, just including the outs. Thus, one figures infielders’ outs in relation to team groundball hits, and outfielders’ outs in relation to team flyball hits. Next, we come to the pitching staff. Really, we come to the batter at the plate. Left-handed batters are more likely to hit the ball to first base, second base or left field, and right-handed batters are more likely to hit the ball to third base, shortstop or right field. In post-Casey Stengel times, a right-handed pitcher is more likely to face a left-handed batter, and a left-handed pitcher is more likely to face a right-handed batter. I was wrong on this adjustment before for four reasons:
This is for third basemen. Divide by 2 and add 0.50 for shortstops, and divide by 5 and add 0.80 for right fielders. For first basemen, divide the third base rate into one (or, more mathematically, this is the inverse); for second basemen, divide the shortstop rate into one; and for left fielders, divide the right field rate into run. The second is for seasons for which we do have LHB/RHB data:
A right-handed batter is On the flip side (pun intended), that same batter is 30% more likely to hit the ball to the right fielder. That’s not a huge difference, and I debated on whether to adjust for that. I chose to do so, since it’s not entirely trivial. Do not make this adjustment when you are making estimates of left/center/right playing time. Finally, I adjust for ballpark. I haven’t made any changes here, though I do make the assumption that a team’s ground/air rate remains the same from park to park, since I calculate groundball and flyball singles for both home and road. It may well not, but since we’re primarily talking about historical data, we shall never know whether most teams do this or not. Still, a pitcher could throw lower in the strike zone in Coors Field than in Dodger Stadium, and it would be worth knowing if this is true. As some have noted in discussions, I make no adjustment for team strikeouts. This is partially true, actually; I make no Herein lies the ultimate difference between my system and Clay Davenport’s system. For a team, the two of us will create virtually identical ratings. However, when a team allows 100 fewer hits than normal and strikes out 100 fewer batters than normal, Davenport’s methods will assume the two events negate each other. Davenport’s system will rate third baseman with 300 assists versus 300 assists for league average as average in this context. CAD will assume the player is a bit better than average. Next, I make an assessment of a player’s ability to remove runners who are already on base. The basic calculation for this is:
Notice this has not changed, aside from adding the extra measure of a first baseman’s arm. See below (“What I learned from Bill James, and what I already knew”) for details. Similarly, each position has a different way of determining how many runners are available:
Again, notice that nothing has changed. Actually, I’m experimenting with different weights for errors by position, but it isn’t a big deal. Here are the values for that:
That’s the percentage of time an error at that position puts a man on first base. I have no idea how well that holds up over time. After working with National Association data, I would assume it does not hold up well in the 19th Century. This figure is affected by a team’s pitching staff’s handedness in the same way the range figure is, except I do not make this adjustment for middle infielders. Apply adjustments to opportunities, by the way, for both range and arm calculations. The number of outs a fielder made or the double plays he turned is certain. His opportunity to make those outs and turn those double plays is not. For groundballs and flyballs, I took a different tack. I took the team’s groundball or flyball total, divided it by BFP, and multiplied the resulting figure by the team’s opportunities. This combines the groundball/flyball adjustment with the old balls in play adjustment, and produces better results. It would be interesting to see park data for these figures. Frankly, I hope someone who has more time on his hands than I do will tally up DP and Errors by park, at a minimum, in addition to my long-standing wish to have H, 2B and 3B for every ballpark ever. It’s tedious but possible. However, with the data we have now, I make no park adjustments. By the way, I should let you know that accuracy with this is good but not great for infielders and outfielders, and poor for catchers. Stolen bases allowed by each catcher varies more than opponents caught stealing do, and thus a catcher’s arm rating may be very different than his real throwing capacity. Until opposition stolen bases become available for all teams (which will be a few years), take single-season catcher arm ratings with a grain of salt. We are working on methods of determining dropped third strike assists from a team’s passed ball/wild pitch rate, but it only helps a little, and only with the caught stealing side of the equation, which, as I just said, is the lesser side. That being the case, one can and should use stolen bases when they are available, which is from 1978 to the present. Now, we move on to error rates. I chose to treat these separately from range, largely because I can then use a different value for errors when it comes time to determining run-values. Determine error rates as below:
For the most part, these results vary little from traditional fielding percentage, but it helps to remove the “dead plays” from a fielder’s error rate. It does make a large difference for catchers, as Bill James has noted. For many positions, I calculate position-specific details:
For catchers, this is easy. Same as before—PB and WP rate per runner on base. A low rate is good. Including Wild Pitches keeps the value from varying too much. For first basemen, we figure the rate at which a team’s third basemen and shortstops made errors. It’s an easy calculation:
Finally, I figure net putouts for middle infielders. A middle infielder obtains a large portion of his putouts as cheap plays, plays a high school shortstop could make, namely popups and fielders’ choices. Furthermore, these plays are elective plays, either the shortstop or second baseman could make them. They occur a bit more often on bad teams, since a requirement for a fielder’s choice is a runner on first, which occurs more often with a bad team. Bad teams also allow a few more balls in play. The calculation is: A note on this. This particular formula should brand me as a hypocrite, as I use Balls in Play as the primary source of opportunities instead of hits allowed. There are reasons why I did it this way: I should note I am comparing totals to league averages to come up with an outs over/under number. Thus, the initial context in which I am working is a Pete Palmer-style context. I can switch to a Bill James-style context, and I worked my butt off to do so. The nature of fielding statistics makes it easier to come up with a Pete Palmer-type over/under number. I can create a number like Fielding Runs, like Defensive Winning Percentage, and am even working on Win Shares. With the right plus/minus value and the right base, you can turn a decent fielding metric into anything. Finally, we come to create a table of values for statistics. I stole some values from XRuns, and some others I used a Tangotiger suggestion and adjusted values to a 27-out context. Basically, turning a hit into an out not only puts that batter out, it prevents another batter from coming to the plate. I shan’t run this table with this article; its values would change annually. Remember, if you come up with your own table that you need to adjust everything down a bit because we know where the outs went, but not the hits (I used 1/3 the value for infielders and 2/3 the value for outfielders). Or, to put that in plainer English, were you to evaluate an infield as a unit (add all infield assists together, as well as net putouts by first basemen and catchers, and figure its rating vis-a-vis hits) versus adding up all the individual totals, the infield would be one third the sum of the individual totals. It is a mathematical phenomenon. To compute these values:
Some additional explanations are in order. For errors we’re making an assessment of whether or not the error put a man on base. As such, different positions have different error weights. The other errors at the position are weighted as advancing a man. There are a few muffed foul flies, but not many, and even in those cases, a small penalty is in order. I also reduced the values ever so slightly for third basemen and shortstops, since we are crediting first basemen for part of their rating (that’s the “E.throw”). In the case of net assists for first basemen, we’re only interested in the advancement penalty. We already counted the hit prevention/out making ability of those assists in the range calculation. Also, to avoid additional double counting, we halved the value of first basemen’s double plays, as well as pitchers (though, for pitchers, this is because they often end double plays). For outfielders, I only calculated the amount an extra base hit has over a single. I know, there are exclamation points in your eyes, hear me out. For the most part, outfielders cannot prevent extra base hits, but rather keep them from becoming extra base hits. This isn’t a pretty solution, as turning that double into a single lowers the rating, but the values do work. Read that last sentence in Bill Jamesese—I tried it the other way, and Mike Emeigh and I agreed it didn’t work. I tried it about three ways, and the play-by-play data did not support the variance between each fielder each other way showed. I am certainly open for another solution. For outfield assists, Mike Emeigh ran the correlation, and there is an inverse correlation between outfield assists and advances, but it is not large. Therefore, I chose to make it equal to one advance plus one runner pegged, as well as the value of not having another man come to bat. Finally, to come up with a Defensive Winning Percentage, we need to devise a baseline, a level of basic defensive responsibility. We first find the league
((1B * 0.5) + (2B * 0.72) + (3B * 1.04) - (AB - H - SO) * 0.09) / (AB - HR - SO) + 0.098 This is a measure of the hit-prevention value of a strikeout, and will be around 0.20 for a league. Next, we make this a rate:
Multiply this by the team BFP, GB or FB, and apply the left/right adjustment. This is the baseline. From here, you can use the Pythagorean Formula to derive a percentage. ## What I learned from Bill James, and what I already knew
I have heard (such as one can hear in cyberspace) a fair number of
The truth is, most of these concepts are obvious, if you think about them. Team assists represent groundballs—I learned that one from the 1985 Baseball Abstract, when someone recommended to James putting an outfielder’s putouts in the context of PO-SO-A. Left-handed pitchers may cause a third baseman to record fewer plays—someone wrote into The Sporting News in 1991 to defend Carney Lansford’s low Range Factor with this fact. I claim two things as mine: That Bill James and I would come to similar conclusions about fielding data is not the surprise. The real surprise is that James, Clay Davenport and I are the only people who bothered to adjust for what we already knew. I first published this system to rec.sport.baseball in 1994. The system resembled James’s Range Index then, as Davenport’s system does now. James published Range Index in the 1984 Abstract, which I had not read at that time. However, my estimate of first basemen’s double plays is consciously adapted from those early Abstracts. Good works feed off each other. Clay Davenport’s work caused me to lower my exponent for left-handed pitching on two occasions. I sure as hell hope someone fixes places where I have flaws, like using catchers’ assists.
Charles Saeger
Posted: September 21, 2002 at 06:00 AM | 21 comment(s)
Login to Bookmark
Related News: |
## BookmarksYou must be logged in to view your Bookmarks. ## Hot TopicsLoser Scores 2014
(8 - 2:36pm, Nov 15)Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43 - 8:29am, Nov 05)Last: Jolly Old St. Nick Is A Jolly Old St. Crip What do you do with Deacon White? (17 - 12:12pm, Dec 23)Last: Alex King Loser Scores (15 - 12:05am, Oct 18)Last: mkt42 Nine (Year) Men Out: Free El Duque! (67 - 10:46am, May 09)Last: DanG Who is Shyam Das? (4 - 7:52pm, Feb 23)Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45 - 9:22pm, Jan 09)Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5 - 12:50am, Sep 18)Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37 - 12:33pm, Jun 23)Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2 - 8:03pm, May 16)Last: Diamond Research Retrosheet Semi-Annual Site Update! (4 - 3:07pm, Nov 18)Last: Sweatpants What Might Work in the World Series, 2010 Edition (5 - 2:27pm, Nov 12)Last: fra paolo Predicting the 2010 Playoffs (11 - 5:21pm, Oct 20)Last: TomH SABR 40: Impressions of a First-Time Attendee (5 - 11:12pm, Aug 19)Last: Joe Bivens, Minor Genius St. Louis Cardinals Midseason Report (12 - 12:42am, Aug 10)Last: bjhanke |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Page rendered in 0.7170 seconds |

## Reader Comments and Retorts

Go to end of page

*Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.*

1. Charles Saeger Posted: September 22, 2002 at 12:49 AM (#606373)There is a second article about a topic I about which I am writing in the next version: putouts by first basemen, second basemen and shortstops. I realized late in writing this my ways of handling these were poor and I could do better, as both Bill James and Clay Davenport were (well, Bill James on first basemen's putouts, he handles putouts by middle infielders no better than I do).

As I commented on Fanhome, it is a great overall system (I think) - one that is able to use traditional team and player info and rigorously compute each player's fielding skill (with the help of some extraneous, yet critical, material from a "one-time" PBP database), rather than having to rely on such garbage metrics as "range factor".

Again, to be honest, I can't comment on the specifics of your methodology because it is difficult to wade through your article. I have a "feeling", however, that the methodology is sound.

I do wish that you (and some others to whom I have made similar comments) would write more in "English" than in a style more befitting the "American Journal of Applied Mathematics". As well, more "description" (again, in "English") and less formula-type prose would be helpful for feedback and understanding. For example, if I described (in nauseating detail) the exact formula I use for UZR, I think that I would lose lots of folks. Instead I describe, in easy to understand English, the basic idea. I might even get into some of the boring details, and if I do, I still try to present them in "English". If someone wants to know some of the exact "formulas" that go into my methodolgies (and trust me, no one ever does - no one gets paid to "peer review" our articles), they can request them or I can supply appendices or something like that.

If I am in the minority in this regard, ignore these comments...

Here is an edited copy of my recent post on Fanhome, describing what I think is your "system" (I hope that either Charles or Mike is dgb100 or that dgb100 is a colleague of theirs, otherwise it looks like someone is stealing ideas from someone else) in 500 words or less:

Let me summarize what dgb is doing, which as Tango states, is using all the traditonal information available to come up with basically a ZR/RF (that's "slash", not "divided by") for each fielder.

He is first taking each team's BIP's. Then he is (or should be) separating that into GB's and FB's, using a team's GB/FB ratio (if available, of course; if not, we don't do that step - we simply

assume a league-average GB/FB ratio, or we can estimate GB/FB ratio from a team's total IF assists and BIP's).

Now here's the nice part:

He determines how many balls per 100 GB's that each infielder (or how many balls per 100 FB's for outfielders) "should" catch (turn into outs - GB assists for IF'ers and FB putouts for OF'ers). He does this by using PBP data from a bunch of historical games to determine, on the average, what percentage of all ground balls (GB BIP's) are "caught" (turned into outs) by the SS, 3B'man, etc. For example, in the database, if the SS catches 10 ground balls per every 100 GB's

hit, then it is assumed that for every 100 GB's that team A allows (remember, we calculate how many GB's team A allows by taking their BIP's and applying their pitchers' GB/FB ratio), their SS should field 10 of them. If he only fields 8 (he has 8 GB "assists" per 100 GB's), then he is a below average fielder with a AFR of .8.

He can further refine his system to take into consideration the handedness of the opposing batters and/or the handedness of a team's pitchers, depending upon what information is available. Obviously if a team has lots of LHP's they will face more RHB's than the average team; consequently they will allow more ground balls to the left side of the infield.

So, he can take the PBP database and figure out "how many ground balls does a SS catch per 100 GB's allowed when a RHB is at bat," and do the same for LHB's, and for FB's versus LHB's and RHB's as well.

If we don't have that kind of information available - handedness of opposing batters (which we probably don't unless we have some kind of a PBP database), then we can do the same thing using the handedness of the pitchers. For example again, we would use the historical PBP databse to see how many GB's were caught by a SS on average when a LHP was on the mound and how many were caught with a RHP on the mound. Now we can look at team A and even if we don't know how many balls were put in play when their RHP's were on the mound and how many were put into play when their LHP's were on the mound, we can figure out the percentage of time a RHP was on the mound, the percentage of time a LHP was on the mound, and divide up the team's total BIP's accordingly (I guess if we want, we can compute from the indivual player stats how many BIP's and GB's and FB's were allowed by each pitcher, and therefore by all their LHP's combined and all RHP's combined.)

The next logical step is to put all of this nice methodolgy into an "easy to read and undestand" formula, DGB!

Like EZR for an IF(estimated zone rating, or whatever you want to call it)=(player "A" GB assists)/((team BIP)*(team GB/FB))/(average player for position "A" GB assists per 100 GB's).

The above reads "A's GB assists divided by his team's total GB's" divided by "the average player at his position's GB assists per 100 GB's". This last term is a constant for each position in the field, based upon the historcal database.

You can refine the formula to account for a team's LHP and RHP's as discussed above, by putting in the appropriate conversion algorithm, and you can also include the formula for detemining a middle IF'er's GB assists only (facoring out his CS assists I guess).

Charles, is that basically what you are doing? You are calculating a "normalized ZR" for each fielder by estimating how many balls should have been caught by an average fielder at that position given the estimated number of ground balls hit to the infield and the actual or estimated percentage of RHB and LHB at the plate!

Of course, in order to do this, you also have to estimate how many of a defensive player's PO and A are actually "balls caught" and in fact relate to defensive skill. For outfielders, it should be PO only (I don't know whether you incorporate OF assists in your "formula" for OF defense; if you do, you shouldn't - OF assists bear little relationship to OF defensive skill vis-a-vis the arm - you would need holds/extra bases and opportunities; using OF A only would be like using catcher or baserunner CS only - it tells you very little about overall value), and for IF, it should be mostly assists on GB only - as you properly explain, assists on steals (do they give an IF an assist on a CS?) should be ignored (subtracted), PO by all IF'ers other than the 1B should be ignored (other than by middle IF'ers if you want to incorporate DP's), and only PO's by the 1B'man should count when he fields the ground ball and makes the play himself (doesn't he also get an assist when this happens? If he does, then we can ignore PO's by 1B'men as well). Whatever you do, as you also state, pop-fly PO's by IF'ers should and must be ignored, as there is almost no relationship between a defender's number of pop-ups caught and his defensive skill - for obvious reasons.

It just doesn't seem as complicated as a glance at your article suggests. Am I missing something? (Please re-read my last 2 paragraphs.)

A putout results when (these are either or):

1) a fielder catches a fly ball or line drive - nope, that's not it.

2) catches a thrown ball which puts out a batter - that's not it either.

3) tags a runner - that's not it!

Well, I can see no defintion that gives a fielder a putout (or an assist) when he makes a GB out unassisted...

BTW, a fielder who tags a runner on a CS or pickoff, gets credited with a PO, I guess, according to definition 3 of a PO, and certainly not as assist. Does the catcher get an assist based on "throwing a [batted or] thrown ball that results in a putout," the "thrown ball" being the pitch?

MGL -- lotsa stuff, obviously. You're right, I should have an article structure that shows me going through the math step by step. I have been working on that for the next version. Problem is, it is slow going. In spite of my malaprop above (and it is probably not the only one), my training is as a writer, not as a statistician. Writing page after page about numbers just is not very interesting, but it is necessary in this case.

IF putouts -- again, I have been spending tons and tons of time on these. I do have better formulas to estimate unassisted putouts not only by first basemen, but by second basemen and shortstops as well. I discovered a few things about these. I, too, am a little skeptical of the value of middle infielders' putouts, though the unassisted numbers do have some year-to-year consistency. I think I have spent more time on this topic over the last year than any other defensive topic, and have written about eight formulae about this.

OF assists -- as in, I have no other way of estimating the impact of an outfielder's arm. There is some correlation between a high assist rate and a low advance rate, but only some; as I wrote above, it looked like the assist was keeping the runner it pegged from advancing, which is why I gave it the value I gave it. As a note, an outfielder with a large number of Baserunner Kills almost certainly did have a positive impact with his arm despite the number of advances against him. Even a fluke year like Gary Ward 1982 or Joe Orsulak 1992 probably has defensive value in spite of the extra advances.

LHB/RHB -- well, yeah, I am trying to measure opportunity, or more to the point, failed opportunity (since we already know successful opportunity). We went to the PBP data for this one. You figure the adjustment, and multiply it by failed opportunities. It is not as bad as it looks, but I would be open to a simpler way of calculating this.

Errors -- I am doing this; the error values show how likely the error put a man on base. For example, I figure an outfielder's error as 25% of the value of putting another man on base (0.50 + 0.09 + LgR/LgPA) plus 75% of the value of allowing a man to advance (0.18). As each position has a different "put the batter on first" rate, each position has a different value.

DP opps -- that is what I am doing.

Run values -- what I am doing is figuring the value of each event and multiplying each plus/minus number by that value. If each infielder's assist has a weight of 0.234, I multiply the positive/negative number by 0.234 to determine how many runs that fielder saved/blew versus league average. I add them together to find the total plus/minus runs.

He determines how many balls per 100 GB's that each infielder (or how many balls per 100 FB's for outfielders) "should" catch (turn into outs - GB assists for IF'ers and FB putouts for OF'ers).And this is where he makes the mistake - because whether or not a fielder should make a play is dependent upon the specific context in which the ball is hit - both the game context (runners on base, number of outs, game score, batter at the plate, pitcher on the mound) and the fielder context (fielder position relative to his teammates). Lumping all of these results together, and making value assignments based on the aggregate results from all fielders, makes the outcome highly susceptible to aggregation bias, where the group characteristics not only don't apply across the board to the individuals in the group but are highly likely to be significantly different for individuals in the group.

It is far less likely to introduce bias into the results to consider whether the fielder *could* make a play on the ball, and to penalize him to the full extent whenever a play is not made in an area where he could have made a play, even if when all teams are lumpred together another fielder was more likely to have made the play. IOW, if a single goes through the SS hole, both the 3B and the SS should be penalized the full value of one single, because either could have made the play depending on the circumstances, and because you don't know the circumstances you can't make a valid

a priorijudgment as to which fielder *should* have made the play.-- MWE

and because you don't know the circumstances you can't make a valid a priori judgment as to which fielder *should* have made the play.the point is to make a best guess. If the ss makes 90% of a particular play in zone X and the 3b makes 10%, and if a hit gets through, then to minimize your OVERALL error, you assign 90% of the blame to the ss.Now, if you tell me that with man on 2b, 0 outs, and RH at bat the SS only makes 60% of those plays, then fine, let's adjust based on this new data.

But to categorically make it 100% for both players, is, in my view, not a valid representation.

My position is that you identify every possible variable, situation, and context that you can think of, and base your best estimate on that.

ZR

- by zone

- by base-out state

- by score differential & inning

- by LH/RH batter

- by LH/RH pitcher

- by park

- by actual batter

- by actual pitcher

- by batter showing bunt / no-bunt

- by batter executing bunt / no-bunt

- by speed of runners on base

Anything else I missed?

There's supposed to be some tables for the 2001 data, but they aren't up yet.

Range outs / (Range outs + Hits Allowed - Home Runs Allowed)

Arm outs / Runners on base

And then I adjust.

I discovered the reduced formulae this spring. I don't use them because it is harder to make adjustments with ithem, but they do work on a basic level.

I was not being facetious. Those two formulae really do work.

Catcher assists do track opponents caught stealing. I am adding an adjustment based on passed balls allowed, which do loosely (r=0.50) track K23 assists, which improves the accuracy there. The problem is, both catcher assists and opponents caught stealing also correlate well with opponents stolen bases allowed, so a good assist total could well mean the opposite of what we assume it means, a good throwing catcher.

M.D.'s enthusiasm for spreadsheeting reminds me that Mssrs. Saeger and/or Emeigh have previously been seen talking with Sean Forman about eventually adding CADish results to Baseball Reference. I.e., for every player season ever. Which of course would rule. I remember subsequent intimations that this might be impossibly hard.

So where does that project idea stand?

Enlist M.D. and others as aides!

If you want constructive criticism, you have to present the details of your method to your audience so that they understand your thought process and so that they can replicate enough of the work to feel comfortable about the path that you are taking (and to suggest improvements as warranted). If you hold back the details, on the other hand, you take the risk of undermining your own credibility, especially if your audience sees what appears to be an obvious flaw in your approach but can't confirm whether or not you've addressed it because you haven't provided the details. There's nothing to lose, and a great deal to be gained, from submitting an analysis method in all of its gory detail for independent analysis and assessment by your readers, many of whom probably know as much about the subject as you do, much as I hate to say it :)

The "holy grail" nature of defensive analysis comes about in large part because we have almost no information about fielder performance in relation to opportunity to perform. We have "opportunity contexts" for batters and pitchers, with a fairly complete record of their successes and failures. For fielders, we have a record of their successes (polluted by successes of other fielders that show up in their record) and only a partial record of their failures (even in zone-based systems), so we don't have a complete record of their "opportunity context". What Charles has attempted to do here, to the best of his ability, is to strip out the areas of pollution in the existing records and to derive an "opportunity context" for fielders based on information that we know, without trying to divvy up responsibilities based on what we think is true but which we can't support with empirical evidence. It's not a simple task, because the process of converting a ball in play into an out is *heavily* driven by contextual factors, to a far greater degree than either batting or pitching, and it's a strain just to make sure that those factors have been identified, let alone to ensure that they have been properly accounted for.

-- MWE

You must be Registered and Logged In to post comments.

<< Back to main