Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Saturday, September 21, 2002

The Good, the Bad and the Context-Adjusted

Charlie Saeger demonstrates the inner workings of Context-Adjusted Defense v3.0.  Zero Clint Eastwood references.

Recently, I received an e-mail from someone wanting an explanation of how Context-Adjusted Defense works for the math-impaired. I figured it would make a good article, so I’m writing it as an article.

The principle behind CAD is that a fielder’s contribution to his team’s fielding will, in some way, show up on the scoreboard. However, his context (mostly pitching staff, but also ballpark) will alter just how we see his contribution in the final stat line. Therefore, we must use known principles of fielding as a machete to slash through the weeds that surround traditional defensive statistics.

Something important—I believe that, at their core, traditional fielding stats are mostly valid. All other things being equal, a shortstop who recorded 527 assists is a better shortstop who only recorded 436 assists. Some statistics are not valid, and we should know which ones they are to keep them from fooling us.

Above all, the first clause of the second sentence of the above paragraph—“all other things being equal”—is important in evaluating defensive statistics. It’s valid in evaluating other baseball statistics, but it is tantamount when evaluating fielders. The shortstop who recorded but 436 assists could well be a better shortstop than the one who recorded 527 assists. There are many important cues for which one must watch so one can know when the numbers are lying.

First, some ground rules:

* I prefer innings estimates to a Claim Points system for individual fielders. Ultimately, any method of determining defensive innings will show itself in a Claim Points system, since the fellows who recorded more outs played more innings. Estimating innings based on total chances works because Range Factor does not vary much at a position (remembering a fielding corollary of Voros’s Law: anyone can do anything in 100 innings afield), and because those outs recorded themselves determine innings. An inning is three outs, after all.

So, when determining an individual player’s opportunities, prorate the adjusted team opportunities to the player’s innings. If Derek       Jeter played 972 of the Yankees’ 1458 innings at shortstop, and you determine the Yankees shortstops are responsible for 600 adjusted hits allowed, Jeter is responsible for two-thirds of those, or 400 adjusted hits allowed.



* I am comparing each rate to the league average, noting the outs/errors/whatever better or worse than the league rate, and setting it aside. Before deriving a value in runs, you’ll have anywhere from three to five numbers, either positive or negative.



The core calculation for CAD is a measure of range. I place the fielder’s outs in the context of his team’s hits allowed total. The reason for this is because defensive stats do a pretty good job at telling us how often a fielder succeeded, but they are almost completely silent when we ask them how often a fielder failed. We do know how often the entire team failed, however, and that’s the team’s hits allowed total.

                                                                                                                                                                Thus, the initial calculation is:



Player outs / (Player outs + team hits allowed)



“Team hits allowed” means different things for infielders and outfielders. Look at the groundball/flyball adjustment for more detailed info.               

                                                                                                                                                                Since I have updated the system and haven’t given out the details to anyone but co-author Mike Emeigh, I’d like to redefine “Player Outs” by position:


if

A

c

PO - SO

1b

PO - (A.2b+ A.3b + A.ss) + A

of

PO

 

I removed putouts from infielders for many reasons. For pitchers, they reflect two things, both of which are unrelated to skill: covering first base on 3-1 groundouts, and pop flies. For third basemen, many putouts are related to skill, but again there are pop flies, and a team’s foul territory heavily influences the number of foul files he catches. (First basemen are also affected by these phenomena, but not putouts as they primarily reflect unassisted groundouts, which is the most important measure of their fielding, so we must keep them.) I handle middle infielders’ putouts separately later on as to weed out the forceouts (which are about 30-40% of a middle infielders’ putouts).     

For catchers and first basemen, scale down these net putouts as as a percentage of putouts, giving the same percentage to each fielder. The good fielders will make up for this when you scale down the hits, for which you use defensive innings. If you do not have a defensive innings total, you’ll need to estimate it. I was using a weighted formula of team games, but now I stole Bill James’s since it works much better. If you’re dealing with an outfielder who played in multiple fields, scale his stats by the percentage of games in each field, and give a 25% boost to putouts in center field.                                                                                                                                           

The first context in which we place a fielder’s outs is his team’s hits allowed. I mentioned this above, but it bears repeating. Indeed, this is the most important adjustment of all, since a good defensive team became such through its good individual fielders. Thus a fielder on a team like the 2001 Seattle Mariners, which was an outstanding defensive team, will be considered a good fielder unless his traditional defensive statistics (Range Factor, Fielding Percentage, Double Play Rate) were very poor. A fielder on the 2001 Cleveland Indians, a poor defensive team, would considered to be a bad fielder unless his traditional defensive statistics were very good. Those people who decided Roberto Alomar was a better defensive player than Bret Boone (in 2001) should take note.

The second context is his team’s groundball/flyball rate. Looking at the play-by-play data, we have found the following to be true:



* Team assist rates accurately predict the number of groundouts a team’s pitchers generated.

* Virtually all doubles and triples are the responsibility of the outfielders.

* Most singles are also the responsibility of the outfielders, and have fall at a similar distribution to the outs.



I changed the adjustment based on this info. To figure it, first determine a team’s number of groundouts and flyouts:

Groundouts

A - A.c - A.of - DP.1b

Flyouts

PO - SO - A

 

Incidentally, I wrote four years ago that the ratio of groundouts to flyouts was low, but this accurately predicted ranking. I was wrong. This is the proper ratio. We have become accustomed to looking at data from the Elias Bureau that has a normal ground-air ratio as 1.17. Elias figures this because it counts double plays as two groundouts. If you remove them, you find a team’s ground-air ratio is closer to 0.80, just including the outs.

                                                                                                                                                              Then, subtract doubles and triples allowed, and multiply this number by GO/(GO+FO). This is the number of singles for which the infielders are responsible. The outfielders are responsible for all remaining hits, which includes the doubles and triples, so remember to add them back into the total.

Thus, one figures infielders’ outs in relation to team groundball hits, and outfielders’ outs in relation to team flyball hits.

Next, we come to the pitching staff.  Really, we come to the batter at the plate. Left-handed batters are more likely to hit the ball to first base, second base or left field, and right-handed batters are more likely to hit the ball to third base, shortstop or right field. In post-Casey Stengel times, a right-handed pitcher is more likely to face a left-handed batter, and a left-handed pitcher is more likely to face a right-handed batter.

I was wrong on this adjustment before for four reasons:



* I made the rate too extreme. A team’s LHB/RHB rate is not as lock-in-step with the pitching staff as I assumed. Thus, I moved the exponent for the rate from 0.5 to 0.25.

* Batters do not pull the ball more often with the platoon advantage.

* Batters do not pull the ball to the outfield. In fact, they usually hit the ball to opposite field.

* I’m looking, but I’m not sure where the back boundary for platooning is. It may have occurred in some seasons, and not in others. For example, in 1941, it’s pretty clear that few, if any, managers were platooning. However, in 1942, both leagues were platooning at a modern rate. It’s possible teams platooned more as the regular ballplayers went to war.



With all this in mind, I present two versions of the adjustment. The first is for seasons in which we do not have LHB/RHB data:



((BFP.rhp - HR.rhp - HB.rhp - BB.rhp - SO.rhp) / (BFP.tm - HR.tm - HB.tm - BB.tom - SO.tm)) / lgAVG) ^ 0.25

This is for third basemen. Divide by 2 and add 0.50 for shortstops, and divide by 5 and add 0.80 for right fielders. For first basemen, divide the third base rate into one (or, more mathematically, this is the inverse); for second basemen, divide the shortstop rate into one; and for left fielders, divide the right field rate into run.

The second is for seasons for which we do have LHB/RHB data:



(((AB.rhb - HR.rhb - SO.rhb) * posRate) / (AB.tm - HR.tm - SO.tm)) / lgAVG



What the heck is the posRate? It is the “position rate,” a measure of how much more likely a right-handed batter is to hit a ball to that position. Each affected position has a different rate:

1b

0.25

2b

0.50

3b

4.00

ss

2.00

lf

0.70

rf

1.30

 

A right-handed batter is four times as likely to hit a ball to the third baseman than a left-handed batter, and twice as likely to hit a ball to the shortstop than a left-handed batter. This sounds like a big difference, and it is, but teams don’t vary much on the number of right-handed batters they face.

On the flip side (pun intended), that same batter is 30% more likely to hit the ball to the right fielder. That’s not a huge difference, and I debated on whether to adjust for that. I chose to do so, since it’s not entirely trivial. Do not make this adjustment when you are making estimates of left/center/right playing time.

Finally, I adjust for ballpark. I haven’t made any changes here, though I do make the assumption that a team’s ground/air rate remains the same from park to park, since I calculate groundball and flyball singles for both home and road. It may well not, but since we’re primarily talking about historical data, we shall never know whether most teams do this or not. Still, a pitcher could throw lower in the strike zone in Coors Field than in Dodger Stadium, and it would be worth knowing if this is true.

As some have noted in discussions, I make no adjustment for team strikeouts. This is partially true, actually; I make no overt adjustment for team strikeouts. A team with a high strikeout rate will have fewer outs available to the fielders, but will also have fewer hits allowed. Strikeouts are Adam Smith’s invisible hand—they correct for themselves without adjustment.

Herein lies the ultimate difference between my system and Clay Davenport’s system. For a team, the two of us will create virtually identical ratings. However, when a team allows 100 fewer hits than normal and strikes out 100 fewer batters than normal, Davenport’s methods will assume the two events negate each other. Davenport’s system will rate third baseman with 300 assists versus 300 assists for league average as average in this context. CAD will assume the player is a bit better than average.

Next, I make an assessment of a player’s ability to remove runners who are already on base. The basic calculation for this is:



Runners Removed / Opportunities



Each position, naturally, has a different way of determining runners removed:


c

A

1b

DP - DP.2b - DP.p / 2

1b

A - PO.p

if

DP

of

A

 

Notice this has not changed, aside from adding the extra measure of a first baseman’s arm. See below (“What I learned from Bill James, and what I already knew”) for details.

Similarly, each position has a different way of determining how many runners are available:


c

1B + BB + HBP + (0.71 * Err) - DP.1b - A.of

if

1B + BB + HBP + (0.71 * Err) - A.c - A.of

of

H + HR + BB + HBP + (0.71 * Err) - A.c - DP.1b

 

Again, notice that nothing has changed. Actually, I’m experimenting with different weights for errors by position, but it isn’t a big deal. Here are the values for that:

p

50%

c/of

25%

1b

80%

if

85%

 

That’s the percentage of time an error at that position puts a man on first base. I have no idea how well that holds up over time. After working with National Association data, I would assume it does not hold up well in the 19th Century.

This figure is affected by a team’s pitching staff’s handedness in the same way the range figure is, except I do not make this adjustment for middle infielders. Apply adjustments to opportunities, by the way, for both range and arm calculations. The number of outs a fielder made or the double plays he turned is certain. His opportunity to make those outs and turn those double plays is not.

For groundballs and flyballs, I took a different tack. I took the team’s groundball or flyball total, divided it by BFP, and multiplied the resulting figure by the team’s opportunities. This combines the groundball/flyball adjustment with the old balls in play adjustment, and produces better results.

It would be interesting to see park data for these figures. Frankly, I hope someone who has more time on his hands than I do will tally up DP and Errors by park, at a minimum, in addition to my long-standing wish to have H, 2B and 3B for every ballpark ever. It’s tedious but possible. However, with the data we have now, I make no park adjustments.

By the way, I should let you know that accuracy with this is good but not great for infielders and outfielders, and poor for catchers. Stolen bases allowed by each catcher varies more than opponents caught stealing do, and thus a catcher’s arm rating may be very different than his real throwing capacity. Until opposition stolen bases become available for all teams (which will be a few years), take single-season catcher arm ratings with a grain of salt. We are working on methods of determining dropped third strike assists from a team’s passed ball/wild pitch rate, but it only helps a little, and only with the caught stealing side of the equation, which, as I just said, is the lesser side. That being the case, one can and should use stolen bases when they are available, which is from 1978 to the present.

Now, we move on to error rates. I chose to treat these separately from range, largely because I can then use a different value for errors when it comes time to determining run-values. Determine error rates as below:

c

(PO + A - SO) / (PO + A + SO + Err)

1b

(PO + A - A.2b - A.3b - A.ss) / (PO + A - A.2b - A.3b - A.ss + Err)

2b/ss

(PO + A - DP) / (PO + A - DP + Err)

3b/p/of

(PO + A) / (PO + A + Err)

 

For the most part, these results vary little from traditional fielding percentage, but it helps to remove the “dead plays” from a fielder’s error rate. It does make a large difference for catchers, as Bill James has noted.

For many positions, I calculate position-specific details:

c

PB/WP rate

1b

3b/ss Error rate

2b/ss

Combined PO rate

 

For catchers, this is easy. Same as before—PB and WP rate per runner on base. A low rate is good. Including Wild Pitches keeps the value from varying too much.

For first basemen, we figure the rate at which a team’s third basemen and shortstops made errors. It’s an easy calculation:

(A.3b + A.ss) / (A.3b + A.ss + Err.3b + Err.ss)



Again, a low number is good. The theory is that a good first baseman will prevent throwing errors by his third basemen and shortstops. This is true, to a small extent, so I weight it appropriately low.

Finally, I figure net putouts for middle infielders. A middle infielder obtains a large portion of his putouts as cheap plays, plays a high school shortstop could make, namely popups and fielders’ choices. Furthermore, these plays are elective plays, either the shortstop or second baseman could make them. They occur a bit more often on bad teams, since a requirement for a fielder’s choice is a runner on first, which occurs more often with a bad team. Bad teams also allow a few more balls in play. The calculation is:



(PO.2b + PO.ss - A.c / 2 - DP.1b) / (BFP - SO - 2B - 3B - HR - A.c / 2 - DP.1b)



Since flyball pitchers induce popups and groundball pitchers induce fielders’ choices, the two cancel each other out. I divide the result by two—half for the second basemen, half for the shortstops—at the runs value stage. Obviously, substitute OCS for A.c / 2 when it is available.

A note on this. This particular formula should brand me as a hypocrite, as I use Balls in Play as the primary source of opportunities instead of hits allowed. There are reasons why I did it this way:



* A popup does not compete with a hit allowed. Infielders catch 99% of popups, and those they do not catch are errors. A team that allows many flyballs and strikes out few batters will induce more popups.

* Again, a fielder’s choice requires a runner on first, as well as a groundball, which cancels out the flyball effect, but is still affected by strikeouts.

* Balls in Play does include hits allowed. It is not like I ignored them entirely.

* Most importantly, the standard deviation as a percentage of the league rate is lowest when I compute infield putout rates this way. For awhile, I was computing this as a percentage of runners on base, and the standard deviation was large.



When applying this number to individual fielders, take it as a percentage of team putouts and apply it to the individual fielder’s putouts. It looks a little weird, having to apply 398 PO as a percentage of 277 PO. As I said before, apply the net putouts for catchers and first basemen, and net assists and double plays for first basemen the same way.

I should note I am comparing totals to league averages to come up with an outs over/under number. Thus, the initial context in which I am working is a Pete Palmer-style context. I can switch to a Bill James-style context, and I worked my butt off to do so. The nature of fielding statistics makes it easier to come up with a Pete Palmer-type over/under number. I can create a number like Fielding Runs, like Defensive Winning Percentage, and am even working on Win Shares. With the right plus/minus value and the right base, you can turn a decent fielding metric into anything.

Finally, we come to create a table of values for statistics. I stole some values from XRuns, and some others I used a Tangotiger suggestion and adjusted values to a 27-out context. Basically, turning a hit into an out not only puts that batter out, it prevents another batter from coming to the plate. I shan’t run this table with this article; its values would change annually. Remember, if you come up with your own table that you need to adjust everything down a bit because we know where the outs went, but not the hits (I used 1/3 the value for infielders and 2/3 the value for outfielders).

Or, to put that in plainer English, were you to evaluate an infield as a unit (add all infield assists together, as well as net putouts by first basemen and catchers, and figure its rating vis-a-vis hits) versus adding up all the individual totals, the infield would be one third the sum of the individual totals. It is a mathematical phenomenon.

To compute these values:

Hits.if

(0.59 + R / PA) / 3

Hits.of

((0.50 * 1B + 0.22 * 2B + 0.54 * 3B) / (H - HR) + 0.09 - R / PA) * 2 / 3

Caught Stealing

0.50 + R / PA

Assists.c

0.32

Assists.1b

0.18

Double Plays.P

(0.37 + R / PA) / 2

Double Plays.1b

(0.37 + R / PA) / 4

Assists.of

0.50 + R / PA

Passed Balls

0.09

Errors.p

(0.59 + R / PA) / 2 + 0.09

Errors.c

(0.59 + R / PA) / 4 + 0.135

Errors.1b

(0.59 + R / PA) * 0.80 + 0.036

Errors.2b

(0.59 + R / PA) * 0.85 + 0.027

Errors.3b/ss

(0.59 + R / PA) * 0.765 + 0.0243

Errors.of

(0.59 + R / PA) / 4 + 0.135

E throw

(0.59 + R / PA ) * 0.085 + 0.0027

 



I derived most of this through the following values:

Single

0.50 Runs

Double

0.72 Runs

Triple

1.04 Runs

Out

-0.09 Runs

GDP

-0.37 Runs

SB

0.18 Runs

CS

-0.32 Runs

 

Some additional explanations are in order. For errors we’re making an assessment of whether or not the error put a man on base. As such, different positions have different error weights. The other errors at the position are weighted as advancing a man. There are a few muffed foul flies, but not many, and even in those cases, a small penalty is in order. I also reduced the values ever so slightly for third basemen and shortstops, since we are crediting first basemen for part of their rating (that’s the “E.throw”).

In the case of net assists for first basemen, we’re only interested in the advancement penalty. We already counted the hit prevention/out making ability of those assists in the range calculation. Also, to avoid additional double counting, we halved the value of first basemen’s double plays, as well as pitchers (though, for pitchers, this is because they often end double plays).

For outfielders, I only calculated the amount an extra base hit has over a single. I know, there are exclamation points in your eyes, hear me out. For the most part, outfielders cannot prevent extra base hits, but rather keep them from becoming extra base hits. This isn’t a pretty solution, as turning that double into a single lowers the rating, but the values do work.

Read that last sentence in Bill Jamesese—I tried it the other way, and Mike Emeigh and I agreed it didn’t work. I tried it about three ways, and the play-by-play data did not support the variance between each fielder each other way showed. I am certainly open for another solution.

For outfield assists, Mike Emeigh ran the correlation, and there is an inverse correlation between outfield assists and advances, but it is not large. Therefore, I chose to make it equal to one advance plus one runner pegged, as well as the value of not having another man come to bat.

Finally, to come up with a Defensive Winning Percentage, we need to devise a baseline, a level of basic defensive responsibility. We first find the league (R - HR) / (PO - SO). Then, we come up with an outs value for each position:

p

A

1b

PO - (A.2b + A.3b + A.ss) + A

2b/ss

PO + A - DP - (PO - DP) / 2

3b/of

PO + A

c

PO - SO + A

 



and multiply the number for the league by (R - X R(pitcher only)) / (PO - SO). For catchers, we need to add one-third the pitcher-only XRuns ... what are the pitcher-only runs?



(HR * 1.44) + (BB - IBB + HBP) * .34 + (IBB * .25) - (SO * X)



where X is:



((1B * 0.5) + (2B * 0.72) + (3B * 1.04) - (AB - H - SO) * 0.09) / (AB - HR - SO) + 0.098



This is a measure of the hit-prevention value of a strikeout, and will be around 0.20 for a league.

Next, we make this a rate:

c

divide by BFP

1b/3b/p

divide by GB

2b

divide by GB and subtract 0.025

ss

divide by GB and add 0.025

of

divide by FB

 

Multiply this by the team BFP, GB or FB, and apply the left/right adjustment. This is the baseline. From here, you can use the Pythagorean Formula to derive a percentage.



Important note—to deal with individual players, you need to proportion opportunities by the percentage of innings the player played afield. If you do not have defensive innings, you need to estimate them; I am currently using the innings estimator Bill James used in Win Shares. If you come up with a better innings estimate, by all means post it.


What I learned from Bill James, and what I already knew

 

 

I have heard (such as one can hear in cyberspace) a fair number of
people remark about the similarities between my defensive system and
Bill James’s Win Shares fielding. Inevitably, someone posits the the
idea that James read CAD and stole it. I would be surprised if that
were true, and were it true, I would be flattered.

 

The truth is, most of these concepts are obvious, if you think about them. Team assists represent groundballs—I learned that one from the 1985 Baseball Abstract, when someone recommended to James putting an outfielder’s putouts in the context of PO-SO-A. Left-handed pitchers may cause a third baseman to record fewer plays—someone wrote into The Sporting News in 1991 to defend Carney Lansford’s low Range Factor with this fact.

I claim two things as mine:



* The idea that, if you remove the assists by the other infielders, a first baseman’s putout rate is meaningful.

* Hits less home runs is the proper context for fielding stats.



And do you know what? Even after I came up with these, I learned that other people had already hinted at them. In response to Bill James’s article introducing Range Factor to Baseball Digest readers, someone wrote a letter explaining why James should not have figured these for a first baseman, since his infielders could increase his putouts with their assists. The 1982 Baseball Abstract has Bill James calling hits (less home runs, “and the doubles and triples off the wall”) defensive failures, and even running a comparison chart of several pitchers, beckoning shades of Voros McCracken. These examples are only in the context of Bill James, and there doubtless are other examples.

That Bill James and I would come to similar conclusions about fielding data is not the surprise. The real surprise is that James, Clay Davenport and I are the only people who bothered to adjust for what we already knew.

I first published this system to rec.sport.baseball in 1994. The system resembled James’s Range Index then, as Davenport’s system does now. James published Range Index in the 1984 Abstract, which I had not read at that time. However, my estimate of first basemen’s double plays is consciously adapted from those early Abstracts.

Bill James’s Win Shares influenced me in three ways for this revision:



* I chose to use his net first baseman assists. While I do not regard it as important as James does, it is one of the few places where we can assign difficulty to plays.

* I chose not to use putouts by third basemen and pitchers.

* I chose to evaluate the putouts of second basemen and shortstops collectively, and in the context of runners on first base. I had been looking for a better way to handle middle infielders’ putouts for some time, and James’s comments on Nap Lajoie influenced my thinking.



As an aside, James did not place many items in their proper context. In racking my brain to find a place for net first basemen’s assists, I realized every event James described occurred with a runner on base. I gave them credit only for stopping the advancement, as I already gave credit for making an out. James instead chose to place them in context of the number of balls in play, and as one of his DWP-style “factors.” As a result, I think he gives these plays too much weight.

Good works feed off each other. Clay Davenport’s work caused me to lower my exponent for left-handed pitching on two occasions. I sure as hell hope someone fixes places where I have flaws, like using catchers’ assists.



Context-Adjusted Defense 4.0



The above is the 3.0 version (the rec.sport.baseball June 1994 version was 1.0, the Big Bad Baseball Annual 1999 was 2.0). Some things I would love to implement:



* A final-score table. How do I explain this? We know many events are more or less common depending on the relative score of the game. Assists by catchers and outfielders are more common in close games (especially by a team that is one or two runs behind), just to pick on something. We shall probably never know the score at every point in every game ever, but we do know the final scores. I’m hoping for a table like this:



Score

Event +5+ +2-4 +1 +0 -1 -2-4 -5+

A.of -50% +50% +30% +20% +10% +0% -60%



So, knowing the final score of the game, we can now how much more often an opposing baserunner would take a base and the outfielder would record the assist.



* Catchers’ assists. We know passed ball/wild pitch rates do have some correlation with dropped third strikes. We’d also like to see if there’s a way we can use the assist rate to figure both the ability to throw out a baserunner and to see how well the catcher fields other plays, like bunts. Forays into the play-by-play data have not been successful for most of this—these elements vary wildly from team to team.



* Range Bonus Plays. A great Bill James invention, and the one thing that is easier to represent in a Claim Point system. Why? In a Claim Point system, there is no need for penalties for the fielders who didn’t earn such plays. For seasons for which we lack accurate innings data, we would need to find a way to divvy up the RBPs in reverse for the fielders who lacked a bonus. Also, what is the weight of these? We know being a better fielder than your teammates is a sign of being a good fielder, but we already have given the player some credit for these plays.



For the record, I would want these for the infielders, and for catcher assists. For outfielders, this just causes problems when you do not have accurate innings data—Win Shares overrates center fielders because of these.

 

Charles Saeger Posted: September 21, 2002 at 06:00 AM | 21 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Charles Saeger Posted: September 22, 2002 at 12:49 AM (#606373)
Just to let everyone know -- I am actually working on the next version. There's a final table for 2001 that is supposed to run with this, I suppose I need to submit another copy to Sean.

There is a second article about a topic I about which I am writing in the next version: putouts by first basemen, second basemen and shortstops. I realized late in writing this my ways of handling these were poor and I could do better, as both Bill James and Clay Davenport were (well, Bill James on first basemen's putouts, he handles putouts by middle infielders no better than I do).
   2. Mitchel Lichtman Posted: September 22, 2002 at 12:49 AM (#606376)
I hestitate to comment at all, as there isn't enough free time in my whole life to digest all the info in your article, as it's presented...

As I commented on Fanhome, it is a great overall system (I think) - one that is able to use traditional team and player info and rigorously compute each player's fielding skill (with the help of some extraneous, yet critical, material from a "one-time" PBP database), rather than having to rely on such garbage metrics as "range factor".

Again, to be honest, I can't comment on the specifics of your methodology because it is difficult to wade through your article. I have a "feeling", however, that the methodology is sound.

I do wish that you (and some others to whom I have made similar comments) would write more in "English" than in a style more befitting the "American Journal of Applied Mathematics". As well, more "description" (again, in "English") and less formula-type prose would be helpful for feedback and understanding. For example, if I described (in nauseating detail) the exact formula I use for UZR, I think that I would lose lots of folks. Instead I describe, in easy to understand English, the basic idea. I might even get into some of the boring details, and if I do, I still try to present them in "English". If someone wants to know some of the exact "formulas" that go into my methodolgies (and trust me, no one ever does - no one gets paid to "peer review" our articles), they can request them or I can supply appendices or something like that.

If I am in the minority in this regard, ignore these comments...
   3. Mitchel Lichtman Posted: September 23, 2002 at 12:49 AM (#606377)
Here's a follow up to my previous post. First, I hope you take my criticisms with regard to the "complex" nature your article with all due respect. Your system is the best defensive metric I've seen other than UZR (my UZR that is), and the best without using PBP data (other than the "background/one time" PBP data, of course).

Here is an edited copy of my recent post on Fanhome, describing what I think is your "system" (I hope that either Charles or Mike is dgb100 or that dgb100 is a colleague of theirs, otherwise it looks like someone is stealing ideas from someone else) in 500 words or less:

Let me summarize what dgb is doing, which as Tango states, is using all the traditonal information available to come up with basically a ZR/RF (that's "slash", not "divided by") for each fielder.

He is first taking each team's BIP's. Then he is (or should be) separating that into GB's and FB's, using a team's GB/FB ratio (if available, of course; if not, we don't do that step - we simply
assume a league-average GB/FB ratio, or we can estimate GB/FB ratio from a team's total IF assists and BIP's).

Now here's the nice part:

He determines how many balls per 100 GB's that each infielder (or how many balls per 100 FB's for outfielders) "should" catch (turn into outs - GB assists for IF'ers and FB putouts for OF'ers). He does this by using PBP data from a bunch of historical games to determine, on the average, what percentage of all ground balls (GB BIP's) are "caught" (turned into outs) by the SS, 3B'man, etc. For example, in the database, if the SS catches 10 ground balls per every 100 GB's
hit, then it is assumed that for every 100 GB's that team A allows (remember, we calculate how many GB's team A allows by taking their BIP's and applying their pitchers' GB/FB ratio), their SS should field 10 of them. If he only fields 8 (he has 8 GB "assists" per 100 GB's), then he is a below average fielder with a AFR of .8.

He can further refine his system to take into consideration the handedness of the opposing batters and/or the handedness of a team's pitchers, depending upon what information is available. Obviously if a team has lots of LHP's they will face more RHB's than the average team; consequently they will allow more ground balls to the left side of the infield.

So, he can take the PBP database and figure out "how many ground balls does a SS catch per 100 GB's allowed when a RHB is at bat," and do the same for LHB's, and for FB's versus LHB's and RHB's as well.

If we don't have that kind of information available - handedness of opposing batters (which we probably don't unless we have some kind of a PBP database), then we can do the same thing using the handedness of the pitchers. For example again, we would use the historical PBP databse to see how many GB's were caught by a SS on average when a LHP was on the mound and how many were caught with a RHP on the mound. Now we can look at team A and even if we don't know how many balls were put in play when their RHP's were on the mound and how many were put into play when their LHP's were on the mound, we can figure out the percentage of time a RHP was on the mound, the percentage of time a LHP was on the mound, and divide up the team's total BIP's accordingly (I guess if we want, we can compute from the indivual player stats how many BIP's and GB's and FB's were allowed by each pitcher, and therefore by all their LHP's combined and all RHP's combined.)

The next logical step is to put all of this nice methodolgy into an "easy to read and undestand" formula, DGB!

Like EZR for an IF(estimated zone rating, or whatever you want to call it)=(player "A" GB assists)/((team BIP)*(team GB/FB))/(average player for position "A" GB assists per 100 GB's).

The above reads "A's GB assists divided by his team's total GB's" divided by "the average player at his position's GB assists per 100 GB's". This last term is a constant for each position in the field, based upon the historcal database.

You can refine the formula to account for a team's LHP and RHP's as discussed above, by putting in the appropriate conversion algorithm, and you can also include the formula for detemining a middle IF'er's GB assists only (facoring out his CS assists I guess).

Charles, is that basically what you are doing? You are calculating a "normalized ZR" for each fielder by estimating how many balls should have been caught by an average fielder at that position given the estimated number of ground balls hit to the infield and the actual or estimated percentage of RHB and LHB at the plate!

Of course, in order to do this, you also have to estimate how many of a defensive player's PO and A are actually "balls caught" and in fact relate to defensive skill. For outfielders, it should be PO only (I don't know whether you incorporate OF assists in your "formula" for OF defense; if you do, you shouldn't - OF assists bear little relationship to OF defensive skill vis-a-vis the arm - you would need holds/extra bases and opportunities; using OF A only would be like using catcher or baserunner CS only - it tells you very little about overall value), and for IF, it should be mostly assists on GB only - as you properly explain, assists on steals (do they give an IF an assist on a CS?) should be ignored (subtracted), PO by all IF'ers other than the 1B should be ignored (other than by middle IF'ers if you want to incorporate DP's), and only PO's by the 1B'man should count when he fields the ground ball and makes the play himself (doesn't he also get an assist when this happens? If he does, then we can ignore PO's by 1B'men as well). Whatever you do, as you also state, pop-fly PO's by IF'ers should and must be ignored, as there is almost no relationship between a defender's number of pop-ups caught and his defensive skill - for obvious reasons.

It just doesn't seem as complicated as a glance at your article suggests. Am I missing something? (Please re-read my last 2 paragraphs.)
   4. Mitchel Lichtman Posted: September 23, 2002 at 12:49 AM (#606378)
It was bugging me that I didn't know how it was "scored" when an IF'er makes an out "unassisted" on a GB. I guess since they call it "unassisted" he must get a putout and not an assist. Interestingly, the rulebook doesn't help a whole lot, unless I am missing something. According to the Offcial Rules, credit a fielder with an assist when he "throws or [meaningfully] deflects a batted or thrown ball in such a way that a putout results (or should have resulted)..." Obviously, on an unassisted GB out, no fielder throws or deflects a ball, so we must look to the definition of a putout...

A putout results when (these are either or):

1) a fielder catches a fly ball or line drive - nope, that's not it.
2) catches a thrown ball which puts out a batter - that's not it either.
3) tags a runner - that's not it!

Well, I can see no defintion that gives a fielder a putout (or an assist) when he makes a GB out unassisted...

BTW, a fielder who tags a runner on a CS or pickoff, gets credited with a PO, I guess, according to definition 3 of a PO, and certainly not as assist. Does the catcher get an assist based on "throwing a [batted or] thrown ball that results in a putout," the "thrown ball" being the pitch?
   5. Charles Saeger Posted: September 23, 2002 at 12:49 AM (#606381)
Gerry -- oops. Actually, I probably should have used "key" or "main" instead. Chalk one up against my pride in good use of the English language ...

MGL -- lotsa stuff, obviously. You're right, I should have an article structure that shows me going through the math step by step. I have been working on that for the next version. Problem is, it is slow going. In spite of my malaprop above (and it is probably not the only one), my training is as a writer, not as a statistician. Writing page after page about numbers just is not very interesting, but it is necessary in this case.

IF putouts -- again, I have been spending tons and tons of time on these. I do have better formulas to estimate unassisted putouts not only by first basemen, but by second basemen and shortstops as well. I discovered a few things about these. I, too, am a little skeptical of the value of middle infielders' putouts, though the unassisted numbers do have some year-to-year consistency. I think I have spent more time on this topic over the last year than any other defensive topic, and have written about eight formulae about this.

OF assists -- as in, I have no other way of estimating the impact of an outfielder's arm. There is some correlation between a high assist rate and a low advance rate, but only some; as I wrote above, it looked like the assist was keeping the runner it pegged from advancing, which is why I gave it the value I gave it. As a note, an outfielder with a large number of Baserunner Kills almost certainly did have a positive impact with his arm despite the number of advances against him. Even a fluke year like Gary Ward 1982 or Joe Orsulak 1992 probably has defensive value in spite of the extra advances.

LHB/RHB -- well, yeah, I am trying to measure opportunity, or more to the point, failed opportunity (since we already know successful opportunity). We went to the PBP data for this one. You figure the adjustment, and multiply it by failed opportunities. It is not as bad as it looks, but I would be open to a simpler way of calculating this.

Errors -- I am doing this; the error values show how likely the error put a man on base. For example, I figure an outfielder's error as 25% of the value of putting another man on base (0.50 + 0.09 + LgR/LgPA) plus 75% of the value of allowing a man to advance (0.18). As each position has a different "put the batter on first" rate, each position has a different value.

DP opps -- that is what I am doing.

Run values -- what I am doing is figuring the value of each event and multiplying each plus/minus number by that value. If each infielder's assist has a weight of 0.234, I multiply the positive/negative number by 0.234 to determine how many runs that fielder saved/blew versus league average. I add them together to find the total plus/minus runs.
   6. Charles Saeger Posted: September 23, 2002 at 12:49 AM (#606384)
David -- correct. I don't even know what the heck that is.
   7. Charles Saeger Posted: September 23, 2002 at 12:49 AM (#606385)
MGL -- yes, that is a putout. (Every out must be a putout somewhere, or else the boxscore will not balance.) Specifically, it is a forceout. I'm a little surprised it is not part of 10.10, but this has been the custom at every game I have ever attended and every boxscore and scoresheet I have ever seen. Must be an oversight on MLB's part.
   8. Charles Saeger Posted: September 23, 2002 at 12:49 AM (#606387)
F James -- the single-assist DPs do not matter. As for catchers, they have few assists in modern baseball that are not CS or K23 (less than 30% of their assists), and furthermore, we have no way to figure out how many such plays a catcher made. (Catchers field about 12% of all opposition SH, but this varies wildly.) This estimate is about as close as we shall reach, and it works fine. Heck, I'm more worried about the unassisted groundouts by first basemen screwing things up than catcher assists and single-assist DPs.
   9. Mike Emeigh Posted: September 23, 2002 at 12:49 AM (#606391)
He determines how many balls per 100 GB's that each infielder (or how many balls per 100 FB's for outfielders) "should" catch (turn into outs - GB assists for IF'ers and FB putouts for OF'ers).

And this is where he makes the mistake - because whether or not a fielder should make a play is dependent upon the specific context in which the ball is hit - both the game context (runners on base, number of outs, game score, batter at the plate, pitcher on the mound) and the fielder context (fielder position relative to his teammates). Lumping all of these results together, and making value assignments based on the aggregate results from all fielders, makes the outcome highly susceptible to aggregation bias, where the group characteristics not only don't apply across the board to the individuals in the group but are highly likely to be significantly different for individuals in the group.

It is far less likely to introduce bias into the results to consider whether the fielder *could* make a play on the ball, and to penalize him to the full extent whenever a play is not made in an area where he could have made a play, even if when all teams are lumpred together another fielder was more likely to have made the play. IOW, if a single goes through the SS hole, both the 3B and the SS should be penalized the full value of one single, because either could have made the play depending on the circumstances, and because you don't know the circumstances you can't make a valid a priori judgment as to which fielder *should* have made the play.

-- MWE
   10. tangotiger Posted: September 23, 2002 at 12:49 AM (#606392)
Sure you have aggregation bias, but and because you don't know the circumstances you can't make a valid a priori judgment as to which fielder *should* have made the play. the point is to make a best guess. If the ss makes 90% of a particular play in zone X and the 3b makes 10%, and if a hit gets through, then to minimize your OVERALL error, you assign 90% of the blame to the ss.

Now, if you tell me that with man on 2b, 0 outs, and RH at bat the SS only makes 60% of those plays, then fine, let's adjust based on this new data.

But to categorically make it 100% for both players, is, in my view, not a valid representation.

My position is that you identify every possible variable, situation, and context that you can think of, and base your best estimate on that.
   11. tangotiger Posted: September 24, 2002 at 12:50 AM (#606399)
MGL: these are the variables that you should consider for UZR
ZR
- by zone
- by base-out state
- by score differential & inning
- by LH/RH batter
- by LH/RH pitcher
- by park
- by actual batter
- by actual pitcher
- by batter showing bunt / no-bunt
- by batter executing bunt / no-bunt
- by speed of runners on base

Anything else I missed?
   12. Charles Saeger Posted: September 24, 2002 at 12:50 AM (#606400)
Runners removed is pretty simple. It is just double plays, outfield assists or opposition steal rate, depending on position.

There's supposed to be some tables for the 2001 data, but they aren't up yet.
   13. Charles Saeger Posted: September 24, 2002 at 12:50 AM (#606407)
MGL -- OIC. Problem is, I need some sort of proxy for CS when it is unavailable (and CS allowed has never been official stat). Assist rate kinda, sorta works. I have been working on improvements to it, and things work okeh, but every so often, there's some team I guess to allow 130 steals and it allows 178 steals ...
   14. Charles Saeger Posted: September 24, 2002 at 12:50 AM (#606408)
MGL -- oh, yeah, short form. Basically, I can get things to work OK with A/(TmA+TmH-TmHR) for infielders and PO/(TmPO-TmSO-TmA+TmH-TmHR) for outfielders, as a basic measure of range. The principles are *simple*, the details are where I spend hours of time.
   15. Charles Saeger Posted: September 25, 2002 at 12:50 AM (#606412)
MGL -- uh, no. That is the simplfied formula. If you step back through the older versions, you'll see the basic two formulae are:

Range outs / (Range outs + Hits Allowed - Home Runs Allowed)

Arm outs / Runners on base

And then I adjust.

I discovered the reduced formulae this spring. I don't use them because it is harder to make adjustments with ithem, but they do work on a basic level.

I was not being facetious. Those two formulae really do work.
   16. Charles Saeger Posted: September 25, 2002 at 12:50 AM (#606413)
As for steals, I do have a new method estimating them, not above, but there's a fair amount of error (standard error is 10 steals, but when I tested it, you can be as far as 35 steals off). I wrote a couple of small sidebar articles about this for the next CAD revision, which I can send to you if you would like.

Catcher assists do track opponents caught stealing. I am adding an adjustment based on passed balls allowed, which do loosely (r=0.50) track K23 assists, which improves the accuracy there. The problem is, both catcher assists and opponents caught stealing also correlate well with opponents stolen bases allowed, so a good assist total could well mean the opposite of what we assume it means, a good throwing catcher.
   17. Charles Saeger Posted: September 26, 2002 at 12:50 AM (#606429)
Mitchell -- I do have CAD calculated for each team/position for 2001. It was supposed to run in a chart with this article, but it apparently did not. I can e-mail it to you if you would like.
   18. Silver King Posted: September 26, 2002 at 12:50 AM (#606434)
Heck, Primer Gods, fix whatever the problem was and please run Charles' chart for us!

M.D.'s enthusiasm for spreadsheeting reminds me that Mssrs. Saeger and/or Emeigh have previously been seen talking with Sean Forman about eventually adding CADish results to Baseball Reference. I.e., for every player season ever. Which of course would rule. I remember subsequent intimations that this might be impossibly hard.

So where does that project idea stand?
Enlist M.D. and others as aides!
   19. Silver King Posted: September 28, 2002 at 12:51 AM (#606471)
Boy, can I kill off a thread, or what?
   20. Charles Saeger Posted: September 29, 2002 at 12:51 AM (#606480)
Threadicide, eh?
   21. Mike Emeigh Posted: October 01, 2002 at 12:51 AM (#606501)
A comment on the amount of detail presented:

If you want constructive criticism, you have to present the details of your method to your audience so that they understand your thought process and so that they can replicate enough of the work to feel comfortable about the path that you are taking (and to suggest improvements as warranted). If you hold back the details, on the other hand, you take the risk of undermining your own credibility, especially if your audience sees what appears to be an obvious flaw in your approach but can't confirm whether or not you've addressed it because you haven't provided the details. There's nothing to lose, and a great deal to be gained, from submitting an analysis method in all of its gory detail for independent analysis and assessment by your readers, many of whom probably know as much about the subject as you do, much as I hate to say it :)

The "holy grail" nature of defensive analysis comes about in large part because we have almost no information about fielder performance in relation to opportunity to perform. We have "opportunity contexts" for batters and pitchers, with a fairly complete record of their successes and failures. For fielders, we have a record of their successes (polluted by successes of other fielders that show up in their record) and only a partial record of their failures (even in zone-based systems), so we don't have a complete record of their "opportunity context". What Charles has attempted to do here, to the best of his ability, is to strip out the areas of pollution in the existing records and to derive an "opportunity context" for fielders based on information that we know, without trying to divvy up responsibilities based on what we think is true but which we can't support with empirical evidence. It's not a simple task, because the process of converting a ball in play into an out is *heavily* driven by contextual factors, to a far greater degree than either batting or pitching, and it's a strain just to make sure that those factors have been identified, let alone to ensure that they have been properly accounted for.

-- MWE

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Eugene Freedman
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.9834 seconds
47 querie(s) executed