Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 1 pages
   1. Charles Saeger Posted: September 21, 2002 at 11:49 PM (#606373)
Just to let everyone know -- I am actually working on the next version. There's a final table for 2001 that is supposed to run with this, I suppose I need to submit another copy to Sean.

There is a second article about a topic I about which I am writing in the next version: putouts by first basemen, second basemen and shortstops. I realized late in writing this my ways of handling these were poor and I could do better, as both Bill James and Clay Davenport were (well, Bill James on first basemen's putouts, he handles putouts by middle infielders no better than I do).
   2. Mitchel Lichtman Posted: September 21, 2002 at 11:49 PM (#606376)
I hestitate to comment at all, as there isn't enough free time in my whole life to digest all the info in your article, as it's presented...

As I commented on Fanhome, it is a great overall system (I think) - one that is able to use traditional team and player info and rigorously compute each player's fielding skill (with the help of some extraneous, yet critical, material from a "one-time" PBP database), rather than having to rely on such garbage metrics as "range factor".

Again, to be honest, I can't comment on the specifics of your methodology because it is difficult to wade through your article. I have a "feeling", however, that the methodology is sound.

I do wish that you (and some others to whom I have made similar comments) would write more in "English" than in a style more befitting the "American Journal of Applied Mathematics". As well, more "description" (again, in "English") and less formula-type prose would be helpful for feedback and understanding. For example, if I described (in nauseating detail) the exact formula I use for UZR, I think that I would lose lots of folks. Instead I describe, in easy to understand English, the basic idea. I might even get into some of the boring details, and if I do, I still try to present them in "English". If someone wants to know some of the exact "formulas" that go into my methodolgies (and trust me, no one ever does - no one gets paid to "peer review" our articles), they can request them or I can supply appendices or something like that.

If I am in the minority in this regard, ignore these comments...
   3. Mitchel Lichtman Posted: September 22, 2002 at 11:49 PM (#606377)
Here's a follow up to my previous post. First, I hope you take my criticisms with regard to the "complex" nature your article with all due respect. Your system is the best defensive metric I've seen other than UZR (my UZR that is), and the best without using PBP data (other than the "background/one time" PBP data, of course).

Here is an edited copy of my recent post on Fanhome, describing what I think is your "system" (I hope that either Charles or Mike is dgb100 or that dgb100 is a colleague of theirs, otherwise it looks like someone is stealing ideas from someone else) in 500 words or less:

Let me summarize what dgb is doing, which as Tango states, is using all the traditonal information available to come up with basically a ZR/RF (that's "slash", not "divided by") for each fielder.

He is first taking each team's BIP's. Then he is (or should be) separating that into GB's and FB's, using a team's GB/FB ratio (if available, of course; if not, we don't do that step - we simply
assume a league-average GB/FB ratio, or we can estimate GB/FB ratio from a team's total IF assists and BIP's).

Now here's the nice part:

He determines how many balls per 100 GB's that each infielder (or how many balls per 100 FB's for outfielders) "should" catch (turn into outs - GB assists for IF'ers and FB putouts for OF'ers). He does this by using PBP data from a bunch of historical games to determine, on the average, what percentage of all ground balls (GB BIP's) are "caught" (turned into outs) by the SS, 3B'man, etc. For example, in the database, if the SS catches 10 ground balls per every 100 GB's
hit, then it is assumed that for every 100 GB's that team A allows (remember, we calculate how many GB's team A allows by taking their BIP's and applying their pitchers' GB/FB ratio), their SS should field 10 of them. If he only fields 8 (he has 8 GB "assists" per 100 GB's), then he is a below average fielder with a AFR of .8.

He can further refine his system to take into consideration the handedness of the opposing batters and/or the handedness of a team's pitchers, depending upon what information is available. Obviously if a team has lots of LHP's they will face more RHB's than the average team; consequently they will allow more ground balls to the left side of the infield.

So, he can take the PBP database and figure out "how many ground balls does a SS catch per 100 GB's allowed when a RHB is at bat," and do the same for LHB's, and for FB's versus LHB's and RHB's as well.

If we don't have that kind of information available - handedness of opposing batters (which we probably don't unless we have some kind of a PBP database), then we can do the same thing using the handedness of the pitchers. For example again, we would use the historical PBP databse to see how many GB's were caught by a SS on average when a LHP was on the mound and how many were caught with a RHP on the mound. Now we can look at team A and even if we don't know how many balls were put in play when their RHP's were on the mound and how many were put into play when their LHP's were on the mound, we can figure out the percentage of time a RHP was on the mound, the percentage of time a LHP was on the mound, and divide up the team's total BIP's accordingly (I guess if we want, we can compute from the indivual player stats how many BIP's and GB's and FB's were allowed by each pitcher, and therefore by all their LHP's combined and all RHP's combined.)

The next logical step is to put all of this nice methodolgy into an "easy to read and undestand" formula, DGB!

Like EZR for an IF(estimated zone rating, or whatever you want to call it)=(player "A" GB assists)/((team BIP)*(team GB/FB))/(average player for position "A" GB assists per 100 GB's).

The above reads "A's GB assists divided by his team's total GB's" divided by "the average player at his position's GB assists per 100 GB's". This last term is a constant for each position in the field, based upon the historcal database.

You can refine the formula to account for a team's LHP and RHP's as discussed above, by putting in the appropriate conversion algorithm, and you can also include the formula for detemining a middle IF'er's GB assists only (facoring out his CS assists I guess).

Charles, is that basically what you are doing? You are calculating a "normalized ZR" for each fielder by estimating how many balls should have been caught by an average fielder at that position given the estimated number of ground balls hit to the infield and the actual or estimated percentage of RHB and LHB at the plate!

Of course, in order to do this, you also have to estimate how many of a defensive player's PO and A are actually "balls caught" and in fact relate to defensive skill. For outfielders, it should be PO only (I don't know whether you incorporate OF assists in your "formula" for OF defense; if you do, you shouldn't - OF assists bear little relationship to OF defensive skill vis-a-vis the arm - you would need holds/extra bases and opportunities; using OF A only would be like using catcher or baserunner CS only - it tells you very little about overall value), and for IF, it should be mostly assists on GB only - as you properly explain, assists on steals (do they give an IF an assist on a CS?) should be ignored (subtracted), PO by all IF'ers other than the 1B should be ignored (other than by middle IF'ers if you want to incorporate DP's), and only PO's by the 1B'man should count when he fields the ground ball and makes the play himself (doesn't he also get an assist when this happens? If he does, then we can ignore PO's by 1B'men as well). Whatever you do, as you also state, pop-fly PO's by IF'ers should and must be ignored, as there is almost no relationship between a defender's number of pop-ups caught and his defensive skill - for obvious reasons.

It just doesn't seem as complicated as a glance at your article suggests. Am I missing something? (Please re-read my last 2 paragraphs.)
   4. Mitchel Lichtman Posted: September 22, 2002 at 11:49 PM (#606378)
It was bugging me that I didn't know how it was "scored" when an IF'er makes an out "unassisted" on a GB. I guess since they call it "unassisted" he must get a putout and not an assist. Interestingly, the rulebook doesn't help a whole lot, unless I am missing something. According to the Offcial Rules, credit a fielder with an assist when he "throws or [meaningfully] deflects a batted or thrown ball in such a way that a putout results (or should have resulted)..." Obviously, on an unassisted GB out, no fielder throws or deflects a ball, so we must look to the definition of a putout...

A putout results when (these are either or):

1) a fielder catches a fly ball or line drive - nope, that's not it.
2) catches a thrown ball which puts out a batter - that's not it either.
3) tags a runner - that's not it!

Well, I can see no defintion that gives a fielder a putout (or an assist) when he makes a GB out unassisted...

BTW, a fielder who tags a runner on a CS or pickoff, gets credited with a PO, I guess, according to definition 3 of a PO, and certainly not as assist. Does the catcher get an assist based on "throwing a [batted or] thrown ball that results in a putout," the "thrown ball" being the pitch?
   5. Charles Saeger Posted: September 22, 2002 at 11:49 PM (#606381)
Gerry -- oops. Actually, I probably should have used "key" or "main" instead. Chalk one up against my pride in good use of the English language ...

MGL -- lotsa stuff, obviously. You're right, I should have an article structure that shows me going through the math step by step. I have been working on that for the next version. Problem is, it is slow going. In spite of my malaprop above (and it is probably not the only one), my training is as a writer, not as a statistician. Writing page after page about numbers just is not very interesting, but it is necessary in this case.

IF putouts -- again, I have been spending tons and tons of time on these. I do have better formulas to estimate unassisted putouts not only by first basemen, but by second basemen and shortstops as well. I discovered a few things about these. I, too, am a little skeptical of the value of middle infielders' putouts, though the unassisted numbers do have some year-to-year consistency. I think I have spent more time on this topic over the last year than any other defensive topic, and have written about eight formulae about this.

OF assists -- as in, I have no other way of estimating the impact of an outfielder's arm. There is some correlation between a high assist rate and a low advance rate, but only some; as I wrote above, it looked like the assist was keeping the runner it pegged from advancing, which is why I gave it the value I gave it. As a note, an outfielder with a large number of Baserunner Kills almost certainly did have a positive impact with his arm despite the number of advances against him. Even a fluke year like Gary Ward 1982 or Joe Orsulak 1992 probably has defensive value in spite of the extra advances.

LHB/RHB -- well, yeah, I am trying to measure opportunity, or more to the point, failed opportunity (since we already know successful opportunity). We went to the PBP data for this one. You figure the adjustment, and multiply it by failed opportunities. It is not as bad as it looks, but I would be open to a simpler way of calculating this.

Errors -- I am doing this; the error values show how likely the error put a man on base. For example, I figure an outfielder's error as 25% of the value of putting another man on base (0.50 + 0.09 + LgR/LgPA) plus 75% of the value of allowing a man to advance (0.18). As each position has a different "put the batter on first" rate, each position has a different value.

DP opps -- that is what I am doing.

Run values -- what I am doing is figuring the value of each event and multiplying each plus/minus number by that value. If each infielder's assist has a weight of 0.234, I multiply the positive/negative number by 0.234 to determine how many runs that fielder saved/blew versus league average. I add them together to find the total plus/minus runs.
   6. Charles Saeger Posted: September 22, 2002 at 11:49 PM (#606384)
David -- correct. I don't even know what the heck that is.
   7. Charles Saeger Posted: September 22, 2002 at 11:49 PM (#606385)
MGL -- yes, that is a putout. (Every out must be a putout somewhere, or else the boxscore will not balance.) Specifically, it is a forceout. I'm a little surprised it is not part of 10.10, but this has been the custom at every game I have ever attended and every boxscore and scoresheet I have ever seen. Must be an oversight on MLB's part.
   8. Charles Saeger Posted: September 22, 2002 at 11:49 PM (#606387)
F James -- the single-assist DPs do not matter. As for catchers, they have few assists in modern baseball that are not CS or K23 (less than 30% of their assists), and furthermore, we have no way to figure out how many such plays a catcher made. (Catchers field about 12% of all opposition SH, but this varies wildly.) This estimate is about as close as we shall reach, and it works fine. Heck, I'm more worried about the unassisted groundouts by first basemen screwing things up than catcher assists and single-assist DPs.
   9. Mike Emeigh Posted: September 22, 2002 at 11:49 PM (#606391)
He determines how many balls per 100 GB's that each infielder (or how many balls per 100 FB's for outfielders) "should" catch (turn into outs - GB assists for IF'ers and FB putouts for OF'ers).

And this is where he makes the mistake - because whether or not a fielder should make a play is dependent upon the specific context in which the ball is hit - both the game context (runners on base, number of outs, game score, batter at the plate, pitcher on the mound) and the fielder context (fielder position relative to his teammates). Lumping all of these results together, and making value assignments based on the aggregate results from all fielders, makes the outcome highly susceptible to aggregation bias, where the group characteristics not only don't apply across the board to the individuals in the group but are highly likely to be significantly different for individuals in the group.

It is far less likely to introduce bias into the results to consider whether the fielder *could* make a play on the ball, and to penalize him to the full extent whenever a play is not made in an area where he could have made a play, even if when all teams are lumpred together another fielder was more likely to have made the play. IOW, if a single goes through the SS hole, both the 3B and the SS should be penalized the full value of one single, because either could have made the play depending on the circumstances, and because you don't know the circumstances you can't make a valid a priori judgment as to which fielder *should* have made the play.

-- MWE
   10. tangotiger Posted: September 22, 2002 at 11:49 PM (#606392)
Sure you have aggregation bias, but and because you don't know the circumstances you can't make a valid a priori judgment as to which fielder *should* have made the play. the point is to make a best guess. If the ss makes 90% of a particular play in zone X and the 3b makes 10%, and if a hit gets through, then to minimize your OVERALL error, you assign 90% of the blame to the ss.

Now, if you tell me that with man on 2b, 0 outs, and RH at bat the SS only makes 60% of those plays, then fine, let's adjust based on this new data.

But to categorically make it 100% for both players, is, in my view, not a valid representation.

My position is that you identify every possible variable, situation, and context that you can think of, and base your best estimate on that.
   11. tangotiger Posted: September 23, 2002 at 11:50 PM (#606399)
MGL: these are the variables that you should consider for UZR
ZR
- by zone
- by base-out state
- by score differential & inning
- by LH/RH batter
- by LH/RH pitcher
- by park
- by actual batter
- by actual pitcher
- by batter showing bunt / no-bunt
- by batter executing bunt / no-bunt
- by speed of runners on base

Anything else I missed?
   12. Charles Saeger Posted: September 23, 2002 at 11:50 PM (#606400)
Runners removed is pretty simple. It is just double plays, outfield assists or opposition steal rate, depending on position.

There's supposed to be some tables for the 2001 data, but they aren't up yet.
   13. Charles Saeger Posted: September 23, 2002 at 11:50 PM (#606407)
MGL -- OIC. Problem is, I need some sort of proxy for CS when it is unavailable (and CS allowed has never been official stat). Assist rate kinda, sorta works. I have been working on improvements to it, and things work okeh, but every so often, there's some team I guess to allow 130 steals and it allows 178 steals ...
   14. Charles Saeger Posted: September 23, 2002 at 11:50 PM (#606408)
MGL -- oh, yeah, short form. Basically, I can get things to work OK with A/(TmA+TmH-TmHR) for infielders and PO/(TmPO-TmSO-TmA+TmH-TmHR) for outfielders, as a basic measure of range. The principles are *simple*, the details are where I spend hours of time.
   15. Charles Saeger Posted: September 24, 2002 at 11:50 PM (#606412)
MGL -- uh, no. That is the simplfied formula. If you step back through the older versions, you'll see the basic two formulae are:

Range outs / (Range outs + Hits Allowed - Home Runs Allowed)

Arm outs / Runners on base

And then I adjust.

I discovered the reduced formulae this spring. I don't use them because it is harder to make adjustments with ithem, but they do work on a basic level.

I was not being facetious. Those two formulae really do work.
   16. Charles Saeger Posted: September 24, 2002 at 11:50 PM (#606413)
As for steals, I do have a new method estimating them, not above, but there's a fair amount of error (standard error is 10 steals, but when I tested it, you can be as far as 35 steals off). I wrote a couple of small sidebar articles about this for the next CAD revision, which I can send to you if you would like.

Catcher assists do track opponents caught stealing. I am adding an adjustment based on passed balls allowed, which do loosely (r=0.50) track K23 assists, which improves the accuracy there. The problem is, both catcher assists and opponents caught stealing also correlate well with opponents stolen bases allowed, so a good assist total could well mean the opposite of what we assume it means, a good throwing catcher.
   17. Charles Saeger Posted: September 25, 2002 at 11:50 PM (#606429)
Mitchell -- I do have CAD calculated for each team/position for 2001. It was supposed to run in a chart with this article, but it apparently did not. I can e-mail it to you if you would like.
   18. Silver King Posted: September 25, 2002 at 11:50 PM (#606434)
Heck, Primer Gods, fix whatever the problem was and please run Charles' chart for us!

M.D.'s enthusiasm for spreadsheeting reminds me that Mssrs. Saeger and/or Emeigh have previously been seen talking with Sean Forman about eventually adding CADish results to Baseball Reference. I.e., for every player season ever. Which of course would rule. I remember subsequent intimations that this might be impossibly hard.

So where does that project idea stand?
Enlist M.D. and others as aides!
   19. Silver King Posted: September 27, 2002 at 11:51 PM (#606471)
Boy, can I kill off a thread, or what?
   20. Charles Saeger Posted: September 28, 2002 at 11:51 PM (#606480)
Threadicide, eh?
   21. Mike Emeigh Posted: September 30, 2002 at 11:51 PM (#606501)
A comment on the amount of detail presented:

If you want constructive criticism, you have to present the details of your method to your audience so that they understand your thought process and so that they can replicate enough of the work to feel comfortable about the path that you are taking (and to suggest improvements as warranted). If you hold back the details, on the other hand, you take the risk of undermining your own credibility, especially if your audience sees what appears to be an obvious flaw in your approach but can't confirm whether or not you've addressed it because you haven't provided the details. There's nothing to lose, and a great deal to be gained, from submitting an analysis method in all of its gory detail for independent analysis and assessment by your readers, many of whom probably know as much about the subject as you do, much as I hate to say it :)

The "holy grail" nature of defensive analysis comes about in large part because we have almost no information about fielder performance in relation to opportunity to perform. We have "opportunity contexts" for batters and pitchers, with a fairly complete record of their successes and failures. For fielders, we have a record of their successes (polluted by successes of other fielders that show up in their record) and only a partial record of their failures (even in zone-based systems), so we don't have a complete record of their "opportunity context". What Charles has attempted to do here, to the best of his ability, is to strip out the areas of pollution in the existing records and to derive an "opportunity context" for fielders based on information that we know, without trying to divvy up responsibilities based on what we think is true but which we can't support with empirical evidence. It's not a simple task, because the process of converting a ball in play into an out is *heavily* driven by contextual factors, to a far greater degree than either batting or pitching, and it's a strain just to make sure that those factors have been identified, let alone to ensure that they have been properly accounted for.

-- MWE
Page 1 of 1 pages

You must be Registered and Logged In to post comments.

 

 

<< Back to main

Support BBTF

donate

Thanks to
Quiet Flows the Don Taussig Avenger (Edmundo)
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Buy MLB playoff tickets, plus 2011 World Series, 2011 ALCS tickets and NLCS game tickets. We also have Texas Rangers playoff schedule, tickets to Red Sox games and Yankees game tickets. Plus, buy Phillies baseball tickets, Tigers playoff tickets and the biggies like ALDS baseball tickets and 2011 NLDS tickets.

Demarini, Easton and TPX Baseball Bats

 

 

 

AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets.

Page rendered in 0.8913 seconds
38 querie(s) executed