Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Monday, April 02, 2007

THT: Smith: What is Zone Rating?

Uhh…a stathammer I’m gonna conk all the Jeter-bacchers at the bar today during Yankee Opening Day? No?...OK, Sean Smith lets on.

There are some big disagreements on a few players, like Edgar Renteria, Jack Wilson, and Hanley Ramirez, but the two zone ratings agree on most. The correlation coefficient between the two ratings is .827. STATS is using larger zones, showing more opportunities for every player. I don’t know what the BIS zones look like, but I can show you what the STATS zones look like here. This is not a big deal, using a slightly larger or smaller zone doesn’t make one measure better or worse than another, what is important is that is used consistently for all players.

What troubles me the most is not the opportunities, but the plays made column. A play made, whether in or out of zone, should be credited every time a player fields a groundball and records an out. Line drives and popups should not be included. Double plays started should count just the same as other outs, and when a player is the middle man in a double play, that should not affect his zone rating at all.

 

Repoz Posted: April 02, 2007 at 12:30 PM | 79 comment(s) Login to Bookmark
  Tags: sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. GGC don't think it can get longer than a novella Posted: April 02, 2007 at 04:55 PM (#2322586)
Is Sean CHONE? I'm surprised this got no play. This is the type of stuff that I think about when I think about Primer; not roids, race, or politalks.
   2. AROM Posted: April 02, 2007 at 05:13 PM (#2322624)
Yeah, that's me. I don't know why its so silent. There isn't anything big going on in baseball today, is there? :-)
   3. Slinger Francisco Barrios (Dr. Memory) Posted: April 02, 2007 at 06:35 PM (#2322799)
The clip from the article doesn't reveal its interesting-ness.

I caught something. We've been told that Derek Jeter's low ratings are at least partly because of the Yankees' unusual defensive alignment, yet no SS on the list had fewer OOZ than he.
   4. AROM Posted: April 02, 2007 at 06:44 PM (#2322808)
With OOZ plays near the shortstop, the phrase you will hear is "pasta diving Jeter".

The funny thing is that phrase is coded into MLB06 The Show. The game designers must read this site. I don't hear "past a diving Cabrera" or past a diving Rollins" in that game.

I hate to bash him because I like Jeter, I'm a Yankee fan when the Angels get eliminated in the playoffs, but its true.

My favorite part of doing this article was the anatomy of a shortstop's defensive record. How many assists go 4-6 instead of 4-3? How many putouts are popups vs. forceplays vs tags?

At least for one SS, one year, now I know.
   5. AROM Posted: April 02, 2007 at 06:53 PM (#2322818)
Tango Tiger has included a link to this in a thread on his site, he thinks the difference between BIS plays made and STATS pm could be line drives.

My preference would be not to use line drives, I think the number of (line drives / ld in zone ) is more dependent on the speed and location of the ld than the player's ability, but that's just one opinion.
   6. Chris Dial Posted: April 02, 2007 at 07:13 PM (#2322847)
I don’t know why STATS is consistently counting more plays made than BIS, or what those plays are. Perhaps BIS is not counting all the plays they should?

Ding. Nice catch, Sean. STATS doesn't use line drives, so the difference isn't that. That doesn't make sense to me.

from what I read at the fielding bible site, the difference is that they are not using the same zones, and thus the two ratings aren't exactly "apples and apples". Also, we know that *some* portion of STATS data are OOZ, and we can be pretty sure that some of the BIS OOZ plays are also OOZ for STATS, which means that BIS is much shorter than first look appears.

Basically we need to see a grid for BIS. Dewan knows what he is doing.

I also appreciate your pointing out that STATS removed double counting DPs about 8 years ago. The THT glossary still intimates that it still double-counts DPs.

And that's a great title for an article.
   7. Slinger Francisco Barrios (Dr. Memory) Posted: April 02, 2007 at 07:14 PM (#2322850)
My preference would be not to use line drives, I think the number of (line drives / ld in zone ) is more dependent on the speed and location of the ld than the player's ability, but that's just one opinion.

Couldn't you say the same thing about ground balls, though?
   8. Chris Dial Posted: April 02, 2007 at 07:24 PM (#2322872)
Couldn't you say the same thing about ground balls, though?

IN the strictest sense, but GBs are converted at a 80% rate, while LDs are converted at a 20% rate. speed and location are far more determinative for LDs.
   9. AROM Posted: April 02, 2007 at 07:34 PM (#2322884)
from what I read at the fielding bible site, the difference is that they are not using the same zones, and thus the two ratings aren't exactly "apples and apples". Also, we know that *some* portion of STATS data are OOZ, and we can be pretty sure that some of the BIS OOZ plays are also OOZ for STATS, which means that BIS is much shorter than first look appears.

If STATS is not counting line drives, and we know they no longer double count the DP's, then how do they have Cabrera at 347 plays made for 2006? I went through the game logs, and its just not possible. In the article I grouped every one of OC's assists and putouts, and there's no way he fielded 347 ground balls and turned them into outs?

Check my accounting of OC's plays. If you can get 347 from there, I'd love to see how.

As far as how many plays are in/out of zone, we aren't comparing apples to apples, and I understand how those can differ, but we should be able to agree on how many plays a shortstop makes.
   10. DSG Posted: April 02, 2007 at 07:36 PM (#2322886)
Ding. Nice catch, Sean. STATS doesn't use line drives, so the difference isn't that. That doesn't make sense to me.

from what I read at the fielding bible site, the difference is that they are not using the same zones, and thus the two ratings aren't exactly "apples and apples". Also, we know that *some* portion of STATS data are OOZ, and we can be pretty sure that some of the BIS OOZ plays are also OOZ for STATS, which means that BIS is much shorter than first look appears.

Basically we need to see a grid for BIS. Dewan knows what he is doing.

I also appreciate your pointing out that STATS removed double counting DPs about 8 years ago. The THT glossary still intimates that it still double-counts DPs.

And that's a great title for an article.

***

Chris, I (literally) don't get what you're saying here. Are we all in agreement that STATS is counting way too many plays made, while BIS has about (if not exactly) the right number? It seems to me that you're intimating otherwise...
   11. Chris Dial Posted: April 02, 2007 at 07:51 PM (#2322899)
Are we all in agreement that STATS is counting way too many plays made, while BIS has about (if not exactly) the right number? It seems to me that you're intimating otherwise...

I certainly have no reason to think BIS has it right, and I'm 100% sure they don't have it *exactly* right.

I am not satisfied that I have enough information to say BIS has the right zones. Moreover, in his fractioning of the STATS zones, does his granulation maintain the "50% outs", but just within the segments he used prior?

And is that necessarily giving us a better overall picture? Maybe - but I'd prefer that the OOZ count represented a smaller percentage of plays made.

So, no, we are not all in agreement STATS is counting too many plays. BIS may well be counting too few plays.
   12. DSG Posted: April 02, 2007 at 07:55 PM (#2322904)
So, no, we are not all in agreement STATS is counting too many plays. BIS may well be counting too few plays.

***

Chris, I am frankly flabbergasted. Did you read the article? Sean carefully went through the PBP data, and found that Cabrera made no more than 322 plays on groundballs. BIS counted 317, STATS 347. How can STATS possibly be right, unless they're including some other type of out?
   13. Chris Dial Posted: April 02, 2007 at 08:04 PM (#2322911)
Uh, Sean counted wrong? there's an error in the data?
   14. AROM Posted: April 02, 2007 at 08:05 PM (#2322914)
Yeah, what David said. I'd be happy to provide my assist and putout log for Cabrera. If you can find a way to use that data and count 347 plays for OC, I would love to see how.
   15. Chris Dial Posted: April 02, 2007 at 08:11 PM (#2322921)
Where did you get 347 plays? STATS or SI?
   16. Chris Dial Posted: April 02, 2007 at 08:13 PM (#2322926)
Any chance the data entry at SI was using a keypad and was supposed to type "317" instead of "347"?

Any chance of that?
   17. Chris Dial Posted: April 02, 2007 at 08:17 PM (#2322929)
There's always a problem with "hearsay" data, which is largely what using the "Chances" at SI is.

My confidence in that data has always been weak (I couldn't reconcile it a few years ago), so I never switched my system to using it - I stuck with original chances.
   18. Slinger Francisco Barrios (Dr. Memory) Posted: April 02, 2007 at 10:48 PM (#2323084)
IN the strictest sense, but GBs are converted at a 80% rate, while LDs are converted at a 20% rate. speed and location are far more determinative for LDs.

O.K., that makes sense. So perhaps there should be a pro-ration for ability to convert...

Any chance the data entry at SI was using a keypad and was supposed to type "317" instead of "347"?

That brings up the question: what good are the numbers if they can be misreported this badly pretty much with impunity? That's a 9% difference, not trivial.
   19. JoeArthur Posted: April 02, 2007 at 11:02 PM (#2323101)
STATS Baseball Scoreboard 1998 pp178-9:
"For the purposes of the zone rating, the number of "outs" a player is credited with includes all balls fielded cleanly and turned into outs within his zone, plus balls turned into outs outside the zone, plus double plays (since a double play results in two outs. First basemen do not receive credit for participating in a double play.) For the purposes of calculating the number of balls hit into an infielder's zone, all ground balls hit within the zone are counted, line drives are counted only if they land in the zone, and pop flies are ignored."

STATS Baseball Scoreboard 1999 p.203:
"The number of outs a player is credited with includes all balls fielded cleanly and turned into outs within his zone. It also includes balls turned into outs outside the zone, plus double plays started, because a DP results in two outs. For the purposes of calculating the number of balls hit into an infielder's zone,only groundballs within the zone are counted. If he makes a play outside his zone,it is also credited as a ball in the zone.

STATS Baseball Scoreboard 2000 p.165: "This year we have taken a closer look at this system ... We have found a few areas where we believed it was appropriate to make adjustments ... Infielders: Count double plays as one converted opportunity, rather than two outs for one opportunity."
p. 174 "Each infielder gets a zone rating based on the number of grounders he converts into outs. Popups and flyballs are ignored for infielders. ... infielders no longer get credit for two outs when they start a ground ball double play."

STATS Baseball Scoreboard 2001 p.170: [the italicised language from 2000 is repeated word for word.]

Prior to 1998 the descriptions in the scoreboard were consistently in terms such as "number of balls in his zone" with no clarification whatsoever about ground balls, line drives, popups or fly balls

So what do we have? An unambiguous assertion that line drives can be counted in 1998, a consistent language language except for 1999-2001 that outs are counted on "balls" fielded cleanly not "ground balls" fielded cleanly, and a explicit exclusion of pops and flies in 2000 to contrast with the inclusion of ground balls without a mention of line drives one way or the other. Nor is there any mention in the 1999 description ["only ground balls"] that there has been any change in the method from the prior year, when line drives are explicitly included.

For what its worth, I thought I had once read a description of the zones which explicitly did say that line drives landing less than 220 feet were included in the infield zones; but I have checked all 11 of the Scoreboards and cannot find it there.

Given that the plays made do reconcile much better when line drives are counted, the best way to interpret these conflicting descriptions is to accept that line drives have always been counted.

Chris - I'm not sure what proving mechanism you went through, but multiplying the cnn-si version of chances by the 3 decimal place ZR always results in a value very close to an integer. There's no chance of that always happening if it's not really the denominator associated with the zone rating. There is no error in the chances reported.
   20. Steve Treder Posted: April 02, 2007 at 11:16 PM (#2323109)
FWIW, my tiny suggestion is that it makes sense to count line drives. Granted, their conversion into outs is likely not largely a function of the fielder's skill, and so their inclusion is mostly going to be a wash between fielders.

But their conversion into outs isn't completely beyond the influence of the fielder. It's to some degree a function of positioning, and proper positioning is a positive fielding skill. It's also, to some degree, a function of the fielder's reflexes, reach, leaping ability, all positive fielding attributes as well.

Including line drives won't likely have a big effect, but it would seem to have a net useful effect. Why not include them, even if it only helps a little?
   21. BDC Posted: April 02, 2007 at 11:48 PM (#2323132)
Idly curious question: this analysis is of starting shortstops. Are reserve shortstops better (on the whole) or worse than starters, in terms of zone-rating-type measures?
   22. Chris Dial Posted: April 03, 2007 at 12:18 AM (#2323185)
Nor is there any mention in the 1999 description ["only ground balls"] that there has been any change in the method from the prior year, when line drives are explicitly included.

Line drives, as I read it, are not "explicitly included", but rather, explicitly excluded.

A "line drive that lands in the zone" *is* a ground ball to be fielded in the zone. Where does a ball need to land to be defined as a grounder? In the grass?

Moreover you quote:
Each infielder gets a zone rating based on the number of grounders he converts into outs.

He gets a ZR based on grounders turned into outs. Not grounders and line drives. grounders.

It has always been groundballs only. Always. I'm six nines certain of that.
   23. Chris Dial Posted: April 03, 2007 at 12:25 AM (#2323199)
Are reserve shortstops better (on the whole) or worse than starters, in terms of zone-rating-type measures?

Yes, reserves (at all positions) are on the whole better fielders than the starters.
   24. Chris Dial Posted: April 03, 2007 at 12:44 AM (#2323222)
Given that the plays made do reconcile much better when line drives are counted, the best way to interpret these conflicting descriptions is to accept that line drives have always been counted.


I would be terrifically surprised if that were the case.

Dewan goes through great pains to explain how he changed from STATS ZR to BIS ZR, and he does not mention removing line drives. That would be important, I'd think.

I am NOT arguing what the data at CNN SI means or doesn't mean. I am addressing the actual ZR data.
   25. Chris Dial Posted: April 03, 2007 at 01:23 AM (#2323246)
Looking more closely at Sean's data, it doesn't make sense about the line drives.

Unfortunately, Sean chose the one player that had 29 more chances in the STATS CH column than the JD-CH column and 30 more conversions in the PM columns.

No other SS approaches that. The chances are wildly variable, but most convert between 20-40 extra plays made, but the chances will vary between 20 and 60.

So for it to be LDs, you have to believe that OC converted 100% of his LDs, and no other SS got close. Edgar Renteria converted 50% of his.

That doesn't ring true with me.
   26. Chris Dial Posted: April 03, 2007 at 01:59 AM (#2323257)
I'm on the road right now, but I can look into what Sean did and make an effort to repeat it with some older data I have of STATS ZR chances (not CNNSI's).
   27. AROM Posted: April 03, 2007 at 02:34 AM (#2323275)
Unfortunately, Sean chose the one player that had 29 more chances in the STATS CH column than the JD-CH column and 30 more conversions in the PM columns.

I think in 2005 there were some others who fit this criteria. That's not a big deal, it just means the two systems are using different zones.
   28. AROM Posted: April 03, 2007 at 02:47 AM (#2323286)
Are reserve shortstops better (on the whole) or worse than starters, in terms of zone-rating-type measures?

As Chris said, starters are better. For SS with 800+ innings in 2006, ZR = .824, OOZ = .126

For those under 800 Inn: ZR = .804, OOZ = .124
   29. Chris Dial Posted: April 03, 2007 at 03:01 AM (#2323293)
I said the opposite of that, Sean.
   30. Chris Dial Posted: April 03, 2007 at 03:04 AM (#2323294)
That's not a big deal, it just means the two systems are using different zones.

If you are assigning the difference to LDs, it is a big deal.
   31. JoeArthur Posted: April 03, 2007 at 03:47 AM (#2323323)
Chris -

Well, my motto is "I quote, you decide." But I won't let you off that easy. I don't share your opinion about the clarity of Dewan's explanations of his fielding systems [this one or his +/- system]. If the system had been clearly described, there wouldn't be this controversy. I do think it's odd that you think the data must be inexplicably whacked, but that one sentence about "only grounders" must be treated as gospel.

Let's look at that quote again: "line drives are counted only if they land in the zone." That "plainly" means if line drives land in the zone, they count. You are dismissing the apparently plain meaning by making the further assumption that there can be no such thing as a line drive landing in the zone - that it really has to be a grounder. You cite no definition of line drive as defined by STATS to support this theory. You refuse to read this sentence as a precise statement and then insist that instead I should read as a precise statement the next year's remark about "only grounders", although I have pointed out difficulties with that reading.

Let's try it this way, which should be more convincing if you follow it through. cnnsi has season fielding totals which so far include only the cardinals/mets game yesterday. No ambiguity yet, just one game in the season so far.

Cardinals 2B Adam Kennedy is shown with seven innings defensively; 4 PO 3 A, 3 "chances" (=zone opportunities according to me) and a ZR of 1.000 So I expect the play by play to reveal 3 plays made in 3 chances.

Looking at the play by play at fangraphs and at mlb.com we have:
1st inning: both LoDuca and Beltran ground out to 2nd. This should be 2 assists, 2 plays made.
2nd inning: no plays involving Kennedy.
3rd inning: Kennedy gets putout receiving throw on forceout at 2nd.
4th inning: Kennedy gets putout receiving throw at 1st on sacrifice.
6th inning Kennedy gets putout catching line drive and assist doubling runner off first. A putout, an assist, and a play made (according to me).
7th inning Kennedy gets putout catching popup. With popups excluded as we both agree, no play made.
Kennedy is removed in a double switch to start the 8th, closing the books: 7 innings 4 PO 3 A, 3 plays made, all accounted for. Just as I predicted according to my interpretations of the CNNSI data and the definition of the zone.

Now, were there any hits that were through his zone? Should he have had any missed opportunities? Alou had a ground single to center in the 2nd(along stats vector M); Valentin lined a single to right in the 4th(vector W, distance 140 feet); LoDuca grounded a single to center in the 4th (vector N); Beltran grounded singled to center in 4th (vector N). [Vectors from view pagesource for fox hitcharts for each player, such as Alou.] All other mets hits thru 7th were to left and so irrelevant. So those hits in the general vicinity of the 2B zone were along vectors M,N and W. None of those are in the 2B zone. So a detailed accounting indicates that the CNNSI ZR matches what my interpretation predicted for opportunities as well. It doesn't match yours. If you pay attention to the game by game progression of Kennedy's chances and ZR on that site, I predict once he misses a play and his ZR drops below 1.000, you will be able to tell that the chances reported there do match the denominator required by his ZR.

Here's my challenge to you. Repeat the exercise I performed with one of today's games with an infield lineout - all you really have to do is capture the snapshot of that team's fielding page tomorrow so that it contains the single game - the play by play will stick around and you can catch up on the analysis later if you have to.
   32. Chris Dial Posted: April 03, 2007 at 04:24 AM (#2323348)
That "plainly" means if line drives land in the zone, they count. You are dismissing the apparently plain meaning by making the further assumption that there can be no such thing as a line drive landing in the zone - that it really has to be a grounder. You cite no definition of line drive as defined by STATS to support this theory. You refuse to read this sentence as a precise statement and then insist that instead I should read as a precise statement the next year's remark about "only grounders", although I have pointed out difficulties with that reading.

No, Joe, I read "landed" as "landed", not caught.
   33. Chris Dial Posted: April 03, 2007 at 04:30 AM (#2323354)
Just as I predicted according to my interpretations of the CNNSI data and the definition of the zone.

As I stated above, I don't know what CNNSI is doing with their definitions. STATS data could be 2 chances, 2 PM, ZR 1.000. As far as I know, CNNSI is doing what they want with the pbp.

I don't see that refuted.

Do you offer an explanation for the chance differentials?

And I have stated already that I am not at my house where I can produce these cites you are asking for. I'm at my in-laws, and I don't have the access (I say again). I also can't capture tomorrow's data.

And I'll say AGAIN - I don't care what CNNSI's data is, I need STATS data.
   34. Chris Dial Posted: April 03, 2007 at 04:48 AM (#2323359)
I don't share your opinion about the clarity of Dewan's explanations of his fielding systems

Why doesn't someone just ask him?

I'm not saying it "can't be"; I'm saying that there are other things that don't make sense - that explanation has other problems.

I apologize for requiring a higher standard of proof.
   35. JoeArthur Posted: April 03, 2007 at 06:18 AM (#2323383)
????!!?

Chris, where did I read "landed" as "caught", or imply that you had read "landed" as "caught"?

I also read it as landed; in other words a line drive hitting the ground within the boundaries of the zone instead of being caught. If there are such balls, they are balls coded as hit type "line drive" and they are counted as opportunities. You actually claimed this sentence supported your ground ball interpretation and that this phase really had to be equivalent to "ground ball:" A "line drive that lands in the zone" *is* a ground ball to be fielded in the zone.

Caught line drives would be plays made. Plays made count whether they are in or out of zone. The quoted sentence is not about plays made, it is about measuring opportunity - chances which can be counted as a miss if they are missed. So a line drive which lands in the zone is an opportunity, likely a missed one unless a force out is managed, (which happens a handful of times per year).

Anyway, I think I've already shown that yesterday's caught line drive by Kennedy has been counted as a play made. Find a flaw in that or find a counter-example. As time goes on, we may be able to find test cases with missed short line drives as well. Do you think that an infield line drive dropped for an error would not be counted as a missed play?

As to your other points -
N.B. that CNNSI ZR and ESPN ZR are the same for each player, and both claim that STATS is their source. The CNNSI data is not some idiosyncratic rolling up of PBP data according their own rules; it is a reflection of each website getting a similar data feed from a common source. Moreover last year on this board, someone posted some data points from the STATS premium fantasy service - I pointed out at the time that the zone opportunities published there on the STATS website matched the chances reported at CNNSI; you were part of that discussion. And here's yet another site with ZR data, matching what I have said about the CNNSI data; it explicitly shows 3 plays made and 3 opportunities for Kennedy yesterday, no need to infer anything.

The data doesn't reconcile? That is a pseudo-problem. Nobody is (or should be) saying that ground ball opportunities are identical under the two zone systems so that the only difference is line drives; BIS has different granularity than STATS, so they needn't agree about the exact number of missed ground balls in the zone, never mind the ordinary discrepancies in recording the exact direction of balls hit near the edge of the zone. That creates the noise you're worrying about.
   36. Walt Davis Posted: April 03, 2007 at 08:06 AM (#2323387)
6th inning Kennedy gets putout catching line drive and assist doubling runner off first. A putout, an assist, and a play made (according to me).

Excuse my ignorance of fielding stats, but does the fact that he had an assist here make it a "play made" rather than the line drive? Or are throws not counted in ZR?
   37. studes Posted: April 03, 2007 at 11:25 AM (#2323401)
I also appreciate your pointing out that STATS removed double counting DPs about 8 years ago. The THT glossary still intimates that it still double-counts DPs.

Hey Chris, wish you had told me that. I'll fix it right away.
   38. studes Posted: April 03, 2007 at 11:33 AM (#2323403)
Actually, I don't see how our explanation implies that, but I'll add a clause anyway.
   39. Chris Dial Posted: April 03, 2007 at 12:16 PM (#2323407)
If there are such balls, they are balls coded as hit type "line drive" and they are counted as opportunities.

Joe, I was a scorer for STATS. When a line drive hit the ground in the zone prior to reaching the fielder, it was scored a GB, not an LD.

That's why.
   40. Chris Dial Posted: April 03, 2007 at 12:22 PM (#2323410)
Studes,
he has revised the original Zone Rating calculation so that it now lists balls handled out of the zone (OOZ) separately (and doesn't include them in the ZR calculation) and doesn't give players extra credit for double plays (Stats had already made that change). We believe both changes improve the Zone Ratings substantially.

I don't know. That reads like STATS did still include DPs, exspecially as yu ndicated the removal of DPs was a big improvement.
   41. Chris Dial Posted: April 03, 2007 at 12:32 PM (#2323413)
Joe,
let's review:
line drives are counted only if they land in the zone,

"Only" means "only". Caught LDs do not "land" in the zone - you and I both have agreed that "landed" means "landed", not caught. Thus caught LDs are not counted, since *only* LDs that *land* are counted.

It specifies that only LDs that *land* are counted. The reason they are counted is that when they "land" in the Infield, they become GBs for the purposes of fielding - that is, once a LD hits the ground, you are no longer fielding a LD, but rather a GB, as it is not bouncing/rolling.

Now maybe you want to say that caught LDs count (you clearly do), but that's NOT what STATS says in your quotes. It SPECIFICALLY EXCLUDES caught LDs - if you read "landed" as "landed" and not as "caught".

I appreciate your exasperated interrobangs, but really, the sentence seems clear to me. Perhaps that explanation will help.
   42. BDC Posted: April 03, 2007 at 12:42 PM (#2323416)
reserves (at all positions) are on the whole better fielders than the starters

Thanks, Chris. That's a very important principle. Very interesting.
   43. Slinger Francisco Barrios (Dr. Memory) Posted: April 03, 2007 at 12:50 PM (#2323421)
Thanks, Chris. That's a very important principle. Very interesting.

AROM begs to differ in post #28, so I'm not sure we have a principle just yet.
   44. Chris Dial Posted: April 03, 2007 at 12:59 PM (#2323427)
Joe,
all of what you say about LDs is fine. Maybe they are counting LDs - it would just be a surprise as I said above.

Let's look at that - as only LDs that are caught are showing up as PM, and in Sean's list everyone except OCab has a greater increase in CH than PM, although the PM portion is generally greater than 50% of the increase, it says something about the PM.

Wait a second.

In the "apples to apples" chart, Sean, are the ZR scores supposed to be the "BIZ+BOZ" totals? Because they aren't. For Adam Everett, 413/456 is 0.906. Not sure what the JD ZR is there for with those BIP counts.
   45. BDC Posted: April 03, 2007 at 01:00 PM (#2323428)
Nah, Dial is the man :)
   46. AROM Posted: April 03, 2007 at 01:06 PM (#2323432)
A little reading comprehension issue on my part.

I thought someone had looked at starter/reserve fielding before and the general conclusion was that for most positions they are equal, but reserve shortstops are worse than starters. I'll check the other positions tonight unless somebody beats me to it.
   47. AROM Posted: April 03, 2007 at 01:11 PM (#2323436)
If play by play matched to CNNSI indicates line drives are being counted, and CNNSI matches Stats premium, it seems like a slam dunk case to me that line drives are included in ZR. If that's what all the published data has, it really doesn't matter how we read the definitions.

It seems they have a very liberal definition of "landing" to include landing in a glove.
   48. Chris Dial Posted: April 03, 2007 at 01:28 PM (#2323446)
I thought someone had looked at starter/reserve fielding before and the general conclusion was that for most positions they are equal, but reserve shortstops are worse than starters.

That would have been me. they are *at least* equal. the league avg ZRs are slightly higher than the ZRs for the starters (with 1000 IP).
I may have overreached for SS. It is a general truth.
   49. Dan Turkenkopf Posted: April 03, 2007 at 01:40 PM (#2323449)
A little reading comprehension issue on my part.

I thought someone had looked at starter/reserve fielding before and the general conclusion was that for most positions they are equal, but reserve shortstops are worse than starters. I'll check the other positions tonight unless somebody beats me to it.


Not that this necessarily applies to ZR based metrics, but LA BlackHawk calculated that using PMR, SS with less than 1000 BIP were collectively 97 runs below average.

Another caveat, PMR includes liners and infield popups. I don't think there were numbers released for GB only this season.
   50. Chris Dial Posted: April 03, 2007 at 01:46 PM (#2323452)
If play by play matched to CNNSI indicates line drives are being counted, and CNNSI matches Stats premium, it seems like a slam dunk case to me that line drives are included in ZR. If that's what all the published data has, it really doesn't matter how we read the definitions.

I agree the data strongly indicates that. The question is - does that matter?

Caught LDs (and it represents about 30 plays a year) would be a one-for-one proposition. That's easily accounted for (0.0218 LDs per IP)

That would also indicate that any "made play" difference between BIS and ST would be a line drive (using the BIS OOZ data).

There should never be anyone with a greater change in CH than PM (I dont think). Jack Wilson has a -8, which is significant, especially as that is one noted as "missed a lot".

More as I sort through it - I have a ten+ hour drive today.
   51. AROM Posted: April 03, 2007 at 01:57 PM (#2323458)
There should never be anyone with a greater change in CH than PM (I dont think). Jack Wilson has a -8, which is significant, especially as that is one noted as "missed a lot".

There can be if the zones cover different areas. What this means for Wilson is that he missed more balls in the area that BIS counts as "in zone" but STATS does not.

Enough with the complaining, Chris, don't you have internet access through your cellphone so you can post while you drive?

Just kidding- have a safe trip.
   52. Slinger Francisco Barrios (Dr. Memory) Posted: April 03, 2007 at 03:19 PM (#2323512)
Walt had a good question in #36...does anybody know the answer?
   53. AROM Posted: April 03, 2007 at 03:25 PM (#2323518)
6th inning Kennedy gets putout catching line drive and assist doubling runner off first. A putout, an assist, and a play made (according to me).

Excuse my ignorance of fielding stats, but does the fact that he had an assist here make it a "play made" rather than the line drive? Or are throws not counted in ZR?


He's getting the play made for the line drive catch. The assist that turns a double play does not count in zone rating.

I see Valentin the same game had 5 putouts, 4 assists, and only one chance. He turned 4 DP's - as the middle man you get the assist and putout, but no PM. Was the extra putout also a line drive? I'll have to check the PBP.
   54. AROM Posted: April 03, 2007 at 03:31 PM (#2323519)
For Valentin it was 3 DP's turned, 1 DP started (his one CH), and two flyballs caught.
   55. bads85 Posted: April 03, 2007 at 04:15 PM (#2323544)
Where did you get 347 plays? STATS or SI?


It is 347 on both SI and STATS Inc. Premium. Perhaps some dull cubicle troll entered the data wrong originally for STATS, then the error was sent out to those companies who purchase the STATS. Or perhaps STATS started counting line drives just because someone there felt like it. The currenct STATS Inc. just ain't what it used to be.
   56. AROM Posted: April 03, 2007 at 04:39 PM (#2323565)
It is 347 on both SI and STATS Inc. Premium. Perhaps some dull cubicle troll entered the data wrong originally for STATS, then the error was sent out to those companies who purchase the STATS. Or perhaps STATS started counting line drives just because someone there felt like it. The currenct STATS Inc. just ain't what it used to be.

Its not a data error, unless they repeated that data error for every shortstop in the game, adding 20-40 to every shortstop's number, and then doing it in such a way that if you take (CH * ZR), round to a whole number, then take your rounded PM/CH, you wind up with EXACTLY the same ZR, no .831 vs .832 rounding difference or anything.

If you still think its a data entry error I have some 2007 Nationals world series tickets to sell you. Buy now and I'll even throw in a Cristian Guzman MVP trophy.
   57. bads85 Posted: April 03, 2007 at 05:17 PM (#2323587)
Its not a data error, unless they repeated that data error for every shortstop in the game, adding 20-40 to every shortstop's number, and then doing it in such a way that if you take (CH * ZR), round to a whole number, then take your rounded PM/CH, you wind up with EXACTLY the same ZR, no .831 vs .832 rounding difference or anything.


Those cubicle trolls are devious little buggers.

If you still think its a data entry error


I never really thought it was a data error, but one should never underestimate the destructive powers of cubicle trolls. I also hadn't read Joe Arthur's #35 when I posted that.
   58. JoeArthur Posted: April 04, 2007 at 12:32 AM (#2323928)
bads85 - as a cubicle troll myself, I agree you're wise to beware our powers. Sorry not to give you explicit credit in #35; I couldn't be bothered to find the correct thread to do so.
   59. Chris Dial Posted: April 04, 2007 at 02:29 AM (#2324104)
There can be if the zones cover different areas. What this means for Wilson is that he missed more balls in the area that BIS counts as "in zone" but STATS does not.

Here's the problem. That wouldn't happe for a single SS (just one). And it's pretty clear from Dewan's work that BIS doesn't use zones that STATS doesn't. He says at The Fielding Bible he's within the STATS zones, and more granular within them.

Can't someone just ask him?
   60. bads85 Posted: April 04, 2007 at 02:40 AM (#2324125)
Sorry not to give you explicit credit in #35


Don't worry about something that trivial. I'd be worried if you had remembered my screen name.
   61. bads85 Posted: April 04, 2007 at 02:53 AM (#2324152)
Can't someone just ask him?


Would he give a satisfactory answer if someone did? After all, his system has been out for a while, and he still hasn't clarified it. In my very limited dealings via email with Dewan, his answers were somewhat unsatisfactory -- very congenial, however. Since we didn't know each other, I didn't pursue matters, lest I come across as some sort of email pest.
   62. Chris Dial Posted: April 04, 2007 at 03:04 AM (#2324171)
Hmmm, I haven't but I figured Studes knows the man. Can't he just say "John, when you were running STATS ZR, were you including caught line drives as a play made? STATS appears to be doing so now."?

Doesn't Pinto know? Ask him.
   63. JoeArthur Posted: April 04, 2007 at 05:32 AM (#2324253)
Practically speaking, the rules for STATS ZR are instantiated in the programming code. To get a reliable answer about how STATS zone rating works, you need to look at the code. Barring access to read the code, you can reverse engineer it from comparison of the outputs to the inputs (as I sketched in #31.) It's possible that what Dewan thinks he asked the programmer to do and what the programmer did are two different things. [The Baseball Scoreboard acknowledged in print in two different years (1991,1997) that they had corrected bugs which caused miscounting of the opportunities in their previously published results. I have spot checked historical seasons for still active players at cnnsi and espn websites and compared them to the contemporaneously published totals in the 2000 & 2001 Scoreboard editions, and they haven't been restated, so it looks like the program hasn't changed since the publication of the annual Scoreboards ceased.]

If Dewan didn't write the code then, and doesn't have access to read it now, he can't give a decisive answer. I know Pinto did some of their programming in that era; there's a better chance that he knew and remembers the details correctly than Dewan does.

He says at The Fielding Bible he's within the STATS zones, and more granular within them.
I don't see where Dewan says that. He implies (p.227) that they both define their zones using a 50% plays made rule. Their pbp datasets are both attempting to capture the same reality (since 2002), but they are independent records of that reality and probably will have differences setting the edges of the zones because of that. This is the point of my last paragraph in post #35.

I can't resist a final note on line drive vs ground ball (#39). We know infield zones have a front boundary because really short ground balls are excluded. If they are intending to count line drives, zones also could use a back boundary. If they have a back boundary, it probably is in back of the fielder's usual position. In that case there would be room for a line drive to pass a fielder in the air, and land in the zone. For example, consider a 3B playing at standard depth, 110 feet from the plate, in the middle of the infield dirt. A ball that passes him in the air knee high will land ~125 feet from the plate, on the back of the infield dirt. A ball that passes him head high might land 170 feet from the plate, a ways onto the outfield turf. Taking at face value the phrase "line drives are counted only if they land in the zone", a back boundary for the zone with some distance cutoff is implied. I see no difficulty with taking the phrase at face value and conceiving the zone to be defined this way.
   64. JoeArthur Posted: April 04, 2007 at 07:24 AM (#2324273)
#36,#52
Excuse my ignorance of fielding stats, but does the fact that he had an assist here make it a "play made" rather than the line drive? Or are throws not counted in ZR?

Good point. My example with Kennedy is not incontrovertible because it's theoretically possible they are ignoring the catch and counting the assist on the doubled off runner. I believe the catch is counted and the assist is not, but you can't tell from this example.

So some others from 4/2. "Chances" refers to the cnnsi field containing ZR opportunities.

-Jhonny Peralta Cle SS. fielded 3 groundouts and caught a line drive. shown with 3 "chances" and 1.000 ZR = 3 plays made. Unambiguously contradicts my theory.
-Miguel Tejada Bal SS. fielded 4 groundouts, a popup and a lineout. shown with 6 chances and .833 ZR = 5 plays made. Either the lineout or the popup has to be counted. tentatively supports my theory.
-Garret Atkins Col 3B. fielded 2 forceouts, a lineout and a popup. shown with 5 chances and .600 ZR = 3 plays made. Either the lineout or the popup has to be counted. tentatively supports my theory.
- Tony Tulowitski Col SS. fielded 2 groundouts, 3 popups and a lineout. shown with 3 chances and 1.000 ZR = 3 plays made. too many popups, seems like it is the line drive which has to be counted. supports my theory.
- Dontrelle Willis Fla P. fielded 2 lineouts and a forceout. shown with 3 chances and 1.000 ZR = 3 plays made. Supports my theory.
- Edgar Renteria Atl SS. fielded 1 groundout and 1 lineout. shown with 3 chances and .667 ZR = 2 plays made. Supports my theory.
- Jose Castilla Pit 2B. fielded 2 popouts, a lineout and a lineout-DP with an assist shown with 3 chances and ZR of .667 = 2 plays made. Matches my theory but ambiguous.
- Jose Bautista Pit 3B. started 2 GDPs plus 3 groundouts, a lineout and a foul lineout. shown with 6 chances and ZR of 1.000 = 6 plays made. I expected the foul lineout to count and predicted 7 chances. Seems at least 1 lineout had to be counted to reach 6 chances. Arguably supports my theory.
- Adam LaRoche Pit 1B. 2 popouts and a lineout. shown with 1 chance and ZR of 1.000 = 1 play made. too many popups, tentatively supports my theory.
- Salomon Torres Pit P. groundout, lineout. shown with 2 chances and ZR of 1.000 = 2 plays made. supports my theory.

10 players; Peralta's plays made appears to contradict my theory, and I don't have an explanation for that but 8 of the other 9 exactly match my theory that line drives are counted. Bautista has one less play made than I expect, but that includes a foul lineout, a sufficiently rare play that the STATS ZR program may overlook it as a play made outside of zone. All 9 contradict the gb only theory (all 7 if you think the results for pitchers are irrelevant.) I don't see any reason to muddy the waters with popups - none of the players' totals require counting any popups.
   65. Foghorn Leghorn Posted: April 04, 2007 at 12:05 PM (#2324307)
Bautista has one less play made than I expect, but that includes a foul lineout, a sufficiently rare play that the STATS ZR program may overlook it as a play made outside of zone.

Just so you know, you just explained that the code is the code, and that play wouldn't (logically) be overlooked - if Bautista caught it, and it was a lineout, it should show up.

What is missing is that the foul lineout and Peralta's may not have been scored as "lineouts" by STATS, but rather as "popouts".

As noted, the inclusion of lineouts doesn't appear to impact the data to any significant degree. Why? Because everyone fields "about" the same number, and it is being added to the numerator and the denominator. This may impact data by 2-4 runs (all positive) wrt variance from BIS.

The much larger problem is: why are there another 30 plays for some SS, none for some SS, and *fewer* plays for some? 30 plays is representing 10% of BIP, and wouldn't it be strange that some just have no unfielded OOZ?

That's an issue to me.
   66. Foghorn Leghorn Posted: April 04, 2007 at 12:13 PM (#2324309)
If Dewan didn't write the code then, and doesn't have access to read it now, he can't give a decisive answer.

That's utterly preposterous. Of course Dewan could do exactly what Sean did (and I'm sure he had some level of quality control). It's every bit as likely that sinec 2001, STATS made a change to incorporate line drives.

I didn't write the code to Excel, but I can tell when it doesn't sum a set of numbers properly.
   67. AROM Posted: April 04, 2007 at 12:54 PM (#2324322)
What is missing is that the foul lineout and Peralta's may not have been scored as "lineouts" by STATS, but rather as "popouts".

Could be one of those 'fliners'.

The much larger problem is: why are there another 30 plays for some SS, none for some SS, and *fewer* plays for some? 30 plays is representing 10% of BIP, and wouldn't it be strange that some just have no unfielded OOZ?

That's an issue to me.


Me too. BIS is recording an average of 12 chances more (I think, don't have data handy) per SS per 1400 innings. STATS are obviously using slightly larger zones, but there are still a few edges where BIS counts a chance where STATS does not. I started looking at z-scores and the differences between the 2 system's chances do not appear to be close to random. We could have significant differences between scorer decisions.
   68. JoeArthur Posted: April 04, 2007 at 01:04 PM (#2324325)
Not preposterous in the least. 1)Dewan had a lot of things to keep him busy besides this one statistic, even though it was "his baby." If he was checking the quality in detail, and the code was perfect, why did they twice publicly acknowledge bugs? 2) I am a programmer and work with qualitative specs all the time which don't fully comprehend the complexities of the actual data. It can take a lot of back and forth with the data and the person who issued the spec to get things exactly right, and that doesn't necessarily happen.

For instance STATS 1992 Baseball Scoreboard p.235:
"Technical note: Our thanks to Pete Palmer and Craig S. Tyle who pointed out a problem of double counting in our 1989 and 1990 zone ratings. The error did not significantly affect the overall rankings for those years. The three-year ratings in this book were recalculated with the 1989 and 1990 numbers restated.

Emphasis added. Note how they avoided revealing the size of the problem. Let's look at examples.
2b Sandberg 1991 534/565 1989-1991 restated 1473/1601
originally published 1989: 691/747; 1990: 677/716
without restatement the 3year total would have been 1902/2028. So the double counting for two years added up to an additional
429 outs and 427 opportunities. [This was when they did credit an extra out for DPs]

CF Devon White 1991 435/499 1989-1991 restated 1147/1320
originally published 1989 446/498 (while crediting 426 putouts); 1990 331/389 (while crediting 302 putouts)
without restatement the 3 year totals would have been 1212/1386. So the double counting for two years years added up to an additional 65 plays made and 66 opportunities. Obviously the plays made were glaringly inconsistent with actual putouts.

Great quality control. Around '97, they admitted another error: occasionally counting "flys" [sic] caught by infielders. Sounds like they excluded popups but didn't think to exclude flies because they didn't anticipate the combination would occur.

"Just as likely" that a change was made since 2001? Who would have prompted a change like that? Good circumstantial evidence can be produced to test that theory as well...
   69. Foghorn Leghorn Posted: April 04, 2007 at 01:26 PM (#2324341)
1)Dewan had a lot of things to keep him busy besides this one statistic, even though it was "his baby." If he was checking the quality in detail, and the code was perfect, why did they twice publicly acknowledge bugs?

Um, how do you think they found the bugs to begin with? By checking. I'm not terribly surprised that in the first couple of years of the work there were some problems, but since 1997, they haven't noticed any, and two over 18 years seems pretty good quality control. Perfect? Who ever thinks it is perfect? Programmers are notoriously poor at perfection - if they did it right the first time, they'd be out of a job. Are there ever programs written without bugs?

"Just as likely" that a change was made since 2001? Who would have prompted a change like that? Good circumstantial evidence can be produced to test that theory as well...

Gee, someone at STATS who thought LDs should always have been counted, and when Dewan left, he changed it? I agree it should be tested.

As I stated, line drives caught aren't really a problem. They slightly increase ZR, but as they are a one-for-one proposition, they don't do much to change the data - neither runs saved, nor rankings. I appreciate your being hung up on this, but it doesn't appear to matter wrt the data.

The larger question is regarding the plays that aren't being recorded - why does Edgar Renteria have 8% extra plays in his zone and OCab have *0%*. Moreover, why does Jack Wilson have a lot less?

Last night, in the Mets-Cards game, someone hit a linedrive to Eckstein that landed just in front of him and he fielded on a hop. Guess what it is scored as? Not a line drive - a grounder. The batter grounded out.
   70. mgl Posted: April 04, 2007 at 05:50 PM (#2324627)
A couple of things. One, Tango found that whether reserves or regulars are better fielders (according to UZR I think), depends entirely on the position. Check out the thread on The Book blog.

Two, I found that the year to year correlations for line drives caught divided by line drive opps are at least as good as for ground balls for SS and 2B only. For some reason, that is not the case for 1B and 3B. Maybe a sample size issue, I don't know. So I include line drives for SS and 2B only. I started doing that in 05 I think (for UZR). For a line drive "opp" that is not caught, I think I use 250 feet (according to STATS) as the cutoff. IOW, I only track line drives that are either caught by a SS or 2B or those that land less than 250 feet from HP, according to the STATS data.

Finally, according to the STATS data, which I have (the raw data), in 06 Cabrera fielded and turned into at least one out 320 gb's, 27 line drives, and 47 pop-ups. So it appears that STATS in counting GB's and LD's.
   71. mgl Posted: April 04, 2007 at 06:11 PM (#2324658)
Actually, I track all line drives, wrt including line drives in the UZR for SS and 2B. Remember there are no "opps" per se in UZR. Every bip is a potential opp.
   72. AROM Posted: April 04, 2007 at 06:23 PM (#2324680)
Close enough to my numbers for groundballs and liners, but I get 87 popups from retrosheet.

Is that a typo or maybe if goes far enough you count it as a flyout instead of a popup?
   73. mgl Posted: April 04, 2007 at 09:09 PM (#2324864)
I'll have to check on the popups. I thought I included all air balls. That sounds like a lot of popups for one player to catch (1 per 2 games), but it is what it is. The important thing are the ground balls and line drives.
   74. JoeArthur Posted: April 05, 2007 at 04:28 AM (#2325656)
"Um, how do you think they found the bugs to begin with? By checking. "

Discovery of the egregious double counting error was credited to Pete Palmer and Craig S. Tyle, as I explicitly quoted. Why do you think that Pete Palmer and Craig S. Tyle are evidence that STATS had quality control? Pete Palmer was not a STATS employee and probably Craig S. Tyle wasn't either. (2 minutes with google show a Craig S. Tyle as a lawyer, who had worked with an investment company for 13 years as of 2002, i.e. he started there in 1989, i.e. he wasn't a STATS employee in 1991. Craig Tyle is not a common name. Altavista peoplefinder finds 1 in the entire country.) These were bug reports from consumers. Dewan obviously did not hand check this data himself, or if he did, he didn't do a good job. Given that he was CEO of a struggling company (at the time) I assume the former.

"since 1997, they haven't noticed any [bugs], and two over 18 years seems pretty good quality control. "

I don't agree that the time span proves anything about quality control. Context is everything in evaluating this claim. Typically, code is quality controlled only a) before it is put into production b) when it is upgraded c) by users noticing errors after it is put into production. STATS clearly failed with a). That is all that can fairly be concluded. And let's remember the original context here. I said Dewan could not be an authoritative source for how zone rating actually worked if he didn't write the code or still have access to it. [His intentions about how it "should" work, even if clear and complete (which we also don't know), do not guarantee that the programmer implemented his intentions. It is what the code actually does which determines why the raw data results in the zone ratings it does.] You said this was an "utterly preposterous" statement, because Dewan (or his team) would have quality controlled the results. That's a non sequitur in the first place - it doesn't bear on my statement one way or the other, even if true - and STATS' acceptance of the Palmer/Tyle discovery is conclusive evidence that their quality control was egregiously flawed or nonexistent at the time zone rating was developed. The only other times when efforts at quality control were at all likely to have occurred were when the method was redesigned in the late '90s and when the Palmer/Tyle double-counting bug was reported, (though more likely the specific bug was was fixed, much less likely that the code was generally reviewed from top to bottom).

"The larger question is regarding the plays that aren't being recorded - why does Edgar Renteria have 8% extra plays in his zone and OCab have *0%*. Moreover, why does Jack Wilson have a lot less?"

I "explained" this in #35, but you have ignored my remark entirely. Why can't differences in opportunities of 20-30 chances between Dewan's and STATS' zones be explained away just by discrepancies in recording the position and direction of balls in play near the periphery of the zone? In one case, scoring is done at the ballpark, and in the other, through video review. The perspectives are different. I wouldn't expect them to routinely agree to the nearest foot.

"Just as likely" that a change was made since 2001" - [a claim in #66 which you did not alter in #69].

You have offered no basis for declaring this possibility "just as likely." Now that it has been clearly demonstrated that line drives were counted in 2006, it also means that they were counted in all earlier years as well, unless this part of the method was changed. You have hypothesised this change solely to preserve your interpretation of my quotation in #19 above from the Baseball Scoreboard 1998 about line drives landing in the zone being counted as opportunities - your interpretation that this phrase was a circumlocution for ground ball and really meant that line drives were not counted at that time. Obviously I don't think your interpretation of the phrase is at all likely.

"Last night, in the Mets-Cards game, someone hit a linedrive to Eckstein that landed just in front of him and he fielded on a hop. Guess what it is scored as? Not a line drive - a grounder. The batter grounded out."

What's your point? Of course I accept a one hopper as a ground ball too. It's not responsive to what I said about the zone having a back boundary in #63, so that it was possible for a line drive to be in the zone and not land in front of the fielder, so I don't 'get' the "guess what..?"

I look forward to your replies.
   75. Foghorn Leghorn Posted: April 05, 2007 at 11:33 AM (#2325689)
I "explained" this in #35,

Sorry, Joe, I don't count this:
"never mind the ordinary discrepancies in recording the exact direction of balls hit near the edge of the zone."
as an "explanation". It's a observation of what could be occurring.

You don't find it odd that the Angels agree 100% and others are off by 10%? That's not too large for you? I think being off by 5 is explained with what you wrote - not 10X as much.

And please explain why counting LDs matters.

Joe,
you said "If Dewan didn't write the code then...he can't give a decisive answer."

*That* is what I said was preposterous. Of course he can. Moreover, he can certainly clarify what was written in teh Scoreboard and what it meant.

"His intentions about how it "should" work, even if clear and complete (which we also don't know), do not guarantee that the programmer implemented his intentions."

That says a lot for programmers. Do you routinely ignore the design instructions?

And yes, time since error *does* say something about quality.
   76. JoeArthur Posted: April 07, 2007 at 07:28 AM (#2328222)
Chris -

"You don't find it odd that the Angels agree 100% and others are off by 10%? That's not too large for you? I think being off by 5 is explained with what you wrote - not 10X as much." #75

This is very misleading.

Your original objection in #25 [the one I responded to] was more sophisticated than this; you did realize there that if STATS was counting LD "caught" and BIS was not, then STATS opportunities automatically would be greater by those extra plays made, and that any additional difference would be the difference in measurement of missed plays. Thus STATS Opps - extra STATS PM from line drives =~ Dewan PM (as you yourself state in #50) You chose to insist that any further difference would have to be missed line drive plays only, though AROM (#27) and I (#35) didn't agree, pointing out that the ground ball zones would not be defined identically between the systems [because of different granularity in recording position, as well as discrepancies in exactly locating the ball's position or direction -again you yourself seemed to understand this point about differences between the STATS and BIS zones when you wrote your post #6]. The average "unexplained" difference in opportunities among the 24 players AROM listed is +9 (11 by absolute value), 20 on the positive side, 4 on the negative, ranging from Renteria at +29 and Hanley Ramirez at +26 to Wilson at -8.

Remember that the STATS zone for SS is supposed to be vectors H-L: 5 vectors. Three are interior to the zone and two are the edge of the zone. We can loosely extrapolate that 40% of the balls in zone are along vector H or vector L. Since the average opportunities are ~450 for the full time players examined in AROM's chart is around 450, that would be in the vicinity of 180 balls at the edge of the shortstops zone and in the zone. Along vectors G and M, we can loosely estimate another 180 balls "just outside" the zone. Let's say all 50 of the average plays made outside zone come from vectors G and M, so those balls recorded along G or M would be counted as opportunities anyway. And of the peripheral opportunities just inside the zone, many would be plays made and be counted whether they were mislocated by the reporter or not. Plays in the hole are much tougher [see 1991 Baseball Scoreboard], so the conversion rate at the periphery will be rather less than 80%. So perhaps 40 missed plays, perhaps more, on the periphery of the zone. For the average SS in AROM's list, he would have 130 plays outside the STATS zone with potential to be located in BIS's zone with just a slight measurement error. Conversely, there would be 40 plays inside the zone with potential to be located outside the zone by BIS.

That is the correct context (average of ~170 borderline missed balls in play per shortstop) for comparison to the discrepancy of 11 measured opportunities between the systems, once caught line drives are recognized as part of the difference. I think that is a plausible rate of disagreement, and that these discrepancies are not big enough to invalidate the line drive theory, which anyway is pretty conclusively verified by MGL's post #70 using the actual raw STATS data. And that was the original reason you raised this pseudo-problem.

"And please explain why counting LDs matters." #75

I don't know what you mean by "matters." In a theoretical sense, I think more a bigger information base is better than a smaller one, and that skill is involved in catching line drives, and if adding line drives reduces noise overall, add them. I said this in #19. In a practical sense, I have discussed line drives at length here solely to refute claims you have made: STATS doesn't count line drives (#6); CNNSI ZR data did not correspond to STATS ZR because you couldn't reconcile it (#17); "A "line drive that lands in the zone" *is* a ground ball to be fielded in the zone" (#22; followed with your contorted restatement in #41 which utterly ignored a paragraph in my #35, and your bizarre note about a one hop grounder in #69 which likewise was completely non-responsive to my explanation in #63.

"you said "If Dewan didn't write the code then...he can't give a decisive answer."

*That* is what I said was preposterous. Of course he can. Moreover, he can certainly clarify what was written in teh Scoreboard and what it meant.
"His intentions about how it "should" work, even if clear and complete (which we also don't know), do not guarantee that the programmer implemented his intentions."

That says a lot for programmers. Do you routinely ignore the design instructions?

And yes, time since error *does* say something about quality.


I said this clearly enough the first time so I don't think a restatement is going to change your mind, but I'll try: You are confusing what I'll now call the "platonic ideal" of what zone rating should be with the imperfect reality instantiated in code. You look at the published numbers and percentages, they are not generated by the ideal program in Dewan's head but by the actual program. The correct interpretation of those numbers, such as they are, depends on what the actual program does. I believe your idea about CEO Dewan micro-managing the text descriptions in the scoreboards or quality checking the program himself is not very likely, especially as the company grew in size. I already pointed out the egregious error discovered by outsiders in year 3 of the scoreboard after the '91 season(#63,68).

As for quality, let's now look at the other "bug" STATS acknowledged after the 1996 season, when they redesigned the infield zones: [1997 Scoreboard pp.205-6] "we also eliminated a bug which was causing the occasional fly ball to be erroneously charged against a fielder's zone." It was expressly stated that for shortstops this bug fix was the only change in the zones that year. What difference did this change make for shortstops? p206: the average went ... at shortstop from .883 (1995) to .935 (1996). That implies a 5.6% change in opportunities; as an example Jay Bell's restated opportunities for the prior year went down by 42 with no change to plays made. Again, this is not a

The descriptions of ZR in the scoreboard generally cannot be read as technical blueprints to ZR. They are imprecise and incomplete, and often are clearly cases of cut-and-paste from other descriptions, by no means independently derived from study of the program. After the initial outline in the 1989 scoreboard which discussed double plays started and the handling plays made on balls outside of zone, neither was mentioned again until 1995. As far as whether line drives were counted or not, there are two additional texts which are of interest (beyond the ones I analyzed in #19), and of course they contradict:
1996 Scoreboard p.212 [in a special article examining Roberto Alomar's ZR performance in detail] : "Zone rating is simply the total number of outs
recorded by a fielder on line drives and ground balls as a percentage of total balls hit in his area of responsibility." The 2001 Scoreboard Glossary however claims that only ground balls were counted.

As far as my personal practice with specs, I discussed that in #68, and I don't see any humor in your rhetorical question.

You can have the last word whenever you fairly characterize my comments and don't mislead about the positions you took before on which I commented. That itself amounts to a mischaracterization - that I missed your points when I did not.
   77. Chris Dial Posted: April 07, 2007 at 01:51 PM (#2328251)
Joe ,
you keep talking about this magical code, but it's just summing columns in Excel (or the equivalent). There's nothing fancy about it (or there needn't be). To calculate things like percentage of balls turned into outs in a zone, maybe, but to take the data as entered and to sum a few columns isn't that sophisticated.

You say stuff like this:
"You can have the last word whenever you fairly characterize my comments and don't mislead about the positions you took before on which I commented"

But also write:
"You chose to insist that any further difference would have to be missed line drive plays only"

I *insisted*? Give me a break.

"For the average SS in AROM's list, he would have 130 plays outside the STATS zone with potential to be located in BIS's zone with just a slight measurement error."

You don't find it odd that some shortstops have a reallt large number and some have none?

"It was expressly stated that for shortstops this bug fix was the only change in the zones that year. "

It's not a "bug"; it's a programming error. Of course, I think of a bug is conflicting programming information that causes something not to work properly, maybe it isn't that - maybe "bug" is just what programmers call their errors.

What you ignore in all of this is how long ago I said "I agree the data strongly indicates that."

Talk about mischaracterization. What are you even arguing about?

I exchanged email with Dewan and Pinto. Both said "it was designed to include GBs only, but it's possible caught LDs were counted." Neither was sure. Dewan said he would have to check the code, but obviously wouldn't be able to do that now. The same person designed both systems. He considered LDs caught wrong in both cases.

All I care about is that the data is right. I don't like "accepting" data that has incomplete information. I'd like to see what Dewan call shis zone. Since exchanging emails with him, I certainly feel a lot better about BIS data, but I'd prefer more info on what constitutes BIZ and OOZ.

I like the Zone methodology approach (despite it's wide criticism). I think that's the right way. Dewan designed the first one, and I think he'll manage a good one now.

I'd like to see something better for the Green Monster (and other walls), but I'll live.
   78. JoeArthur Posted: April 07, 2007 at 09:44 PM (#2328636)
you keep talking about this magical code, but it's just summing columns in Excel (or the equivalent).

I think it's significantly harder than that, and that's why big bugs were discovered twice. I don't know what individual fields are available to describe pbp in the STATS database, but here's some pseudo-code which illustrates the minimum likely complexity of the zone rating program, for a single position:

ss_PlaysMade=0;
ss_Opps=0;

Loop:for each BIP;

If player X is playing SS
then
IF fielded_by_pos=ss
and hit_type <> FB and hit_type <>pop and hit_type <>bunt
and safe_hit =no
and error = no
and out_recorded = yes
THEN ss_PlaysMade = ss_PlaysMade+1;
ELSE;
END IF;)
IF ss_PlaysMade =1 or
IF vector is between (H:L)
and fielded_by <> 3b and fielded_by <> 2b and fielded_by <> P
and ((hit_type =GB and distance > 60) or (hit_type = LD and distance < 250))
THEN ss_Opps = ss_Opps +1;
ELSE;
END IF;
END IF; ## player x is playing SS

end loop;

more complicated than =sum(a1:a130,000) I'd say. And it's still not complicated enough. It won't handle "single - batter out because ball hits runner"; it won't handle "batter and runner safe on failed fielders' choice but then one is put out advancing an extra base"; it won't handle the CF making a play on a Gb in the SS zone due to a 5-man infield. Accounting for those things makes the code more complicated still - how complicated depends on how much STATS did something special to make these cases easily distinguishable before the program was written. And I've taken for granted that the program doesn't need to keep track of in-game substitutions at each position to get credit to the right individual.

It's not a "bug"; it's a programming error. Of course, I think of a bug is conflicting programming information that causes something not to work properly, maybe it isn't that - maybe "bug" is just what programmers call their errors.
Now what point are you trying to make? The slang term 'bug' was used by STATS itself. I merely quoted it. There's a standard distinction between specification bugs (e.g. Dewan did not correctly specify what the programmer needed to do, assuming it was Dewan) and coding bugs - both lead to unintended results. It's impossible to know from the outside which sort this was, and I don't understand what your point would be if we did know.

I *insisted*? Give me a break.
You don't find it odd that some shortstops have a reallt large number and some have none?

Again you are misleadingly non-responsive on both points. I explained "insisted" in #76 - you ignored the contrary explanation from AROM and me and non-responsively stuck with your own theory - I think that was a deliberate choice on your part - I call that "insist" and don't give you a break.

What you ignore in all of this is how long ago I said "I agree the data strongly indicates that."
Talk about mischaracterization. What are you even arguing about?


I'm arguing with you about disagreements, not this. You conceded on this point in #50 and yet still wanted to argue about what the definition of a line drive was and whether Dewan could be trusted to explain how ZR really worked and whether BIS zones had a serious discrepancy compared to STATS zones. I have been arguing with you about your continued statements after #50 on those points.

I'll close by partly agreeing with you - ZR is very simple in concept (not necessarily in execution!) and that makes it attractive, if the zones are an appropriate size. Its limits are fairly clear, and to some degree ZR systems can be independently quality controlled by free data like Retrosheet's. More accurate systems have been attempted, and they may well be more accurate, but their execution in code is much harder still, so without their code and without the precise data, it's hard to tell how well they are actually fulfilling their intentions.
   79. Chris Dial Posted: April 07, 2007 at 10:07 PM (#2328683)
whether Dewan could be trusted to explain how ZR really worked and whether BIS zones had a serious discrepancy compared to STATS zones. I have been arguing with you about your continued statements after #50 on those points.

So you're just arguing?

You can take all the BIP data and alt, d, s and sort by zone and distnce (see the outs by distance vector link in that article) and see if the result was an out or not. It's not (or needn't be) that complicated.

I explained "insisted" in #76 - you ignored the contrary explanation from AROM and me and non-responsively stuck with your own theory - I think that was a deliberate choice on your part - I call that "insist" and don't give you a break.

I didn't ignore it - it was hardly an in-depth explanation:
"That's not a big deal, it just means the two systems are using different zones."

Well, one set of ones is completely within another set of zones.

You ignore AROM agreeing that these "leftovers" are a problem in #67. You "insist" they are not.

But really, I don't think that you "insist" that, despite ignoring AROM's observation. You just think differently, and could be reaoned to think something else, no? Lighten up, Francis.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Sponsor

Support BBTF

donate

Thanks to
Downtown Bookie
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogDraft Features Rarest of Prospects: Redheads
(94 - 1:36am, May 20)
Last: JoMo the master pitch framer

NewsblogCafardo: Dustin Pedroia the best second baseman in MLB?
(100 - 1:27am, May 20)
Last: bookbook

NewsblogOMNICHATTER for May 19, 2013
(97 - 1:22am, May 20)
Last: you got a STEAGLES? you're gonna need a STEAGLES.

Newsblog[OTP-May] Politico: Congressional baseball game, May 1, 1926
(3360 - 1:11am, May 20)
Last: DJ Funky and the Smile Time Variety Players

NewsblogOT: The Soccer Thread, May 2013
(889 - 12:59am, May 20)
Last: Textbook Editor

NewsblogHochman: Dallas Green still tells it like it is
(10 - 12:44am, May 20)
Last: Sunday silence

NewsblogOT: NBA Monthly Thread - May 2013
(947 - 12:28am, May 20)
Last: robinred

NewsblogHolmes: Where does Miguel Cabrera rank among Tiger greats?
(37 - 10:33pm, May 19)
Last: Cooper Nielson

NewsblogMurray Chass: ARE RED SOX REELING ALREADY?
(13 - 10:20pm, May 19)
Last: Dale Sams

NewsblogMLB hoping for large replay expansion in 2014
(51 - 10:06pm, May 19)
Last: David Nieporent (now, with children)

NewsblogHal Steinbrenner calls tickets 'affordable'
(21 - 9:26pm, May 19)
Last: bunyon

Hall of MeritMost Meritorious Player: 1982 Discussion
(48 - 9:05pm, May 19)
Last: Mr. C

NewsblogChicago Softball
(43 - 9:05pm, May 19)
Last: Meatwad denied pope, reveals he faked it at mass

NewsblogSoE (Megdal): It's Time to Finally Believe in the Orioles
(26 - 8:38pm, May 19)
Last: Mayor Blomberg

NewsblogBtBS: Kevin Gregg Re-emerges in Chicago
(4 - 7:48pm, May 19)
Last: Transmission

Demarini, Easton and TPX Baseball Bats

 

 

 

AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets.

For wholesale prices on baseball gifts and equipment, check these stores out!

Baseball Autograph Signings
Baseball Card Supplies
Baseball Memorabilia
Baseball Collectibles
Baseball Equipment
Baseball Protective Gear

Page rendered in 0.7115 seconds
51 querie(s) executed