Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Wednesday, March 09, 2005

Baseball Musings: Charting Range

Jon Daly Posted: March 09, 2005 at 07:33 PM | 73 comment(s)
  Related News: Sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 1 pages
   1. With 17th Pick, From LA, 1k5v3L KcoLLoP Posted: March 09, 2005 at 09:00 PM (#1190918)

Of all the shortstops in baseball, he chooses David Eckstein. I was so ready to make fun of Capn Crunch.

   2. Dewey, Local Boy and Hero Posted: March 09, 2005 at 09:18 PM (#1190951)

Jeter’s 2003 graphs and 2004 graphs would be interesting…

   3. Cowboy Popup Posted: March 09, 2005 at 09:26 PM (#1190965)

If he did it with Jeter, I’m sure the haters and the fanboys would get together and ruin the thread and any sort of academic discussion, like we always do.

   4. Hendry's Wad of Cash (UCCF) Posted: March 09, 2005 at 09:31 PM (#1190976)

Would it make sense that smaller players have a smaller range?  Let’s say that Cal Ripken and Eckstein are positioned in the same place, and there’s a hard hit ball to the left or right.  Whether or not they get to the ball will depend upon how much ground they can cover in the time it takes the ball to reach the shortstop position.

Part of this is going to be a quick reaction, so if Eckstein “sees” the ball sooner off the bat and reacts quicker as a result, he starts off with an advantage (I don’t know if that’s true, just a hypothesis).  Once he starts covering territory, though—and particularly if a dive for the ball is involved—wouldn’t he have a natural disadvantage by having a shorter stride and shorter reach?  If Shaq and Billy Barty both dive for a ball, Shaq is going to be able to reach a ball much farther from his starting point than Billy is because he’s got the length of his torso and arms to help him out.

It just makes me wonder whether Eckstein is, well, short-changed by some of this stuff.  He fields well the balls hit at him, but his range drops off faster because he’s physically unable to cover that ground as well as a bigger man could and he doesn’t have enough of an advantage in terms of better reflexes (if he has them at all) that would give him the quick first step to make up the difference.

   5. Miko Supports Shane's Spam Habit Posted: March 09, 2005 at 10:16 PM (#1191046)

Would it make sense that smaller players have a smaller range?

Sure, other things being equal.

But are the other things equal? It’s probably easier to find a smaller guy with more than enough quickness and lateral movement to compensate.

   6. Walt Davis Posted: March 09, 2005 at 10:39 PM (#1191084)

Of all the shortstops in baseball, he chooses David Eckstein.

Wasn’t there a big gap between UZR and Pinto’s numbers on Eckstein?  I’d guess that’s why.

   7. Aspiring One-Armed Economist (6 - 4 - 3) Posted: March 09, 2005 at 11:12 PM (#1191148)

Very, very interesting stuff with lots of potential applications. 

Is Eckstein’s mediocre handling of linedrives mostly a function of his height?  If so, was Ripken’s better-than-expected defense because of he’s 6’4”?

   8. Prostetnic Vogon Steve Jeltz (Dan Lee) Posted: March 09, 2005 at 11:14 PM (#1191149)

If Shaq and Billy Barty both dive for a ball

This is the funniest mental image I’ve had in months.

A very good illustration of UCCF’s point.  But a hilaripus mental image.

   9. mgl Posted: March 10, 2005 at 01:25 AM (#1191326)

Hmmm.  I second, third, fourth, fifth, etc., the comments on the site that praise the “coolness” of the graphical presentation.  Wish I had thought of that myself.

One of the BIG differences between UZR and PMR for the IF is that UZR considers the speed of the batted ball and PMR does not.  While they will of course even out in the long run, you simply CANNOT compare the two metrics without knowing how hard each BIP was hit to each fielder or at least what the “average speed” of a BIP was for each fielder.  Obviously, there is no comparison between a hard hit ball hit up the middle and a slow or medium hit ball up the middle. 

Honestly, PMR is like BA and UZR is like OBP or SA or OPS.  I don’t see how the two can be compared.  It is apples and oranges.

OK, this is turning out to be a critique of PMR.

As far as pop flies, again, I don’t see how they can be included in an IF’ers overall rating.  One, most pop flies are caught.  Two, virtually all the ones (90 something percent) on the IF are caught; who catches what has nothing to do with any player’s fielding ability.  At the very least, pop flies on the IF need to be eliminated from the data! Even pop flies in the OF, although some skill is involved, are suspect when included in this kind of analysis.  There is too much interaction among teammates.  So again, either these need to be eliminated from the analysis, or they need to be separated from the ground balls.  You never want to combine two sets of data, when one set has much more reliability than the other!, in terms of estimating talent or projecting future performance.  For example, DIPS versus ERA.

Another point brought up by someone above is whether you want to “penalize” one player when a play is not made by him but is made by another player.  The answer is, “Yes and no!”

You cannot penalize a player fully, because (this is the “English” explantion as opposed to the mathemarcial explanation) there is a chance that the player could have made the play.  That is much different than if no one made the play.  If no one made the play, then there is no chance that any fielder could have made the play.  They need to be treated differently!  I don’t think that PMR does so.

Finally, the most interesting thing is the line drives.  What I said about pop files applies to some extent to line drives.  You definitely don’t want to take line drive data too seriously.  Again, you don’t want to combine it with GB data!  All that does is corrupt the integrity of the overall evaluation (combining reliable data with unreliable data).  As with the pop flies, you want to keep them separate!  For example, if an IF’er were -10 runs on GB’s and +10 runs on pop files in the IF, you do NOT want to report that he is an average fielder!  Clearly the -10 runs on GB’s are much more reliable than the +10 on IF pop ups.  If anything, you might say that he is a true -5 on GB’s and a true +1 in IF pop flies for a total estimated true fielding value of -4.  Not zero!

OK, back to what was interesting about line drives.  The height thing.  I never thought about that.  The two things that are always mentioned about the “skill” portion of line drives are reaction time and positioning.  I have always felt that it was not worth introducing a variable that probably has a lot of random variance (line drives caught rate) to try an capture some reaction time and/or positioning skill, so I do NOT inlcude line drives in UZR.  With the height thing, I am re-thinking my position.  If it is true that there is a significant difference in the rates among players as a function of height (maybe there is and maybe there isn’t), then perhaps a “height adjustment” should be used for line drives.  I think I would rather use a fixed height number for line drives than a player’s actual line drives caught, but I’m not sure.  If there is a singificant relationship between height and line drives caught, then using a fixed number (actually a variable one based upon height only, using a regression equation) probably enables you to then combine it with GB numbers.  I don’t think I would be comfortable combining a player’s actual line drive numbers with GB numbers, even with a solid height/line drives caught relationship…

   10. Psychedelic Red Pants Posted: March 10, 2005 at 01:31 AM (#1191332)

Once he starts covering territory, though—and particularly if a dive for the ball is involved—wouldn’t he have a natural disadvantage by having a shorter stride and shorter reach?

A longer stride wouldn’t matter, and could be a disadvantage. Rather, if they both cover ground at the same speed, the guy with the shorter stride is less likely to be forced to field a ball awkwardly in mid-stride. If you’re trying to make the point that taller people are faster, I’m not sure that’s relevant when we’re looking at players on a case-by-case basis. The relevant question would be the speed of the relevant players. I’m significantly taller than Eckstein, but he’d likely win any race shorter than 3-4 miles.

   11. Jason Kendall's #6,530,420,771 fan (AS) Posted: March 10, 2005 at 01:55 AM (#1191370)

Wow, this is incredible. Primey nomination.

It seems to me that the Angels simply don’t have their SS catch pop flies in foul ground. IIRC Los Angeles Stadium of Angel is pretty normal-shaped, so that wouldn’t be the answer, but I can easily see a team telling its 3B (esp if it’s Figgins) to take all of those. If Eck caught some foul balls in the same shape but lower, I’d believe it’s his suckiness, but he’s catching basically none, which makes me think it’s a systemic thing and not an individual thing.

   12. CFiJ Posted: March 10, 2005 at 05:06 AM (#1191512)

Would it make sense that smaller players have a smaller range? Let’s say that Cal Ripken and Eckstein are positioned in the same place, and there’s a hard hit ball to the left or right. Whether or not they get to the ball will depend upon how much ground they can cover in the time it takes the ball to reach the shortstop position.

I’m with Miko and DNR in the “not necessarily” camp.

To use a volleyball example, it seems that the best diggers tend to be on the short side.  They can get lower to the ground easier and faster.  They don’t seem to have a problem covering ground compared to taller players because they are lighter and quicker (to generalize).  Since fielders generally have to field below the waist, these same qualities would seem to be advantageous.  IOW, a taller player may get to the fielding point a hair quicker than a shorter player, but they may lose that advantage because they have to reach further to field the ball, and come back futher to throw it.

   13. Tango Tiger Posted: March 10, 2005 at 08:24 AM (#1191548)

MGL,

Pinto *does* consider batted ball speed.  His “expected” is not based on the number of balls converted at that vector, but it also includes the handedness of the pitcher/batter, the park, the batted ball speed, and maybe a few more.

Someone coming late to his PMR stuff (or for those with bad memories!), David should probably have an FAQ that describes is methodology.

Tom

   14. baudib Posted: March 10, 2005 at 09:52 AM (#1191582)

How many soft popups/Texas leaguers fall in for hits each year compared to the number of groundball or line-drives between the 5.5 and 3.5 hole?

   15. PhillyBooster Posted: March 10, 2005 at 10:33 AM (#1191625)

Finally, the most interesting thing is the line drives. What I said about pop files applies to some extent to line drives. You definitely don’t want to take line drive data too seriously. Again, you don’t want to combine it with GB data!
. . .
I have always felt that it was not worth introducing a variable that probably has a lot of random variance (line drives caught rate) to try an capture some reaction time and/or positioning skill, so I do NOT inlcude line drives in UZR. With the height thing, I am re-thinking my position.

Did you see the Cristian Guzman charts?  It appears that the only thing that makes him above average is his line drive skills.  Irrespective of whether it’s based on height or not, if a player shows that he is above average at line drives, it hardly seems right to discount it just because line drives have more variance.

   16. Tango Tiger Posted: March 10, 2005 at 11:06 AM (#1191680)

No, it is right to discount it, *if* that variance is based mostly on luck.

At the moment, neither MGL nor David have provided the data for us to say either way.

The answer is very simple: figure out the actual observed variance, and then figure out the expected variance if there was no skill involved.  The difference is attributed to non-luck (true talent of fielder, batter, pitcher, park). 

From the fielder’s perspective, the variance of the batter will be close to zero.  You can then account for the bias of the pitcher and park.

What’s left is the fielder’s skill (be it height, range, positioning, or reaction).

   17. Mike Emeigh Posted: March 10, 2005 at 11:08 AM (#1191683)

Is Eckstein’s mediocre handling of linedrives mostly a function of his height? If so, was Ripken’s better-than-expected defense because of he’s 6’4”?

As MGL implies (I think), it’s probably heavily affected by random variation. You are talking about a relatively small number of chances to start with, and in most cases, a line drive that isn’t hit more or less directly at a fielder isn’t going to be caught anyway. Taking those two factors into account, you have something like 200-250 total line drives that might be fieldable by someone, and when you then divide those up by location on the field, you’re reducing the effect even more. You might be talking about no more than 5 line drives a season as the difference between a good line-drive man and a poor one, and that real difference is going to be devilishly difficult to tease out of the random variation.

-- MWE

   18. Dr Love Posted: March 10, 2005 at 11:18 AM (#1191703)

Maybe this was discussed and I’m not picking it up, but I’d like to see the charts for 3Bmen when Eckstein played to get a better understanding of his range.  If Figgins/Glaus/et al were getting to more balls to their left than usual, then that would impact Eckstein’s numbers.  Or perhaps I’m a fool and that shouldn’t be taken into consideration for some valid reason.  Yeah, it’s probably that.

   19. Aspiring One-Armed Economist (6 - 4 - 3) Posted: March 10, 2005 at 11:25 AM (#1191716)

Would it make sense that smaller players have a smaller range? Let’s say that Cal Ripken and Eckstein are positioned in the same place, and there’s a hard hit ball to the left or right. Whether or not they get to the ball will depend upon how much ground they can cover in the time it takes the ball to reach the shortstop position.

Height isn’t reflected just in longer legs, but is also closely correlates with increased wingspan.  Most people’s wingspans are approximately their height.

So to stick with the Eckstein/Ripken comparison, the 5’6” Eckstein (66") not only has shorter legs than the 6’4” (78") Ripken, but also a full foot shorter in wingspan as well.  It seems to me that would give Ripken a decisive advantage over Eckstein on plays requiring extension of the body to reach a ball.  Eckstein may be able to compensate for his inferior height more easily on grounders, but line drives rely more heavily on one’s able to extend out from one’s initial fielding spot with minimal time to reposition.

   20. Aspiring One-Armed Economist (6 - 4 - 3) Posted: March 10, 2005 at 11:29 AM (#1191720)

And BTW, I realize that Eckstein/Ripken is about as extreme a difference as it gets.  Most shortstops seem to be somewhere around 5’10"-6’2", and obviously a 4” difference isn’t as big a deal as a 12”.

   21. Mister High Standards Posted: March 10, 2005 at 11:32 AM (#1191725)

I’d like to see another year or two of Eck’s chart.  While I doubt that the Angels told figgins to take charge of the infield on pop ups I do think it’s possiable that the Angels told Eck or Eck on his own chose to defer to a player who wasn’t familar with the position and the roles of 3b and SS when handling pop ups.

   22. Tango Tiger Posted: March 10, 2005 at 11:51 AM (#1191752)

Pinto said:

Some commentors have pointed out that Eckstein is short for a ball player. I see him listed as 68 inches. Guzman is listed at 72 inches, so that may make a difference. However, I took at look at the relationship of height to PMR on line drives, and didn’t find anything indicating that it really mattered.

Again, based purely on anectodal observation, I would say that while:
(a) players do have a skill with line drives
(b) the data captured is such that we won’t be able to find it

When you look at Pinto’s results, you might say:
“ok, GB account for 70% of all plays, so I’ll weight it at 70%, and LD account for 15% of all plays, so I’ll weight it at 15%”.  But, that’s not the right way to do it.

If it’s determined that the LD numbers are 100% luck, then you would not weight the LD at 15%, but at zero %.  The weights are based on how much “truth” is in there.

   23. Chris Dial Posted: March 10, 2005 at 12:00 PM (#1191763)

Um, what Mike said about line drives.

They are probably too randomly distributed.  MGL may eventually be able to demonstrate as much.

And Athletic supporter has what I read the case to be - teh Angels simply prefer to have Garrett Anderson (or Figgins or whomever) make the play. 

What is Eckstein’s “speed”?  What are his SBs and 3Bs?  I know SBs don’t mean speed, but they are a reasonable proxy.

If Eckstein’s speed is at least “average”, then I would doubt the reason he has almost no popups behind third is because he can’t get there, but rather he isn’t a “take charge” guy.

There is a defensive leader on the field - Andruw Jones will depress Raffy Furcal’s popup numbers, because he’s a take charge guy.

I strenuously agree that defensive ratings *must* be separated for infielders.  Popups on the infield are complete noise and do more harm than good to a rating.

I think LDs probably shouldn’t be there.  this work was probably covered by DA/DR 15 years ago, and I adsorbed it somewhere (as I did with DPs initiated).

Also, Pinto is taking the zones and renaming them.  On a STATS diamond, the zones are:
3b line: C and -4
D and -3
E and -2 (these three zones are the 3Bs “area” in ZR and I believe represent one zone, “5”., in UZR.)
F and -1
G and 0
H and +1
I and 2
J and 3
K and 4
L and 5
M and 6

That doesn’t make sense.  Maybe Pinto has no zone “0”.  M should be zone 7 (for CF to be zone 8).  “Straight-away ss” is around the J/K line.

I’ll have to get my zone article written.

Had we but world enough and time…

   24. Aspiring One-Armed Economist (6 - 4 - 3) Posted: March 10, 2005 at 12:00 PM (#1191765)

Two season’s worth of data really isn’t going to tell us much of anything given how infrequent LD palys are.  If we had data on entire careers of players over the past five decades (using z-score to league average SS to account for the increase in size of middle infielders), then a hypothesis test showing no statistical relationship might be meaningful.  But at this point I don’t think we have nearly enough PRM data to start making even preliminary conclusions.

   25. Aspiring One-Armed Economist (6 - 4 - 3) Posted: March 10, 2005 at 12:02 PM (#1191771)

BTW, does anyone know what STATS charges for this data?

   26. Mike Emeigh Posted: March 10, 2005 at 12:11 PM (#1191788)

Maybe this was discussed and I’m not picking it up, but I’d like to see the charts for 3Bmen when Eckstein played to get a better understanding of his range. If Figgins/Glaus/et al were getting to more balls to their left than usual, then that would impact Eckstein’s numbers.

Probably not all that much, if Eckstein is playing a more or less normal SS position and not shading up the middle; under those circumstances when the 3B doesn’t make the play, the ball is unlikely to be an out anyway, and would probably not penalize Eckstein much at all. I think it’s far more likely that Eckstein’s relatively weak arm (something that I’ve seen mentioned a lot) makes it more difficult for him to make throws when he is going to his right, and he’s not making plays under those circumstances that other SS with stronger arms make more often.

Infielders don’t steal a lot of ground balls from each other; it is a very rare circumstance when more than one infielder can make a play on a ground ball. A fielder with good range may affect the “positioning” of nearby fielders, allowing them to cover a different area of the field than they would typically do, and their zone-based numbers would probably be distributed differently. Dial and I disagree on whether or not this actually happens - he says no, I say yes - and even if it does happen I’m not sure whether the impact would be to the benefit or detrement of the player’s ZR/UZR ratings. Assuming average range, the player should, under those circumstances, be making fewer plays than his counterparts in the “expected” zones, and more plays than his counterparts in “unexpected” zones, and it’s not clear whether penalties for failure to make “expected” plays would balance out gains from making “unexpected” plays.

-- MWE

   27. Mike Emeigh Posted: March 10, 2005 at 12:12 PM (#1191793)

BTW, does anyone know what STATS charges for this data?

Thousands of dollars.

-- MWE

   28. AROM wants you off his lawn Posted: March 10, 2005 at 12:26 PM (#1191816)

What is Eckstein’s “speed”? What are his SBs and 3Bs? I know SBs don’t mean speed, but they are a reasonable proxy.

I’ve timed Eck around 4.2 to 4.3 from home to first. 

(4.2 on infield grounders, 4.3 on walks :-)

Isn’t that about average speed for a MI?

I have no doubt he was a valuable player for the Angels the last 4 years.  He’s 30 now, I don’t think he’s a good bet to be worth what the Cards will pay him for the next 3 years.  He’s got just barely enough ability to get the job done, but any loss in reaction, quickness, etc. could be devastating for him, more than for most players I think.

Oh, he’ll come closer to justifying his 9-10 million than Cabrera will to his 32 million.  I still wish the Angels had kept him for one more year, they didn’t have to commit beyond one year.  I think the Cardinals would have been better off trading a B-/C prospect for him before he was released, and offering arbitration.

   29. Tango Tiger Posted: March 10, 2005 at 01:00 PM (#1191870)

The Fans’ Scouting Report has Eckstein with a speed of 55, meaning above average for a player, but slightly below average for a SS.  Overall, the fans see him as pretty much an average SS.

***

Chris, David is not “relabeling”.  David’s data source is BIS, and their data uses x,y plotting, and not “zones”.  You could of course convert the x,y plot into zones.

   30. Chris Dial Posted: March 10, 2005 at 02:39 PM (#1192030)

Thanks, Tango.

   31. mgl Posted: March 10, 2005 at 05:19 PM (#1192369)

Infielders don’t steal a lot of ground balls from each other; it is a very rare circumstance when more than one infielder can make a play on a ground ball. A fielder with good range may affect the “positioning” of nearby fielders, allowing them to cover a different area of the field than they would typically do, and their zone-based numbers would probably be distributed differently....and even if it does happen I’m not sure whether the impact would be to the benefit or detrement of the player’s ZR/UZR ratings. Assuming average range, the player should, under those circumstances, be making fewer plays than his counterparts in the “expected” zones, and more plays than his counterparts in “unexpected” zones, and it’s not clear whether penalties for failure to make “expected” plays would balance out gains from making “unexpected” plays...

I couldn’t agree more.  As Tango says, it is easy enough to figure out the “luck factor” in LD’s, IF popups, OF popups, etc. and then figure out how much to “weight” each set of data if you want to combine them.  Of course, the better way is to provide the raw numbers for each set of data separately and then the user can regress each one individually and then combine in order to get a “true” comprehensive number.  I suppose you can weight and combine and then regress that one final number to get a “true” value, but if you “weight” (which is essentially regressing) and then combine, it is not clear how much to regress (again) that one number.  That is why, BTW, a DIPS ERA is more predictive of future ERA then regular ERA.  In DIPS ERA, the pitchers components are weighted and then “combined.” The weight given to BABIP is zero. Techincally, HR’s should have a smaller weight than BB and K’s and singles, doubles, and triples (or just $H) shoudl not have a weight of zero, but I don’t think DIPS or FIP gets that technical.

Later on today, I will publish some overall numbers for LD’s, pop-ups (IF and OF), and GB’s, so that we can see how much relative skill and luck there is!  Stay tuned to this thread.

Oh, and my aplogies to David for forgetting that he does use speed of batted ball, batter L/R, park, etc.  My criticisms still stand though about using LD’s and pop-ups, especially pop-ups.  In fact, because there is such an arbitrary assignment of pop-up catching duty, including pop-up data in the overall numbers may render those overall numbers almost useless.  At least with line drives, there are not that many (as MWE said, most of them not right at a fielder have a zero “expected caught").  The other criticism, and perhaps David has cleaned this up, is that his expected values, IIRC, are based on one year average values.  Once you break things up into such small pieces (batted ball speed, zone, L/R, park, etc.) you simply CANNOT use one year averages for the baselines or expected values!  Finally, as I said in my last post, you should not give the same penalty to a player when he does not make a play but another fielder does as when another fielder doesn’t, for the reasons I explained in that post.  I forgot how UZR handles that, but there is a mathematical way to do so.  For example, as MWE said, and I agree, since there are few areas of the field and few batted ground balls that more than one player can make a play on, if one player does not make a play on a ball that the average player at that position fields, say 20% of the time, but another player does make a play on it, you can assume that it was hit at the farthest edge of that particular “zone” such that an average player only makes the play 5 or 10% of the time (or even zero % for some zones and some positions).  Look at it this way and picture it in your mind.  If a fielder does not make a play and no one does, it is likely close to him but just out of his reach.  If a fielder does not make a play (in the same zone) but anothe fielder does, it is likely further from him.  Therefore, the penalty should be different…

   32. Tango Tiger Posted: March 10, 2005 at 05:49 PM (#1192440)

MGL, not to correct you again, but Pinto uses 3 years of data to establish his baselines.

***

As well, the BIS data is plotted on an x,y plane, where each point is a bit under 4x4 square feet.  So, it’s not really a good idea to think about it as “farthest edge of the zone”, though technically you could say that.

***

Oh, and line drive data should essentially be “0-20%” or “80-100%” expected, or close to it.  You should not have an expected chance to catch a line drive at 50%.

Look at Eckstein’s line drives, as it shows at balls hit pretty much at the SS zone that the LD is caught 30% of the time, predicted.

What this tells me is that you’ve got 25% of the balls at the 80% (some fat a$$) to 100% (Ozzie Smith) chance of being caught, and you’ve got 75% of balls at the 0% (same fat a$$) to 20% (Ozzie Smith) of getting caught. 

(Math interlude, assuming the averages above are 90% and 10%, that gives you .25*.90 + .75*.10 = .30.)

However, the data source gives you no indication at the quality of the line drive being caught.  So, the “noise” in the line drive data is that you can’t distinguish from the 0-20% balls from the 80-100% ball, and instead, basically, lumps them all in at 30% balls.  If a guy is lucky enough to have alot of those 80-100% balls hit right at him, well congratulations, you get a huge bonus, since the system will basically say that you should have had only a 30% chance of getting to that ball.

The same criticism I’ve also leveled at UZR (and PMR) with “the routine play”.  They can’t qualify the routine play where the out is made 99% of the time.  Probably the best they can get to is 95%.  So, you get alot of routine plays, you get a +.04 outs per play bonus that you don’t deserve.  And guess what, the difference between an average and great fielder is right around .04 outs per play.

Now, given ALOT of data, these routine plays will cancel out, eventually.

But, with line drive data, you don’t get that many plays to begin with, and the “qualification” of the type of line drive isn’t even good to begin with.  For these reasons, I expected the year-to-year correlation for LD “actual minus expected” rates to be close to zero.

   33. Tango Tiger Posted: March 10, 2005 at 06:02 PM (#1192465)

Oh, as for MGL’s regression thing, yes, you do have to be careful.  If you regress the LD, pops, and grounders individually, you are doing so without information from the other components.  That is, you are regressing each to the population mean of that component.

Say that we’ve determined that good groundball fielders are also good line drive catchers.  If you decide to regress groundball and line drive separately, you are ignoring this relationship.  The groundball out rate should be regressed somewhat towards the line drive rate of the player in question (and vice-versa).

On the flip-side, if you add everything together after doing the “new” weighting as I described, and then regressing, you probably get closer to how you should do it.

At this point, a regression primer by a regression expert would be appropriate.

   34. Tango Tiger Posted: March 10, 2005 at 06:15 PM (#1192494)

Just thought of a better example for my LD Ozzie thing.

Thing of HS ball, where for every MLB prospect a pitcher faces, he’ll face 5 guys who will never play beyond age 20.  The quality of competition is spread very wide.  If you don’t control for the quality of competition, you will have to assume that you faced an “average” HS player in your division.  Maybe you know that a particular HS always has some good prospects, so you bump up the quality of competition based on that.  (Maybe hard-hit line drives are caught at a different rate than medium-hit line drives.)

This is the problem with this kind of data: if the true spread in quality of player against or ball hit is very wide, then you better have enough parameters so that you can better qualify that player or ball.  If not, you lump this wide spread together, and you’ll get some crazy sample results.

If some kid faced Gooden 25 times in HS, and you don’t control for that, that kid’s stats will look horrible.

   35. mgl Posted: March 10, 2005 at 08:29 PM (#1192758)

Tango, the problem with weighting (regressing) the components separately and then lumping them together is this:

The weighting is not linear!  It is a function of sample size (I don’t know what the curve looks like, but I know it is a curve).  So let’s say that for one year, you have +10 runs for a player’s line drives and -10 for his GB’s.  If you decide to weight the LD’s by .5, and then add them up, you get a total of -5 (rather than zero with no weighting).  That’s fine for the one year.  But then let’s say that you have severa years of this “combined” data.  You can’t add them all up!  If you do, the weightings are not correct!  Not only that, but the weightings are different depending upon how many BIP’s, or innings, or whatever, for that one year.  A .5 rating for LD’s for a full-time player may be a .25 weighting for a part-time player (as I said, the weighting is a function of the sample size).  So you don’t want to do any weighting or regressing until you are “done” with the sample.  For example, if you had an infinite number of BIP’s for a player, or close to it, then assuming there were some skill in catching line drives, as long their were no systematic bias (like their might be with pop files - certain fielders may be “in charge"), there would be no need to regress or weight the line drive data; you could simply combine the data (line drives and GB’s) with impunity…

   36. Tango Tiger Posted: March 10, 2005 at 08:56 PM (#1192798)

I would not add them up as you are saying.  I would first add up the component multi-year data, and then regress them separately.

   37. Tango Tiger Posted: March 10, 2005 at 08:58 PM (#1192802)

And the regression factor is of course dependent on sample size.

   38. Kelly Posted: March 10, 2005 at 09:32 PM (#1192888)

Regarding whether or not players reposition themselves depending on the quality of their neighboring defender:  A graphical approach to presenting the data is ideal for this, because it will show a shift in the distribution of plays made as a function of zone compared the the average player.

Assuming that the SS positioning does depend on the quality of the third baseman, the interesting question is could the repositioning of the SS towards 2nd base lead to an increase in the total number of plays made by SS+3B, but a decrease in the fraction made by the SS compared to the performance he would have next to an average 3B.

If you assumed that SS on average did a good job of positioning themselves to maximize the numbers of balls in play they could catch, any repositioning due to the performance of the 3B should reduce the number of plays the SS would make.

   39. mgl Posted: March 10, 2005 at 10:21 PM (#1193035)

If, for example, a 3B has exceptional range, and that enables the SS to move towards the middle, then if you don’t penalize the SS for plays not made by him but made by the 3B, then the SS will catch more balls (to his left) it will look like he is better than he is.

If you do penalize the SS for balls he does not make to his right, but the 3B makes, then him being “out of position” will make it look like he is worse than he is (assuming that the normal SS position is optimal).  There has to be a compromise…

   40. Mike Emeigh Posted: March 10, 2005 at 11:04 PM (#1193175)

The problem is the the normal SS position may NOT be optimal for a particular team, based on the BIP distribution against them and the range of the adjacent fielders, and we have no objective way of determining which plays would typically be made by which fielders unless we start with an assumption that the optimal position is the normal position.

The real penalty is probably not when we penalize the SS for plays not made to his right that the 3B makes, but when we penalize him for not making plays that most other SS make because they’re positioned further toward the hole - but that he does not because he’s playing toward the middle. Looking at replays on MLB.com, I’ve seen a handful of doubles when the SS is shifted one way or the other and the batter hits a ground ball right through the normal SS position. They would have been routine GB outs if the SS had been positioned normally.

-- MWE

   41. mgl Posted: March 10, 2005 at 11:23 PM (#1193221)

Here is a little work on LD’s to the SS by height:

I computed LD UZR’s for all SS’s and then compiled the data by the height of each SS.  Right off the bat, looking at the year by year data (before I add it all up), there looks like there is enormous random fluctuation in the UZR’s.  Some years the taller ones are +6 runs per 500 chances; some years they are -8, etc.  Anyway, if we add everything up for 2001-2004 (4 years, and break players (SS) down into “less than 5’11”, from 5’11” to 6’, and more than 6’, for a total of 3 groups, we get:

group I (less than 5’11")
117 player seasons
2817 chances
-.86 runs per 500

group II (more 5’10” and less than 6’1")
196 player seasons
4092 chances
+.47 runs per 500

group III (more than 6’)
121 player seasons
2590 chances
+3.06 runs per 500

Hmmmm...Very nice…

Let’s try 2B:

group I (less than 5’11")
117 player seasons
2876 chances
+1.51 runs per 500

group II (more 5’10” and less than 6’1")
171 player seasons
4377 chances
-.91 runs per 500

group III (more than 6’)
72 player seasons
1929 chances
+1.64 runs per 500

Hmmm...Not so nice…

IF we combine…

Group I
.34 runs

Group II
-.24

Group III
+2.45

So maybe the tall IF’ers are better at catching LD’s.  I’m not sure why they don’t add up (even when weighted by chances) to zero.

Now, even at +2.45 per 500 chances for the tall ones, consider that a 2B or SS only gets 75 or so chances per season, so that is only an extra .27 runs per season for the tall ones.

OK, how about all IF positions combined for the 4 years?

group I (less than 5’11")
1428 player seasons
44693 chances
+2.37 runs per 500

group II (more 5’10” and less than 6’1")
3466 player seasons
107714 chances
+2.83 runs per 500

group III (more than 6’)
8742 player seasons
266592 chances
-2.13 runs per 500

That ain’t too good!  I give up.

BTW, in the last 4 years, Eck was 0, 0, +1, and +5 in UZR line drives!  Jeter was 0, -4, 0, and 0.  A-Rod was 0,0 and 0 at SS.  Looking at the individual players, there is not a whole lot of variance among players.  In 2003, the “best” SS are +4 and the worst are -2.  So I guess even you include LD’s it is probably no big deal, even if they are mostly luck.  The reason is that there are relatively few chances and either a player catches them hit right at them or they don’t when they are not hot right at them.  The ones that are just inside or outside an IF’ers reach are few and far between.

I’ll do a year to year “r” for LD’s and popflies in a while…

   42. mgl Posted: March 10, 2005 at 11:29 PM (#1193226)

Looking at replays on MLB.com, I’ve seen a handful of doubles when the SS is shifted one way or the other and the batter hits a ground ball right through the normal SS position. They would

So what?  One, that is why I have adjustments for the handedness of the batters.  Two, part of a fielder’s UZR is where he positions himself or where his coach positions him.  If he is not playing optimally for his pitchers or in general, too bad.  I’m sorry, I don’t get your point.

If a SS is “out of position” because of another player, then that is a different story.  Then we don’t want to penalize (or give him more credit than he deserves) him.  That’s what I was talking about. 

That’s why when people start criticizing a PBP defensive metric because because we don’t know in the data where the infielder was positioned, I say, “We don’t care!  It generally does not matter!  Where he positons himself is the same as his range!  That is his responsibility and his problem”

   43. Los Angeles Waterloo of Black Hawk Posted: March 11, 2005 at 12:28 AM (#1193314)

Though I agree that positioning is part of range, that also forces us to acknowledge the fact that a team’s scouting, coaching, and philosophy is a part of his rating, and therefore that a switch in teams creates a great deal of noise in any particular player’s defensive record.

I suspect strongly that the Angel middle infielders in 2004 suffer in PMR because of the the corner infielders.  The Angels play a CF at 1B, and they have an organizational philosophy of playing the 1B off the line even when a runner is on first.  I would be interested to see, in Pinto’s data, if Erstad’s positioning affects Adam Kennedy’s chances. 

What’s more, I would suspect strongly that Erstad’s ability to hunt down popups has a negative effect on Kennedy’s PMR; Kennedy was +12 runs in 2004 by UZR, but was -2 in PMR!  That’s a difference of about 19 plays; I wonder if PMR is debiting Kennedy for Erstad making more plays, especially popups, which UZR dismisses (and, I believe, rightly so).

And I’m damn near certain that happened to Eckstein.  PMR has him at -19 runs in 2004, but UZR had him around average.  These charts, as others have noted, show that he caught basically no popups to his right.  Combine this with the shocking fact that Legs Figgins was +13 runs at 3B by PMR, fourth in the majors, and I have no doubt that Figgins—also a CF—catching popups knocks down Eck’s PMR.

I suspect that UZR does a better job of removing these team biases from a player’s record than PMR does.

   44. Vaux, A.B.D. Posted: March 11, 2005 at 12:54 AM (#1193325)

FWIW, I always play the 1b off first in computer games, with good effect.

LAWLOBH, that’s a fantastic point, and I’ll bet it accounts for a lot of the wierd 1 year variations in UZR.  I wonder if someone can find out how many of them correspond to changes in manager or base coaches.

   45. Tango Tiger Posted: March 11, 2005 at 01:04 AM (#1193335)

Why don’t we talk about the scouting and philosophies of hitting approach of teams on their batters and pitchers?  We give the entire credit to the pitchers and batters, and we should do the same for fielders.

***

You also shouldn’t care if A-Rod, Rolen, Beltre, and Chavez force their SS to play more towards 2B, thereby putting them in a suboptimal position for SS chances, but optimal position for SS/3B chances.  You are measuring *value* and not skill. 

If you do care about skill, then you have no choice but to start splitting up the fielder’s contributions based on his positioning, catching, and throwing.

If a hitter is adversely affected by always have to take a pitch because some fast brainless runner doesn’t know how to steal, do we adjust his performance?  Well, you could.

If a fielder is adversely affected by having a 3B play way off the line, do we adjust his performance?  Well, you could.

But, you have to decide what it is you are trying to measure, value (backwards-looking) or ability (Forward-looking).  And if you are measuring ability, then you have to measure it component-by-component, so you know how much to regress each component.  Positioning is a component.

   46. mgl Posted: March 11, 2005 at 01:06 AM (#1193340)

I think that you guys are overrating the effect of managers and coaches and switching teams.  As is usually the case, that can easily be checked, but I don’t think it’s worth the effort.  As I said, a player’s positioning, whether it be by choice or by edict, is included in his rating.  Again, I don’t think that one, positioning varies a lot among teams, and two, it matters that much anyway.  I have a lot of experience with UZR, and trust me, it takes a lot to change the ratings significantly…

   47. mgl Posted: March 11, 2005 at 01:16 AM (#1193357)

Come on Tango, we both know we are trying to measure value AND skill, at least I am.  The value is a sample of skill with some other things ("noise" if you will, at least with respect to the skill) thrown in for good measure.  The real issue is how much do these other things matter.  There is nothing we can do with the “random noise” or luck (other than regress appropriately) in order to convert value to skill.  There are things we can do with some of the other biases.  The question is does it really matter and is it worth the effort?  I am suggesting here that it is not worth the effort to try and tease out the effect of manager/coaching positioning (as opposed to a player’s own positioning which is part of his skill set) and it is hardly worth the effort of teasing out the influence of other players on an adjacent player’s positioning (although the latter is probably more doable).  As well, we can easily be resigned to the fact that when we “convert” value to skill, that we are including the influence of a team’s coaches and managers. Heck, a great example of that is Mazzone and the Braves pitchers.  When we estimate a pitcher’s skill on the Braves, we are apparently significantly including the skill of Mazzone (to the tune of around .5 runs in ERA, which is huge), according to the research.  If I am going to project a pitcher’s performance going from the Braves to another team (or vice versa), I most certainly want to try and include the “Mazzone effect.” If I am projecting Exhstein’s defensive performance (skill) going from the Angels to the Cards, I am not sure I care that much about the coaching/positioning, as I don’t think it is going to amount to a hill of beans (as long as you don’t include pop ups in the defensive metric of course - that really could screw things up when a player changes teams).

   48. Los Angeles Waterloo of Black Hawk Posted: March 11, 2005 at 01:18 AM (#1193362)

As I said, I believe that PMR is more susceptible than UZR to these factors, in large part because of it inclusion of elective plays (infield popups) and plays that appear to have as much to do with luck as skill (line drives to infielders).

As always, the distinction between value and skill is important.  Also, I think that though there are teammate/organization effects on pitchers and hitters, those effects are most likely less significant than those relating to defense, i.e. positioning, an organizational philosophy on where a 1B should play relative to the line, who should take charge on IF popups, whether the CF is a ball-hog, etc.  These are the soundwaves of the noise that make raw defensive numbers more volatile than their offensive counterparts.  I also think PMR appears to have a great deal of this noise in it.

   49. Los Angeles Waterloo of Black Hawk Posted: March 11, 2005 at 01:20 AM (#1193366)

Is the Mazzone .50 ERA effect documented, or is that part of MGL’s proprietary research?  Or do I misunderstand and that was hypothetical?

   50. mgl Posted: March 11, 2005 at 01:44 AM (#1193416)

OK, I did a correlation (linear regression) of line drive UZR from one year to another for 2B and SS for players who played at least 70 games in each of two adjacent seasons from 2001 to 2004 (average N games = 128).

For second base, we have 51 player pairs for the regression and correlation. 

For interval averages, here is what we have:

Interval, N players, first year UZR (ind. variable), 2nd year UZR (depend. variable)

First year less than -2 per 150, 14, -4.29, -1.00
from -2 to 0, 7, -1.57, -1.86
from 0 to +2, 10, .50, 1.20
greater than +2, 20, 3.65 .40

From this it looks like a mild relationship with a large regression coefficient.  In fact, that’s what we get.  The “r” is .206.

Frankly, I am surprised it is so high.  Compare that to GB’s, where the “r” is around .500, again, for players with at least 70 games per season.

I don’t have the conmfidence intervals handy, but I am going to conclude that there does appear in fact to be some significant skill in catching LD’s as measured by UZR, unless there is some systematc bias that I am missing.

For SS, the results are similar:

The interval averages are as folows:

First year less than -2 per 150, 19, -4.37, -1.89
from -2 to 0, 7, -1.57, -.86
from 0 to +2, 14, .29, -.07
greater than +2, 20, 3.55 1.85

Again, looks like a mild correaltion, and again, the “r” is .311 for 60 players with an average number of games of 132.

Now, given that there are only about .5 LD chances per game at SS and 2B, these are quite high correlations.  In fact, I am shocked.

Next up, pop flies....

   51. mgl Posted: March 11, 2005 at 01:47 AM (#1193429)

No, no the .5 runs is documented!  There was a great article and thread on that by a good researcher, who used regression analysis, IIRC. My research using empirical data with no regression analyses backed up his results almost ### for tat.  Someone can probably link you to the article/thread.  It may have been on the business of baseball web site. I’m sure you can google it…

   52. mgl Posted: March 11, 2005 at 02:06 AM (#1193477)

Wow, in doing the pop up analysis, I stumbled on something interesting.  There is an enormous relationship in the height of a player and his pop up UZR!  Basically the taller IF’ers take all the popups.  That is going to suggest while there may be a strong y-t-y correlation in IF popup UZR, it may have nothing to do with skill and everything to do with who takes charge on an IF popup, which then again, may have everything to do with the height of the IF’er.  I find that fascinating.  Has anyone noticed that the taller player takes most of the popups?  I suppose that makes some sense.

In any case, I just realized that on discretionary plays, correlations will tell us nothing.  If a player is a take charge (or designated pop up catcher) player one year, he will tend to be on the next year and vice versa.

Basically, for every year I run the data, the short fielder are like -10 to -15 UZR runs per 500 chances in IF popups.  They almost never take pop ups which is probably why Eckstein does not either.

I just doin’t see any way of deternmining whether there is any skill in popups unless you do a complex analysis like looking at areas on the field where pop ups are not caught very often or something like that.

For now, I would say that you can safely include line drives in UZR or PMR (although they won’t change the overall numbers very much at all) but that you should very definitely NOT include pop ups, at least on the IF.  The tweeners, I don’t know about, for the IF’ers that is.  I think you still have take charge and discretionary problems there, but I’ll try and do some analysis on that if I have time…

   53. Los Angeles Waterloo of Black Hawk Posted: March 11, 2005 at 02:09 AM (#1193480)

Oh, I think it may have been the sabernomics site, that kind of rings a bell.  Thanks for the tip ...

   54. mgl Posted: March 11, 2005 at 02:18 AM (#1193491)

BTW, the height thing is even more marked for SS’s, although for some reason in the NL in 2004, it is reversed (the shorter SS’s take most of the IF popups).

Oh, and there is lots of fluctustion in IF pop up runs, unlike with the LD runs, where I said it is like -2 to +2 for almost all players, with a few +4’s.

For popups, in 2004, you have Eck with -4 runs, Tejada with -5, C. Guillen with +11, Valentine with -5, Jeter with +11, Aurilia -5, and Lugo +8.

In 03, you have D. Cruz with -6 (these are all actual runs, not per 150 games or 500 chances), Nomar with -4, Valentin with +6, Vizquel with +4, Infante +4, Berroa +5, Jeter +5, Lugo +6, and A-Rod -9. 

Remember these are all IF pop ups only, although they do include the “deep IF Retrosheet grids.” However, also keep in mind that the overall catch rate is .948, so we are basically talking about mostly discretionary pop flies.

Looking at the magnitude of these numbers and seeing the height factor (which tells you that these catches are mostly discretionary), I definiteley do not want to see them included in an IF’ers rating!

For all the players mentioned above, for example, their overall rating is not going to mean much, when you have numbers anywhere from -4 and -5 ro +5 and even +10 added in from discretionary popups.

That’s my story and I’m stickin’ to it…

   55. FJ Posted: March 11, 2005 at 03:59 AM (#1193595)

Is the Mazzone .50 ERA effect documented, or is that part of MGL’s proprietary research? Or do I misunderstand and that was hypothetical?

BTF thread link

F

   56. FJ Posted: March 11, 2005 at 04:42 AM (#1193614)

For popups, in 2004, you have Eck with -4 runs, Tejada with -5, C. Guillen with +11, Valentine with -5, Jeter with +11, Aurilia -5, and Lugo +8.

In 03, you have D. Cruz with -6 (these are all actual runs, not per 150 games or 500 chances), Nomar with -4, Valentin with +6, Vizquel with +4, Infante +4, Berroa +5, Jeter +5, Lugo +6, and A-Rod -9.

To prettify it up.

PUR = Pop Up Runs

*Lugo is listed as 6’1” @ ESPN and 5’10” @ bb-ref.... Infante is listed as 6’0” @ ESPN and 5’9” @ bb-ref. There were some other slight discrepancies in height between ESPN and bb-ref (but nothing else over an inch) for SS. All ht. values given are bb-ref.

If the vast majority of games weren’t played by the same player, I put the # of games in parentheses.

[PRE]
Player Height PUR 3B Height
Jeter 6’3” +11 6’3”
Guillen 6’1” +11 6’3” (94), 5’11” (73)
Aurilia 6’0” -5 6’2”
Lugo ‘04* 5’10” +8 6’4” (87), 6’3” (59)
Valentin ‘04 5’10” -5 6’2”
Tejada 5’10” -5 5’11”
Eckstein 5’6” -4 5’9” (92), 6’1” (32), 5’10” (26)

Jeter ‘03 6’3” +5 6’1” (80), 6’2” (54)
A-Rod 6’3” -9 6’1”
D. Cruz 6’0” -6 6’0”
Garciaparra 6’0” -4 5’11”
Berroa 5’11” +5 5’11”
Lugo ‘03 5’10” +6 6’2” (73), 6’3” (50)
Valentin ‘03 5’10” +6 6’2”
Infante (69) 5’9” +4 6’3” (91), 5’10” (50)
Vizquel (59) 5’9” +4 6’2”
[/PRE]

Some comments:

1) Is there a reason you chose Infante in 2003? He only played in 69 games. He was a 2B in 2004. Same question for Vizquel (he played ~150 games in 2004) so that COULD be a mix up.

2) I’m not really seeing the height correlation, but maybe that’s just me.

3) From the data, I’d be more inclined to say that each team has it’s system and works it a certain way....

F

   57. Los Angeles Waterloo of Black Hawk Posted: March 11, 2005 at 05:14 AM (#1193626)

Wow, I just read that Mazzone thread.  Looking at the dates, I see that it was a time that I was slammed at work, which explains why I missed it all while it was happening.

That thread is marvelous, and easily the best thread of 2004.  If anyone, like me, missed it the first time around, they would be well-served to read it now.

   58. tomm Posted: March 11, 2005 at 05:16 AM (#1193627)

I’m glad MGL mentioned the Mazzone thread, because to me the work on that thread seems like a first step in showing the impact of coaching on fielding.

I’m not the one to do it, but a regression analysis on team UZR’s to show the impact of the manager could *maybe* go somewhere. I’d certainly assume the null hypothesis on this one, but maybe it’s worth doing.

   59. mgl Posted: March 11, 2005 at 05:48 AM (#1193629)

I just randomly selected a few players that appeared to have substsntial plus or minus PUR’s.  That’s all. 

If I track PUR’s by height for all 2B’s and SS’s, using the height info from my rosters, here is what I get for 2001-2004:

2B NL

Group I (5’10” or less)

PUR= -7.26 per 500 chances N=113 chances=1915

Group II (5’11” to 6’)

PUR=.34 N=206 chances=3704

Group III (6’ 1” to 6’3")

PUR=3.95 N=156 chances=3234

Group IV (more than 6’3")

PUR= -1.28 N=131 chances=2490

2B AL

Group I (5’10” or less)

PUR= -2.81 per 500 chances N=94 chances=2222

Group II (5’11” to 6’)

PUR=2.83 N=158 chances=3097

Group III (6’ 1” to 6’3")

PUR= -1.57 N=167 chances=3547

Group IV (more than 6’3")

PUR= 1.59 N=104 chances=1905

So for 2B, being short seems to imply not catching your fair share of pop-ups.  After that, there does not seem to be much of a pattern.

If we combine the NL and AL for 2B and shorten the categories, so that Group I is not below 5’10”, Group II is 5’10” to 5’11”, etc., we get:

2B NL and AL

Group I (5’9” or less)

PUR= 4.45 per 500 chances N=112 chances=2819

Group II (5’10” to 5’10")

PUR= -4.12 N=357 chances=7849

Group III (6’ to 6’2")

PUR=.14 N=571 chances=12483

Group IV (more than 6’2")

PUR= .02 N=776 chances=15524

Oh, well, I give up.

When I do a y-t-y correlation, I get a nice “r” of .390 for SS and .372 for 2B for 65 and 52 players, respectively, with an average # games of 126 and 128, repsectively.

As I said, this DOES NOT tell us that there is skill in catching popups.  All it tells us is that there is some predictablility from one year to antother in terms of pop up UZR runs (PUR), which could very well (likely) be due to the same players taking charge from one year to another…

   60. AROM wants you off his lawn Posted: March 11, 2005 at 10:42 AM (#1193753)

Has anyone noticed that the taller player takes most of the popups?

Yes.  When 6’5” Troy Glaus caught popups near Eckstein, I believe Rex Hudler’s term was “Eckstein got out-rebounded”

Looks like most of our popup data just tells us what player takes the catch.  All is good as long as someone catches the ball.  If a noticeable amount of popups fall for hits, then I’d wonder if a player is hurting the team there.

   61. Tango Tiger Posted: March 11, 2005 at 11:11 AM (#1193790)

MGL, you said:

Interval, N players, first year UZR (ind. variable), 2nd year UZR (depend. variable)

First year less than -2 per 150, 14, -4.29, -1.00
from -2 to 0, 7, -1.57, -1.86
from 0 to +2, 10, .50, 1.20
greater than +2, 20, 3.65 .40

From this it looks like a mild relationship with a large regression coefficient. In fact, that’s what we get. The “r” is .206.

Frankly, I am surprised it is so high. Compare that to GB’s, where the “r” is around .500, again, for players with at least 70 games per season.

You’ve got an average of 13 players per class (14,7,10,20).  For each player, you have 0.5 chances per game (based on another post), and I’ll assume an average of 120 games.  This gives you a total of about 800 chances per class.  Your r is around .20.  Your regression towards the mean equation therefore is x/(x+opps), where x=3200.

For groundballs, you have about 450 chances per player, with an r of .50.  Your regression towards the mean equation is therefore x=450.

So, this tells me that if you had an infielder with 3200 line drive chances, then you would know as much about his LD skill as with him having 450 chances with grounders.

3200 line drive chances would be 40 years of baseball as a middle infielder!

   62. AROM wants you off his lawn Posted: March 11, 2005 at 11:19 AM (#1193807)

In other words, Tango, we can have a good idea how good Julio Franco is on line drives, but need a bigger sample for everyone else.

   63. Tango Tiger Posted: March 11, 2005 at 12:33 PM (#1193927)

Rally: good one!  (Or, LOL in my younger days.)

   64. Tango Tiger Posted: March 11, 2005 at 12:38 PM (#1193936)

Oh, if you have one year of data, you’d regress groundball-UZR by 50% as MGL noted, and linedrive-UZR by 98%. 

Unless I’m just completely misreading MGL’s charts, I see no need to include line drives in a player’s “ability” evaluation, though you can certainly include it in his “value” evaluation (if you decide that everything that happens on the field must be attributed to the players).

   65. Los Angeles Waterloo of Black Hawk Posted: March 11, 2005 at 01:19 PM (#1193999)

This is an inexcusable hijack, but it occurs to me that anyone reading this thread may be interested in taking part in a linear weights-based fantasy league.  I have details in the second entry of the Friday Dugout ...

... okay, sorry for the interruption, back to your regularly scheduled dose of MGL/Tango insight ...

   66. Cowboy Popup Posted: March 11, 2005 at 01:25 PM (#1194009)

I have nothing to contribute, but I really think this thread is pretty interesting. Just wanted to throw that out there.

   67. Kelly Posted: March 11, 2005 at 01:58 PM (#1194068)

I suspect that this is clear to the participants in the this thread, but since I haven’t seen it mentioned explicitly, I thought I would.  Ideally, the 3B and SS function as a unit designed to intercept as many balls in play to the left side of the infield.  What I think would be more interesting to determine is not whether you can determine if the influence of the 3B on the SS can be removed to determine the value of the SS independent of forces outside his control, but rather whether it is at all important.

I find an analogy valuable at the this point.  Tony Parker’s perimeter defense should be effected by Tim Duncan’s presence in the post because Duncan’s presence creates a strong disencentive to drive to the basket.

MGL thinks this effect is of negligible importance in baseball, so this is probably not important…

   68. Tango Tiger Posted: March 11, 2005 at 03:33 PM (#1194280)

FWIW, Scott Fischal took MGL’s UZR for 2000-2003, and ran a regression of team-level SS to 3B, and reported the findings at my site.  He then did it with every single combination of positions.  IIRC, the correlations were weak throughout.

I can’t remember the filename on my site off the top of my head, and I don’t have FTP access from work, so I’ll post the link tonight from home.

Anyway, the likelihood is that optimizing positioning for SS/3B has little impact on each individual’s UZR.  I think I worked out a realistic example where a great yankee fielding 3B (Brosius, Ventura, ARod) would have at most a 2-run impact on Jeter’s UZR.

I think positioning matters, but I don’t think the “stealing of plays” matters too much, at least for the IF.  It shouldn’t matter in the OF at all, if you can properly qualify the “routine” plays (i.e., you would just discard them altogether, since the routine play has a 99%+ chance of getting caught).

   69. Chris Dial Posted: March 11, 2005 at 04:03 PM (#1194346)

Wow, you got some stuff from Scott?  Awesome.  He is one of those old-school USENET sabermatricians that we never get input from, the dirty dog (attempt to gate).

I think positioning matters, but I don’t think the “stealing of plays” matters too much, at least for the IF. It shouldn’t matter in the OF at all, if you can properly qualify the “routine” plays (i.e., you would just discard them altogether, since the routine play has a 99%+ chance of getting caught).

cough cough cough

   70. Mike Emeigh Posted: March 11, 2005 at 05:31 PM (#1194510)

Looks like most of our popup data just tells us what player takes the catch.

More or less. This is one reason why Michael Humphries advocates ignoring infield putouts altogether in non-PBP based metrics.

That’s why when people start criticizing a PBP defensive metric because because we don’t know in the data where the infielder was positioned, I say, “We don’t care! It generally does not matter! Where he positons himself is the same as his range! That is his responsibility and his problem”

In my case, it wasn’t a criticism, it was an observation. It doesn’t make the metric “wrong”, nor does it require an adjustment to the metric to account for it.

In most cases, as MGL notes, proper positioning IS a part of the player’s basic skill set, and he should be penalized accordingly for plays not made if he’s consistently out of position. There are also obvious cases (LHB/RHB, in general; also runners on/bases empty/infield in/infield back kinds of stuff) where you can (and MGL does) make adjustments for known positioning differences.

I think, however, there are also differences in positioning that *are* based on the skills of the individuals involved, and that would change if you changed one of the principals. When Brosius, and later Ventura, were playing 3B for the Yankees, they did play further off the line than most other 3Bs did, allowing Jeter to move a couple of steps toward the middle. When I’d watch Yankee replays on MLB.com, I’d note that many left-side singles went squarely through an area where a SS positioned normally “might” have been able to make a play - and thus I suspect that Jeter was being penalized more heavily for not making plays on those balls than the typical left-side single costs the typical shortstop (most of those balls were being fielded by Brosius and Ventura). That’s a team positioning decision which (probably) helped Brosius and Ventura and (probably) hurt Jeter. I don’t expect any PBP metric to account for that, because you can’t really figure out based on the actual positioning what the “expected” play rate would be with normal range at both positions, since you have nothing against which to baseline such an expectation. That’s not a criticism of the metric either, but I think a practical recognition of its limits. There are all kinds of things that may fail to rise above the general level of “noise”, some of which might be important to capture and model if we could - but none of which we can actually capture in any practical way, and which may in fact not be worth the time to even try to figure out how to capture.

-- MWE

   71. Tango Tiger Posted: March 12, 2005 at 12:38 AM (#1195274)

Here’s the link to Scott’s post on UZR correlations between positions.

Page 1 of 1 pages

You must be Registered and Logged In to post comments.

 

<< Back to main

Support BBTF

donate

My Bookmarks

You must be logged in to view your Bookmarks.

Vivid Seats is a sports ticket broker, concert ticket broker and theater ticket broker offering the best baseball tickets like Yankees tickets, Cubs tickets, and Red Sox tickets, as well as Police reunion tour tickets and Jersey Boys tickets.

We have baseball tickets, the NFL schedule, college football tickets and Cowboys tickets. We have NBA tickets like Celtics tickets and Lakers tickets. Plus, buy concert tickets, Patriots tickets and Colts tickets. Also check out our MLB baseball schedule

Baseball Bats

Concerts Theatre NFL Angels Dodgers MLB Celtics Theater NBA Tickets Venues NHL Lakers Tickets NFL Yankees NHL Phillies NBA Wicked Marlins MLB Concerts Cubs Mets Red Sox Wicked WWE Red Sox Mets Yankees Dodgers

Major League Baseball: All Star Game, New York Yankees, Boston Red Sox, LA Angels, Washington Nationals, Chicago White Sox, and the Chicago Cubs.

Find terrific deals on Yankees tickets for the new home, Cubs tickets for classic Wrigley, or Red Sox tickets for Fenway with OnlineSeats. We have seats for every baseball game, including Dodgers tickets.

Page rendered in 1.0878 seconds
79 querie(s) executed