You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Sunday, January 29, 2006JoelW: Predicting UZRPrimate JoelW Certified Red Sox Optimist has been looking at various opensource defensive methods to see how they can be combined to predict how a given player will rank in UZR. He writes: Below is the formula and the predicted ratings I have for SS and 2b. Player UZR Predicted UZR Ellis 15 13.89 Castillo 5 5.82 Counsell 26 26.56 Loretta 9 9.29 Hudson 12 11.53 Soriano 20 19.84 Young 21 22.07 Vizquel 5 8.47 Everett 21 20.288 Reyes 11 10.98 Jeter 14 13.142 Tejada 12 12.956 Cabrera 14 10.426 The equation: That equation is complicated, but it would make some sense: Pinto and Dial pick up on things, but overstate the effects of certain skills relative to UZR The MSE with those numbers was something like 2.5 which I thought was fantastic. Joel has been bringing this up on several of the PMR threads, and rather than having the discussion spread out over those threads I thought it would make sense to set up a single discussion thread. 
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2014
(8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: ERRORJolly Old St. Nick What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH SABR 40: Impressions of a FirstTime Attendee (5  11:12pm, Aug 19) Last: Joe Bivens, Minor Genius St. Louis Cardinals Midseason Report (12  12:42am, Aug 10) Last: bjhanke 

Page rendered in 0.4838 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Joel W Posted: January 29, 2006 at 05:17 PM (#1842568)I should note that the hands/speed/accuracy ratings are all from Tango's Fans Scouting Report 2005.
I also should say that the position there just means 0 for second base and 1 for SS. It's basically a dummy variable, and basically says "speed is more important for SS."
The big question is whether people think the connections are spurious, or if they are genuinely predictive of other players.
This is interesting work, but it suffers from small sample size (not your fault). You simply don't have enough UZR ratings to get significant results. Having good hands doesn't lower your UZR, but your regression says so. I'm going to do something similar in my big comparison to UZR article at THT, which will come out this week if David releases all PMR ratings by Thursday. I've already done the work at the positions for which he's released data, and the results are pretty interesting. I'll do something similar to what you're doing here, though I don't think I'll use scouting data.
Because I wanted to a) have a larger sample size/degrees of freedom and b) allow for the position to have different ratings. Basically, the ratings say, "add 24 for second basemen to the rest of the equation and add speed*.46 for SS"
DSG,
I agree that I don't have the sample size, although the R^2 and the Tvalues were very significant. However, I don't think you're interpreting the scouting results correctly. For one, UZR does measure hands in some sense, the ability to come up cleanly with the ball. However, more importantly, it doesn't have to for those numbers to be important. If Dial and Pinto's method overstate or understate the value of certain skills compared to how UZR values them, then those skills will be part of the error term in the regression. Adding in the scouting variables helps us see if those skills are actually valued in such a way.
All that would change if I took out the scouting variables would mean that we would have different coefficients on PMR and Dial (PMR isn't informative for CF, but yours is part of the equation I have for that). It would put the scouting variables back in the error term. We still get nice results, but they have a larger MSE. Again, the correlation with those numbers might be spurious, but I don't think that it is because UZR doesn't measure them directly.
That's not what I meant. I just don't want to look at scouting numbers because I'm interested in what each metric tells us, without other information (otherwise I could throw things like SBs in the regression, which would correlate decently with UZR).
I'm sure you know this, BTW, but your numbers can be significant without meaning much. Your sample size is just too small. Mine ain't great either. FWIW, in my sample (n = 24), PMR correlates better with UZR than does Range.
Of course they can be sig. w/o meaning much. Anyway, I think we are both interested in some of the same thing and have a slightly different goals.
You stated that you are interested in what each thing measures, as am I, and it's a goal we share. In my CF analysis, your metric and Dial's were the ones that did the best, and I just dropped Pinto off as it increased my MSE and reduced my adjusted R^2. Anyway, basically I think the fact that your metric had a lot of correlation with 2b and SS numbers, but didn't add much when I added in Pinto's GB numbers means that most everything your numbers capture, his do, but more.
Anyway, I find that I am also just looking to be able to predict UZR numbers for the general population, which doesn't seem to be a goal of yours. If that's the goal, I do think the scouting data should be something we value. If it's the case that certain things aren't captured by our available measures. For example, I think that in my CF regressions you basically took your measure*.17+1.3*dial's +.1*speed. That may be off, but anyway, that would make a lot of sense to me. Really fast players may be able to get to those really hard zones that UZR views as particularly hard to get to, and therefore as more valuable. Your measure is much more of an adjusted rate stat, which captures some important stuff, and the dial rating uses Zone Rating to do something similar, but knowing a player's raw speed can help us to know that a player get to really difficult zones.
I have UZR numbers. I don't need to predict them. What interests me though is how close different metrics come to measuring what UZR is looking at. What part of UZR does Range, ZR, etc. capture? That's what I want to know, and that's the point of the tests I'm going to do.
So my questions stand:
1) To what extent should I/we include scouting variables in this analysis and to what extent do people think that it creates a spurious correlation? Do others think it makes logical sense to include them?
2) To what extent are different positions interchangable in this type of analysis?
3) Do people have more publicly available UZRs for this analysis outside of the ones in the GG articles that are referenced in Dial's article on ZR.
Trying to figure out a fitted regression to, predict is not the right word, estimate UZRs from our given public data seems quite important. On SS and 2b, I will have to look when I get to work, but if we leave out scouting variables, we get something that is just the combination of Dial and PMR, somewhat weighted. Obviously PMR (GB only, btw) does a good job of picking up UZR ability to get to GB given various directional and speed components, and I suppose that Dial's method helps to describe other facets of the infield game better. I'd be interested to know how you think we can say which method measures which skills best, DSG?
Again, small sample size screws things up. Even my sample size is *too* small, if I were to include variables that correlate spuriously with UZR. I guarantee you that a model including just the major fielding systems will do better the next year than a model with those systems plus scouting information, because scouting information is not going to correlate with UZR so consistently (see the fact that your regression says good speed and good hands are bad). To best approximate unkown UZRs, you want to just use the other fielding models, IMO. Scouting information is more important for projecting future UZRs, because it captures (or we hope that it captures) *skill* that UZR might not due to sampling error. But with four defensive metrics, they should be capturing darned near everything UZR captures since they are both looking at directly what happened on the field. Performance and ability are not the same thing  scouting information is supposed to tell us about the latter, defensive metrics about the former (though since performance rests in large part on ability, they will also tell us a lot about that; the opposite, BTW, is NOT true).
.58*PintoGBruns+.81*Dial=UZR
This basically stays the same when I delete various players. This is a good thing, since it says that the model is basically predicting correctly. FWIW, the MSE on that model is about 6.2 and the adjusted R^2 is .84
Now, note above that the coefficients on Pinto's model goes up when I add in the scouting variables, but then the scouting variable is negative. This does not imply that the effect of hands is negative on UZR, but instead says, "UZR can be approximated more closely if we emphasize one model, but deemphasize a skill that we think it emphasizes too strongly over UZR."
The MSE when I add in the scouting variables drops to 3.8 Even when I remove some players from the equation, it does not change that much.
I agree that to be confident in the regression, for 2nd baseman and SS, simply using the available fielding methods is better. I believe you are still wrong that we should not consider the fact that scouting data helps us know actual performance better than we might otherwise. Consider this thought expirment:
Suppose we know Runs Created from 1900 to 1910 for 50 players, but unfortunately walk numbers are not available all of the players. Fortunately, we have scouting data on those players. Shouldn't we include the scouting data available to get a better idea of Runs Created? If the equation has a better fit to a model that includes coefficients on singles, doubles, home runs and some fraction multiplied by a player's "plate discipline scouting report," wouldn't it be better to use it?
I'll add this: I cannot confirm or refute my equations. I can however say that no matter who I remove in the CF regressions, adding speed and Instincts from Tango's Scouting Report undeniably makes the regression better.
So take this equation for five center fielders who were not on the publicly available CF UZRs (Hunter, Williams, Rowand, Andruw, Taveras, Edmonds, Logan, Griffey, Beltran, Kotsay, Finley Wells):
.17*DSGRange + 1.25*DialZRMethod + .30*Speed(Tango's Scouting Report).22*Instincts(Tango's Scouting Report)
I'm mostly asking this because I'm interested: does this do better at predicting UZR?
Tango's project is admirable, but I don't think it should be taken too seriously yet. The people who are most qualified to judge players (i.e. see them play the most) are those who are most biased to like what they see. E.g. Seattle fans think Ichiro is the cat's meow; and since most people don't see too many Mariners games, only M's fans rate Ichiro's performance.
I'm curious about the huge size of the coefficient of the second base dummy. I'm not sure exactly what input data you used (runs above average? outs above average? total outs made?) for each data source, but the implication seems to be that an average second baseman would be 24 runs below average at shortstop. That seems a lot higher than what I've seen elsewhere  can you confirm whether or not this is a conclusion your model draws?
Let me address each point.
1) I am using Blackhawk's PMR numbers for Pinto runs. I am using his per 4000 BIP stat. I am using Gassko and Dial per 150 stats. I am not using Dial and Rally because of the extreme collinearity.
2) The regression does not say instincts are bad. Here is the equation when all the scouting variables are removed, and we use just Dial, Gassko, and Pinto:
.534*Gassko + .896*Dial .12*Pinto = UZR
For that equation there is an adjusted R^2 of .715 and a root MSE of 11. Pinto's rating does not have any significance. Note that if it did it would not say that having a positive Pinto rating is bad, it would say, "Gassko and Dial are picking up on something that Pinto's model says should not be as strongly accounted for in UZR"
Anyway, removing Pinto:
.49* Gassko + .80* Dial = UZR Adjusted R^2 of .74 and a Root MSE of 10.5
First, I think this is I think this is independently important: We do not want to average the ratings to find out UZRs. When they agree on a rating, the UZR will be even more in that direction. If both say that the player is 30, UZR will say 40.
Now, adding speed to the equation:
.38*Gassko + 1.09*Dial +.10*Speed = UZR Adj. R^2=.85 and the Root MSE = 7.9
Ok, we're still looking good. The explanatory power is up, all the coefficients are positive. As I said, if I remove some players, the numbers stay basically the same no matter who I remove. This is a good sign,.
Now, adding instincts to the equation:
.17*Gassko + 1.25*Dial + .30*Speed .22*Instincts = UZR adjusted R^2 is .97 and the root mse is 3.5
Note what happens, the importance of Gassko's rating falls, Dial's goes up, speed goes up, and instincts is negative: This does not mean that having good instincts is negatively correlated to UZR. It simply means that when people are judging a players speed they are also judging instincts. They may be perceiving instincts as speed. In reality, without any other info, the Instincts coefficient provides no real information on UZR.
Suppose Gassko and Dial's ratings overrated the importance of getting to balls that were hit shallow, relative to how UZR rates them. Now suppose that when fans look at a player and judge whether he has good instincts they say, "he gets to those shallow balls so quickly, that's amazing, he must have good instincts" we could see how instincts would have a negative predictive value on UZR.
Now, suppose instincts is considered by fans reaction to shallow balls and to deep balls, but really getting to deep balls is a function of speed. Then we will have this high positive number on speed and this negative number on Instincts: this does not imply instincts are bad, it says instincts are overrated by other metrics WRT UZR. Given that Gassko's stat, for example, does not use different run values by zone, this would seem to make a lot of sense.
3) I should have explained that above a bit more: The 24 number is a function of the model. Let me break it down a bit better:
72*pintoGB + .82*dial .51*speed +.75*accuracy  .55*hands + .23*Firststep +.46*position*speed + +24(forsecondbase) = UZR
PintoGB and dial are both multiplied by those coefficients, all fine. Then, for the speed, accuracy, hands, and firststep ratings you simply do the same with the coefficients. Now, the final two parts: The number is going to be really negative at this point. I used a dummy variable for position: 0 to represent 2nd base, and 1 to represent SS.
Speed matters more for SS than it does for second base according to this model. What this says is, "to make the number correct for 2nd base, add 24 to the number you have. To make the number correct for SS add .46*speed rating" It does not tell us anything, I don't think, about the relative differences between SS and 2b skill wise. It simply says that speed (I think that's a standin for athleticism to some extent) is important for UZR at SS, but it is not important at 2b.
Now, the final two parts: The number is going to be really negative at this point. I used a dummy variable for position: 0 to represent 2nd base, and 1 to represent SS.
Speed matters more for SS than it does for second base according to this model. What this says is, "to make the number correct for 2nd base, add 24 to the number you have. To make the number correct for SS add .46*speed rating" It does not tell us anything, I don't think, about the relative differences between SS and 2b skill wise. It simply says that speed (I think that's a standin for athleticism to some extent) is important for UZR at SS, but it is not important at 2b.
If I understand you correctly, position * speed will always be zero for second baseman, correct? So, the second baseman dummy variable only makes sense if you consider it a counterpoint to "position * speed" by including the first speed coefficient.
Say a SS and 2B have 50 speed and equivalent stats for everything else such that they equal a prediction of zero. Your model predicts their UZR like this:
for the SS: 0 + (.51 * 50) + (.46 * 1 * 50) = 0 + (.05 * 50) = 2.5
for the 2B: 0 + (.51 * 50) + (.46 * 0 * 50) + 24 = 0  25.5 + 24 = 1.5
Right?
Anyway, I don't think that the conclusion is that speed isn't important for 2B. The conclusion I draw is that averagefielding shortstops are slightly faster than averagefielding second basemen.
.53*PintoGB + .78*Dial = UZR
.53*PintoGB + .78*Dial = UZR
No DSG?
I completely agree. This is the real place I'd like to go with this data. If somebody could help me compute run values for each team as a whole, I'd be more than happy to try to do it. PADE seems like a reasonable way to go, but I don't know how much I like it. I think Studes may have something, since he calculated "expected DER" in the Hardball Times annual.
Chris,
I used your method, DSG's and Pinto's. In my small sample, Pinto's method captured everything that DSG's did. It's not that DSG didn't have a solid correlation with UZRit didbut instead it's that DSG didn't tell us enough useful that Pinto's didn't. There's a problem of colinearity, and by removing DSG I had a higher adjustedR^2 and a lower root MSE. I'm sure with a larger sample size, DSG's rating could add something, but with something like 7 data points, it was better off dropped.
In CF, it was DSG and your method that told us the most, whereas Pinto's didn't tell us all that much.
I think the major thing to take out of these preliminary first steps is that when the two ratings agree on something, we should not average them, but instead weight them somewhere in a range where the coefficients equal 1.3.

I agree with GuyM that predicting UZR is a questionable goal, but I don't know a better one. I don't think defensive statistics will get much better until they have usable batted ball trajectory data. No amount of parsing retrosheet data will produce that.
or, make a new model using old data to predict 2005 uzr, and then tell us who's going to win the gold gloves next year...
We can agree to disagree, and I'm sure there will be a lot more arguments once my article comes out. Without having done any tests yet, I think that Range + ZR capture most of UZR, probably with about equal significance. When I add in PMR, my guess is that Range and ZR will alternate in terms of significance at different positions.
Also, I'll gladly do your test, just give me a Scouting Report and nonScouting Report equation. How much work you want to do is up to you, but it would be nice if you could do so for every position you have data for (2B, SS, 3B, CF).
I think that's the easy part. For each team, take (Team DER  Lg DER) * BIP * run value. CTD can probably provide an estimate of average run value for all positions  should be around .75. I suppose building separate models for IF DER and OF DER would be even better, if possible.
It's finding a way to properly account for impact of all the nonfielding factors  pitchers, opposing hitters, park, league, etc.  that strikes me as more challenging.
Since these metrics all adjust for team, league, park, etc. I think we need team numbers on that level. I'm pretty sure Studes did something on it in the HBT Annual.
Kyle,
I don't think we have 2004 UZRs anywhere, do we? Or the Dial ZRs? I know we have Gassko's 2004 ratings and we have PMR for 2004, but I don't think we have UZR or Dial.
DSG,
I will get back to you, as I have some work to do. I think the CF equation I posted above is the one I'd like to use. I'll add that I agree that we can get most of our information from a couple metrics, but I think there's a lot of value in decrease MSE from 8 to 4.
If you used Studes' stuff, you'd find a "r" of 1 with Range. That's because his system is mine on a team level. DER ain't great either because of everything that goes into a high/low DER. You don't *necessarily* want a defensive metric to match up with DER; you want it to tell you how a fielder truly did.
I guess I agree with dsg's point from above. the sample is just too small.
I did some of the same analysis. When I put in random numbers for speed and hands in my regression, the MSE goes up, and the adj R^2 goes down. When I drop some players, the numbers stay quite similar, even as my degrees of freedom approach zero. Thank you though, I know that the sample is small, and I know this is open to spurious correlation, which is why I wanted the thread to begin withnot because I thought that the analysis was correct, but because it's clearly something that we could stand to gain from knowing.
I think we basically want to know if we can figure out a player's true defensive value from the wisdom of defensivemetric crowds.
Here, post #22.
David: Agreed, you don't want to match DER straight up. But if you can control for differences in park, pitchers, and opposing hitters, then shouldn't the DER and the defensive metric match up pretty well? If pitchers have a limited ability to influence BABIP, and with about 4300 BIP each season, the fielding opportunities should be pretty similar at a team level. So, if one metric did a much better job than the others of predicting team DER, that would be strong evidence of its superiority. Don't you think?
I'll use those and look at what we know a bit more tomorrow.
It depends on what you want to measure. You can predict straight DER, simply by rating players based on their Outs made per BIP. On a team level, they will add up. But if you want to measure individual defense, what's important is not the addingup, it's the model. At least that's what I glean with a cursory glance.
did you look at my Strangeglove article?
You can set up a spreadsheet and generate all the DZRs (or ZRRs  Zone Rating Runs) well, for the last decade and then see how well they predict.
It may be prudent to tweak the values based on LHB/RHB for IF. I was just thinking that you could do that for GBs based on some stuff MGL posted about the left side of the IF struggling with BIP from LHB.
I figure once I post all the "how to's" you young whippersnappers can do all the work and give me credit...
PMR seems to correlate pretty well. Dividing teams into 3 group based on runs/saved allowed due to DER alone, I get (PMR/actual):
Best +33 / +40
Middle 1 / +9
Worst 32 / 50
Simply adjusting for park would make the fit even better (TX and CO both do much worse than PMR, as you'd expect). Also controlling for pitchers and opposing hitters would be even better.
The only reason I can see why this won't work is if all the metrics produce very similar estimates at the team level. Is that the case?
I missed this. Can you provide a link?
 MWE
DSG's model basically starts with a team level estimate of DER and then breaks it down. So if you added up all his runs it would perfectly correlate to the adjusted DER.
What would really work is to use 2004 numbers for every player from each system, and then regress them on to team numbers for the 2005 numbers. Since players are moving around, we won't have the perfect correlation problem. Unfortunately, this would have to be done at the team level, and we'd lose out one of the more important questions: by position, which system does the best at telling us about a player's defensive value. I'm perfectly willing to think that Pinto's GB only system is great for SS, but not LF, whereas DSG is great for CF, etc.
That's cheating! No, seriously, then the method obviously wouldn't allow us to evaluate Range. Then again, isn't the real question whether to use ZR or PMR, or some weighted average of the two? If PMR has been done correctly, I don't see how Range can add information  PMR uses the same data source, but with more specific information for each BIP. Or am I missing something?
1. PMR
2. Range
3. UZR
4. ZR
Range and PMR might be switched around, BTW (I think they will be, in fact). But UZR is probably the best (well, definitely in my opinion). So such a test will tell you little. Where each system ends up depends solely on *what* you adjust for.
And Mike Crudale. (Sorry, couldn't resist)
 MWE
You must be Registered and Logged In to post comments.
<< Back to main