Reconciliation - Getting Defensive Stats and Statheads Back Together
There is lots of skepticism around defensive statistics. Wrongly so, in my opinion, but I am probably a little biased. Yes, they aren’t perfect, but there is no reason to throw the baby out with the bathwater. The idea behind the stats is solid. There is some differing opinions around the treatment of the data, but otherwise, the defensive stats community does believe we are in the right general area.
Except for a few people and one of them is Colin Wyers. Colin Wyers, in case you didn’t know, is a sharp cookie and he does tremendous statistical work around testing other people’s stats and theories, in addition to his own developments. He is a terrific critical thinker. He is presently employed at Baseball Prospectus as some sort of stats guru, and has posted several articles questioning the foundation of defensive stats, from the height of pressboxes to the inconsistencies between UZR and +/-, despite coming from the same data source. Let me be perfectly clear, I think Colin is smart, inquisitive and open-minded. I also think he is overly skeptical about the quality of defensive stats.
There is another Primate, LAW or BWV 1129, who has some serious questions about reconciling defensive statistics against DER (comment #14 in the Tango link below), or at least Runs Allowed. He suggests that there should be a pitching stat that complements the fielding stats and those should, at the team level, match runs allowed (in a manner similar to Runs Created matching runs scored), possibly FIP or some other fielding independent stat.
We’ve had an exciting discussion over at Tango’s The Book Blog, with Brian Cartwright, terpsfan, AROM, Tango and MGL, and others. Here at BTF, we had the same types of comments in the latest Colin Wyers BPro thread. Primate Harold posits some interesting thoughts about why the stats won’t reconcile (UZR + FIP =/= RA) but I claim the Dial DRS plus something like FIP might.
SIDEBAR: I just said Dial DRS. Dial DRS? Isn’t that DRS? Well, apparently not. BIS co-opted it, and John Dewan and Bill James outrank me, so once DRS gets posted at ESPN or Fangraphs (who I thought would have known better), and it isn’t the DRS I have been creating and referencing, and being referenced around the web for the last dozen years, then well, I lost the acronym DRS. Up mine. So, Dial DRS or DDRS is where we are today. FWIW, BIS says it was accidental, and I believe them. It doesn’t really excuse Fangraphs, who completely knew about my work, but they are just posting what they are given. Bitter? No, because that’s just the tip of the iceberg. /SIDEBAR
I have been trying to reconcile DDRS and DIPS since DIPS came out and Ron Johnson suggested that should be done to verify that the numbers have solid foundation. I struggled and couldn’t do it, but I was using ERA of HBIP. And a few other “Chris isn’t that smart” stats, and I kept being very far off in outs.
This offseason, Tango made a post about prediction contests. I wanted to generate a set of numbers of projections for pitchers that was more accurate than others since that is an area that all projection systems struggle. So I thought, what if I take pitchers and use *their* HBIP, but their HBIP adjusted for their hits allowed in defensive zones. I thought that I needed to look at each pitchers’ Hits on Balls In Play Not In Zone. HBIPNIZ. Because balls in zone is based on their defensive players, and if one guy has Nate McLouth and another has Carlos Beltran, then one guy looks better than the other, even on the same set of BIP allowed. So I worked backwards from team ZR.
Please bear in mind that I am an open source researcher. I do not think what I am about to post is completely right and I may have made some incorrect assumptions or calc errors. I will argue that I did not, but I am well aware I may have. I EXPECT this community to take what I put forth here and expand and improve on it, either with better data or improved thought processes. I am here to provide the first building block. Who knows – maybe we only need one.
Here are the steps as I have begun:
1. Take the teams ZR chances and Plays Made. This represents the responsibility of the fielders, NOT THE PITCHER. That isn’t a perfect nor completely correct assumption, but we’ll talk about that later.
2. That gives me the number of Hits on Balls In Zone (HBIZ). Interestingly, for 2009, the NL average was 477, with a range of 398 (SFG) to 583 (HOU).
3. With HBIZ, I take HBIP and subtract HBIZ, yielding HBIPNIZ. Now we are talking *pitching*. So the pitching staff that allowed the fewest HBIPNIZ was 710 (LAD), and the most allowed was 861 (PHI). That’s interesting right there. For the curious, SFG was third with 730 HBIPNIZ allowed, but incredibly, HOU was fifth lowest HBIPNIZ. Talk about victimized by poor defense.
4. Now I have HBIP and HBIPNIZ, and thus I create a ratio for each team. The team with the lowest ratio? Houston. Perhaps there is some park factor looming here. The highest is the Phillies. The average here is 0.623. So 62% of hits are on balls not in zones. That makes sense, since line drives make up most hits and most line drives aren’t in anyone’s zone.
5. So now on to the individual pitchers. So I take each pitcher’s line and I calculate their PAR, or “Pitcher Allowed Runs”. Dammit, I *have* to be the first with that!!! PAR is calculated by:
=((BB-IBB)*0.34+HBP*0.25+IBB*0.31+(H-HR)*HBIPNIZ*0.56+HR*1.44)-((BIP Outs)*0.09+K*0.098)
For the league, using linear weights the average value of a HBIP is 0.56 runs (weighted average of singles, doubles and triples).
Notice how my defensive stats mesh completely with DIPS.
6. Next is FAR, or Fielding Allowed Runs, for this specific pitcher. That is calculated (although advanced versions of this statistic would take actual BIP behind this specific pitcher, which I do not have) by: =(1-HBIPNIZ)*0.56*(H-HR). Again, 0.56 represents the average value of an HBIP across the league.
7. The first test is PAR + FAR = TAR (Total Allowed Runs) correlation to actual RA. For the league, summed by each individual pitcher, r^2 is 0.972. That’s pretty strong, but perhaps I haven’t done anything that would make it not be strong. But it is strong.
8. The differences. Ricky Nolasco seems to have allowed 20 more runs than this metric would have agreed with, which is RA - TAR
Top 5:
Pitcher DIFF
Ricky Nolasco 20.51
Juan Rincon 11.45
Chad Gaudin 11.29
Craig Stammen 10.52
Hiroki Kuroda 9.94
That got low in a hurry. Nolasco is a clear “What?” miss here.
Going the other direction:
Pitcher DIFF
J.A. Happ -24.01
Adam Wainwright -22.20
Doug Davis -19.37
Joe Blanton -18.94
Matt Cain -18.43
JA Happ? Seems like he was due for a collapse, which agrees with ZiPS anyway. Likewise, Adam Wainwright is crashing and burning. Wait, what? Well, I don’t know how predictive it is – it’s meant to be descriptive.
9. To calculate PAR+ (park and playing time adjustments), I calculated each pitcher’s PAR/IP and the league PAR/IP, and then put in the Baseball-Reference Pitching Park Factors, and divided by 100. The league leaders were:
Name Age Tm IP PAR+
Tim Lincecum 25 SFG 225.3 42.75
Chris Carpenter 34 STL 192.6 35.55
Javier Vazquez 32 ATL 219.3 35.06
Danny Haren 28 ARI 229.3 32.84
Josh Johnson 25 FLA 209.0 29.81
Crazy set of leaders, huh? After Johnson there is a 20% drop to the next tier.
10. Now I have team PAR, and Team DRS and Team Runs Allowed. If I sum the defensive prevention numbers, do I match the difference between a team’s runs allowed and league average? Let’s see:
Tm Sum of R Sum of PAR by ip DRS Run vs Avg TRS
BAL 876 104 -24 105 127
BOS 736 -24 -54 -35 30
CHW 732 -52 -8 -39 -44
CLE 865 49 -31 94 79
DET 745 52 71 -26 -19
KCR 842 -4 -46 71 42
LAA 761 41 25 -10 16
MIN 765 11 6 -6 5
NYY 753 -42 -11 -18 -31
OAK 761 -27 16 -10 -43
SEA 692 -97 6 -79 -103
TBR 754 -26 10 -17 -37
TEX 740 -18 11 -31 -29
TOR 771 34 28 0 6
Avg 771 Correl: 0.891
This correlation is 0.89. Goodness, that worked out well. Of course, the NL wasn’t quite as good, coming in at r = 0.74. I will have to expand this to see if 2009 was fluky good or bad, and check a few more seasons to see how it works out. I think I have 2007-2008 lined up, but as you can see this isn’t a small amount of work.
This where you come in – what are the next Next Steps? Lots of these match up very well, but there is some factor I am missing. I think it *could be* IF FBs. Not every team will see enough to reconcile all of these differences, but I believe it will tighten up the numbers. Oddly, I couldn’t find just a straight count of popups by team.
Back to the pitcher being responsible part. A key theory for DIPS and FIP is that the pitcher isn’t responsible for HBIP (much). Therefore, all plays fielded are the responsibility of the fielders, and the pitcher gets zero credit. Or, as I noted last year, the pitcher is responsible for about 70% of plays made. That’s the bare minimum anyone playing gets to make. I referenced it the other day - scroll to post #30 for the dirty details.
And…..go!
Chris Dial
Posted: July 22, 2010 at 02:26 PM |
25 comment(s)
Login to Bookmark
Related News:
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Chris Dial Posted: July 22, 2010 at 05:29 PM (#3596358)The front page is completely blank for me. Let's bump this again.
Oh and you really ought to sue. This is the umptieth time you've had to change the name of your work. In the meantime, can I suggest DDRS.
EDIT: And now that I check the linky I see you're already using DDRS.
The other obvious thing to try is HPIPNIZ/groundball and HPIPNIZ/flyball with their distinct weights. for 1B/2B/3B
yes, but I think HBIPNIZ are going to be mostly line drives. At BBRef, you can see that LDs are the most hits by a huge margin.
We'll see if those other options move the needle.
The first test is PAR + FAR = TAR (Total Allowed Runs) correlation to actual RA. For the league, summed by each individual pitcher, r^2 is 0.972. That’s pretty strong, but perhaps I haven’t done anything that would make it not be strong. But it is strong.
If you're looking at raw numbers rather than rates, then this correlation is meaningless. Pitchers with lots of innings will have high figures for both TAR and RA figures; that's what you're measuring with correlation. Anyway, one more reason to stick to teams for the moment.
For the teams, how about RMSE rather than correlation? Or a scatter-plot? The picture is much more illuminating than a summarizing number or two?
Is your spreadsheet available somewhere? I'm not following your steps well by reading; I think seeing the formulas and numbers will help (I can re-create it myself a bit later).
Are you suggesting that all HBIZ are the defense's responsibility? Or maybe you don't think that's literally true; is it your intent to treat them that way in this metric?
I agree that this isn't a perfect assumption; in fact, for each ball in a zone, we could say (for descriptive, not predictive purposes, which is what I believe we're doing here) that the pitcher is responsible for the average rate at which that ball is turned into an out, and the fielder is responsible for the balance between that and what actually occurred (positive or negative).
Here's a zone where the out rate is 50%. The ball goes there. The pitcher gets debited for .5 of a hit. The ball is a hit: the fielder gets debited .5, and the sum of the pitcher+fielder is 1 hit allowed. The fielder makes the out: the fielder gets credited for -.5 of a hit, and the sum of the pitcher+fielder is 0 hits allowed.
Every ball is in a zone somewhere, right? Even a ball that is an out .000001 of the time (not that I would imagine there are any zones of that nature). So you can do this for every single batted ball in play.
Now, if you do this for every batted ball in play, you should have a perfect record of each non-HR hit -- there shouldn't be any hit not included.
Now, the trick is to convert these to runs. Each zone would likely have a different run expectation. That 50% zone I mentioned above, let's say that every hit in that zone is a single, and a single is .47 runs. So the run value for a ball in that zone is .235. The ball goes there. The pitcher gets debited for .235 runs. The ball is a hit: the fielder gets debited .235 runs, and the sum of the pitcher+fielder is .47 runs allowed. The fielder makes the out: the fielder gets credited for -.235 runs, and the sum of the pitcher+fielder is 0 runs allowed.
Do that for every batted ball. Add in run values for non-BIP events (1.4 HR, .33 BB/HBP, etc.). You're not going to have a perfect record of each run, because this won't take into account timing (i.e., clutch). It should be somewhat close, though. How close? What's the RMSE? 15 runs? 20 runs? 50 runs? Are we close to the run estimators we use for offense? If we're way off, how can we get closer?
Now, here I'm going to get a bit controversial. MGL believes that UZR is a predictive, not descriptive stat. I.e., it's describing "true talent" more than "what really happened" (though obviously the latter is part of the former).
I'm not so sure.
I suspect that the adjustments he makes to the raw data, if properly made, actually go toward describing what really happened.
I mean, here are some of the adjustments he makes:
- "A bunt ground ball is treated as a separate kind of a batted ball than a non-bunt ground ball, but only for the first, second, and third baseman."
- "The base runner and outs adjustments are a proxy for infield defensive alignment."
- "Left-handed and right-handed batters are treated separately since infielders and outfielders are positioned differently for each."
- "For outfield air balls, two separate categories of batters are used as a proxy for outfielder depth: Batters with less than average power and batters with greater than average power."
There's a bunch more. Park adjustments, baselines. In my mind, what MGL is assuming is that positioning affects the out conversion rate for each zone. This is obviously true, so in lieu of positioning data (which we lack), he comes up with all these proxies that he thinks adjusts for them.
Let's remove, for the moment, the consideration of whether or not MGL's adjustments make the proper corrections. Let's stipulate that they do. If they do, it seems to me that UZR is describing actual events -- the expected out conversion rate for ball X to zone Y was lower than normal because of circumstance Q. That's something real and, theoretically, verifiable (though we don't have the data; we may have data that says "this ball is an out 50% of the time with the bases empty, but with a man on 1st and less than 2 outs, it's an out 40% of the time," and maybe that's how MGL derived his adjustments, we don't know).
I think that if MGL ran this -- and I'll have to check the PZR links Tango put up at The Book to see how much he may have already -- we should see a total of PZR+UZR that should come somewhat close to the correct number of runs allowed on the team level. (For these purposes, I'm including catcher defense and SB/CS in "UZR".)
=((BB-IBB)*0.34+HBP*0.25+IBB*0.31+(H-HR)*HBIPNIZ*0.56+HR*1.44)-((BIP Outs)*0.09+K*0.098)
I can't really contribute to the meat of this discussion, but why do BB, IBB, and HBP all have different coefficients? Isn't the on-field result of any of the three of them exactly the same?
Are you saying it's an effect that if a pitcher walks a guy he's not trying to that he is then more likely to walk someone else?
IBB almost never occur with a runner on first. They're also much more likely to occur with 2 outs.
HBP vs. UIBB, I'm not sure - I've usually treated those as identical in my own sabermetric experiments.
I'm somewhat surprised by the values used in PAR though. I remember I came up with different correlation coefficients for all 3 events but I was pretty sure it was non-IBB > HBP > IBB.
Could be possible, I'm not too far away, but it depends on how work is going.
it may *not*. If, as zenbitz says, there's areally a constant, like FIP, and I use a team specific one, it may not be needed. I mean, that may be a finer adjustment, but it may not be a big enough differnece to warrant the extra effort.
Rusch is what happens when the effect is real rather than chance. I don't know how possible it is to suss that out.
You must be Registered and Logged In to post comments.
<< Back to main