Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Dialed In > Discussion
Dialed In
— 

Thursday, July 22, 2010

Reconciliation - Getting Defensive Stats and Statheads Back Together

There is lots of skepticism around defensive statistics.  Wrongly so, in my opinion, but I am probably a little biased.  Yes, they aren’t perfect, but there is no reason to throw the baby out with the bathwater.  The idea behind the stats is solid.  There is some differing opinions around the treatment of the data, but otherwise, the defensive stats community does believe we are in the right general area.

Except for a few people and one of them is Colin Wyers.  Colin Wyers, in case you didn’t know, is a sharp cookie and he does tremendous statistical work around testing other people’s stats and theories, in addition to his own developments.  He is a terrific critical thinker.  He is presently employed at Baseball Prospectus as some sort of stats guru, and has posted several articles questioning the foundation of defensive stats, from the height of pressboxes to the inconsistencies between UZR and +/-, despite coming from the same data source.  Let me be perfectly clear, I think Colin is smart, inquisitive and open-minded.  I also think he is overly skeptical about the quality of defensive stats.

There is another Primate, LAW or BWV 1129, who has some serious questions about reconciling defensive statistics against DER (comment #14 in the Tango link below), or at least Runs Allowed.  He suggests that there should be a pitching stat that complements the fielding stats and those should, at the team level, match runs allowed (in a manner similar to Runs Created matching runs scored), possibly FIP or some other fielding independent stat.

We’ve had an exciting discussion over at Tango’s The Book Blog, with Brian Cartwright, terpsfan, AROM, Tango and MGL, and others.  Here at BTF, we had the same types of comments in the latest Colin Wyers BPro thread.  Primate Harold posits some interesting thoughts about why the stats won’t reconcile (UZR + FIP =/= RA) but I claim the Dial DRS plus something like FIP might. 

SIDEBAR: I just said Dial DRS.  Dial DRS?  Isn’t that DRS?  Well, apparently not.  BIS co-opted it, and John Dewan and Bill James outrank me, so once DRS gets posted at ESPN or Fangraphs (who I thought would have known better), and it isn’t the DRS I have been creating and referencing, and being referenced around the web for the last dozen years, then well, I lost the acronym DRS.  Up mine.  So, Dial DRS or DDRS is where we are today.  FWIW, BIS says it was accidental, and I believe them.  It doesn’t really excuse Fangraphs, who completely knew about my work, but they are just posting what they are given.  Bitter?  No, because that’s just the tip of the iceberg. /SIDEBAR

I have been trying to reconcile DDRS and DIPS since DIPS came out and Ron Johnson suggested that should be done to verify that the numbers have solid foundation.  I struggled and couldn’t do it, but I was using ERA of HBIP.  And a few other “Chris isn’t that smart” stats, and I kept being very far off in outs.

This offseason, Tango made a post about prediction contests.  I wanted to generate a set of numbers of projections for pitchers that was more accurate than others since that is an area that all projection systems struggle.  So I thought, what if I take pitchers and use *their* HBIP, but their HBIP adjusted for their hits allowed in defensive zones.  I thought that I needed to look at each pitchers’ Hits on Balls In Play Not In Zone.  HBIPNIZ.  Because balls in zone is based on their defensive players, and if one guy has Nate McLouth and another has Carlos Beltran, then one guy looks better than the other, even on the same set of BIP allowed.  So I worked backwards from team ZR.

Please bear in mind that I am an open source researcher.  I do not think what I am about to post is completely right and I may have made some incorrect assumptions or calc errors.  I will argue that I did not, but I am well aware I may have.  I EXPECT this community to take what I put forth here and expand and improve on it, either with better data or improved thought processes.  I am here to provide the first building block.  Who knows – maybe we only need one.

Here are the steps as I have begun:
1. Take the teams ZR chances and Plays Made.  This represents the responsibility of the fielders, NOT THE PITCHER.  That isn’t a perfect nor completely correct assumption, but we’ll talk about that later.
2. That gives me the number of Hits on Balls In Zone (HBIZ).  Interestingly, for 2009, the NL average was 477, with a range of 398 (SFG) to 583 (HOU).
3. With HBIZ, I take HBIP and subtract HBIZ, yielding HBIPNIZ.  Now we are talking *pitching*.  So the pitching staff that allowed the fewest HBIPNIZ was 710 (LAD), and the most allowed was 861 (PHI).  That’s interesting right there.  For the curious, SFG was third with 730 HBIPNIZ allowed, but incredibly, HOU was fifth lowest HBIPNIZ.  Talk about victimized by poor defense.
4. Now I have HBIP and HBIPNIZ, and thus I create a ratio for each team.  The team with the lowest ratio?  Houston.  Perhaps there is some park factor looming here.  The highest is the Phillies.  The average here is 0.623.  So 62% of hits are on balls not in zones.  That makes sense, since line drives make up most hits and most line drives aren’t in anyone’s zone.
5. So now on to the individual pitchers.  So I take each pitcher’s line and I calculate their PAR, or “Pitcher Allowed Runs”.  Dammit, I *have* to be the first with that!!!  PAR is calculated by:
=((BB-IBB)*0.34+HBP*0.25+IBB*0.31+(H-HR)*HBIPNIZ*0.56+HR*1.44)-((BIP Outs)*0.09+K*0.098)
For the league, using linear weights the average value of a HBIP is 0.56 runs (weighted average of singles, doubles and triples).

Notice how my defensive stats mesh completely with DIPS.

6. Next is FAR, or Fielding Allowed Runs, for this specific pitcher.  That is calculated (although advanced versions of this statistic would take actual BIP behind this specific pitcher, which I do not have) by: =(1-HBIPNIZ)*0.56*(H-HR).  Again, 0.56 represents the average value of an HBIP across the league.
7. The first test is PAR +  FAR = TAR (Total Allowed Runs) correlation to actual RA.  For the league, summed by each individual pitcher, r^2 is 0.972.  That’s pretty strong, but perhaps I haven’t done anything that would make it not be strong.  But it is strong.
8. The differences.  Ricky Nolasco seems to have allowed 20 more runs than this metric would have agreed with, which is RA - TAR
Top 5:

Pitcher    DIFF
Ricky Nolasco    20.51
Juan Rincon    11.45
Chad Gaudin    11.29
Craig Stammen    10.52
Hiroki Kuroda    9.94 


That got low in a hurry.  Nolasco is a clear “What?” miss here. 
Going the other direction:   

Pitcher    DIFF
J
.AHapp    -24.01
Adam Wainwright    
-22.20
Doug Davis    
-19.37
Joe Blanton    
-18.94
Matt Cain    
-18.43 

JA Happ?  Seems like he was due for a collapse, which agrees with ZiPS anyway.  Likewise, Adam Wainwright is crashing and burning.  Wait, what?  Well, I don’t know how predictive it is – it’s meant to be descriptive.

9. To calculate PAR+ (park and playing time adjustments), I calculated each pitcher’s PAR/IP and the league PAR/IP, and then put in the Baseball-Reference Pitching Park Factors, and divided by 100.  The league leaders were:

Name    Age    Tm    IP    PAR+
Tim Lincecum    25    SFG    225.3    42.75
Chris Carpenter    34    STL    192.6    35.55
Javier Vazquez    32    ATL    219.3    35.06
Danny Haren    28    ARI    229.3    32.84
Josh Johnson    25    FLA    209.0    29.81 

Crazy set of leaders, huh?  After Johnson there is a 20% drop to the next tier.
10. Now I have team PAR, and Team DRS and Team Runs Allowed.  If I sum the defensive prevention numbers, do I match the difference between a team’s runs allowed and league average?  Let’s see:

Tm    Sum of R    Sum of PAR by ip    DRS    Run vs Avg    TRS
BAL    876    104    
-24    105    127
BOS    736    
-24    -54    -35    30
CHW    732    
-52    -8    -39    -44
CLE    865    49    
-31    94    79
DET    745    52    71    
-26    -19
KCR    842    
-4    -46    71    42
LAA    761    41    25    
-10    16
MIN    765    11    6    
-6    5
NYY    753    
-42    -11    -18    -31
OAK    761    
-27    16    -10    -43
SEA    692    
-97    6    -79    -103
TBR    754    
-26    10    -17    -37
TEX    740    
-18    11    -31    -29
TOR    771    34    28    0    6
Avg    771            Correl
:    0.891 

This correlation is 0.89.  Goodness, that worked out well.  Of course, the NL wasn’t quite as good, coming in at r = 0.74.  I will have to expand this to see if 2009 was fluky good or bad, and check a few more seasons to see how it works out.  I think I have 2007-2008 lined up, but as you can see this isn’t a small amount of work.

This where you come in – what are the next Next Steps?  Lots of these match up very well, but there is some factor I am missing.  I think it *could be* IF FBs.  Not every team will see enough to reconcile all of these differences, but I believe it will tighten up the numbers.  Oddly, I couldn’t find just a straight count of popups by team.

Back to the pitcher being responsible part.  A key theory for DIPS and FIP is that the pitcher isn’t responsible for HBIP (much).  Therefore, all plays fielded are the responsibility of the fielders, and the pitcher gets zero credit.  Or, as I noted last year, the pitcher is responsible for about 70% of plays made.  That’s the bare minimum anyone playing gets to make.  I referenced it the other day - scroll to post #30 for the dirty details.

And…..go!

Chris Dial Posted: July 22, 2010 at 02:26 PM | 30 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Chris Dial Posted: July 22, 2010 at 05:29 PM (#3596358)
Okay, so no one goes to the front page to see new articles.
   2. Swedish Chef Posted: July 22, 2010 at 05:36 PM (#3596366)
Okay, so no one goes to the front page to see new articles.

The front page is completely blank for me. Let's bump this again.
   3. Chris Dial Posted: July 22, 2010 at 05:48 PM (#3596385)
As a side note, I am going to be in Copenhagen the week of Aug 15. Is it entirely out of the question for us to meetup?
   4. Ron Johnson Posted: July 22, 2010 at 05:53 PM (#3596396)
OK, really nice first step towards a firm understanding of the strength of signal.

Oh and you really ought to sue. This is the umptieth time you've had to change the name of your work. In the meantime, can I suggest DDRS.

EDIT: And now that I check the linky I see you're already using DDRS.
   5. Chris Dial Posted: July 22, 2010 at 07:38 PM (#3596497)
Tango linked to it too, and he has some ideas around it. He didn't like my pitcher comment, but that is really a throwaway line and not critical to what we are building. We can change the 100-0 apportioning to 70-30 or whatever by multiplying with fractions later.
   6. zenbitz Posted: July 22, 2010 at 09:11 PM (#3596618)
You report RA - TAR, but what about FIP/A - TAR? Because one might suspect that if HBIPNIZ is just randomly distributed it's not really contributing anything but a constant x normal dist.

The other obvious thing to try is HPIPNIZ/groundball and HPIPNIZ/flyball with their distinct weights. for 1B/2B/3B
   7. Chris Dial Posted: July 23, 2010 at 12:32 AM (#3596740)
zenbitz,
yes, but I think HBIPNIZ are going to be mostly line drives. At BBRef, you can see that LDs are the most hits by a huge margin.

We'll see if those other options move the needle.
   8. Harold can be a fun sponge Posted: July 23, 2010 at 02:01 AM (#3596778)
I would not mess around with individual pitchers if I were you. First try this out at the team level, see how it works, refine it, etc. We know that pitchers on the same team get varied fielding support; let's not bring that noise in just yet.

The first test is PAR + FAR = TAR (Total Allowed Runs) correlation to actual RA. For the league, summed by each individual pitcher, r^2 is 0.972. That’s pretty strong, but perhaps I haven’t done anything that would make it not be strong. But it is strong.

If you're looking at raw numbers rather than rates, then this correlation is meaningless. Pitchers with lots of innings will have high figures for both TAR and RA figures; that's what you're measuring with correlation. Anyway, one more reason to stick to teams for the moment.

For the teams, how about RMSE rather than correlation? Or a scatter-plot? The picture is much more illuminating than a summarizing number or two?

Is your spreadsheet available somewhere? I'm not following your steps well by reading; I think seeing the formulas and numbers will help (I can re-create it myself a bit later).

Are you suggesting that all HBIZ are the defense's responsibility? Or maybe you don't think that's literally true; is it your intent to treat them that way in this metric?
   9. BWV 1129 Posted: July 23, 2010 at 02:19 AM (#3596783)
1. Take the teams ZR chances and Plays Made. This represents the responsibility of the fielders, NOT THE PITCHER. That isn’t a perfect nor completely correct assumption, but we’ll talk about that later.

I agree that this isn't a perfect assumption; in fact, for each ball in a zone, we could say (for descriptive, not predictive purposes, which is what I believe we're doing here) that the pitcher is responsible for the average rate at which that ball is turned into an out, and the fielder is responsible for the balance between that and what actually occurred (positive or negative).

Here's a zone where the out rate is 50%. The ball goes there. The pitcher gets debited for .5 of a hit. The ball is a hit: the fielder gets debited .5, and the sum of the pitcher+fielder is 1 hit allowed. The fielder makes the out: the fielder gets credited for -.5 of a hit, and the sum of the pitcher+fielder is 0 hits allowed.

Every ball is in a zone somewhere, right? Even a ball that is an out .000001 of the time (not that I would imagine there are any zones of that nature). So you can do this for every single batted ball in play.

Now, if you do this for every batted ball in play, you should have a perfect record of each non-HR hit -- there shouldn't be any hit not included.

Now, the trick is to convert these to runs. Each zone would likely have a different run expectation. That 50% zone I mentioned above, let's say that every hit in that zone is a single, and a single is .47 runs. So the run value for a ball in that zone is .235. The ball goes there. The pitcher gets debited for .235 runs. The ball is a hit: the fielder gets debited .235 runs, and the sum of the pitcher+fielder is .47 runs allowed. The fielder makes the out: the fielder gets credited for -.235 runs, and the sum of the pitcher+fielder is 0 runs allowed.

Do that for every batted ball. Add in run values for non-BIP events (1.4 HR, .33 BB/HBP, etc.). You're not going to have a perfect record of each run, because this won't take into account timing (i.e., clutch). It should be somewhat close, though. How close? What's the RMSE? 15 runs? 20 runs? 50 runs? Are we close to the run estimators we use for offense? If we're way off, how can we get closer?
   10. BWV 1129 Posted: July 23, 2010 at 02:28 AM (#3596785)
You're not going to have a perfect record of each run, because this won't take into account timing (i.e., clutch). It should be somewhat close, though. How close? What's the RMSE? 15 runs? 20 runs? 50 runs? Are we close to the run estimators we use for offense? If we're way off, how can we get closer?

Now, here I'm going to get a bit controversial. MGL believes that UZR is a predictive, not descriptive stat. I.e., it's describing "true talent" more than "what really happened" (though obviously the latter is part of the former).

I'm not so sure.

I suspect that the adjustments he makes to the raw data, if properly made, actually go toward describing what really happened.

I mean, here are some of the adjustments he makes:

- "A bunt ground ball is treated as a separate kind of a batted ball than a non-bunt ground ball, but only for the first, second, and third baseman."

- "The base runner and outs adjustments are a proxy for infield defensive alignment."

- "Left-handed and right-handed batters are treated separately since infielders and outfielders are positioned differently for each."

- "For outfield air balls, two separate categories of batters are used as a proxy for outfielder depth: Batters with less than average power and batters with greater than average power."

There's a bunch more. Park adjustments, baselines. In my mind, what MGL is assuming is that positioning affects the out conversion rate for each zone. This is obviously true, so in lieu of positioning data (which we lack), he comes up with all these proxies that he thinks adjusts for them.

Let's remove, for the moment, the consideration of whether or not MGL's adjustments make the proper corrections. Let's stipulate that they do. If they do, it seems to me that UZR is describing actual events -- the expected out conversion rate for ball X to zone Y was lower than normal because of circumstance Q. That's something real and, theoretically, verifiable (though we don't have the data; we may have data that says "this ball is an out 50% of the time with the bases empty, but with a man on 1st and less than 2 outs, it's an out 40% of the time," and maybe that's how MGL derived his adjustments, we don't know).

I think that if MGL ran this -- and I'll have to check the PZR links Tango put up at The Book to see how much he may have already -- we should see a total of PZR+UZR that should come somewhat close to the correct number of runs allowed on the team level. (For these purposes, I'm including catcher defense and SB/CS in "UZR".)
   11. BWV 1129 Posted: July 23, 2010 at 03:10 AM (#3596804)
I'm kinda dumb -- in the last part of my post 9, obviously all that would add up to the team's linear weights runs allowed (in my example) ...
   12. Chris Dial Posted: July 23, 2010 at 03:47 AM (#3596829)
Are you suggesting that all HBIZ are the defense's responsibility? Or maybe you don't think that's literally true; is it your intent to treat them that way in this metric?
That's tongue in cheek. As I note in #7, we can re-apportion the credit later.
   13. Chris Dial Posted: July 23, 2010 at 03:53 AM (#3596834)
If you're looking at raw numbers rather than rates, then this correlation is meaningless.
Well, I convereted it to a rate, and thne back again - It is weighted by IP.
   14. Chris Dial Posted: July 23, 2010 at 03:57 AM (#3596841)
in the last part of my post 9, obviously all that would add up to the team's linear weights runs allowed (in my example)
I know it will. I did this with my work, and was within 3 runs over 8000.
   15. and Posted: July 23, 2010 at 12:58 PM (#3597010)
PAR is calculated by:
=((BB-IBB)*0.34+HBP*0.25+IBB*0.31+(H-HR)*HBIPNIZ*0.56+HR*1.44)-((BIP Outs)*0.09+K*0.098)


I can't really contribute to the meat of this discussion, but why do BB, IBB, and HBP all have different coefficients? Isn't the on-field result of any of the three of them exactly the same?
   16. Chris Dial Posted: July 23, 2010 at 01:09 PM (#3597016)
I can't really contribute to the meat of this discussion, but why do BB, IBB, and HBP all have different coefficients? Isn't the on-field result of any of the three of them exactly the same?
No, due to the piutcher-control aspect, they have different base advancement values (moving up other runners)
   17. and Posted: July 23, 2010 at 01:26 PM (#3597028)
How do they move runners up differently?

Are you saying it's an effect that if a pitcher walks a guy he's not trying to that he is then more likely to walk someone else?
   18. Eric J can SABER all he wants to Posted: July 23, 2010 at 01:33 PM (#3597039)
How do they move runners up differently?

IBB almost never occur with a runner on first. They're also much more likely to occur with 2 outs.

HBP vs. UIBB, I'm not sure - I've usually treated those as identical in my own sabermetric experiments.
   19. Ron Johnson Posted: July 23, 2010 at 02:06 PM (#3597071)
#18 The only issue in the potential value of HBP vs non-intentional BB is that a certain percentage of HBP are intentional. And a subset of those will happen only with certain baserunner situations.

I'm somewhat surprised by the values used in PAR though. I remember I came up with different correlation coefficients for all 3 events but I was pretty sure it was non-IBB > HBP > IBB.
   20. Swedish Chef Posted: July 23, 2010 at 02:33 PM (#3597095)
As a side note, I am going to be in Copenhagen the week of Aug 15. Is it entirely out of the question for us to meetup?

Could be possible, I'm not too far away, but it depends on how work is going.
   21. Foghorn Leghorn Posted: July 23, 2010 at 02:47 PM (#3597111)
IBB almost never occur with a runner on first. They're also much more likely to occur with 2 outs.

HBP vs. UIBB, I'm not sure - I've usually treated those as identical in my own sabermetric experiments.
It is completely possible that when I typed the formula (because it is really letters and numbers), I mixed those two up.
   22. BWV 1129 Posted: July 24, 2010 at 01:11 AM (#3597656)
You know, I think this whole exercise really has to be done with the specific fielding data "for" (i.e., behind/in support of) each pitcher. Then we could compare that to what you've done here.
   23. Foghorn Leghorn Posted: July 24, 2010 at 02:16 AM (#3597703)
BWV,
it may *not*. If, as zenbitz says, there's areally a constant, like FIP, and I use a team specific one, it may not be needed. I mean, that may be a finer adjustment, but it may not be a big enough differnece to warrant the extra effort.
   24. BWV 1129 Posted: July 24, 2010 at 07:57 AM (#3597867)
We don't know what the difference is. FIP assumes that all pitchers allow the same assortment of BIP. We know they don't. Is there a way to suss out the Glendon Rusches in advance?
   25. Foghorn Leghorn Posted: July 26, 2010 at 02:14 AM (#3598977)
There certainly is. LD/FB/GB pcts. Of course, one has to believe that the differences between those in a given season are somewhat real. Regression or not, those things exist.

Rusch is what happens when the effect is real rather than chance. I don't know how possible it is to suss that out.
   26. Foghorn Leghorn Posted: January 08, 2015 at 10:51 PM (#4876551)
Whatever became of this? I re-read this and I still like it.
   27. Jack Sommers Posted: January 09, 2015 at 02:03 AM (#4876601)
Hi Chris.

Just curious, do you get your BIP data from same source as baseball-reference ?

If so, how are you dealing with the change in classifications that took place starting last year that greatly impact line drive /fb percentages ?



   28. Mr Dashwood Posted: January 09, 2015 at 09:59 AM (#4876668)
Whatever became of this? I re-read this and I still like it.

That's a question for you to answer.

I had completely forgotten about this. It was published at a point in my life where I was doing a lot of travelling and family things, so it was the worst possible moment for it to pique my interest.

A week or two ago I made a suggestion in another thread that WAR double-counts the effects of BIP. After a private discussion with A Well-Known Sabermetrician, I came to the conclusion that any WAR system that uses DRS or UZR is giving a false reading of the value of BIP, because both those systems are designed to rate fielders amongst one another exclusively, and not integrate the value of those fielders (and the value of pitchers' BIP) in terms of the actual team or league context. Without a team or league context, such as is provided by Win Shares Above Bench (a sadly neglected system), all your WAR is useless.

More research on something like what Dial presents here is what is really needed. Has anything been done?
   29. Foghorn Leghorn Posted: April 28, 2015 at 01:23 PM (#4942481)
SHoewizard,
no, I get it from STATS. Changing definition for LD/FB is probematic, and Statcast will put most of that to bed. Alan Nathan rightly says "I need Time of Flight"
   30. GuyM Posted: April 28, 2015 at 01:42 PM (#4942512)
Hey Chris:
Regarding the issue of the reliability of fielding data you reference here, coincidentally I was looking today at the measured fielding opportunities at SS for the NYY. With Jeter retired, I thought it would be interesting to see if there was any change. I used Fangraph's BIZ, which I think is based on the BIS data used in UZR/DRS. Here are BIZ per 9 innings at SS for the NYY in recent years, along with Jeter’s innings in the field.

Year / BIZ9 / Jeter Inn
2011 2.33 1047
2012 2.39 1186
2013 2.65 110
2014 2.09 1138
2015 2.74 0

Of course there are other variables to be looked at (like total BIP), and the 2015 sample is very small. But boy, it sure looks to me like the range of the SS has a huge impact on measured fielding opportunities. With Jeter out with an injury in 2013, fielding opportunities surged. When Jeter returned and was barely mobile in 2014, fielding opportunities plunged. And now, with Jeter retired, lots of balls are once again being hit into the SS zones. (Cross-posted at Tango's blog.)

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Dynasty League Baseball

Support BBTF

donate

Thanks to
Downtown Bookie
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.3421 seconds
59 querie(s) executed