Monday, July 30, 2012
Chapman’s FIP for July is 0.99—that’s right, that little mark in front of what would be an impressive FIP means it’s silly good… And to try to figure out what that menat, I emailed one of the smartest baseball stat folks I know (and who will return my emails), Dave Cameron of FanGraphs.com. Here’s what he had to say about it.
“Basically, he’s been so good he broke the formula. Obviously, it’s not possible for a pitcher to have an ERA lower than 0, so a negative FIP just means that based on his walk rate, strikeout rate and home run rate, the formula expects him to have given up zero runs this month.”
Chapman’s current FIP for July is 0.86.
More broadly, only one pitcher has appeared in a Reds game this year who sports an ERA+ below 100  Bill Bray, who has 6.2 IP.

1. Famous Original Joe C Posted: July 30, 2012 at 11:54 AM (#4195803)94 Ks in 49 innings for the season, 17 K/9. That's beyond even Lidge's best season.
I dunno how I feel about this. I agree that it doesn't sink the entire FIP enterprise. But obviously it's flat false in some respects, because a 0.99 ERA is impossible, and since it's impossible it's obviously invalid.
I think that a pitcher who doesn't strike out literally every batter should have a FIP above zero, right? Even if it's marginally low. I ask this seriously, because you can't get any lower than zero, and any balls in play will lead to runs at some point  it seems silly to say that even a pitcher with stats as good as Chapman's can be expected to give up no runs over the longterm.
This is one of the almosts.
No, not zero. Negative. Long term he's going to erase runs that were allowed in seasons past. He has altered the TimeFIP continuum. The results may alter baseball as we know it. If Mike Cameron has a Beltran 2004type run through the 1999 playoffs, will the Griffey trade be undone?
I've always wondered at that. I remember when I first started reading about the more advanced metrics that FIP was one of the first I read about. I thought, that's handy and it'll save time when I'm talking in a bar and don't want to look up dERA or something like it. I was completely perplexed later to realize that FIP had more or less become the baseline stat for pitcher performances at some websites instead of anybody seeking to perfect a better formula for mainstream use. Even things like SIERA get soundly ignored on a site like fangraphs in favor of FIP. I guess there's some sociological issue at work that also compels people to use OPS in a situation when looking up a wOBA would be a much better option.
FIP isn't really measuring ERA in any way shape or form, yet it's being transported to the same scale, that is the problem. Using RA instead of ERA would be a lot more consistent with the input (although it would obviously still only be an approximation), which helps to illustrate why negative values are possible on the scale, without invalidating the entire model.
0 RA is better than 0 ERA. And in order to represent something that is better than a 0 ERA, in an ERA framework, you have to use negatives.
Stats like BA, ERA, OBP, SLG can be calculated quickly in your head, and I do it automatically when I look at a box score or the scoreboard at a game to put some context around an individual performance.
For defenseindependent pitching, I use an even rougher approximation of FIP: 4 x HR + BB – K. A negative number is good. A positive number is bad. It's not accurate, but it's good enough for an onthefly calc.
But FIP itself is too complex for me to quickly do the math in my head, let alone remember an (inconstant) constant and several coefficients. So, I’m not really sure why FIP is useful when there are better defenseindependent alternatives.
Because FIP is fun to say?
What are the better alternatives? FIP is still pretty easy to calculate, and in terms of ERA prediction, the more complicated formulae (xFIP and tRA for example) don't appear to perform that much better. So FIP seems like a pretty good quickanddirty stat. Your equation is quicker, but is it dirtier as well?
I've posted this before, but any time a formula gives you an impossible answer (and a negative ERA is, as noted, impossible) it's uselessly flawed. Of course Cameron, who loves FIP and xFIP so much he'd marry them if he could, can't admit that they're useless as designed.
If I'm wrong, then Chapman should not only start, he should start every game because (per Fangraphs): That's right  the Reds can get shut out, but Chapman's going to continue allowing negative runs in the future so they can't lose!
Junk stat.
FUP, Fielding Unaware Pitching
FARP, Fielders Aren't Relevant Pitching
FIGPR!, Fielders InGrates, Pitchers Rule!
I think that's the problem though. One day, somebody said, "On base percentage + SLG = OPS." And even though it made the world a better place everybody kept working. The only new stat that tries to get at the same information as FIP that I've seen since FIP became mainstream is SIERA and xFIP (which really doesn't count).
Everybody still refers to FIP as "quick and dirty." Well, when your choices are to either accept a "quick and dirty" stat as your primary pitching stat or use adjusted runs against then something is wrong.
And yes, I have a real problem with using FIP as the basis for a WAR estimate. I'm not sure what the percentage of pitchers who can "beat their FIP" is, but I'm certain that it's greater than 10% which has to raise serious questions about using FIP as the basis of an entire methodology.
That is a major flaw then. They should just ignore potential unearned runs and go with RA as the baseline. I've just assumed all the methodology's of Component era (which is basically what fip is, a different version, true, but it's still putting the components into era format) used ra as the base line and ignored unearned (potential) runs entirely.
I don't think there really is a better alternative to fip that is easily available who has withstood the peer reveiw process to the level that fip has. There is a reason that these stats get popular and stay.
I'm not sure it's all that hot. Other than Chapman, they all have WHIPS over 1.3.
Isn't that like b****ing about Newton's laws when you get into Einstein's realm? When you see this:
"30 K, 2 BB, 6 H in 13.1 innings for July"
you don't need no fancy formula. He's been a m#####f###### in July and he won't continue pitching like this.
Wow, that's really, really good.
Wow, that's even better.
I think that's a stretch. Any number of hallowed formulas (such as runs created) are capable of producing impossible results given an extreme stat line. Indeed you could make those formulas so that they wouldn't generate such lines, but in doing so would make them significantly less accurate in almost all cases. I don't think it's fair to say that making a formula less accurate just to avoid scenarios where it spits out a negative number in an extreme outlier is a good thing.
It is, however, fair to say that spitting out a negative number for Chapman's July statline may be a problem. He didn't strike out every batter or even close to every batter so maybe throwing a negative number out there is a bit much. This little piece of meaningless statgeekery may be a bit more accurate than standard FIP (although it might not, I never checked) but also a bit more complex and hardly perfect in its own right:
DIPS base runs
It's an excel file, the cells in orange are the only ones you'd need to change.
In very rare cases it can still spit out a sub zero number, but I also fixed that with a kludge that treats every ER total below 0 as 0, the same can be done for IP if needed.
You obviously know alot more about this than I do, but I'm always going to doubt any formula, no matter how "hallowed", if it can give me an impossible answer with valid inputs. And with FIP, all of the inputs except one (the constant, and that can't create the impossible answer (since the "impossible" part is a sub0 ERA, the constant is positive, and we're adding it)) are 100% valid  IP, K, HR, BB, IBB can't be mismeasured. I guess I think either or both of 2 things: Either the theory is wrong (FIP measures "what a player’s ERA should have looked like over a given time period, assuming that performance on balls in play and timing were league average"), or the given formula doesn't accurately measure FIP. Otherwise, you couldn't see results like Chapman's, even in a small sample size.
I think a good comparison for FIP's accuracy and utility is Pythagorean Record. We've all seen that there are better exponents to use than 2 for more accuracy, and that you can go into 2nd order or 3rd order Pythagorean record by basing it on component stats and adjusting for strength of opposition, but you still see people using the basic Pythagorean record far more frequently than the more advanced and more accurate versions.
This is a good way to look at things.
It's not a theory, it's a run estimator. It being an estimate it's going to be subject to error.
A 30 strikeouts, 2 walks, 1 hit batsman and 16 balls in play will generate a different number of runs depending on how you order them. Furthermore, the strikeouts and walks (and lack of home runs) give us a further suggestion of how many of those 16 balls in play are outs, hits, errors, etc. (IE high strikeouts suggest lower BABIP, high homers suggest more flyballs which suggest lower BABIP).
Now in a simple linear estimator, the strikeout coefficient absolutely has to be negative and significantly so. 0 strikeouts, 2 walks, 1 hit batsman and 16 balls in play are usually going to generate more runs than the same but with 30 strikeouts. So the problem is that it's a simple linear estimator addressing something that's more complicated than a linear relationship.
But that was the point of FIP, to cut down on the complicated stuff I was doing to get an ERA that was quick and easy to calculate. I was only tangentially concerned about producing an estimated ERA myself, the concept itself was more important to me. But others were interested in such an estimate and so Tango produced one.
I don’t think that’s right. FIP is a formula based on a linear regression of certain component stats versus ERA. The relationship between those components and run scoring is roughly linear over the normal range of those statistics, but it’s not linear at the extremes.
Chapman has a negative FIP because the stat is flawed, not because it is saying anything specific about his performance relative to other pitchers with similar ERAs.
FIP is not giving an impossible answer, it's just not measuring what you have apparently assumed it was measuring or perhaps doing so in a different way than you thought but this has already been pointed out to you. If you want you can say that you don't like that it was designed to make the final result roughly comparable to ERA and yet still allows for negative results that's fine but it doesn't make the answer impossible any more than something like an OBP lower than batting average is impossible.
Thanks. I see your point.
2. On the other hand, since OBP and AVG are measuring different things and have different denominators, it's easy to see that OBP could be lower than AVG. A more correct comparison would be a AVG or OBP "estimator" that spit out a negative number.
On the west coast, such an estimator would be called "the Vernon".
1) That just means that explanation given by fangraphs is flawed, incomplete or most likely purposely dumbed down to a level that is unconcerned with extreme outliers like Chapman currently is.
2) FIP and ERA are also measuring different things, one is just purposely adjusted to match the other in a way that works alright in most cases but not all. If you'd like to say the method they used to make this adjustment could use some work or special rules to handle special cases alright but frankly... meh, it's not that big of a deal IMO and certainly not a reason to suggest it is a "broken" stat across the board and should be junked.
See post #22.
You are right, though I can see the argument that says that Chapman's July performance isn't so extreme that it should break the formula. 60% of batters faced ending in a strikeout is a whole lot by normal baseball standards, but isn't necessarily so far out there that it should break a run estimator.
Maybe FIP could probably be redone with more precise coefficients and maybe that would minimize that a little. But then that really wouldn't serve a very useful purpose. FIP does what it does and 'newFIP' would do almost virtually the same thing.
It's a nice estimate of performance with the elements a pitcher can't control (or has decidely much less control on) factored out. It's hardly as useful as OPS+, but it's better than ERA and a bit more descriptive than just saying a guy has a 3/1 k/bb ratio.
An ability focused approach to player evaluation is sort of my holy grail crusade. Strange as it may sound from a noted stat geek, I felt like sometimes we overvalued the statistics that players generated rather than the individual playing abilities that drive them. 31 is not Adam Dunn's home run rating, it's a raw count of how many home runs he's hit this season. That raw count is influenced by a whole host of factors, including, of course, Adam Dunn's baseball playing abilities during the season.
In terms of player evaluation, it's that ability which is most important not necessarily the 31 home runs. DIPS, at base, was an attempt to look at pitching ability from a different angle than was prevalent at the time. The mathematical gymnastics that took place toward that end were just one way to try and go about examining that angle.
It is not better than era, it's a completely different animal, which has a different useful factor. What is better than era would be raa(or just ra if you prefer). Fip is a useless animal at telling you the actual performance of a pitcher, but is a good projection tool, if you take the pitcher out and assume he's going to pitch with the same strategy, in a neutral environment.
Extreme cases serve to highlight the model's weaknesses. They don't make the model fail in general, or make it worthless in general.
FIP is attempting to measure a pitcher's expected ERA. It uses the things a pitcher can control (K, BB, HR rates), a "coefficient" that normalizes the things he can't (BABIP), and spits out a number that by definition is "what a player's ERA should look like". It's not adjusted just to match ERA "in a way that works alright"; the adjustment takes place to approximate what average BABIP does to an ERA. It isn't there just to "look pretty", it's specific number for a (supposedly) mathematically valid reason. The final number looks like ERA because that's what it's supposed to be  a "better" version of ERA.
wOBA does what you think FIP does; it looks at individual events (some of which are unrelated to OBP, like SB, CS, and different types of XBH), but then adjusts the weights of those events so that the final league wOBA is league OBP. It is given a number to look pretty, so that you could (somewhat) quickly and easily know what a "good" or "bad" wOBA should be. wOBA is supposed to measure a hitter's total offensive contribution, which OBP clearly doesn't, but it looks like OBP for simplicity.
As for AVG and OBP, AVG is measuring H/AB, but OBP is measuring OB/PA. AB and PA are defined differently, so AVG and OBP are measuring completely different skills.
Simply look at any linear runs estimator. An out is worth .3 runs, a single is worth .4 runs or so. Yet when the leadoff batter makes an out or singles, the score doesn't change. In a perfect game, the losing team doesn't score 8 runs. Nobody freaks out about this stuff. (Since the number of outs is essentially fixed, I suspect outs shouldn't be in the runs estimator function at all but I'm not sure how to get rid of them.)
30 K, 2 BB, 6 H in 48 PA and a team/batter would have "produced" about 6 runs by a linear runs estimator (if all singles, gets a bit better with some doubles and triples).
The mismatch between runs produced and runs scored in a game is due to lots of things but starting with the fact that runs scored is a discrete distribution (i.e. whole numbers) while a linear estimator assumes it's a continuous distribution. Linear run estimators work only because (generally) as you accumulate your count over many "trials", the discrete distribution behaves enough like a continuous one that the approximation works. That is a linear run estimator should work OK over a season and maybe even 20 games or something but will spit out some funny results within the context of a game.
Not implying park adjusted, I'm more talking about the strategies a pitcher uses based upon the defense behind him. And of course not every pitcher adjusts their style based upon the defense behind them.
I'm responding to the fact that FIP, great estimator of future performance that it is (even though a player can "break" it), isn't park adjusted. What would Chapman's FIP (or especially xFIP, which adjusts for HR/FB) look like if it were adjusted for GABP?
On one extreme you have something like newtons laws of mechanics where the laws are derived based on physics, and the data support them 100%... until you get to very small objects like electrons going around an atom... where newtons laws do not work.
On the other extreme you have a very complicated relationship between multiple input variables and the output variable of interest such that a closed form solution is not even possible, so an estimator (linear or nonlinear) is used to approximate the actual relationship. It is useful if it predicts the actual within a resonalble % and the range of input variable under which the results are accurate cover most of the phase space you are interested in. FIP is closer to this extreme.
Or to take a widely accepted, nonpredictive, recordofwhatactuallyhappened stat, ERA. A pitcher comes in. He gives up a homerun. The manager comes and takes him out. Over that short, extreme stat line, the pitcher's ERA is now infinite (as is his WHIP). Obviously this is impossible  a pitcher, even one on my fantasy team, cannot give up Infinity runs. But since there's a zero in the denominator that's what this widely accepted formula gives out. Other than that it does a pretty good job of telling us how a pitcher performed and, over time, normalizes to give some measure of their true ability/value or lack thereof.
I believe almost nothing you've said here is really quite true but for reasons I'm unwilling to type out on a tablet other than to point out that basing your arguments on the halfassed characterization of FIP given by fangraphs is really not something you should be doing at this point in the conversation.
Really? I have some experience in stats and databases and I found his description of FIP well put. Voros said that Tango was interested in making it "comparable" to ERA, so he added some math that made it so. Voros was just more interested in the concept of more accurtely measuring a pitcher's performance based on run values and peripherals  aka "things he can control".
We can argue to what extent a picher can control balls in play. I'm pretty sure that voros himself would admit that much of what he originally though about DIPS theory, of which FIP is a type, was wrong.
There is a chasm of learning between Newtonian physics and the Higg's Bosom. We're probably at about the former in terms of understanding pitcher performance  but FIP is a good start.
That is actually basically what I said that the quoted section was arguing against, not for because seemingly despite having had the math explained, clearly understanding it and being repeatedly told what FIP is actually measuring he very oddly still seems to think quoting fangraphs(I guess Cameron?) saying "this is ERA but better" has any real relevance to anything once you delve even an inch below the surface on the topic.
OK, then why would anyone run a web site that uses it to assign value to past performances?
Even more importantly, it probably needs three seasons or so of data for most modern relievers. So Aroldis Chapman is being unfairly maligned here  he didn't break anything, small sample size did.
Because they are obsessed with promoting crappy stats. (Wpa, fWarfor pitchers, and xFip) That deals in theory of what could have happened in an alternate universe in which every player but the one being listed, is "average" and statistical flukes that they can't account for, just doesn't exist.
This is simply not true. It's because we are interested in separating the wheat from the chafe, the "true talent" from the fluke.
Just take pitcher WAR values with a grain of salt if you don't like its components. Nobody is suggesting we determine Cy Young on FIP, are they?
While ignoring every single situational thing imaginable, except in the case of the ultra silly WPA, in which situations trumps reason to the nth degree. Nobody should look at fip and say "this guy had a good or bad year" It's doesn't measure past performance. It can be a useful tool for future projections in the right situations (when the defense and park behind them change)
The two of the more intelligent baseball writers for the Cy young, Keith Law and Will Carroll both based their cy young votes on fip. So yes, some people are saying that fip should determine the cy young.
You break it, you bought it.
Except you refuse to believe that pitchers can respond to a situation or raise their game in certain situations  all evidence to the contrary notwithstanding. This explains why all the computer drunk statnerds insisted that Glavine would flame out about 10 years before it happened, because he did not do it the way their spreadsheets said it should be done. Instead of separating wheat from chaff, they throw out any wheat that does not correspond with their preexisting biases.
Of course it does, it just does so selectively (as of course do ERA and WinLoss record). Now you can argue whether being selective like that has any value, but it's certainly evaluating past performance.
It measures a very limited spectrum of performance, ERA measures actual results(mind you it overstates the pitchers role in it, and luck also) but it measures things that actually did happen on the field. Fip measures k/w/hr's/ip allowed and then dumps it into a formula that comes up with a number that looks like era. It doesn't care about actual performance (hits allowed, types of hits, etc. other than the three true outcomes)
FIP measure 3 things (K, W, HR); ERA only measures one thing (earned runs). (They both have innings) FIP is a broader spectrum of performance.
ERA measures actual results
Not really because it's measuring earned runs, not runs. Earned runs are an artificial construct (i.e. defined by a convoluted rule, there is no "actual" difference between earned and unearned runs). They are a construct which says that pitchers have zero responsbility for runs which occur under certain circumstances. FIP is an estimate based on "actual" events which assigns pitchers zero responsibility for (expected) runs which occur due to BIP.
There is no "real" way to assign runs allowed to a pitcher. Runs occur through a combination of pitching and defense and there's no ironclad way to tell who's responsible for what. Earned runs are just a crude way of assigning that responsibility that doesn't take into account how well the pitcher pitched and has a prety much completely nonsensical way of assigning value to the defense. FIP at least tries to assess how well the pitcher pitched in an overall sense and it's hard to imagine it could possibly do worse at assigning value to the defense.
Things like FIP are valuable because human beings struggle mightily in thinking multidimensionally. All FIP really is is K, BB and HR. But how is anybody to easily know whether a guy with 180 K, 90 BB and 25 HR is better or worse than the guy with 120 K, 60 BB and 20 HR. FIP brings that info together into one number in a reasonably reliable way and scales it to something we're used to dealing with. That's a good thing, not a bad thing ... or at least it's better than ERA.
From that extreme to the opposite extreme (where a stat is completely dependent solely on playing ability and only playing ability), somewhere in between sits the various playing stats for players. So I think establishing the relationship between a particular statistic and playing ability is fair game in assessing a player's past performance. I think you can make the argument that Cy Young awards should simply be awarded based on the traditional statistical record for pitchers, regardless of ability considerations, because that's what the award is all about. But I don't think that's necessarily the _only_ way to look at the award.
Player A won 20 games, but he had run support from that powerhouse offense and an incredible bullpen that bailed him on more than a few occasions. Player B lead the league in ERA but his home park is 5000 feet below sea level and he had a track team in the outfield running down fly balls. Player C was only 5th in the league in ERA, but he led the league in Ks  he was the most feared and dominant pitcher in the league when he was on.
All of these stats measure specific components of a pitchers ability and performance, and they are all useful if you understand the limitations and biases. I feel the same way about FIP, which is really just a convenient way of looking at some of the most important and repeatable pitching peripherals.
