Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Friday, May 02, 2008

The Book Blog: You wanted crap/yap from a premium writer….

The Neyer/Lichtman Guide to ########…An Historical Compendium of ########, ########, and #######. (but the book ends well!)

From Rob Neyer, who is lately (maybe for a long while) just as obsessed (and misguided) as almost everyone else about short-term recent performance:

So is Cliff Lee for real? I think all we can say is that he’s really healthy. He’s going to give up a higher batting average on balls in play, and some reasonable percentage of the fly balls he gives up will fly over the fence. So no, he probably doesn’t wind up winning the Cy Young Award. But I’ll bet he’s better than average. And considering how well C.C. Sabathia’s pitched in his last two starts, suddenly the Indians would seem to have the best rotation in the majors.

So Cliff Lee, 31 years old, is better than average, because he has pitched well to 128 batters after having pitched mediocrely, at best, to 3047 batters over the last 4 years?  I think not, and I will take up Neyer on that bet (he offered this time, although obviously not literally).

...That is a fairly sucky pitcher who, based on his 128 batters faced so far this year, is a now an ever-so-slightly less sucky pitcher!  He is NOT better than a league average pitcher, nor he is a league average pitcher.  (Warning: of course, I don’t KNOW what he is for sure, but my estimate, since it is based on science, is a heck of a lot better than Neyer’s, which is based on nothing, but a distorted and misinformed view of what 5 outings of good pitching following 4 years of poor pitching, means.)

...The sad part is that Neyer knows this stuff (I think), but he still writes the same crap that everyone else does.

Repoz Posted: May 02, 2008 at 12:14 PM | 176 comment(s) Login to Bookmark
  Tags: community, sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 2 of 2 pages  < 1 2
   101. Jonah Keri Posted: May 02, 2008 at 08:58 PM (#2766805)
Tango, the dollar values were:

Sabathia $28, Harris, technically $5 (I bid $10, but only because he was my last player so no point leaving $ on the table), Scutaro Reserve (i.e. $0)

Lowell $20, Wakefield $2, German $3 and Lee...$0!
   102. villageidiom Posted: May 02, 2008 at 09:10 PM (#2766819)
But so what? We aren't gods. We don't know hardly any of this information, and we aren't capable of knowing it.
Not being able to know everything, and dismissing the usefulness of anything that isn't "known", are two different things.

MGL asserts that his answer has more "science" than Neyer's because, of the X things that he knows from his sample, the type of variation we see from Cliff Lee is generally "random", and random events generally wash out in the long run.

MWE asserts that, just because a piece of information isn't in one's sample, it doesn't mean that it is random and that it'll wash out in the long run. Nor does it mean that one's answer, ignorant and dismissive of new information, has more "science".

I've been amazed at how much confusion there has been in this thread. Random is a relative term - it's relative to what we know or to what we should expect (which in turn is based on what we know).

Tango, you're continually asserting that certain outcomes are random relative to the mean. That's fine... But that mean accounts for nonrandom stuff: Cliff Lee is a certain age, pitching in a certain park, pitching to batters of certain handedness. We generally have no trouble adjusting the mean for general variation by these contexts. But another way of looking at it is that each of these contexts has its own mean. You wouldn't dismiss the variation of contexts as random variation around One True Mean, would you? If his historical data consisted of 80% of PAs against RHB, would you conclude that his overall mean performance level would be relevant were he to be used from now on as a LHB specialist? Of course you wouldn't.

Let's take it further. MWE and others are asserting that, for Cliff Lee, "not throwing changeups to LHB" is a specific context. In the prior historical sample it had very little weight, because it happened so few times. In 2008 it has much greater weight. Is it happenstance, and should we expect him to stop this "random" deviation from his One True Mean? If there's anything to it, probably not.

If anything, we certainly shouldn't dismiss it on the grounds that the One True Mean has "more science" around it. If anything, it has less.
   103. Tango Posted: May 02, 2008 at 09:10 PM (#2766820)
Unless Lowell is out for the season, that trade does not value Lee very much does it?
   104. Jonah Keri Posted: May 02, 2008 at 09:17 PM (#2766830)
I would say not. Lowell *was* on the DL at the time, but it looked like a routine thumb injury and the ETA then was another week (which proved to be true). Part of it is that Harris is an everyday player and German isn't, which matters a lot in a league as deep as LABR (12 teams, AL only, 14 hitters, 9 pitchers, 6 bench spots plus DL spots). But still, yeah, this was a sell-high on Lee.

He's 2-0 for me since. I imagine his value in single-year roto leagues is a little higher now that he's got 5 starts of data behind him instead of 3.
   105. Tango Posted: May 02, 2008 at 09:18 PM (#2766832)
Tango, you're continually asserting that certain outcomes are random relative to the mean. That's fine... But that mean accounts for nonrandom stuff


Of course there's alot of nonrandom stuff. No one is disputing that.

I'm saying even if there aren't any nonrandom stuff, the outcome will still be random around that mean. It has to, by definition of a binomial.

The things we don't know affect the mean, and it affects our uncertainty level around that mean.

Please, cut/paste whatever it is that I have said that you (or anyone) disagrees with, because the interpretation or implication of what I'm saying is not what is being asserted here.
   106. John Lynch Posted: May 02, 2008 at 09:26 PM (#2766837)
Look, I don't think anyone is claiming that adding more context to a problem in order to obtain a better mean or less variation is a bad thing. We can all agree on that. The point is that until you add meaningful context, there's no point in speculating about whether or the variation has a quantifiable explanation. First, add the new context, then talk about its implications. Don't assert that because there's probably more context out there, we should treat our current context as if the random variation present there actually has meaning. You can't do that.

Mike and Dayn added more context to the discussion. That's awesome. I would note, as GuyM does, that to have real meaning we would probably need a way of measuring how Lee not throwing his change up in certain situations affects his performance. At the moment, we have observed a correlation (less change up, better performance) and are ascribing causation to it. I don't think that's a good idea, but it certainly gives an area worth investigating.

Again, no one is saying that MGL's analysis is the final word; obviously more thorough analysis can be done. However, MGL's observation that the context Rob provided is not enough to draw reasonable conclusions is correct. You have to have more than a simple observation of apparently correlated variables to assert that Lee is likely to continue exceeding expectations. If Lee had changed his diet from Burger King to Taco Bell before the season, would that be relevant? Probably not. Granted, the change up pattern has more legs, but it still requires more examination.

Finally, why is there a stigma associated with the word "random?" When I use it, all I mean is that we have not accounted for some factor(s) in any predictable fashion. There's no moral component at play here. I'm not talking about "good luck" or "bad luck," only unaccounted for factors. I'm not impugning Lee's character or performance by suggesting that he's somehow pitched less well over the first month because the reasons for his improved pitching are unclear to me at this time and perhaps are unlikely to carry forward. It's just a way of saying that we haven't accounted for all factors.
   107. GuyM Posted: May 02, 2008 at 09:28 PM (#2766839)
MWE and others are asserting that, for Cliff Lee, "not throwing changeups to LHB" is a specific context. In the prior historical sample it had very little weight, because it happened so few times. In 2008 it has much greater weight. Is it happenstance, and should we expect him to stop this "random" deviation from his One True Mean? If there's anything to it, probably not.


No one is objecting to using data on pitch selection, if it improves our predictions. Do you (or Mike) have any data to show that throwing fewer changes makes pitchers better? Or make LHP's better? 29-yr-old pitchers better? And if not, why should we care?

Any time a pitcher has a good streak (or implodes), you can almost always find something he appears to be doing differently. Does that then justify a new assessment of his ability? Only if there's some demonstrated link between that change and performance.
   108. John Lynch Posted: May 02, 2008 at 09:57 PM (#2766869)
Any time a pitcher has a good streak (or implodes), you can almost always find something he appears to be doing differently. Does that then justify a new assessment of his ability? Only if there's some demonstrated link between that change and performance.

You know I'm a shitty writer because it usually takes me three paragraphs to say something that others can say in one. Well done, sir.
   109. Tango Posted: May 02, 2008 at 10:08 PM (#2766875)
When I use it, all I mean is that we have not accounted for some factor(s) in any predictable fashion.


Randomness is about timing, not about accounting for things. Not accounting for things is about uncertainty.
   110. Forsch 10 From Navarone (Dayn) Posted: May 02, 2008 at 10:10 PM (#2766877)
Here are some pitch type data for Lee from 2005-08 vs. LHBs ...

2005

Fastball - 72.2%
Curve - 14.7%
Slider - 10.3%
Change - 2.8%

2006

Fastball - 70.8%
Curve - 15.3%
Cutter - 12.5%
Change - 14%

2007

Fastball - 67.3%
Slider - 17.6%
Curve - 8.1%
Cutter - 5.6%
Change - 1.4%

2008

Fastball - 83.0%
Curve - 8.8%
Slider - 6.0%
Cutter - 2.2%

According to STATS, he hasn't thrown a change to lefties this season, but he didn't throw it all that often in the past to same-side hitters. Biggest difference I can see is that he's throwing his fastball much more often to lefties and his slider much less often. Obviously, I can't say this is why he's improved, but he's definitely taking a different approach thus far (limited sample, I know).
   111. Chris Dial Posted: May 02, 2008 at 10:24 PM (#2766885)
No one is objecting to using data on pitch selection, if it improves our predictions. Do you (or Mike) have any data to show that throwing fewer changes makes pitchers better? Or make LHP's better? 29-yr-old pitchers better? And if not, why should we care?

Any time a pitcher has a good streak (or implodes), you can almost always find something he appears to be doing differently. Does that then justify a new assessment of his ability? Only if there's some demonstrated link between that change and performance.


Well, part of the point is that the change can be real, but people aren't ready to accept it as the cause until teh sample gets larger. That doesn't mean that the cause isn't already effected.

If Lee does this all season and next, and still leaves his change in his pocket vs LHBs, then MGL and you will be happy to accept it. What you can't accept (seemingly) is the initial observation and assignment of cause. It's stats versus scouts,a nd you are completely dismissing the scouts (AFAICT).
   112. John Lynch Posted: May 02, 2008 at 10:29 PM (#2766888)
Randomness is about timing, not about accounting for things. Not accounting for things is about uncertainty.


I'm not sure I understand the difference then. I would have thought that randomness lead to uncertainty.

For example, the output of a pseudo-random number generator will look random, not because of any timing issue, but simply because we don't know the formula that generated the numbers. Once we account for that factor, that is, once we learn the formula, there is no randomness at all.

An explanation would be wonderful, because the difference is not readily apparent to me.
   113. Dan Turkenkopf Posted: May 02, 2008 at 10:39 PM (#2766895)
Staying out of the randomness discussion, which I'll agree appears to be mostly philosophical, Peter Bendix wrote about Lee's performance over at Beyond the Box Score today. He touches on a number of factors that contribute to Lee's start including some of the pitch f/x data.
   114. GuyM Posted: May 02, 2008 at 10:54 PM (#2766905)
Chris/114: I don't see this as stat vs scout. The observation of a lower changeup % (which appears not even to be true, per 113, but whatever...) doesn't necessarily have anything to do with the improved performance. I'm quite prepared to believe it does, if someone has evidence that LHH kill changes from LHPs, or something like that. But all we appear to see here is a pitcher returning to a normal platoon split over 5 games (albeit at a very good overall performance level), after a season in which he had a weird reverse split. Why is this important?
   115. Voros McCracken, Human Shield Posted: May 02, 2008 at 10:59 PM (#2766908)
Dayn,

You have to be very careful there with reverse causation issues. It could very well be that his success this year is causing him to pitch differently rather than the other way around.

We know pitchers throw different types of pitches by count, runners on base, outs, inning and score. Lee's success in 2008 may be prompting him to throw different pitches as a matter of differences in those situations rather than any conscious desire to improve his performance by pitching differently.
   116. Voros McCracken, Human Shield Posted: May 02, 2008 at 11:42 PM (#2766931)
If you knew for certain everything, the players, the park, the weather, the groundskeeper, the partying... everything, you would STILL have random variation, and that random variation would be 1 SD = 0.5/sqrt(162).

I disagree. It depends on your definition of "everything". If you mean literally _everything_ then you can know the exact outcome of a baseball season, flip of a coin, cards in a hand, the weather on Tuesday, etc.

Probability theory is a field of study which attempts to _model_ uncertainty. Once you know everything, uncertainty disappears. I realize Heisenberg and all that, but he was still discussing a physics problem (the mere act of trying to measure something changes the data you're trying to measure) rather than a knowledge problem (which is a philosophical one). So if you knew everything, you could even know both the position and momentum of a particle in a small region of space.

Though as Tango is noting, the uncertainty involved in predicting a baseball season is not an argument against using probability models, it's an argument in favor of it. By definition, probability theory is our best weapon to date to deal with those uncertainties.
   117. Tango Posted: May 03, 2008 at 01:21 AM (#2767107)
Voros/119:
As I said, as long as you know everything other than WHEN it will happen. As soon as you know WHEN something will happen, you are god or are accepting a predetermined fate.

However, I am saying that if you know...
1. every possible variable
2. every possible condition
3. and the behaviour of all of these variables and conditions (whether inherent or conditional on other variables and conditions)
... then you still have random variation of binomial outcomes, and it is explained as 1 SD = 0.5/sqrt(n)

***

John/115: as long as you don't know WHEN something will happen, then it is random. As soon as you know WHEN something will happen, it is not random. It is predetermined. Future outcomes, unless they are predetermined, must therefore be random, given a mean (however uncertain we are of that mean). And the random results will follow a binomial distribution, around that mean.
   118. GuyM Posted: May 03, 2008 at 01:29 AM (#2767122)
I don't think this is right, Voros. There is literally no amount of knowledge you could have on April 1 that would allow you to predict the exact outcome of a MLB season, unless by "everything" you include the ability to know the future (in which case this is tautological). I don't care how much you know about these players' bodies, minds, and talents, there is still a lot of contingency to deal with. You can't know whether player X is going to collide with player Y on July 17 and end his season. Perhaps you can estimate the probability well, but no information available on April 1 allows you to know that outcome for sure.

Or, if you're right, then you have to believe that all human activity is predetermined. The only way to predict the final season standings perfectly is to predict every game accurately, which means predicting the actions of all players, decisions of managers, injuries, and so on. If someone with perfect knowledge of the present could predict everything about the future, that means the world only appears uncertain to us, but in fact is quite certain. Among other things, this would imply no free will. I can't imagine you believe that's true.
   119. Tango Posted: May 03, 2008 at 01:30 AM (#2767124)
I said this:
2. If you decide to bring in outside data (he was hurt, he had a minor league rehab that went ok, he changed his pitching mechanics or how he mixes up pitches), then this is perfectly legitimate, and really desirable. How you weight this information is of course the key.

How much does all this affect the forecast? Beats me. I won't pretend to know.


Chris said this:
What you can't accept (seemingly) is the initial observation and assignment of cause. It's stats versus scouts,a nd you are completely dismissing the scouts (AFAICT).


I'll be clear: I am not dismissing scouts. At all. Au contraire, I would love to see scouting reports.

I am dimissing starting with the numbers, and then trying to find a scouting angle to support those numbers. I would much rather prefer to see scouting independent of the numbers (much like I do the Fans Scouting Report, where I implore people to base their evaluations independent of the numbers, however successful that is).
   120. Voros McCracken, Human Shield Posted: May 03, 2008 at 02:22 AM (#2767270)
Or, if you're right, then you have to believe that all human activity is predetermined.

No, all I'm saying is that with truly "perfect" knowledge comes the ability to perfectly understand the effects of the seemingly infinite number of variables that affect things like "choice" and "free will". Any level of knowledge that doesn't contain these things is pretty much by definition "less than perfect." To me it's a definitional thing: with perfect knowledge, uncertainty disappears. Without uncertainty, probability models fail. Any level of knowledge in which uncertainty still exists, is by definition imperfect.

Of course we don't have perfect knowledge nor are we close nor likely will we ever have it. And so we use things like probability theory to model areas in which we are uncertain. We can always advance our knowledge by reducing the areas in which we are uncertain.
   121. gay guy in cut-offs smoking the objective pipe Posted: May 03, 2008 at 02:47 AM (#2767360)
Or, if you're right, then you have to believe that all human activity is predetermined.

There is a school of philosophical thought that holds that a deterministic universe is compatible with the existence of free will. Logically enough, it's called "compatibilism". If memory serves, Daniel Dennett is one of its modern adherents.
   122. Chris Dial Posted: May 03, 2008 at 02:59 AM (#2767385)
I don't see this as stat vs scout. The observation of a lowe
   123. Chris Dial Posted: May 03, 2008 at 03:03 AM (#2767398)
Boy, that eaten post sucks.

But: Tango:
your argument in 122 seems to be that you didn't hear that scouts said "stop throwing your changeup to LHBs" prior to him stopping doing it.

That doesn't really refute what I said.
   124. Chris Dial Posted: May 03, 2008 at 03:06 AM (#2767421)
The observation of a lower changeup % (which appears not even to be true, per 113, but whatever...) doesn't necessarily have anything to do with the improved performance. I'm quite prepared to believe it does, if someone has evidence that LHH kill changes from LHPs, or something like that. But all we appear to see here is a pitcher returning to a normal platoon split over 5 games (albeit at a very good overall performance level), after a season in which he had a weird reverse split. Why is this important?


Oof. Because not every pitcher is as effective at throwing changeups as one another. The large group data may not indicate that a particular pitcher is poor at disguising the changeup - he slows down during delivery or changes his arms lot, or otherwise tips the pitch - none of this has ANYTHING to do with large group data,but instead is specific to the palyer and his delviery. Which is specific to interpretation of scouts vs. stats. (Same with a given LHP masher).
   125. KJOK Posted: May 03, 2008 at 03:18 AM (#2767449)
I am dimissing starting with the numbers, and then trying to find a scouting angle to support those numbers. I would much rather prefer to see scouting independent of the numbers (much like I do the Fans Scouting Report, where I implore people to base their evaluations independent of the numbers, however successful that is).


But a scouting report that said "Lee has stopped throwing change ups to LH batters", while it might be helpful if we were about to bat against him, really in no way helps us answer the question about the 'realness' of Lee's recently improved performance.

Now, if we had something like "Lee threw 20 change-ups to LH batters in 2007, and 18 of them were hit for home runs, and in 2008 he has not thrown a change-up to LH batters", THEN we MIGHT be able to make a judgment about the link between not throwing change-ups and an increase in performance.
   126. Los Angeles Waterloo of Black Hawk Posted: May 03, 2008 at 03:30 AM (#2767480)
Cliff Lee's had a five-start string of performances that attracted the attention of Rob Neyer as, perhaps, being indicative of a change on Lee's part. Dayn Perry chose to take a look at what Lee was actually doing - and noted something that he WAS doing differently (namely not throwing changeups to LHB). And when I looked at the data, what I came up with was (a) most of Lee's performance decline in 2007 can be traced to his struggles against LHB, (b) he's doing something different in 2008, and (c) it's working so far - his platoon splits are back to where they were in 2006, in the relative sense (although in an absolute sense he's doing much better against everyone). That one thing - his different approach to LHB - suggests that his 2007 performance, where he struggled against LHB, might not be all that relevant to the 2008 version of Cliff Lee. That's what Rob Neyer *might* have picked up on, even if he didn't write it that way. And that's what WE need to do a better job of picking up on if we want to be relevant going forward - detecting when a change in performance might be signal and not noise - and not simply wedding ourselves to statements like "random variation" and "regression to the mean".

I agree with this passage 100% (at least in regards to the thrust of it).

My Google Spreadsheets tells me there is a .85 correlation between Casey Kotchman's Flyball Percentage and his OPS; the more he hits the ball in the air, the better he does (it's also .85 once I add in line drives). In his career, he has a 395 OPS when hitting the ball on the ground -- for comparison all hitters last year had an OPS of 512 on groundballs. (Sorry for using OPS on this, it's just easier than wOBA.)

I don't know if Casey Kotchman knows this, but in watching him play and looking at this breakdown, I'm confident in saying that if he continues to hit the ball in the air he will continue to be a good hitter. This relationship even exists in years when there are other explanations for his performance: in the last two years, he's hitting the ball in the air 30-31% of the time; in 2006, when he tried playing mono with a month before hitting the MASH unit for the rest of the season, it was a career-low 25%.

I don't know if that's science, and I don't know if I'm justified in being as confident about this relationship as I am. But if Kotch came to me and asked me what he had to do to keep performing, I'd say, "Keep doing what you're doing; stay back on the ball and drive it. You get into trouble when you get out on your front foot too early and then roll your top hand over on your swing. Don't roll the top hand! That way lies The Ruin of Darin '4-3' Erstad."

I'm using this example because I know it, because I've seen something like 95% of Casey Kotchman's major league at-bats (if not more), and because I'm crazy enough to look at his splits and see how he does when he hits the ball in different ways (I'm so crazy that I did this over a year ago before BB-Ref or Fangraphs even had the info easily available; I had to go through PI and do it by hand). But there are dozens of examples of stuff like this, where guys change their approach or their focus, pitchers who drop this pitch or add that one, and succeed as they hadn't before.

Maybe I'm wrong here, and confusing correlation for causation. My reading on Kotchman is based on my observations of his play, and the numbers back up my observations and (possibly) quantify their effect. But if I'm not mistaken, this is exactly the sort of thing Mike is advocating. I think it's a promising line of inquiry. Mistakes may be made, but I'm not worried -- it's just baseball, after all.
   127. KJOK Posted: May 03, 2008 at 04:04 AM (#2767557)
I don't know if Casey Kotchman knows this, but in watching him play and looking at this breakdown, I'm confident in saying that if he continues to hit the ball in the air he will continue to be a good hitter.


But could it be that actually "he will continue to be a good hitter if he continues to hit the ball in the air"?! Or what if, watching Kotchman play, you'd instead noticed that he seemed to hit well on Sundays, and then looked at the numbers and saw that, his OPS in Sunday games is 1120 vs. league average of 800. Would you conclude then that "if he keeps playing on Sundays he'll be a good hitter"?!
   128. Forsch 10 From Navarone (Dayn) Posted: May 03, 2008 at 04:46 AM (#2767602)
Dayn,

You have to be very careful there with reverse causation issues. It could very well be that his success this year is causing him to pitch differently rather than the other way around.


Howdy, Voros. Certainly this is a possibility. However, I'd think that if success were dictating pitch selection rather than vice-versa, then you'd see Lee relying more on his secondary pitches. Things as they are, however, he's very fastball heavy. That says to me that it's a premeditated change in approach. Of course, I have no way of knowing that.

Oof. Because not every pitcher is as effective at throwing changeups as one another. The large group data may not indicate that a particular pitcher is poor at disguising the changeup - he slows down during delivery or changes his arms lot, or otherwise tips the pitch - none of this has ANYTHING to do with large group data,but instead is specific to the palyer and his delviery.

This is definitely something worth keeping in mind. The change is the hardest pitch to throw for most pitchers. It's rooted in imitation and requires of degree of mechanical perfection that not many--even those among the best pitchers in the world--can master. In Lee's case, it's possible that left-handed batters were picking up on something that wasn't perceptible to right-handed batters. Different vantage points and all that. Again, pure speculation on my part. It's not a matter of "LHHs killing changeups from LHPs." Each pitcher is different. It's a matter of LHHs killing Cliff Lee's changeup (if, in fact, they were). In any event, as hackneyed as this point is, it still bears repeating: the numbers don't tell you everything.

No one is objecting to using data on pitch selection, if it improves our predictions. Do you (or Mike) have any data to show that throwing fewer changes makes pitchers better? Or make LHP's better? 29-yr-old pitchers better? And if not, why should we care?

This is too narrow. This overarching point is that pitch selection matters, and altering pitch selection affects performance. The point is not that fewer changeups yield improved performance.

As for MGL's post, sure, it's possible that small sample size is what's behind Lee's improvement. However, to dismiss it fully on those grounds is lazy. It's also possible he's doing something different. I don't think I'm slaying a straw man in pointing this out, as MGL's post, as I parsed it, doesn't allow for this possibility.
   129. Tango Posted: May 03, 2008 at 05:11 AM (#2767616)
However, to dismiss it fully on those grounds is lazy.


Twice, I've quoted MGL showing that pitchers 27-31, off to hot starts expect an improvement in their ERA the rest of the way relative to their career as 0.25 runs. (Just like Marcel says, too.)

I even bolded it.

Telling us that MGL is lazy to dismiss something, when 4 hours later he produced research telling us exactly how much he accounts for it is not being fair.

I don't think I'm slaying a straw man in pointing this out, as MGL's post, as I parsed it, doesn't allow for this possibility.


Maybe his post doesn't allow it, but MGL does. There's not a single person in the world that would dismiss any performance, especially current performance. MGL just ADDS that performance to what the player has already done. He doesn't dismiss it.

This is turning into DIPS, where the parsing of initial statements are not allowed to be clarified. MGL's position is very clear in his 3AM post. Why that blog post was not linked to, rather than his initial post, does a disservice to the good research that 3AM post is all about.

Take the time to read it guys, and check your negativity toward MGL at the door. It's unbecoming of intelligent folks seeking the truth.
   130. Voros McCracken, Human Shield Posted: May 03, 2008 at 05:16 AM (#2767618)
However, I'd think that if success were dictating pitch selection rather than vice-versa, then you'd see Lee relying more on his secondary pitches.

In some instances maybe, but if you find yourself pitching with multi-run leads more often, it certainly would make sense to stick mostly with fastballs and make them hit their way back into the game.

My point is simply that whenever we talk about something that hasn't fully been hashed out and studied (like the Pitch F/X stuff), we have to be very careful when reaching for cause and effect. I realize you know this, I just thought it was a good time to point it out. :)

As the Pitch F/X data piles up, I'd love to see studies done with pitchers posting similar traditional statistics while doing it in different ways according to Pitch F/X and see how that relates to future performance.
   131. Forsch 10 From Navarone (Dayn) Posted: May 03, 2008 at 06:12 AM (#2767639)
Twice, I've quoted MGL showing that pitchers 27-31, off to hot starts expect an improvement in their ERA the rest of the way relative to their career as 0.25 runs. (Just like Marcel says, too.)

I even bolded it.

Telling us that MGL is lazy to dismiss something, when 4 hours later he produced research telling us exactly how much he accounts for it is not being fair.


Tango-

This doesn't contradict--or even speak to--the point I made. MGL's follow-up talked about what happened after the first month. What I was talking about was why that first month unfolded as it did.

Take the time to read it guys, and check your negativity toward MGL at the door. It's unbecoming of intelligent folks seeking the truth.

I look forward to your scolding MGL for his negativity toward anyone who writes or says something with which he disagrees. I like much of his work, but his presentation invites the treatment he's receiving.

In some instances maybe, but if you find yourself pitching with multi-run leads more often, it certainly would make sense to stick mostly with fastballs and make them hit their way back into the game.

I was talking about Lee's pitching in favorable counts and favorable base-out situations, whereas you're talking about run support. So we're apparently defining his success in different ways. In any event, it would be interesting to see what his pitch selection has been in all of those situations.
   132. CW hits the pinata for the candy Posted: May 03, 2008 at 06:53 AM (#2767655)
This is too narrow. This overarching point is that pitch selection matters, and altering pitch selection affects performance. The point is not that fewer changeups yield improved performance.

As for MGL's post, sure, it's possible that small sample size is what's behind Lee's improvement. However, to dismiss it fully on those grounds is lazy. It's also possible he's doing something different. I don't think I'm slaying a straw man in pointing this out, as MGL's post, as I parsed it, doesn't allow for this possibility.


You're absolutely right, in that there is a non-zero possibility that something - Neyer mentioned health, you mentioned pitch selection - other than sample size effects explains Lee's performance so far.

Can we answer that question without more data? No. MGL's arguement, to my best understanding, was simply that. You can argue that he's about as tactful and graceful in delivering that arguement as a singing telegram telling you somebody just died. But that's not the same as actually being wrong.
   133. Forsch 10 From Navarone (Dayn) Posted: May 03, 2008 at 07:22 AM (#2767660)
Can we answer that question without more data? No. MGL's arguement, to my best understanding, was simply that.

That's a very charitable reading of his "argument." He writes: "That is a fairly sucky pitcher who, based on his 128 batters faced so far this year, is a now an ever-so-slightly less sucky pitcher! He is NOT better than a league average pitcher, nor he is a league average pitcher." To which he appends this bullsh*t qualifier "Warning: of course, I don’t KNOW what he is for sure, but my estimate, since it is based on science, is a heck of a lot better than Neyer’s, which is based on nothing, but a distorted and misinformed view of what 5 outings of good pitching following 4 years of poor pitching, means."

Well, if you don't know what he is for sure, then don't, you know, speak pseudo-authoritatively on what he is for sure. Then his ridiculous prescriptive: "Again, I ask, for any of these, 'Is he for real?' questions, that someone simply look at all players in history of about the same age and circumstances, who have had X prior stats, followed by Y (presumably really good or really bad) stats for a short period of time (whatever you want) and then see how they all did in ANY future time period you want (the more, the larger the sample of course)."

Note the hedge word "circumstances." This whole nonsensical argument relies on the assumption that X and Y stats constitute the entire debate. There's more to it than that. Much more. In the absence of closely watching at least a couple of Lee's starts, comparing them to past viewings of Lee's starts, and leavening the numbers with that experience, MGL's grand pronouncements just reinforce the worst things people say about statheads.
   134. Los Angeles Waterloo of Black Hawk Posted: May 03, 2008 at 08:55 AM (#2767674)
-I don't know if Casey Kotchman knows this, but in watching him play and looking at this breakdown, I'm confident in saying that if he continues to hit the ball in the air he will continue to be a good hitter.

But could it be that actually "he will continue to be a good hitter if he continues to hit the ball in the air"?! Or what if, watching Kotchman play, you'd instead noticed that he seemed to hit well on Sundays, and then looked at the numbers and saw that, his OPS in Sunday games is 1120 vs. league average of 800. Would you conclude then that "if he keeps playing on Sundays he'll be a good hitter"?!


I can't think of any way in which this analogy makes the first bit of sense.
   135. Tango Posted: May 03, 2008 at 11:22 AM (#2767678)
Chris:


your argument in 122 seems to be that you didn't hear that scouts said "stop throwing your changeup to LHBs" prior to him stopping doing it.


That is perfectly good knowledge to know what a pitcher is throwing, and exactly is what I was talking about in my point #2. Up for analysis is "what does it mean, and what is its impact", exactly as my point #2 is saying.

So, I don't even know what we are arguing about here.

***

Dayn:
As for MGL's post, sure, it's possible that small sample size is what's behind Lee's improvement. However, to dismiss it fully on those grounds is lazy. It's also possible he's doing something different. I don't think I'm slaying a straw man in pointing this out, as MGL's post, as I parsed it, doesn't allow for this possibility.
...
This doesn't contradict--or even speak to--the point I made. MGL's follow-up talked about what happened after the first month. What I was talking about was why that first month unfolded as it did.


There are two ways to look at "why" something happened. One is looking at the components and saying "this guy was 32/2 K walks, he went ahead 0-1 a ton, and gave up just 1 hr, and most of his hits were infield hits... dude was amazing".

The second way is "in 129 PA, looked at in isolation, I can't tell why something happened, as it could simply be transient, and not something persistent (real)... the uncertainty level in 129 PA is too large as to make any conclusions".

Both viewpoints are legitimate, and neither is lazy.

I don't see the issue, and I appreciate both viewpoints.

I look forward to your scolding MGL for his negativity toward anyone who writes or says something with which he disagrees. I like much of his work, but his presentation invites the treatment he's receiving.


Presentation of words on his blog? Wouldn't those retorts be better placed at his blog post, rather than here? The piling on that goes on here is simply unbecoming. One guy wants to write something, fine. Two, three, ok. But the constant hammering, every time he says something? The regulars here can't be happy needing to wade through those comments, can they? If you think he's a fool, ignore him!

His comments were directed explicitly at Rob Neyer, and Rob Neyer himself responded directly on MGL's blog post.

I have privately told MGL my thoughts, which I think is more effective than making a blog post. However, I have also at times pointed it out to him in public. I've got nearly 1000 threads on my blog. I hope, please, that you can take me at my word, and not make me spend my time trying to prove it.
   136. Tango Posted: May 03, 2008 at 12:03 PM (#2767681)
By the way, anyone who thinks I'm immune to mgl's aggressiveness need only read my recent Clutch threads on my blog to be dissuaded of such thoughts.
   137. Padraic Posted: May 03, 2008 at 12:05 PM (#2767682)
This is a fascinating discussion philosophically speaking.

As an ontological point, I think Tango is dead wrong to say that randomness exists even if you knew everything; this simply goes against the laws of the macro physical universe as we know them (meaning until we shrink baseball players to a subatomic level nothing about their play will be purely random).

That being said, as a practical matter of investigating important questions as to whether or not Cliff Lee's past 5 starts represent a new level of talent, it's probably best to assume things like randomness and regression to the mean. There may be examples where we can investigate areas of causation like MWE suggests (Kotchman being a good example), but for the most part, trying to understand all of the variables (and clear up randomness) is a losing game.
   138. Tango Posted: May 03, 2008 at 12:32 PM (#2767684)
Pad/140: All I can say is that you are dead wrong about me being dead wrong! As long as you don't know WHEN something will happen, randomness will occur.

Regardless, since we are not in the business of being god, as a practical matter, as you said, you have to accept randomness in life (and baseball), and deal with the amount of random variation that does occur with n=129 (or 162).
   139. Padraic Posted: May 03, 2008 at 01:00 PM (#2767690)
As long as you don't know WHEN something will happen, randomness will occur.


No, timing is just another unknown, it's not random.

I mean, if you assume a perfect knowledge (God), then that God (or knower) also knows when things will happen. To think that macro events (like baseball plays) can be subject to random variation really does go against Newtonian physics.

Not to be glib, but if there were a God (or perfect knower), he would be a really really good scout, not a statistician.
   140. Chris Dial Posted: May 03, 2008 at 01:10 PM (#2767691)
So, I don't even know what we are arguing about here.


AFAICT, like Hillary and Obama - just around the fringes. I think the analysis *can* (not "is") be supported by a known change in pitch selection. I think that your position is that the analysis doesn't support that as causation, so moving forward, your position is that Lee will pitch like he has (possibly oh so slightly better), whereas others are saying - with this change in pitch selection, his ability to get hitters out increases significantly, rather than a smaller incremental change, but a fundamental change that yields immediate results in a large change. And is causal.

But, we're just talking about fineness, not anything large (but of the scouts versus stats nature).
   141. bunyon Posted: May 03, 2008 at 01:25 PM (#2767693)
It seems likely to me that we never have sufficient sample size for any one pitcher to conclude how that pitcher will perform in the future with high certainty. Stats are great to project groups of players but looking at Lee's stats over the last four years and this April just isn't enough. He's been a different pitcher in almost all of those years due to changes in approach, experience and health. As an extreme, imagine he came into this year and changed pitching arms. Would you use his previous stats as a lefty to project how he'll do as a righty? So, if he's changed his approach, why should we expect close adherence to his previous self?

My take is that stats are great for groups of players but there will always be outliers and therein lies scouting. Could/should scouting be more organized and systematized? Perhaps. I don't know.
   142. greenback Posted: May 03, 2008 at 01:39 PM (#2767694)
Not to be glib, but if there were a God (or perfect knower), he would be a really really good scout, not a statistician.

Der Herr spielt nicht mit Würfeln?
   143. GuyM Posted: May 03, 2008 at 01:47 PM (#2767696)
Chris/Black Hawk:
I'm not clear on what part of Mike's analysis you find compelling. We see that Lee is pitching extraordinarily well, but we're not sure if this marks a real change in talent. Upon further digging, we learn he is throwing fewer changeups this season. Why does that make it more likely the change is real? Would ANY observed change in pitch selection make the performance improvement more significant? For example, if Lee had increased his # of changeups, I suspect Mike would also see that as a sign the improvement is real. Do you agree? Or if not, which changes should be taken as confirmation of a performance change?

One way to interpret Mike's approach is that we should disregard Lee's 2007 performance against LHHs in determining our new estimate of Lee's ability. As it happens, it's actually 2006 when Lee used a lot of changeups against LHHs, so it would be that year we throw out, which wouldn't change our forecast. But for sake of argument, say Lee had thrown a lot of changes to LHH in 2007, but none this year. Should we then disregard his 2007 LHH stats?

I would say no. First, we don't know if the % of changeups has anything to do with his success. Second, even if it does, we don't know whether hitters will adjust as they learn they don't have to worry about a change (remember, we only have 5 games of the "new" Lee), and sit on the FB. So I'd argue his 2007 performance remains part of his record, unless there's evidence it was injury related (and we only want a if-he-stays-healthy forecast).
   144. GuyM Posted: May 03, 2008 at 01:50 PM (#2767697)
To me it's a definitional thing: with perfect knowledge, uncertainty disappears.

Voros/123: Where we disagree is the element of time. I don't think even perfect knowledge of the present allows you to perfectly predict the future. I suppose that perfect knowledge of the past allows you to explain all that has happened, but that's still not the same as predicting the future.

Again, if perfect knowledge includes a crystal ball, then this is tautological.
   145. Tango Posted: May 03, 2008 at 01:55 PM (#2767698)
Chris:

I think that your position is that the analysis doesn't support that as causation, so moving forward, your position is that Lee will pitch like he has


Right, the current analysis doesn't support it, but it is possible that future analysis can show causation. Until that is established, I'm not going to pretend to know if there is a new Cliff LEe, and will proceed with a 10% this year, 90% career as to what is the real Cliff Lee.

I think we're good...
   146. Tango Posted: May 03, 2008 at 01:58 PM (#2767699)
also knows when things will happen


As I said, if you know when things will happen you are god or believe in predeterminate fate. I'm presuming that's the only aspect that we don't know to accept randomness.
   147. Tango Posted: May 03, 2008 at 02:27 PM (#2767713)
Ditto Guy/146 and Guy/147, especially:

Again, if perfect knowledge includes a crystal ball, then this is tautological.
   148. Tango Posted: May 03, 2008 at 02:29 PM (#2767715)
Question for everyone:

If Lee was a free agent, as of Mar 1, how much would you offer him for 3 years? 4 years?

Presuming that Lee is a free agent as of today, how much would you offer him for 3 years? 4 years?

Thanks...
   149. Tango Posted: May 03, 2008 at 02:46 PM (#2767725)
Don't you find it different when you are part of the thread? If MGL were posting in this thread, it would be him being engaged with whoever has a problem with him. But, this is just alot of "drive-by" posts taking shots at him, on their way to making some relevant post (or not). And it's not just this thread, but virtually any thread linked to his blog posts.

Don't you find it as tiresome as linking to a Richard Griffin article, and then everyone piling on top of his article, every time? If Griffin were to stop by and engage, fine.

If you, kevin, were beeing taken shots at in a thread that you do not participate in, and that happens quite frequenty, I would find it very unbecoming as well.
   150. Tango Posted: May 03, 2008 at 02:57 PM (#2767732)
I have no idea as to how MGL feels about it (though he probably doesn't care). I'm speaking only from my point-of-view as to the clutter (of which I add to, ironically). This thread would be great if you can remove all the drive-byes.
   151. Chris Needham Posted: May 03, 2008 at 03:42 PM (#2767751)
And it's not just this thread, but virtually any thread linked to his blog posts.

Well, part of the problem is that the only ones that get linked here are the ones where he's coming off as especially gruff.

If all you know of him is the excerpts that get posted here, it'd be hard not to think that he's an Ahole.

It's basically the way a reality show gets edited... trim out the banality and the mundane and focus on the controversy.

So which character on the Real World does he most resemble?
   152. Padraic Posted: May 03, 2008 at 04:15 PM (#2767767)
"Der Herr spielt nicht mit Würfeln?


Funny, because Einstein got this wrong, at least on the subatomic level where God almost certainly does play with dice. But on the macro level, he was right.

As I said, if you know when things will happen you are god or believe in predeterminate fate.

Or you just have a lot of information and know the context. Say if you drop something in a vacuum, you can know exactly when it will land. There are all sorts of situations where you can know when something will happen without being god, it just takes an immense amount of information.

Most of the time (like a baseball play), however, the situation is far too complex to know the timing, but again, lack of knowledge does not equal random. It's all well and good (and often accurate) to treat the unknown as if it were random, but that doesn't mean random events occur in things like motion, mass, velocity, etc.

Anyway, this is just a fun thought exercise, but Tango I do agree with you on the specific point about Lee, which is what the thread is about anyway.
   153. Voros McCracken, Human Shield Posted: May 03, 2008 at 04:45 PM (#2767780)
We've gotten pretty far afield here into philosophy and quantum physics (two subjects I'm way out of my depth with), I just wanted to point out that a phenomenom doesn't have to qualify as "truly random" (if such a thing exists) for probability models involving randomness to be useful and applicable. Often what passes for "random" in a model is simply imperfect knowledge, but since "random" seems to model the phenomenon pretty accurately, it's a perfectly acceptable substitute for this lack of knowledge.
   154. Padraic Posted: May 03, 2008 at 05:29 PM (#2767798)
Voros, well said.

The question then is how effectively you can reduce the lack of knowledge, and whether the program that Mike Emeigh is talking about can ever lead to knowledge that would work *better* than probabilistic models. I'm skeptical it can, but I also wouldn't dismiss it outright as Tango seems to do.

Edit - Thought I would change "often" to "always" in this comment when it comes to baseball: "Often what passes for "random" in a model is simply imperfect knowledge"
   155. Tango Posted: May 03, 2008 at 06:04 PM (#2767818)
You are right, I am outright dimissing it. Other than knowing exactly WHEN something will happen, then this will remain true:

If you knew for certain everything, the players, the park, the weather, the groundskeeper, the partying... everything, you would STILL have random variation, and that random variation would be 1 SD = 0.5/sqrt(162)


And since we will never know WHEN, ever, then I don't see the point of trying to avoid the above equation.
   156. Mike Emeigh Posted: May 03, 2008 at 06:07 PM (#2767820)
The question then is how effectively you can reduce the lack of knowledge, and whether the program that Mike Emeigh is talking about can ever lead to knowledge that would work *better* than probabilistic models.


It depends on what you are trying to accomplish. For the average fantasy-baseball type of fan, standard statistical modeling is probably good enough. For major league teams, it isn't - and I would suggest to you that the major league teams that are in the forefront in terms of using statistical analysis to guide their decisions know that, and the work that their statistical analysts do is aimed precisely at reducing the lack of knowledge.

-- MWE
   157. Los Angeles Waterloo of Black Hawk Posted: May 03, 2008 at 07:06 PM (#2767868)
I'm not clear on what part of Mike's analysis you find compelling. We see that Lee is pitching extraordinarily well, but we're not sure if this marks a real change in talent. Upon further digging, we learn he is throwing fewer changeups this season. Why does that make it more likely the change is real? Would ANY observed change in pitch selection make the performance improvement more significant? For example, if Lee had increased his # of changeups, I suspect Mike would also see that as a sign the improvement is real. Do you agree? Or if not, which changes should be taken as confirmation of a performance change?

I'm more interested in Mike's forest than his tree. I don't know that this change-up business is necessarily the key to Lee, but I find that sort of line of inquiry intriguing. We have all these statistical services tracking what kind of pitches a guy throws, we have PitchFX, we have Hit Trackers ... more and more, every year, we're developing tools that can show us how pitchers and fielders get guys out and how batters get hits. I think this is a very promising area of study that when developed may help us identify when random statistical variation may actually be reflective of a real change in the mean (a player's true talent level). That's what I was responding to, more than the specific analysis of Lee.
   158. GuyM Posted: May 03, 2008 at 07:16 PM (#2767877)
Using a probabilistic model does not preclude using non-performance data, which I think is what we're really talking about. PECOTA uses height and weight, and probably other non-performance factors I'm forgetting. David Gassko has done some interesting work trying to use "stuff" scores to predict pitcher performance. I'm quite sure that MGL and Tango, and others who believe in random variation, are open to the possibility that models can be improved by using non-performance information.

I think we're all also willing to consider the possibility that sudden, sharp changes in performance convey useful information, such that the recent performance deserves to be weighted more heavily than it typically would be.

Where we disagree, perhaps, is that:
1) we would like some evidence that any particular factor does actually "reduce the lack of knowledge," before accepting a claim for its value; and
2) we reject the idea that using such data means rejecting a probabilitic model; and
3) we expect modest gains in reducing uncertainty (while Mike, if I'm reading him correctly, seems to think that some substantial portion of what we now call random variation could eventually be explained).

We're all in favor of better models, and using any data we can to do that. Maybe I'm missing something, but it seems to me that the "war on luck" is mostly a manufactured fight where no disagreement really exists.
   159. GuyM Posted: May 03, 2008 at 07:20 PM (#2767880)
I'm more interested in Mike's forest than his tree. I don't know that this change-up business is necessarily the key to Lee, but I find that sort of line of inquiry intriguing. We have all these statistical services tracking what kind of pitches a guy throws, we have PitchFX, we have Hit Trackers ... more and more, every year, we're developing tools that can show us how pitchers and fielders get guys out and how batters get hits. I think this is a very promising area of study that when developed may help us identify when random statistical variation may actually be reflective of a real change in the mean (a player's true talent level).


I completely agree. And so does MGL, I think. In fact, does anyone disagree with this? I just don't see how this represents a rejection of -- or even a departure from -- traditional sabermetric analysis.
   160. Los Angeles Waterloo of Black Hawk Posted: May 03, 2008 at 07:25 PM (#2767882)
Guy, I don't know that it does, and I don't know that Mike said that oppositionally. I just liked what he had to say there, quoted it to say I agreed, and provided an example of something somewhat similar I had personally looked at with another player.
   161. GuyM Posted: May 03, 2008 at 07:32 PM (#2767886)
Fair enough. I do think Mike intends his position as oppositional, based on statements like this one:

And we need to get off "random variation", and "luck", and "regression to the mean", and other purely statistical talk, and start identifying and characterizing some of the other sources of variation in player performance, if we have any hope of using performance analysis in a valuable way going forward. There are tools out there that will let us do some of this, if we want to use them.

But he can of course speak for himself.
   162. Padraic Posted: May 03, 2008 at 07:32 PM (#2767887)
I would suggest to you that the major league teams that are in the forefront in terms of using statistical analysis to guide their decisions know that, and the work that their statistical analysts do is aimed precisely at reducing the lack of knowledge.

That seems fair. I still think it's an open question whether this will be successful, but the effort is certainly warranted.
   163. Tiboreau Posted: May 03, 2008 at 07:33 PM (#2767891)
IOW BHW, each year we gather more information that lends to micro analysis, and after years of sabermetrics being geared toward macro analysis (which is what MGL used, and the results were what you'd typically expect of macro: Short Sample Size, he wasn't very good before, probably luck), sabermetrics should pay more attention to micro analysis, particularly in situations like Cliff Lee?
   164. Tango Posted: May 03, 2008 at 07:38 PM (#2767897)
I don't know that this change-up business is necessarily the key to Lee, but I find that sort of line of inquiry intriguing.


Yes, which is why the John Walsh, Joe P Sheehan, et al work here is so fascinating. For those who haven't kept us, it's the best thing out there.
   165. Los Angeles Waterloo of Black Hawk Posted: May 03, 2008 at 07:44 PM (#2767900)
IOW BHW, each year we gather more information that lends to micro analysis, and after years of sabermetrics being geared toward macro analysis (which is what MGL used, and the results were what you'd typically expect of macro: Short Sample Size, he wasn't very good before, probably luck), sabermetrics should pay more attention to micro analysis, particularly in situations like Cliff Lee?

Yeah, that sounds about right. And as Tango points out, efforts are underway.
   166. villageidiom Posted: May 03, 2008 at 08:45 PM (#2767973)
Tango -

Long-winded response. I hope I'm making myself clearer here.

PART A

Let's say a pitcher has a "true known mean" of a 600 OPS against, just to pick a stat.

If he is used from now own only against lefties, will his performance vary randomly around a 600 OPS? What if he is only used against righties?

What if he starts throwing nothing but batting practice fastballs? Lobs the ball in? Throws with his feet? Certainly the outcomes won't *randomly* vary around a 600 OPS. And based on a reading of your comments here, I think you agree. The "true known mean" reflects what is known, and at the time the ball is thrown we know whether he's throwing with his feet, or such. So that 600 OPS might really be a 550 against righties and a 650 against lefties, and the "true known mean" would include context.

PART B

Let's take a step back and say the pitcher has a significant platoon split, but we don't know it. Let's also say that we don't know the handedness of the batter. Will his performance vary randomly around his true known mean of 600? Likely not. Even if we learn the batter is lefthanded it won't happen, because our "known" mean doesn't capture meaningful nonrandomness (i.e. the given context).

As such, a player's "true known mean" can have nonrandom variation, if our knowledge is limited enough.

PART C

So, then, really there are many "true means", each reflecting a particular context. A pitcher might have one level of performance against righties, another against lefties. To the extent that our estimates of those means are accurate, and to the extent that we know all that is worth knowing, there will be random variation around those means. If not, then there is something else worth knowing, or our estimates are inaccurate.

PART D

Rob Neyer, Dayn Perry, and Mike Emeigh have made the point that there is something else worth knowing, or at least that it is worth finding out if there is something else worth knowing, about Cliff Lee, given how freakish his performance has been this year relative to his past. MGL's point - at least his initial one - appeared to be that writers always seem to think that this time it's different, that this aberrant performance is nonrandom, that this player won't regress to his historical level - and that they're wrong. Wrong, wrong, wrong, always wrong, because they are too quick to believe the small recent sample over the larger body of statistical evidence. While it's nice that, four hours later, he came forward with more relevant research, he was very quick to dismiss the notion that there was anything worth knowing in the recent sample, effectively dismissing the notion as not being science.

But that's just what science is. It's the systematic pursuit of understanding. If you take the posture that the small sample isn't of value, one can miss big stuff. Sure, writers often are dabbling in speculation, not science; and maybe Rob was doing it in this instance. But it's one thing to dismiss an argument because it's speculation, and another to dismiss a sample because it's small and therefore has no merit of "science".

I don't pretend to know if any of the Cliff Lee stuff is a real change of context - meaning a significantly new "known" mean - or if it is just a bunch of stuff that can't and won't be replicated here on out. But when performance sticks out like this, it's worth it to try to understand if there's something worth knowing - if there's something nonrandom about it.

PART E

The whole semantic debate over randomness is where I think things got derailed. Often in history, people would describe (what we now know as) nonrandom things as being random, simply because they lacked an understanding of how things work. That is, what is chalked up as random is really "that which is outside our sphere of knowledge". And why is it outside our knowledge? Because it tends not to matter - it washes out in the long run, or we can't plan for it, or we couldn't have expected it. When something does happen in a nonrandom way, we learn from it, we change our expectations - and our sphere of knoweldge grows.

But randomness is relative. What seems random to me might not actually be random at all, but merely a reflection of my limited understanding of how things work. Yes, if you know all that is worth knowing - because the rest is immaterial or impractical - you will get random variation around a true mean, by definition. What I object to - and I think Mike does as well - is that because randomness is relative to one's knowledge, marrying oneself to the notion that varation will be random is the same as saying "there is nothing else worth knowing than what I already know". And that is the first step down the path of irrelevance.

- - - -

I've gone on a lot, and won't be around to read responses (I'm going to watch a baseball game, if you can believe that). I'm mentioning it so that you don't think of it as a drive-by post.
   167. Forsch 10 From Navarone (Dayn) Posted: May 03, 2008 at 10:48 PM (#2768049)

Presentation of words on his blog? Wouldn't those retorts be better placed at his blog post, rather than here? The piling on that goes on here is simply unbecoming. One guy wants to write something, fine. Two, three, ok. But the constant hammering, every time he says something? The regulars here can't be happy needing to wade through those comments, can they? If you think he's a fool, ignore him!


I don't think he's a fool. He's a hell of lot smarter than I am when it comes to statistical analysis. All I'm saying is that he would have much more success in communicating his ideas if he'd lose the condescension and authoritative sneer.

His comments were directed explicitly at Rob Neyer, and Rob Neyer himself responded directly on MGL's blog post.

He called out Rob on his blog, and it was linked to here. It's not as though it was some sort of private correspondence that fell into prying hands. Again, if you're going to dish out the abuse in a public forum, then be prepared to have it come back on you. I've written stupid, mean-spirited things before, and I was rightly called out on them. That's how it works. And it's good that it works that way. Almost everyone I know--myself included--needs a punch in the nose every now and again.
   168. Mike Emeigh Posted: May 03, 2008 at 11:42 PM (#2768092)
villageidiom's post 171 captures what I have been trying to say all along - especially Parts D and E.

-- MWE
   169. Zach Posted: May 04, 2008 at 12:38 AM (#2768125)
As someone who knows a little of the physics behind chaos, I'll put in my two cents:

1) It doesn't really make any sense to talk about measurement without measurement error. Maybe as a theological concept, you can talk about God knowing initial conditions perfectly, but any attainable measurement in the real world has an associated error.

2) In a chaotic system, even if the laws governing its evolution are completely deterministic, two systems that start in *almost* the same state nevertheless end up in wildly divergent final states.

3) It's precisely the measurement error that kills you in chaotic systems. If you can completely specify initial conditions, it's true that you could calculate the state of the system in the future perfectly. But translating from that observation to anything of practical use involves that measurement error -- you're implicitly saying that you can predict the future (with some accompanying error) even if you don't know the current state of the system perfectly (since you never really can). But that measurement error means you can't tell the difference between current states of the system which will end up in very different states at some future time. So your theoretical ability to calculate the future given perfect knowledge of *one* state has no practical benefit, since you don't know which state you're in right now.

4) In the limit of long times, states will diverge from one another at an exponential rate. Thus, the accuracy with which you have to measure the current system increases roughly exponentially with the time horizon you want to make useful predictions for. For a system that you don't control under laboratory conditions, there will be all kinds of practical limitations on your ability to gather this information.
   170. Tango Posted: May 04, 2008 at 05:38 AM (#2768298)
village/171:

PART A: correct. The true mean is contingent on the conditions. It is not static. Ever.

PART B: our uncertainty level is wider around that true mean. Rather than our true mean being .600 OPS, it is .550 to .650, 90% of the time.

PART C: right.

PART D: sure, agreed. As I said, 129 recent PA for Cliff Lee is weighted at 10% and the rest of his career at 90%.

PART E: I agree that's the way some people treat the "unknown or unquantified parameters". That's not how I do it. Random variation is what it is, which you seem to agree to it. It's the binomial distribution around some (unknown, uncertain, dynamic, but somewhat guessable) mean.

So, having both of us agree to where we are, it still stands that if you have a coin (not necessarily the actual coin, but just something that is 50/50 based on knowing everything about everything), that the only unknown is WHEN (the timing), and the binary outcome will always be random, and the distribution of those outcomes follows the binomial.

As long as you can grant that the WHEN is always unknowable, then the mean can never be 0 nor 1, and therefore, the outcomes will be random around that mean.
   171. Tango Posted: May 04, 2008 at 05:47 AM (#2768302)
Dayn:

Almost everyone I know--myself included--needs a punch in the nose every now and again.


I don't disagree, and I said that if one or two or even three people complain, fine. But the beatings, not the punches in the nose, simply clutters up a good thread. Why not punch him in the nose in his own blog, rather than on another blog? If you want him to change or you want him to know it, say it directly to his face. It's possible to give someone a good punch in the nose and do it in a respectful way. One could come out the winner here. Instead, it's always the same-old. What's the objective? To simply give a punch every single time? That's like my wife yelling at my kid. My kid doesn't pay attention any more. Yet she keeps yelling at him! I'm the one who ends up losing, since I have to listen to the noise pollution, like all third-parties who follow this thread.

And really, where are all the links to some great MGL blog posts, and where is all the praise to his work? Certainly, it is not balanced here at BTF, is it?

Again, I'm not fighting for him, as I don't think he really cares. I'm pointing out the incredibly skewed view here, and that most people who do their drive-bys on mgl post nothing else at all.
Page 2 of 2 pages  < 1 2

You must be Registered and Logged In to post comments.

 

 

<< Back to main

Support BBTF

donate

Thanks to
The Piehole of David Wells, Depends Salesman
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogCarlos Pena defies the traditional numbers
(1 - 10:43am, May 26)
Last: Bob Dernier Cri

NewsblogWilmoth: Nate McLouth Designated For Assignment
(15 - 10:42am, May 26)
Last: DL from MN

NewsblogHP: Baseball is leaving the human factor behind
(62 - 10:40am, May 26)
Last: Bob Dernier Cri

NewsblogOT: NBA Monthly Thread, May 2012
(1836 - 10:37am, May 26)
Last: Famous Original Joe C

NewsblogMatschulat: Did I Miss The "Paul Konerko Is So Overrated OMG" Bandwagon?
(35 - 10:33am, May 26)
Last: Bob Dernier Cri

Hall of MeritMost Meritorious Player: 1973 Discussion
(16 - 10:29am, May 26)
Last: DL from MN

NewsblogMaddon on Red Sox beaning Luke Scott: 'I think it's ridiculous, I think it's absurd, idiotic'
(13 - 10:22am, May 26)
Last: Crispix Attacks

NewsblogBerardino: Heath Bell says he’s no meathead
(2 - 10:15am, May 26)
Last: Best Regards, Larry M.

Hall of MeritMost Meritorious Player: 1972 Ballot
(30 - 10:10am, May 26)
Last: DL from MN

NewsblogYESNetwork: A look at five Yankees' cases for enshrinement in Monument Park
(4 - 9:38am, May 26)
Last: SOLockwood

Sox TherapyA Winning Ballclub?
(21 - 8:34am, May 26)
Last: Darren

NewsblogThe Hall of Very Good: Former Cards Slugger Critical of "LaRussa's Regime"
(6 - 7:16am, May 26)
Last: Shooty: Applying to be Fearless Leader

NewsblogCSN to host ‘Phillies at the Beach’ on Memorial Day
(19 - 7:11am, May 26)
Last: God

NewsblogT.R. Sullivan: Of Frank Robinson, Milt Pappas and Jim Palmer
(10 - 7:09am, May 26)
Last: God

NewsblogBud Selig -- No need for more MLB replay for now - ESPN
(88 - 6:12am, May 26)
Last: Lassus

Buy MLB playoff tickets, plus 2011 World Series, 2011 ALCS tickets and NLCS game tickets. We also have Texas Rangers playoff schedule, tickets to Red Sox games and Yankees game tickets. Plus, buy Phillies baseball tickets, Tigers playoff tickets and the biggies like ALDS baseball tickets and 2011 NLDS tickets.

Demarini, Easton and TPX Baseball Bats

 

 

 

AllianceTickets.com has cheap MLB Tickets. Get all your Colorado Rockies Tickets, Seattle Mariners Tickets, San Francisco Giants Tickets and all your favorite baseball tickets here. We also carry cheap Denver Broncos Tickets, Seattle Seahawks Tickets and Denver Nuggets Tickets.

Page rendered in 0.7826 seconds
54 querie(s) executed