Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Friday, May 02, 2008

The Book Blog: You wanted crap/yap from a premium writer….

The Neyer/Lichtman Guide to ########…An Historical Compendium of ########, ########, and #######. (but the book ends well!)

From Rob Neyer, who is lately (maybe for a long while) just as obsessed (and misguided) as almost everyone else about short-term recent performance:

So is Cliff Lee for real? I think all we can say is that he’s really healthy. He’s going to give up a higher batting average on balls in play, and some reasonable percentage of the fly balls he gives up will fly over the fence. So no, he probably doesn’t wind up winning the Cy Young Award. But I’ll bet he’s better than average. And considering how well C.C. Sabathia’s pitched in his last two starts, suddenly the Indians would seem to have the best rotation in the majors.

So Cliff Lee, 31 years old, is better than average, because he has pitched well to 128 batters after having pitched mediocrely, at best, to 3047 batters over the last 4 years?  I think not, and I will take up Neyer on that bet (he offered this time, although obviously not literally).

...That is a fairly sucky pitcher who, based on his 128 batters faced so far this year, is a now an ever-so-slightly less sucky pitcher!  He is NOT better than a league average pitcher, nor he is a league average pitcher.  (Warning: of course, I don’t KNOW what he is for sure, but my estimate, since it is based on science, is a heck of a lot better than Neyer’s, which is based on nothing, but a distorted and misinformed view of what 5 outings of good pitching following 4 years of poor pitching, means.)

...The sad part is that Neyer knows this stuff (I think), but he still writes the same crap that everyone else does.

Repoz Posted: May 02, 2008 at 12:14 PM | 176 comment(s) Login to Bookmark
  Tags: community, sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 1 of 2 pages  1 2 > 
   1. Slinger Francisco Barrios (Dr. Memory) Posted: May 02, 2008 at 12:24 PM (#2766242)
Speaking of writing crap...a 98 ERA+ for a starter is not fairly sucky. It's about average, maybe a little above.
   2. GGC don't think it can get longer than a novella Posted: May 02, 2008 at 12:37 PM (#2766252)
I'd be interested in the scouting perspective on Lee. You know, is there a Chad Bradford Wannabe Wannabe who can tell if he's doing anything different.

IIRC, this was one of Backlasher's pet peeves about DIPS. It didn't incorporate that type of info. Hell, maybe one of those Pitch/fx guys can ioffer some insight.
   3. BDC Posted: May 02, 2008 at 12:46 PM (#2766261)
I think that the idiomatic sense of Rob's comment is that he'd project Lee to have better-than-average results this season, which is not an outrageous thing to say about a guy who's already 5-0 with an 0.96 ERA.

Whether Lee is actually "better than average" is a question premised to some extent on Lee having a true talent that is somewhere at the core of him and doesn't change year-to-year. But I wonder if that's true of pitchers. This is a guy who's gone 18-5 with a 3.79 ERA, which seems better than average, and he's also gone 5-8 with an ERA of 6.29, which seems worse than average. Quite a lot of pitchers have that pattern (the Esteban-Loaiza kind of career). If they show up good in a given year, then they often really are pretty good that year.
   4. RobertMachemer Posted: May 02, 2008 at 12:53 PM (#2766267)
I'm on my parents' computer (SLOW), so not going to look this up, but if Lee's been roughly a league-average pitcher (let's say slightly below that) for his career and he starts off the season hot, can't we assume that (even after he regresses back to his earlier performance) that he'll end up with an above-average year? (And that's assuming a regression, as opposed to a development). The season is between 15-20% over. 15-20% of hot followed by 80-85% of mediocre is still (probably good) -- and again, that's assuming a regression, which we can't necessarily assume, though it's perhaps the most likely event.
   5. SG Posted: May 02, 2008 at 01:17 PM (#2766278)
I'm on my parents' computer (SLOW), so not going to look this up, but if Lee's been roughly a league-average pitcher (let's say slightly below that) for his career and he starts off the season hot, can't we assume that (even after he regresses back to his earlier performance) that he'll end up with an above-average year?

It depends on the weights you assign previous seasons when projecting him, but i think you're right.

If we use a 3/2/1 weight for 2007/2006/2005 and then add in this year's performance at a weight of 4 for a 200 inning a year pitcher, it'd look something like this:

2005: 1 x 200 = 200
2006: 2 x 200 = 400
2007: 3 x 200 = 600
2008: 4 x 40 = 160

So his projection would be something like 90% 2005-2007 and 10% 2008.

So if Lee was projected to put up an ERA in the 4.80 area and has so far put up an ERA of of .96 (FIP of around 2), I'd probably expect him to put up an ERA of 4.8 x .9 + 2(his FIP) x .1 = 4.52 over the rest of the year. Add that to what he's already done and he'd end the year with an ERA around 3.75 or so.
   6. Famous Original Joe C Posted: May 02, 2008 at 01:36 PM (#2766297)
"sucky"?
   7. 1k5v3L Posted: May 02, 2008 at 01:49 PM (#2766306)
Steve Phillips called Cliff Lee the "best left handed pitcher in the game". And Phillips knows his stuff. Right?
   8. Mattbert Posted: May 02, 2008 at 01:54 PM (#2766309)
MGL comes off as a Grade A jerk in this piece.
   9. GGC don't think it can get longer than a novella Posted: May 02, 2008 at 01:57 PM (#2766313)
He moderates his tone in the comments section, FWIW.
   10. Dag Nabbit is part of the zombie horde Posted: May 02, 2008 at 02:07 PM (#2766324)
So Cliff Lee, 31 years old, is better than average,

He's 29.
   11. Dizzypaco Posted: May 02, 2008 at 02:07 PM (#2766325)
Rob's analysis is right on, in my opinion. Lee was above average in each of the last two years that he was healthy, and his strikeout to walk ratio is terrific this year, even if its in a small sample size. Its not unheard of for a pitcher to find himself partway through his career.

(Warning: of course, I don’t KNOW what he is for sure, but my estimate, since it is based on science, is a heck of a lot better than Neyer’s, which is based on nothing, but a distorted and misinformed view of what 5 outings of good pitching following 4 years of poor pitching, means.)

This is wrong on so many levels its hard to even know where to begin. I mean, we all know that he may be the most obnoxious writer around, but I don't think he's very good at analysis (or science if that's what he wants to call it), either.
   12. Tracy Posted: May 02, 2008 at 02:09 PM (#2766329)
MGL comes off as a Grade A jerk in this piece.


Imagine that.
   13. Dizzypaco Posted: May 02, 2008 at 02:14 PM (#2766331)
You know, given that MGL got his age wrong, and claimed that he had four years of poor pitching, which is wrong, maybe he's looking at the wrong guy.
   14. Dag Nabbit is part of the zombie horde Posted: May 02, 2008 at 02:18 PM (#2766333)
Actually, Cliff Lee had a weird year last season.

He missed all April with what I assume is an injury. He posts a 4.15 ERA in his first four starts, then get clobbered - 8 ER in 4.3 IP. Then he goes back to pitching like normal, with an ERA of 4.14 in his next 7 starts. He overall ERA was 4.90, not good, but as long as he kept pitching like he normally did (aside from pitching adequately in most of his starts, he had an above-average ERA+ in each of the previous two seasons while starting full-time) he'd end the year with respectable numbers.

Then his arm fell off. In his next four starts, he posted an ERA of 11.70 with any opponent OPS of 906. He was injured and they shut him down for a little over 5 weeks. He had four relief appearnces in September, with an ERA of 4.76.

When healthy, he did good.

This entire post is likely schlock analysis, because pitchers frequently get injured and you can't ignore someone's bad starts when examining him, but last season looks like it was an injury-fueled aberration. He won't keep his ERA at sub-Gibons levels, but if he stays healthy & pitches like he normally does (or a little better), it'll be his career season.
   15. John Lynch Posted: May 02, 2008 at 02:30 PM (#2766344)
The real problem, as MGL abrasively notes, is that writers have to write *something*, and no one is going to pay them for writing the same boilerplate "Nothing has changed" article year in and year out for the first four months or more of the baseball season. The writers are being paid to share whatever expertise they have (or rather, are perceived to have). Therefore, they have every incentive to demonstrate as much expertise as possible, especially since they don't get rewarded or penalized for being right or wrong. They get rewarded or penalized for being popular or unpopular. And, again, reading the "Nothing has changed" article is just not entertaining, and therefore not conducive to popularity.

This isn't a shot at anyone in particular. I enjoy reading Rob's work and I enjoy reading BPro's work. Joe Sheehan, for example, tries to hold the "Nothing has changed" line as much as is humanly possible for someone who's job is to write interesting articles about baseball, but even he eventually breaks down and writes a "What have we learned from one month?" article because, hey, that's what he's paid to do. He normally follows that up with a "Man, I was stupid for believing in xyz trend" article. If he or Rob want to continue to be paid for writing baseball analysis on a near-daily basis, I don't know what else they can do.
   16. Slinger Francisco Barrios (Dr. Memory) Posted: May 02, 2008 at 02:43 PM (#2766363)
no one is going to pay them for writing the same boilerplate "Nothing has changed" article year in and year out for the first four months or more of the baseball season

But something changes every year, or we would likely know the outcome of the season in advance. Some guys get better, some get worse. The expert tries to figure out which is fluke and which is progression/regression.
   17. Dag Nabbit is part of the zombie horde Posted: May 02, 2008 at 02:47 PM (#2766371)
Anyone else unable to get the Book blog to go up?

You know, given that MGL got his age wrong, and claimed that he had four years of poor pitching, which is wrong, maybe he's looking at the wrong guy.

Maybe. He aslo gets his TBF wrong for both this year and the combined sum of the last four years. (Figuring in my head, it's 2965 for the latter - that might be off, but I'll guarantee it doesn't end in a 7).

(checks). Well, according to the PI, I nailed it - 2965 TBF from 2004-7. No one had 3047 - Kyle Lohse came closest at 3048. Then Ted Lilly at 3055.

I'm not sure where he got the numbers from.
   18. Dizzypaco Posted: May 02, 2008 at 02:50 PM (#2766375)
This debate reminds me of NCAA Tournament pools. I used to be in an office pool. Every year, I, like most people in the office, would pick some upsets. There was one guy who'd always pick almost all favorites. And he'd do very well in the pool - if you pick upsets, you are generally not going to pick the right ones. But what's the fun in picking all favorites every single year? What does that really tell you about how knowledgeable the person is if he wins?

Same thing here. Its safe to say that all 29 year old pitchers who start off well are going to regress to there previous norm - most will. Its more fun to pick out a few that you think have genuinely improved and see if they are the ones that have actually turned the corner.
   19. John Lynch Posted: May 02, 2008 at 02:53 PM (#2766380)
But something changes every year, or we would likely know the outcome of the season in advance. Some guys get better, some get worse. The expert tries to figure out which is fluke and which is progression/regression.

We certainly would not know the outcome of the season in advance even if we knew the true talent levels of all the players involved with certainty. Random variation would insure that we not not be able to predict the results with certainty.

Because of this, trying to predict which early season performance is a fluke and which is real using only statistical evidence is a fool's errand. Figuring out which is a fluke and which is real from a small sample size is impossible by definition. If we could tell using statistics alone, then it would not be a small sample. That, I believe, is MGL's quarrel. Rob doesn't site anything but Lee's recent performance in making his assessment.

Now, if you are bringing something else to the table (which has demonstrable predictive capability, a tough hurdle), go for it. I'm not going to complain.
   20. Bad Doctor Posted: May 02, 2008 at 03:04 PM (#2766392)
The real problem, as MGL abrasively notes, is that writers have to write *something*, and no one is going to pay them for writing the same boilerplate "Nothing has changed" article year in and year out for the first four months or more of the baseball season.

Yes but, for Cliff Lee, something HAS changed. He basically has two baselines -- the suckitude we saw last year, and the above average starter from '05-06. His pitching this year -- even if you figure BABIP luck rounding out and tougher opponents -- is suggesting that last year was a fluke, and given that that was our most recent look at the guy, then something HAS changed.

I've seen a handful of articles online and even in Sports Weekly (which has brought on Ron Shandler for a weekly sabermetrics-type page) that essentially say, "BABIP blah blah, weak opponents blah blah ... don't be fooled, Cliff Lee's not suddenly Sandy Koufax circa 1963." Neyer's the first one I've seen that talks about the real issue of whether or not he's Cliff Lee circa 2005-06, and what that means for the Indians.
   21. John Lynch Posted: May 02, 2008 at 03:19 PM (#2766412)
Yes but, for Cliff Lee, something HAS changed. He basically has two baselines -- the suckitude we saw last year, and the above average starter from '05-06.

We also have 180ish innings of 80 ERA+ pitching from 2004, FWIW.

Actually, I tend to agree that Lee will be around average, but that's because I agree that a reasonable expectation before the year began was that he would be around average. In that sense, I don't think anything has changed. Yes, his overall numbers will probably be above average at the end of the year because of the early success. The important point is that it isn't the recent performance spike that should causes us to believe he will be around average. It's his past performance that causes us to believe this.
   22. Dizzypaco Posted: May 02, 2008 at 03:27 PM (#2766421)
The important point is that it isn't the recent performance spike that should causes us to believe he will be around average. It's his past performance that causes us to believe this.


Lets say he posts an ERA+ of 108. Would you call that average? I'd call it above average. Given that's what he did the past two years in which he was healthy, its not unreasonable, based on past performance, to expect him to do it again if healthy.

The first month of the season was important for Lee, more than most pitchers, due to his performance in 2007. As has been said, he was a different pitcher in 2005-2006, as compared with 2007. If he really struggled in April, you could say that last year was no fluke. But by pitching very well, its reasonable to think that 2007 was an injury-plagued aberration.
   23. The Dangerous Mabry Posted: May 02, 2008 at 03:45 PM (#2766439)
I'd be interested in the scouting perspective on Lee.


From Keith Law's chat yesterday:

Charlie (New York): Cliff Lee...is this for real? Or are we going to see him fall fast and hard?

SportsNation Keith Law: It's mostly real. I assumed he'd be a contributor to their team this year when I picked them to win the WS, although I can't say I saw this coming.


So I guess the scouting perspective is that Cliff has gotten his act together, at least to some degree, this season.
   24. Tango Posted: May 02, 2008 at 03:57 PM (#2766453)
FWIW, I made a post on my blog, which I'll repost here:
These are the ERA (and FIP) forecasts for Lee coming into 2008 (remembering that the league average ERA is roughly 4.4):

Marcel: 4.79, 4.68
Chone: 4.45, 4.67
ZiPS: 4.63, 4.38
James: 4.40, 4.62
Sackmann: 4.80, 4.76

Marcel is the least loving, with a win% of roughly .460 or so. The most loving makes him a .500 pitcher. I mean, you can reasonably call him a .480 pitcher, and no one is going to argue with you.

An average pitcher will be .470 as a starter and .560 as a reliever.

So, going into 2008, Lee was around an average PITCHER, and a bit below average as a starter.

Given his sample (129 batters faced), I would weight his 2008 performance as roughly 10%, and his career performance (performance through 2007) at 90%.

His 2008 performance has two ridiculously unsustainable numbers: BABIP of .195, when his career is .295 and all the forecasters coming into 2008 had him around .305 give or take. He also has 32K with 2 walks in 38 innings. He won't keep that up.

Anyway, even if you want to take his 2.01 FIP of 2008 and give it 10% weight, and take a 4.65 FIP forecasted going into 2007 and give it 90% weight, you end up with a 4.40 FIP.

To the extent that one believes that he was slightly below average going into 2007, it is wholly supported to say that he's slightly above average as of today.

***

I seem to remember that Jason Marquis has a ridiculously good April in 2007 as well I think. Something about Marquis anyway, if someone wants to look it up.
   25. Nasty Nate Posted: May 02, 2008 at 04:02 PM (#2766459)
Yeah Marquis had a good April 2007 too (2.35 ERA), but it wasnt in the same class as Lee's 2008. I think thats what makes it interesting: its not just a run-of-the-mill hot start which prompts the chorus of 'is he for real?' but he is on fantastic stretch of getting people out.
   26. Dizzypaco Posted: May 02, 2008 at 04:09 PM (#2766464)
His 2008 performance has two ridiculously unsustainable numbers: BABIP of .195, when his career is .295 and all the forecasters coming into 2008 had him around .305 give or take. He also has 32K with 2 walks in 38 innings. He won't keep that up.

Well, no, he won't have an 0.96 ERA all year, but I don't think anyone expects him to.

Tango, my problem/question about your analysis (and others) of Lee is that he is the type of pitcher that you don't know what to expect from. In the past four years, he was bad, very good, pretty good, and terrible/hurt. Its very different from being about average four straight years.

Are you weighting 2007 as you would any other pitcher? And isn't it possible that he was hurt for some of all of last year, which would make his performance irrelevant for either describing him or predicting future performance? And, if true, isn't it misleading to say that he was an average pitcher going into 2008, even if healthy?
   27. Forsch 10 From Navarone (Dayn) Posted: May 02, 2008 at 04:10 PM (#2766466)
If MGL is going to make such basic errors of fact, then he'd do well to drop the "infant banging spoon on high-chair" tone that infects, well, almost everything he writes. Being insufferable tends to blunt the power of one's message.
   28. Forsch 10 From Navarone (Dayn) Posted: May 02, 2008 at 04:13 PM (#2766467)
As for the matter at hand, I've watched a couple of Lee's starts this season. Other than locating really, really well, the only thing I've noticed is that he's almost completely abandoned using his changeup against left-handed batters. Of course, the latter probably doesn't have much to do with his success this season.
   29. Barry`s_Lazy_Boy Posted: May 02, 2008 at 04:16 PM (#2766469)
MGL continues to prove he is not intelligent. He may have a autistic knack for numbers. Maybe.
   30. Depressoteric feels Royally blue these days Posted: May 02, 2008 at 04:18 PM (#2766474)
What a surprise. MGL continues behaving like a human stain.
   31. Tango Posted: May 02, 2008 at 04:19 PM (#2766475)
Dizzy/28: I don't disagree with your post.

1. The main issue in the analysis, here or elsewhere, is that if you decide to constrain yourself to only looking at performance data, then post 24 stands as correct.

2. If you decide to bring in outside data (he was hurt, he had a minor league rehab that went ok, he changed his pitching mechanics or how he mixes up pitches), then this is perfectly legitimate, and really desirable. How you weight this information is of course the key.

How much does all this affect the forecast? Beats me. I won't pretend to know. How does the community process all that information (of that available as of Apr 1, 2008)? Well, look at community forecasts, fantasy auction bids, and whatnot. When it came time to actually make a decision where they actually had to put money and thought to it, what did people actually do and say?

The guys who have Cliff Lee on their fantasy teams: what is being offered in trades? No one is offering Santana. But, what is being offered? Some #3 starter I suppose? Were they offered #2 starters (say a .530-.540 pitcher)?

3. Finally, you cannot, simply cannot, use 129 PA to try to infer that something fundamental has changed, because his performance has been so historic, and therefore must conclude that *something* major has changed.

We can presume that something has changed, since we have more information (129 more PA). But, that information has been processed in my point #1. If someone wants to include even more information (point #2) without, at all, making reference to his ERA, K/BB, or any performance stat already include in point #1, fine. Please do so.

Does anyone disagree with anything I've said here?

***

I found Rob Neyer's post reasonable, and I have no issue with it.
   32. John Lynch Posted: May 02, 2008 at 04:21 PM (#2766478)
Lets say he posts an ERA+ of 108. Would you call that average? I'd call it above average.

I would say that fits within my expectation of "around average," especially given the bump his final ERA+ will get from his first month of pitching.

If he really struggled in April, you could say that last year was no fluke.

I disagree with this. If Lee had struggled to start this year, I would be predicting a rebound to an "around average" level of pitching.

But by pitching very well, its reasonable to think that 2007 was an injury-plagued aberration.

Only in the sense that we would have thought the exact same thing before the year started. It is entirely possible for a really crappy pitcher, which I do not believe Lee is, to pitch really well for a short period of time. For crying out loud, does no one remember Aaron Small?

Given the population that we're examining, MLB and nearly MLB caliber pitchers, there is just nothing so extraordinarily special about Lee's start to warrant suggesting that he's different than he was at the start of the year. We may quibble about what that expectation was, again, I say "around average," but we shouldn't be suggesting substantial improvement based on a month's worth of data.
   33. Dag Nabbit is part of the zombie horde Posted: May 02, 2008 at 04:22 PM (#2766482)
I seem to remember that Jason Marquis has a ridiculously good April in 2007 as well I think. Something about Marquis anyway, if someone wants to look it up.

Actually, it went beyond April. On May 9, he had an ERA of 1.70. There are some differences between him & Lee. Marquis pitched 47.2 innings with 24 Ks & 13 walks. Like Lee, he had an incredibly low BABIP, but unlike Lee his peripherals weren't that much.

Looking solely at the numbers, Lee's hot start reminds me of Jon Garland, circa 2005. His control improved dramatically and he rode it to a new level of performance.
   34. John Lynch Posted: May 02, 2008 at 04:24 PM (#2766486)
Tango makes my points better in #31 than I do elsewhere. Thanks!
   35. Jimmy P Posted: May 02, 2008 at 04:30 PM (#2766498)
Being insufferable tends to blunt the power of one's message.

So does getting a lot of simple facts wrong. Buzz Bissinger is shaking his fist at this post for dumbing down America's kids.

2. If you decide to bring in outside data (he was hurt, he had a minor league rehab that went ok, he changed his pitching mechanics or how he mixes up pitches), then this is perfectly legitimate, and really desirable. How you weight this information is of course the key.

I was listening to his start against the Mariners the other night on the radio. He seemed to be getting ahead of everyone 0-1, 0-2. Now, the Mariners aren't Murderer's Row, but strike one is great pitch. Has he increased his rate of getting up 0-1 as compared to his past? Is this even a good point, or more just a coincidence and correlation not causation? Just throwing ideas out.
   36. Tango Posted: May 02, 2008 at 04:32 PM (#2766501)
Doc/33: Garland in 2005 was 25 years old. There's no real shock when someone that young does something remarkable. Setting that aside...

His Marcel entering 2005 was an ERA of 4.57.

His Marcel entering 2006 was 3.98

So, his 2005 season was clearly fantastic, as Marcel can attest to. But, it only moved the needle so much even with having 901 more PA to use.

His ERA in 2006 was 4.51. And in 2007 it was 4.23. One would say that Marcel, as conservative and lucid as he was entering 2006, may have been still too optimistic.

This is the point if you constrain yourself to only looking at performance data. You cannot read trends or "new levels of talent", because the uncertainty level in the performance data is simply too great.

As long as no one is disputing this assertion, I really don't have any issue here.
   37. Tango Posted: May 02, 2008 at 04:34 PM (#2766503)
Has he increased his rate of getting up 0-1 as compared to his past? Is this even a good point, or more just a coincidence and correlation not causation? Just throwing ideas out.


This is the kind of thing that IS good to know. Whether that is persistent or transient would need to be studied.

As for comparison purposes, you can figure out his rates of getting to 0-1 by looking at the incomparable b-r.com splits pages.
   38. Mike Emeigh Posted: May 02, 2008 at 04:35 PM (#2766504)
Other than locating really, really well, the only thing I've noticed is that he's almost completely abandoned using his changeup against left-handed batters. Of course, the latter probably doesn't have much to do with his success this season.


It may very well, actually.

LHB killed Lee in 2007, to the tune of a .917 OPS in 117 PA (compared to .728 in 192 PA a year earlier); that's a substantial piece of the difference in his splits between 2006 and 2007, with RHB only going from .789 to .813. This year, so far, LHB have a .290 OPS against Lee in 42 PA (which is a lot of PAs with the platoon advantage for so early in the season).

-- MWE
   39. John Lynch Posted: May 02, 2008 at 04:39 PM (#2766507)
As long as no one is disputing this assertion, I really don't have any issue here.

Me neither.
   40. Dag Nabbit is part of the zombie horde Posted: May 02, 2008 at 04:40 PM (#2766508)
Doc/33: Garland in 2005 was 25 years old. There's no real shock when someone that young does something remarkable. Setting that aside...

Dang. Seems like he'd been around forever.
   41. bads85 Posted: May 02, 2008 at 04:42 PM (#2766509)
Yes but, for Cliff Lee, something HAS changed.


He is keeping the ball in the park this year so far. Whether that is a real change remains to be seen, but he has only allowed 1 HR in 37.7 IP this year, a much, much better rate than in previous years.
   42. Slinger Francisco Barrios (Dr. Memory) Posted: May 02, 2008 at 04:44 PM (#2766513)
We certainly would not know the outcome of the season in advance even if we knew the true talent levels of all the players involved with certainty. Random variation would insure that we not not be able to predict the results with certainty.

You sure about that?
   43. Mike Emeigh Posted: May 02, 2008 at 04:46 PM (#2766517)
The main issue in the analysis, here or elsewhere, is that if you decide to constrain yourself to only looking at performance data, then post 24 stands as correct.


I agree - and it is a good reason NOT to constrain yourself to looking only at performance data when you have the option to look at other data as well. If performance data is all that you have to view, then you should (a) make it very clear that's all you have and (b) show that you are aware of the limitations of said analysis.

Posts like Dayn's #28 - and some of the things that appear in the Bill James Gold Mine about pitch selection - are valuable bits of information to have.

-- MWE
   44. Slinger Francisco Barrios (Dr. Memory) Posted: May 02, 2008 at 04:51 PM (#2766524)
Maybe the question we would like answered is: what are the chances a "sucky" (defined how you like) pitcher will have a stretch like Lee is having? The odds seem low to me.
   45. Jimmy P Posted: May 02, 2008 at 04:55 PM (#2766532)
Dang. Seems like he'd been around forever.

People forget that Garland came up when he was 20 because he was putting up crazy weird numbers in the minors. Then, he didn't really perform well, but it was one of those situations where he wasn't learning any new tricks in AAA so they just kept him in the majors. Pretty good return for Matt Karchner.
   46. Mike Emeigh Posted: May 02, 2008 at 05:01 PM (#2766545)
We certainly would not know the outcome of the season in advance even if we knew the true talent levels of all the players involved with certainty. Random variation would insure that we not not be able to predict the results with certainty.


It ain't random.

Variation in a data set occurs for one of two reasons:

1. uncertainty or error in measurement
2. outside effects that are not measured

Knowing true talent with certainty eliminates #1 as a source of variation. But #2 is still there - there are outside effects (player health and environmental conditions, to name two) that we aren't measuring. If we could capture all of THOSE perfectly, too, then we WOULD be able to predict the outcome of the season with 100% certainty.

-- MWE
   47. Tango Posted: May 02, 2008 at 05:17 PM (#2766561)
If we could capture all of THOSE perfectly, too, then we WOULD be able to predict the outcome of the season with 100% certainty.


If you knew for certain everything, the players, the park, the weather, the groundskeeper, the partying... everything, you would STILL have random variation, and that random variation would be 1 SD = 0.5/sqrt(162).

It would be nowhere near 100% certain.
   48. Tango Posted: May 02, 2008 at 05:22 PM (#2766567)
By the way, MGL made two blog posts. Between the two, the one linked here has the least amount of information. This blog post is far better, and contains good research, like:

Finally, and again, just for fun, let’s look at pitchers, like Cliff Lee, who are not too old or young, and look like they have finally turned the corner. These are the guys who did not have a great projection going into the season, but have pitched lights out for the entire month of April. These are the guys that hundreds of articles are written about, right? Are they for real?

I am restricting to ages 27-31 and those starters who had a pre-season projection of greater than 4.25:

Starters

Good start

N age IP1 ERA1 prERA1 prERA2 ERA2
20 28.9 32 2.22 4.55 4.38 147 4.19

Well, they definitely did better than expected, but not quite the studs they looked like in April.


Read the rest of the post for the context.
   49. GGC don't think it can get longer than a novella Posted: May 02, 2008 at 05:26 PM (#2766570)
Thanks, #23. I appreciate it.
   50. John Lynch Posted: May 02, 2008 at 05:44 PM (#2766595)
You sure about that?

Yes. If I have a fair coin, I know with certainty the probability of both relevant events, heads or tails, when I flip the coin. There's no way that I can predict how many times 162 flips will come up heads with 100% accuracy. The same is true with a baseball player. Even if I knew for certain that Cliff Lee was a 3.81 true talent RA pitcher, there will still be variation.

Maybe the question we would like answered is: what are the chances a "sucky" (defined how you like) pitcher will have a stretch like Lee is having? The odds seem low to me.

The odds of a *particular* sucky pitcher having this stretch are poor. The odds of *any* sucky pitcher having this stretch are not.

If we could capture all of THOSE perfectly, too, then we WOULD be able to predict the outcome of the season with 100% certainty.

Sure, but that's tantamount to saying that if we could write a perfect simulation of the entire known universe, then we could predict the future. That's how many factors you'd have to account for to know with certainty the results a baseball season. And, heck, I'm not even sure we could do it then, given the vagaries of quantum mechanics (of which I know little, except that there is the potential for nondeterminism).
   51. GuyM Posted: May 02, 2008 at 05:49 PM (#2766600)
Finally, you cannot, simply cannot, use 129 PA to try to infer that something fundamental has changed, because his performance has been so historic, and therefore must conclude that *something* major has changed.

An interesting question is whether there is any performance over 129 PA that is so extraordinary that it tells us something important about the pitcher's talent (beyond adding this data to his prior performance and running a new Marcel). If a pitcher performs at a truly elite level, even for a short period, it may tell us something important about his talent. For example, there have been 43 games thrown since 1994 with a game score of 95 or higher. Nine pitchers account for 60% (26) of these:
Randy Johnson (6)
Roger Clemens (3)
Pedro Martinez (3)
Mike Mussina (3)
Curt Schilling (3)
Kevin Millwood (2)
David Wells (2)
Hideo Nomo (2)
Kerry Wood (2)
Clearly, all enormously talented pitchers at the time they threw these games. The other 17 pitchers are also almost all good-to-great: David Cone, Eric Milton, Erik Bedard, Francisco Cordova, Greg Maddux, Jason Schmidt, Frank Castillo, Johan Santana, John Lackey, Justin Verlander, Kenny Rogers, Pat Hentgen, Andy Benes, Bartolo Colon, Bobby Witt, Chan Ho Park, Chuck Finley. So a game score of 95+ would seem to be a pretty strong indication of pitching talent, even though it's a sample of just 1 game.

Now, I'm NOT saying that Lee's 0.96 ERA belongs in this category. It clearly doesn't. But if he had, for example, K'd 65 batters in his 38 IP, we'd have to seriously consider the possibility that his talent had changed.
   52. Tango Posted: May 02, 2008 at 05:51 PM (#2766602)
John, you were doing perfectly well until you said "That's how many factors you'd have to account for to know with certainty the results a baseball season."

I will chalk it up to having your guard down for a second. Post 47 backs you up. Stand your ground, since you are right!

***

I will try to make it even easier, to prove the falseness of Mike's 100% certainty statement. If you knew everything about everything, including the exact pitch and how he will throw it, and you knew the decision-making process of the batter, and how hard he will swing, and you knew where every fielder was, and you knew just about everything, and you knew that Santana has a .300 OBP and the hitter is Pujols a .400 OBP hitter, if you knew it all, the outcome for that one single PA will either be a safe or out. Someone is going to have a .000 and the other is going to have a 1.000.

And if you know everything about everything, that second PA will again give you a binomial result, of which you have no way to know with 100% certainty where it will land.

At the point where the ball leaves the pitcher's hand, there is uncertainty at the outcome of that PA.

And that uncertainty is the random variation around your true mean since you know everything about everything.
   53. Dizzypaco Posted: May 02, 2008 at 06:02 PM (#2766615)
Tango, I know this is a while back, but I think you partly missed the point of my post. I wouldn't argue that we would know with any certainty that something fundamental has changed for Lee based on one month's pitching. I was speaking rather about the likelihood that last year's performance was a fluke, and that we should expect him to return to the level of performance he achieved in 2005-2006. This is where I think we disagree. Based on the fact that he has pitched so well, I think it is more likely now that he return to the level of performance he achieved in 2005-2006 than it was before the season started. I think there was some non-zero possibility before the season started that the injuries and/or mechanical problems he experienced in 2007 were going to continue to affect him in the future, and I think that is much less of a possibility now.

If you are saying that it makes no difference in your projection systems whether a pitcher puts up an ERA of 0.96 or an ERA of 8.96 over the first month of a season, a year after he experienced an injury plagued, ineffective year, I think we'll have to agree to disagree.
   54. Mike Emeigh Posted: May 02, 2008 at 06:11 PM (#2766621)
you would STILL have random variation, and that random variation would be 1 SD = 0.5/sqrt(162).


That equation doesn't apply in the case we are discussing. It applies to cases where you don't know with certainty that you've measured every possible effect perfectly - cases where you have measurement error/uncertainty and/or unmeasured effects - and it is derived from the assumption that the impact of those sources of variation is likely to be manifested in a particular way. If you have no measurement error or uncertainty and no unmeasured effects - if you knew everything perfectly, which is what I wrote - you have no sources of variation. The reason you have so-called "random" variation is that you do NOT know everything perfectly. You don't know that measurement error=0, measurement uncertainty=0, and you have all possible effects perfectly captured. When that happens, the equation above is intended to give you a level of confidence that you've modelled the things that you HAVE captured correctly, and that there is no other source of significant variation that you haven't captured.

-- MWE
   55. Tango Posted: May 02, 2008 at 06:18 PM (#2766625)
Mike, can you respond to John Lynch's coin-flipping post at the start of post 52, and tell me how that is different than what you are talking about.
   56. Tango Posted: May 02, 2008 at 06:19 PM (#2766626)
Dizzy/55, in post 31 point 2, I said:

2. If you decide to bring in outside data (he was hurt, he had a minor league rehab that went ok, he changed his pitching mechanics or how he mixes up pitches), then this is perfectly legitimate, and really desirable. How you weight this information is of course the key.

How much does all this affect the forecast? Beats me. I won't pretend to know.


As far as I can tell, we have no source of disagreement.
   57. Mike Emeigh Posted: May 02, 2008 at 06:24 PM (#2766630)
At the point where the ball leaves the pitcher's hand, there is uncertainty at the outcome of that PA.


And that is because you haven't captured every possible effect with 100% certainty.

Let me make clear what I am saying. I am saying this:

THERE IS NO SUCH THING AS RANDOM VARIATION.

I'm not saying there is NO uncertainty. I'm not saying there is NO variation. I'm saying IT AIN'T RANDOM. And we need to get off "random variation", and "luck", and "regression to the mean", and other purely statistical talk, and start identifying and characterizing some of the other sources of variation in player performance, if we have any hope of using performance analysis in a valuable way going forward. There are tools out there that will let us do some of this, if we want to use them.

-- MWE
   58. Slinger Francisco Barrios (Dr. Memory) Posted: May 02, 2008 at 06:28 PM (#2766633)
Jeezoo Pete, nobody's claiming anything is "certain". That's a straw man. We all know at the least that if we played a table baseball game, where the talent level is 100% certain, there are factors that will change things from iteration to iteration. But if you had 100% certainty of the talent levels, it would be one whole helluva lot easier to predict what was going to happen over a season.

But of course that 100% certainty would have far-reaching implications that go beyond mere standings.
   59. Mike Emeigh Posted: May 02, 2008 at 06:30 PM (#2766635)
If I have a fair coin, I know with certainty the probability of both relevant events, heads or tails, when I flip the coin. There's no way that I can predict how many times 162 flips will come up heads with 100% accuracy.


All you know with certainty, in this circumstance, is that the COIN is unbiased. You don't know how much force is applied to the coin each time it is flipped; you don't know which side was facing up each time it is flipped; you don't know how the atmospheric conditions (humidity, in particular) might affect the balance of the coin - it might make a difference whether you are in Phoenix in January or Raleigh in July. If you could capture every source of variation with 100% certainty, you could predict how many times in 162 flips the coin will come up heads. But you can't.

-- MWE
   60. Tango Posted: May 02, 2008 at 06:34 PM (#2766643)
The outcome of any single event is random, around the true known mean.

If you know everything about everything, or you flip a balaned coin, or you roll a perfectly weighted die, then you have the true known mean for an event.

Once you have that mean, any single outcome is random. And the distribution of a whole bunch of those outcomes for that exact mean is the binomial distribution.

Mike, are you agreeing with this statement?

- If no, then I think we've both made out point, and I don't think either of us is budging.

- If yes, then we agree to this particular point, and I am clearly missing whatever else you are saying (which is fine). At least, we are agreeing on something!
   61. Tango Posted: May 02, 2008 at 06:42 PM (#2766650)
Mike/61: even if you know everything of everything about the coin, the conditions, the context, whatever. Everything. From the moment that just before the coin is struck, the outcome of the single next coin flip will either be a head or tail.

And, if you repeat that flip every single time in the exact same way in the exact everything, the next coin flip will still be head or tail.

Otherwise, if you always do something every single way the exact same way under the exact same conditions, and everything else the same, you would have to have heads showing either 100% or 0% of the time, whether it was 1 or 100 flips.

I have no doubt that any researcher who tries to create conditions as perfectly as possible to recreate the exact same flip in the exact same way, will end up with the binomial distribution centered around a constant mean, whether that mean is .500 or .300 or .800.
   62. PreservedFish Posted: May 02, 2008 at 06:50 PM (#2766658)
If you could capture every source of variation with 100% certainty

OK, so everything I know about Chaos Theory came from Jurassic Park, but there's got to be some application here, right? Or quantum uncertainty, or something?

I feel like a dork wading into an argument that has been had a million times and everyone else probably already knows like the back of their hand, but doesn't Mike's insistence that 100% of variables can theoretically be captured lead to the conclusion that the universe runs on a single nearly endless math equation which leads to the conclusion that there is no free will, that all of our consciousness is in fact predetermined by Newtonian forces reaching back to the birth of the universe? Maybe I'm going to far with this.
   63. John Lynch Posted: May 02, 2008 at 07:08 PM (#2766671)
I feel like a dork wading into an argument that has been had a million times and everyone else probably already knows like the back of their hand, but doesn't Mike's insistence that 100% of variables can theoretically be captured lead to the conclusion that the universe runs on a single nearly endless math equation which leads to the conclusion that there is no free will, that all of our consciousness is in fact predetermined by Newtonian forces reaching back to the birth of the universe? Maybe I'm going to far with this.


Right, it's a silly debate in that the Mike's assertion is that if we can perfectly simulate the entire known universe and that universe is deterministic, then there is no uncertainty. I'm not sure how to disagree with that.

Fortunately, it doesn't really matter as we know we're never going to get there.

I'm not saying there is NO uncertainty. I'm not saying there is NO variation. I'm saying IT AIN'T RANDOM. And we need to get off "random variation", and "luck", and "regression to the mean", and other purely statistical talk, and start identifying and characterizing some of the other sources of variation in player performance, if we have any hope of using performance analysis in a valuable way going forward. There are tools out there that will let us do some of this, if we want to use them.


Yes, and Tango and I both allow that in order to draw inferences from Cliff Lee's performance so far this year, you have to use other methods of evaluation. However, inasmuch as you lean purely on statistical data, that data will be subject to variation. You can spend as much time as you want quantifying reasons for that variation and reducing the uncertainty, but until you can construct the magic box above, there will be factors you have not accounted for.

Currently, there are a ton of factors we have not accounted for. Therefore, from the frame of reference that we currently have the variation appears and behaves randomly. Yes, there's a reason for it, but until we quantify it, there's no use saying "IT AINT'T RANDOM" as if the fact that there is some unknown reason for it impacts our current decision in a meaningful way. Just because I know that there are real, physical reasons why my coin comes up heads or tails doesn't mean that I should allow that to change my current view of the coin flip as random. Only once I quantify those other factors can I allow it to affect my view of the problem.

So, if what you're saying is that we need to keep examining other factors in player performance to eliminate uncertainty, who could disagree with that? But until we have examined and quantified them, there's no use saying that they aren't random.
   64. Arva Posted: May 02, 2008 at 07:13 PM (#2766676)
I think Mike's objection to random is that it suggests there's no reason. And there is a reason for every outcome, we just don't no them all. When you flip a coin, its either heads, tails, or on its side. Just because it lands on it head doesn't mean that it JUST HAPPENED! There was a reason why it landed on heads, we don't know it, but we might be able to find out, and writing it off as random doesn't seem productive.
   65. Mike Emeigh Posted: May 02, 2008 at 07:17 PM (#2766679)
What I have an issue with here is this, going back to the article:

I don’t KNOW what he is for sure, but my estimate, since it is based on science, is a heck of a lot better than Neyer’s


Apart from some of the factual errors that others have noted, it is certainly possible that Lee HAS changed something about the way that he pitches which in essence has changed his level of ability. Dayn pointed to one possibility in #28. I don't believe that "science" involves ignoring potentially relevant evidence.

-- MWE
   66. DCA Posted: May 02, 2008 at 07:18 PM (#2766681)
OK, so everything I know about Chaos Theory came from Jurassic Park, but there's got to be some application here, right?

Most of what Ian Malcolm says about chaos theory is wrong. Consider it fiction, just like the rest of the book.
   67. Tango Posted: May 02, 2008 at 07:24 PM (#2766685)
Arva: no, it is random. That's the point! As long as the true mean of an object is not zero and it's not 1, given the exact conditions that made you determine the true mean, then by definition, the outcome will be random (around that mean).

After all, if you've decided that the true mean is .100 given exact conditions, then that means 1 time in 10, the outcome will be different from the other 9 times. And why, given exacting conditions, would an outcome be different? Random variation, centered around the .100 mean.

Otherwise, PreservedFish is right, that we are talking about predetermined fate, and that the true mean of a binomial event is either .000 or 1.000, at which point, nothing in the universe is random.

My presupposition is that the true mean of something is greater than .0000 and less than 1.0000. And once you have that true mean, the result of the outcome will be random, it will be centered around that true mean, and it will be explained by the binomial distribution.

Now do we all agree?
   68. Arva Posted: May 02, 2008 at 07:31 PM (#2766693)
I'm afraid not, Tango. What yo useem to be saying is that there is no reason whatsoever for the coin landing on heads, and no reason for Cliff Lee's early season performance. I feel the reasons might be unknowable, but still exist. You say that the reasons don't exist.
   69. John Lynch Posted: May 02, 2008 at 07:31 PM (#2766695)
Apart from some of the factual errors that others have noted, it is certainly possible that Lee HAS changed something about the way that he pitches which in essence has changed his level of ability. Dayn pointed to one possibility in #28. I don't believe that "science" involves ignoring potentially relevant evidence.

Mike, I completely agree with this. We should use every available piece of information, properly weighted, in order to arrive at the best conclusion. Neyer does not do that in his piece. Is MGL's evaluation based on as much "science" as it could have been? No, certainly not, but I don't think that it's a stretch to say that it was based on more science that Rob's musing.

No one is disagreeing that extra evidence beyond the back of Lee's baseball card can be used for evaluation(handled properly, of course). Neither MGL nor Rob did that. MGL at least analyzed the back of Lee's baseball card correctly. That's my only point.
   70. Dizzypaco Posted: May 02, 2008 at 07:31 PM (#2766696)
I'm having a little difficulty following all of this, and its relevance here, but I think the distinction is, Tango assumes that the true mean is knowable, and Mike assumes that its not. Tango assumes that the true mean on a coin flip is exactly 50%, and therefore we can figure out the chances that x number of flips end up heads.

In the Cliff Lee case, a good case can be made that we don't know his true ERA. We don't know how good he really is, so we don't know the chances that he will put up a .96 ERA after five starts. The fact that he put up a certain ERA in 2005 and 2006 and 2007 gives us some indication how good he is at this point in time, but the level of certainty is much lower than some are assuming it to be.

Or am I off base in interpreting these statements? Is there agreement on whether you can ever know the true mean?
   71. Hal Chase Headley Lamarr Hoyt Wilhelm (ACE1242) Posted: May 02, 2008 at 07:32 PM (#2766697)
If you had enough time and money, you could collect all the information you needed to project the final standings of a future baseball season -- in fact, any future baseball season. But at that point they wouldn't be baseball seasons any more.
   72. PreservedFish Posted: May 02, 2008 at 07:32 PM (#2766699)
Consider it fiction, just like the rest of the book.

Wait a second ... fiction?
   73. Dizzypaco Posted: May 02, 2008 at 07:34 PM (#2766700)
Mike, I completely agree with this. We should use every available piece of information, properly weighted, in order to arrive at the best conclusion. Neyer does not do that in his piece. Is MGL's evaluation based on as much "science" as it could have been? No, certainly not, but I don't think that it's a stretch to say that it was based on more science that Rob's musing.

Neyer used common sense, combined with an non-thorough examination of the numbers. MGL used bad science alone. I'll take good science over common sense, but I'll take common sense over bad science. And that is what I believe happened here.
   74. Styles P. Deadball Posted: May 02, 2008 at 07:35 PM (#2766702)
So, is Cliff Lee....uhh..... gonna keep pitching well or not?
   75. Tango Posted: May 02, 2008 at 07:36 PM (#2766706)
Arva/71: I am saying that, given a true mean, there is no reason whatsoever for the outcome of any single event. And the aggregate of those single events will follow the binomial distribution around that true mean.

I have said nothing at all about this, as to how it pertains to Cliff Lee.

However, if we knew Cliff Lee's true mean (say it is, today, a smidge better than whoever is the best pitcher in baseball), given that we know that true mean, then his historic performance is completely random, around that true mean. It has to be.

Up for discussion, however, is what is Cliff Lee's true mean. And, the best guess, given performance data, given all information at hand, is that, today, he is somewhere between a #2 and #3 starter.
   76. John Lynch Posted: May 02, 2008 at 07:38 PM (#2766707)
Tango can correct me if I'm wrong, but I believe his point is that if the true mean is between 0.0 and 1.0 exclusive, then it doesn't matter what the reason is. We are unable to reason about it. We need to examine and quantify the reason for the variation. However, as soon as you quantify a reason for variation, the true mean changes. Eventually, you will quantify every reason in the entire universe and the true mean will be 0.0 or 1.0, by definition. If it is not, then you have not accounted for every reason. Until then, the unexplained factors may well have reasons, but we can't talk about them in meaningful terms and thus consider them random.
   77. Tango Posted: May 02, 2008 at 07:39 PM (#2766708)
Dizzy/73: the only assumption I'm making is that the true mean is neither exactly 0 nor exactly 1. Once you make that assumption, the outcome will follow the binomial distribution, and will do so in a random fashion, around that true mean.
   78. Forsch 10 From Navarone (Dayn) Posted: May 02, 2008 at 07:40 PM (#2766709)
You speaking from experience here, Dayn? :)

Yep!
   79. AROM Posted: May 02, 2008 at 07:42 PM (#2766712)
Lets say we know exactly what a pitcher will throw, and where he throws it, what the batter is expecting, how he reacts when he sees the pitch, the exact ability of his swing at this point in time. Say it's a grounder to the left side. We know how many hops it will take, and based on the batter's swing that we have perfect information on, and our knowledge of the ground, we know if it takes a bad hop or not. We know if the shortstop is capable of getting to it, and once he does, the exact strength and accuracy of his arm, in addition to the batter's exact speed and jump out of the box, and whether he's running at 100% effort or not. Finally, as the batter and ball arrive at first base within .01 seconds of each other, our intimate knowledge of the umpire's brain chemistry and whether he will interpret this as safe or out. There is no random variation here, we knew before the ball left the pitchers hand what would happen and if the batter would be on first or back in the dugout.

But so what? We aren't gods. We don't know hardly any of this information, and we aren't capable of knowing it. Pujols will face Santana in multiple situations that are, as best we can measure, identical, or close enough. Sometimes he'll get hits and sometimes he'll be put out. The estimated ability + random variation model works fine for me. Good luck to any would-be gods who think they can predict anything better with a different model. I eagerly await a model that can demonstrate consistent and repeated success will throwing out the concepts of random variation and regression to the mean.
   80. John Lynch Posted: May 02, 2008 at 07:44 PM (#2766715)
In other words, when I have a fair coin, the true mean is 0.5 that it will land heads. Of course, that's the true mean for any random condition at any random time in any random place. As soon as I start describing other conditions, I have eliminated sources of randomness, and the true mean changes.

That's how I understand it anyway.
   81. John Lynch Posted: May 02, 2008 at 07:46 PM (#2766717)
Good luck to any would-be gods who think they can predict anything better with a different model. I eagerly await a model that can demonstrate consistent and repeated success will throwing out the concepts of random variation and regression to the mean.

Exactly. That's why it's pointless to assert there is no randomness.
   82. AROM Posted: May 02, 2008 at 07:48 PM (#2766719)
So, is Cliff Lee....uhh..... gonna keep pitching well or not?


The 32/2 K/W ratio sticks out to me. Especially for a guy with somewhat ordinary command for his career (2.3 K/W). If his command is drastically improved, then he's a much better pitcher even if his BABIP goes back to the .295-.305 range, which it almost certainly will.

You need less sample size to detect a true change in K than anything else, and walk rate less than anything except K rate... he may have turned a corner. Might be a good idea to focus on pitchers who put up unreal K/W rates instead of super-low April ERA rates.

Fausto Carmona has a 2.60 ERA, an improvement from even last year's excellent season, but his 26 walks and only 13 K is weird and worrisome.
   83. Tango Posted: May 02, 2008 at 07:48 PM (#2766722)
John/79: what you are saying here:
Eventually, you will quantify every reason in the entire universe and the true mean will be 0.0 or 1.0, by definition.


is what I said here:
Otherwise, PreservedFish is right, that we are talking about predetermined fate, and that the true mean of a binomial event is either .000 or 1.000, at which point, nothing in the universe is random.


That is a purely philosphical discussion. If Mike wants to argue that his statements are correct on a philosophical predetermined fate level, that's fine. They are wrong on any other level.

My presumption, above all else, is that the true mean is neither 0 nor 1. And, once you have a situation where the mean is neither zero, nor 1, then everything that happens will be random, around whatever mean you happen to have. And that that distribution follows the binomial.
   84. Tango Posted: May 02, 2008 at 07:52 PM (#2766726)
John:

As soon as I start describing other conditions, I have eliminated sources of randomness, and the true mean changes.


As long as that true mean is not 0, nor 1, then the result will always be random around that true mean.

To argue that you can describe a set of conditions in such a way that you can possibly have a true mean of 0 or 1 is to explain fate.

If you reject the idea of predetermined fate, you automatically accept the idea of random variation around a true mean.
   85. Tango Posted: May 02, 2008 at 07:53 PM (#2766727)
Exactly. That's why it's pointless to assert there is no randomness.


Right.
   86. Arva Posted: May 02, 2008 at 07:53 PM (#2766729)
So both of you are saying that because the reasons are unknowable, they don't exist?
   87. Tango Posted: May 02, 2008 at 07:55 PM (#2766733)
You need less sample size to detect a true change in K than anything else, and walk rate less than anything except K rate... he may have turned a corner. Might be a good idea to focus on pitchers who put up unreal K/W rates instead of super-low April ERA rates.


Agreed. Rates like that are Clemens (check out his minors), Pedro, Saberhagen, Maddux, Eck. It would definitely be interesting to see someone have that kind of stretch. From that, you could infer something fundamental has changed. There is alot less variance with K rates and walk rates.
   88. AROM Posted: May 02, 2008 at 07:57 PM (#2766734)
I'm not stating one way or another if they exist. If they are unknowable, then whether they exist or not doesn't really matter to me, at least for predictive purposes.
   89. Tango Posted: May 02, 2008 at 08:01 PM (#2766737)
So both of you are saying that because the reasons are unknowable, they don't exist?


I don't know how you are saying that, but I'm glad that you are at least asking me, and not asserting it.

If there is something you don't know, that increases the uncertainty level around your true mean. Everything has a true mean, given whatever conditions. But, if there is a set of conditions that you have not quantified, then that adds a source of error to your true mean.

Say that a coin lands head 60% of the time when head shows and 40% of the time when tail shows. The true mean, if you don't know what's showing, is 50%, with a certain level of uncertainty.

So, you have one binomial distribution that follows random variation around a .400 mean, and another binomial distribution that follows random variation around a .600 mean.

If you don't know what's showing, you have a .500 mean, but a distribution that is wider than the binomial of something with a true .500 mean. The uncertainty of your mean widens the random distribution around it.
   90. Arva Posted: May 02, 2008 at 08:04 PM (#2766740)
Here's what I'm saying.

You throw a frisbee, and you can generally predict where its going. A bird comes out of know where and hits it, causing it to land in a creek that takes it three miles away. The bird hitting it was unknowalbe, the reason why the frisbee landed in the creek is not.

If the wind pushes Cliff Lee's pitch over the plate, and a batter hits it for a homer, you can't predict that. But the reason it got hit for a homer does exist. It not random, it has a reason, ableit an unpredictable one, and one that statistical models attempt to account for. But it does exist, even if it was unknowable before hand.

Randomness suggests that Lee's pitch went over the middle of the plate, "just because". there was a reason the pitch went over (wind), a reason the wind was blowing then and there (sun, gravity, tides, etc.), its just unpredictable. Randomness would suggest that those factors don't exist.
   91. Tango Posted: May 02, 2008 at 08:10 PM (#2766744)
Arva/93: no, randomness does no such thing. I am saying that if you know all the conditions around, everything, the flight pattern of a bird, the exact way the wind blows. Every single thing ever. Presume you know all that.

Randomness will say that as long as the result of the outcome was not predetermined, then the result will be random (around that mean).

What you are suggesting is that not only do you have all the conditions down, but you have the timing of it down as well. That you not only know how everything behaves, but you know when it will act.

Randomness presumes that the timing of events is up for grabs.

Predetermined fate presumes that the timing of events is set.

To argue that randomness doesn't exist is to argue that the timing of events is predetermined. Randomness only insists that the timing of events is not set in stone. It acknowledges the existence of all objects, conditions, and behaviours.

This is really a purely philosphical discussion, which I wish I would have realized 60 posts ago.
   92. Jonah Keri Posted: May 02, 2008 at 08:22 PM (#2766756)
Tango said: "The guys who have Cliff Lee on their fantasy teams: what is being offered in trades? No one is offering Santana. But, what is being offered? Some #3 starter I suppose? Were they offered #2 starters (say a .530-.540 pitcher)?"

--

In LABR, a league staffed with experts in the fantasy field (and oddly, me), I made the following deal to acquire Lee:

Sabathia (right before he had his first good start), B. Harris and Scutaro
for
Lee, Lowell (just before he came off the DL), Wakefield and German

Interestingly, the trade was in a way a referendum on both Sabathia's expected outcome for this season as well as Lee's.

The stat that I expect to regulate the most on Lee's part isn't K/BB or BABIP...it's HR % (apologies if someone else has already said this). Just one HR allowed in 5 starts despite inducing more flyballs than groundouts. That should regress over time. Still like Lee to do be above average this season, though.
   93. GuyM Posted: May 02, 2008 at 08:23 PM (#2766757)
I'm saying IT AIN'T RANDOM. And we need to get off "random variation", and "luck", and "regression to the mean", and other purely statistical talk, and start identifying and characterizing some of the other sources of variation in player performance, if we have any hope of using performance analysis in a valuable way going forward. There are tools out there that will let us do some of this, if we want to use them.

To me, this is the essential part of Mike's argument. And I think it's almost entirely wrong. Even if we can learn to identify new predictive factors in team and player performance, surely they will collectively remove only a fraction of what we currently call "random" variation. As Tango has shown, all of the efforts by a lot of smart people over many years to develop accurate predictions of hitter performance end up being little or no better than a simple Marcel projection. We don't need to abandon "luck" to do performance analysis; rather, performance analysis may -- MAY -- marginally shrink luck's share of the turf. And as for the claim that there are tools that will let us do this, do tell.
   94. Tango Posted: May 02, 2008 at 08:31 PM (#2766766)
Sabathia (right before he had his first good start), B. Harris and Scutaro
for
Lee, Lowell (just before he came off the DL), Wakefield and German


Fantastic. I would have preferred a 1-for-1 trade. Now, can you tell me how much each of those players were bought for at the beginning of the draft?
   95. Mike Emeigh Posted: May 02, 2008 at 08:33 PM (#2766770)
Randomness suggests that Lee's pitch went over the middle of the plate, "just because". there was a reason the pitch went over (wind), a reason the wind was blowing then and there (sun, gravity, tides, etc.), its just unpredictable. Randomness would suggest that those factors don't exist.


Yes. And they DO exist. And more to the point I'm making, SOME of them can be identified and characterized and built into performance models - and I don't understand why people decline to do so.

Cliff Lee's had a five-start string of performances that attracted the attention of Rob Neyer as, perhaps, being indicative of a change on Lee's part. Dayn Perry chose to take a look at what Lee was actually doing - and noted something that he WAS doing differently (namely not throwing changeups to LHB). And when I looked at the data, what I came up with was (a) most of Lee's performance decline in 2007 can be traced to his struggles against LHB, (b) he's doing something different in 2008, and (c) it's working so far - his platoon splits are back to where they were in 2006, in the relative sense (although in an absolute sense he's doing much better against everyone). That one thing - his different approach to LHB - suggests that his 2007 performance, where he struggled against LHB, might not be all that relevant to the 2008 version of Cliff Lee. That's what Rob Neyer *might* have picked up on, even if he didn't write it that way. And that's what WE need to do a better job of picking up on if we want to be relevant going forward - detecting when a change in performance might be signal and not noise - and not simply wedding ourselves to statements like "random variation" and "regression to the mean".

-- MWE
   96. PreservedFish Posted: May 02, 2008 at 08:34 PM (#2766772)
There are tools out there that will let us do some of this, if we want to use them.

Obviously the "some" must be a meaninglessly small fraction. If you are talking about every variable possible, you are not just talking about predicting the wind and weather in every square foot of every ballpark of every second of the next year, you are talking about to what degree the hot dog guy's barking will almost imperceptably distract Dmitri Young's attention while he is on the on deck circle on May 14 at 8:12 and miniscule variations in the calcium content of the Wheaties box that Joe Mauer bought on September 3 and consumed every morning the following week. To uncover the underlying factors behind those two events would require more learning than the sum total of human history combined, and multiplied exponentially.
   97. Tango Posted: May 02, 2008 at 08:38 PM (#2766777)
I just want to reiterate that random variation is simply the timing of events. And, outside of god or fate, as long as you can't control the timing of every single object in the specific universe (environment) being studied, random variation will always exist.

And since baseball is played by people, there are at least 10 players at any time to which we cannot control their timing. Ergo, random variation around a (specific unknown uncertain but somewhat guessable) mean.

You'd think I was the one who invented the binomial distribution!
   98. Dizzypaco Posted: May 02, 2008 at 08:43 PM (#2766784)
And that's what WE need to do a better job of picking up on if we want to be relevant going forward - detecting when a change in performance might be signal and not noise - and not simply wedding ourselves to statements like "random variation" and "regression to the mean".

I agree with this. The fact is that we simply don't know if Lee's performance represents a change in true performance or simple random variation from a level of performance he has shown in the past. And the fact that MGL said with absolute certainty that its random variation does not speak well of his analysis.
   99. GuyM Posted: May 02, 2008 at 08:54 PM (#2766799)
And that's what WE need to do a better job of picking up on if we want to be relevant going forward - detecting when a change in performance might be signal and not noise - and not simply wedding ourselves to statements like "random variation" and "regression to the mean".

I agree with this. The fact is that we simply don't know if Lee's performance represents a change in true performance or simple random variation from a level of performance he has shown in the past. And the fact that MGL said with absolute certainty that its random variation does not speak well of his analysis.

And that's precisely what I disagree with. "We simply don't know" is no different in practice than saying it's "random variation." Sure, it sounds more modest, and lets you criticize MGL for sounding arrogant. But what practical difference is there? "I don't know" still means your best guess for the future is the same as your best guess before the 5 games, exactly MGL's point.

Now, Mike is kind of implying that we DO know something, because we know he stopped throwing the change. But what does that really tell us? We already expected a huge regression in Lee's .917 OPS against LHHs (a reverse platoon split), since it was based only on 117 PAs and his career mark is .750. So of course it improved. Does that have anything to do with not using his change? I don't know, and best I can tell, Mike doesn't either.

So what exactly do we "know" about Lee, or what exactly can we predict about him, that a traditional regression-to-the-mean approach doesn't account for? I still haven't heard what that is.
   100. Tango Posted: May 02, 2008 at 08:58 PM (#2766804)
And the fact that MGL said with absolute certainty that its random variation does not speak well of his analysis.


He said that? Please provide the quote.

This is what MGL wrote just a few hours later:
What do good and bad starts by pitchers tell us?... Finally, and again, just for fun, let’s look at pitchers, like Cliff Lee, who are not too old or young, and look like they have finally turned the corner. These are the guys who did not have a great projection going into the season, but have pitched lights out for the entire month of April. These are the guys that hundreds of articles are written about, right? Are they for real? ... Well, they definitely did better than expected, but not quite the studs they looked like in April. I will grant them a (very) partial “for real” on the order of .25 runs or so...


When I showed my pre-08 Marcels, and my to-date Marcels, I said:
Anyway, even if you want to take his 2.01 FIP of 2008 and give it 10% weight, and take a 4.65 FIP forecasted going into 2007 and give it 90% weight, you end up with a 4.40 FIP.


And 4.65 minus 4.40 is 0.25.

MGL's study of what happens to pitchers after hot starts, and Marcel's view as to what our updated view of their true mean, is identical. Our expectation for Cliff Lee's true mean is having an ERA of 0.25 runs better.

No one, I don't think, has said that our estimate of his true talent has not changed at all. If they did, they are wrong.

Given ONLY performance data, our optimism has been raised by 0.25 runs.

Now, can we reduce our uncertainty of his true mean? Remember, if we say he has a true mean of 4.40, we are really saying "I'm 90% sure his true mean is between 4.00 and 4.80", or some such. Can we maybe get a tighter range at 95%? That if we look at particular splits that maybe our true mean will put him at 4.20 or 3.90, with a +/-0.30 range? Sure, absolutely and definitely.

This is what I was talking about in my initial post, with the three points.

Above all else, don't discount random variation. It is real and powerful.
Page 1 of 2 pages  1 2 > 

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
JE (Jason)
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogCurt Schilling not hiding his scars - ESPN Boston
(22 - 3:24am, Oct 25)
Last: TVerik, the gum-snappin' hairdresser

NewsblogBuster Olney on Twitter: "Sources: Manager Joe Maddon has exercised an opt-out clause in his contract and is leaving the Tampa Bay Rays immediately."
(81 - 2:03am, Oct 25)
Last: Dan

Newsblog9 reasons Hunter Pence is the most interesting man in the World (Series) | For The Win
(16 - 1:35am, Oct 25)
Last: base ball chick

NewsblogJohn McGrath: The Giants have become the Yankees — obnoxious | The News Tribune
(12 - 1:31am, Oct 25)
Last: Into the Void

NewsblogOT: The Soccer Thread, September 2014
(916 - 1:29am, Oct 25)
Last: J. Sosa

Newsblog2014 WORLD SERIES GAME 3 OMNICHATTER
(515 - 1:26am, Oct 25)
Last: Pat Rapper's Delight

NewsblogOT: Monthly NBA Thread - October 2014
(385 - 1:05am, Oct 25)
Last: tshipman

NewsblogOT: Politics, October 2014: Sunshine, Baseball, and Etch A Sketch: How Politicians Use Analogies
(3736 - 12:23am, Oct 25)
Last: The Yankee Clapper

NewsblogHow top World Series players ranked as prospects. | SportsonEarth.com : Jim Callis Article
(21 - 12:04am, Oct 25)
Last: Howie Menckel

NewsblogRoyals get four AL Gold Glove finalists, but not Lorenzo Cain | The Kansas City Star
(14 - 11:59pm, Oct 24)
Last: Zach

NewsblogDid Adam Dunn Ruin Baseball? – The Hardball Times
(73 - 11:22pm, Oct 24)
Last: Walt Davis

NewsblogBeaneball | Gold Gloves and Coco Crisp's Terrible 2014 Defense
(2 - 7:47pm, Oct 24)
Last: Walt Davis

NewsblogOT: NBC.news: Valve isn’t making one gaming console, but multiple ‘Steam machines’
(871 - 7:22pm, Oct 24)
Last: Jim Wisinski

NewsblogDealing or dueling – what’s a manager to do? | MGL on Baseball
(67 - 6:38pm, Oct 24)
Last: villageidiom

NewsblogThe ‘Little Things’ – The Hardball Times
(2 - 6:34pm, Oct 24)
Last: RMc is a fine piece of cheese

Page rendered in 0.9286 seconds
52 querie(s) executed