Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Thursday, January 06, 2005

Baseball America - Schwarz - The Great Debate

For the past two years, the scouting and statistics communities have feuded like members of rival families. Baseball lifers who evaluate players with their eyes are derided as over-the-hill beanbags who don’t understand the next frontier. Numbers-oriented people are cast as cold, computer-wielding propellerheads with no appreciation for scouting intangibles. Not surprisingly, the camps have grown so polarized that they have retreated to their respective bunkers rather than engage in open and intelligent debate.

Until now.

SG in ATL Posted: January 06, 2005 at 11:38 PM | 219 comment(s)
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Page 3 of 3 pages  1 2 3
   201. Al Posted: January 07, 2005 at 11:52 PM (#1064986)
Reading this thread jogged a memory. In a former life, I was a psychology professor. A dude named Paul Meehl wrote a book comparing Statistical and Clinical predictions of human behavior (think statguy vs. scout). The point of the book: Statistical prediction is far superior to clinical prediction. That was in 1954. I've lost touch with this field, but I just did a quick search and found a recent paper describing how widespread this phenomenon is (conflict between the statguys and scouts). Here's an excerpt with some really interesting (at least to me) examples.

"In 1954, Paul Meehl wrote a classic book entitled, Clinical Versus Statistical
Prediction: A Theoretical Analysis and Review of the Literature. Meehl asked a simple
question: Are the predictions of human experts more reliable than the predictions of
actuarial models? To be a fair comparison, both the experts and the models had to make
their predictions on the basis of the same evidence (i.e., the same cues). Meehl reported
on 20 such experiments. Since 1954, every non−ambiguous study that has compared the
reliability of clinical and actuarial predictions (i.e., Statistical Prediction Rules, or SPRs)
has supported Meehl’s conclusion. So robust is this finding that we might call it The
Golden Rule of Predictive Modeling: When based on the same evidence, the predictions
of SPRs are more reliable than the predictions of human experts.
It is our contention that The Golden Rule of Predictive Modeling has been
woefully neglected. Perhaps a good way to begin to undo this state of affairs is to briefly
describe ten of its instances. This will give the reader some idea of the range and
robustness of the Golden Rule.
1. A SPR that takes into account a patient’s marital status, length of psychotic distress,
and a rating of the patient’s insight into his or her condition predicted the success of
electroshock therapy more reliably than a hospital’s medical and psychological staff
members (Wittman 1941).
2. A model that used past criminal and prison records was more reliable than expert
criminologists in predicting criminal recidivism (Carroll 1982).
3. On the basis of a Minnesota Multiphasic Personality Inventory (MMPI) profile,
clinical psychologists were less reliable than a SPR in diagnosing patients as either
neurotic or psychotic. When psychologists were given the SPR’s results before they
made their predictions, they were still less accurate than the SPR (Goldberg 1968).
4. A number of SPRs predict academic performance (measured by graduation rates and
GPA at graduation) better than admissions officers. This is true even when the
admissions officers are allowed to use considerably more evidence than the models
(DeVaul et al. 1957), and it has been shown to be true at selective colleges, medical
schools (DeVaul et al. 1957), law schools (Dawes, Swets and Monohan 2000, 18) and
graduate school in psychology (Dawes 1971).
5. SPRs predict loan and credit risk better than bank officers. SPRs are now standardly
used by banks when they make loans and by credit card companies when they approve
and set credit limits for new customers (Stillwell et. al. 1983).
6. SPRs predict newborns at risk for Sudden Infant Death Syndrome (SIDS) much better
than human experts (Lowry 1975; Carpenter et. al. 1977; Golding et. al. 1985).
7. Predicting the quality of the vintage for a red Bordeaux wine decades in advance is
done more reliably by a SPR than by expert wine tasters, who swirl, smell and taste the
young wine (Ashenfelter, Ashmore and Lalonde 1995).
8. A SPR correctly diagnosed 83% of progressive brain dysfunction on the basis of cues
from intellectual tests. Groups of clinicians working from the same data did no better
than 63%. When clinicians were given the results of the actuarial formula, clinicians still
did worse than the model, scoring no better than 75% (Leli and Filskov 1984).
9. In predicting the presence, location and cause of brain damage, a SPR outperformed
experienced clinicians and a nationally prominent neuropsychologist (Wedding 1983).
10. In legal settings, forensic psychologists often make predictions of violence. One will
be more reliable than forensic psychologists simply by predicting that people will not be
violent. Further, SPRs are more reliable than forensic psychologists in predicting the
relative likelihood of violence, i.e., who is more prone to violence (Faust and Ziskin
1988).
Upon reviewing this evidence in 1986, Paul Meehl said: “There is no controversy in
social science which shows such a large body of qualitatively diverse studies coming out
so uniformly in the same direction as this one. When you are pushing [scores of]
investigations [140 in 1991], predicting everything from the outcomes of football games
to the diagnosis of liver disease and when you can hardly come up with a half dozen
studies showing even a weak tendency in favor of the clinician, it is time to draw a
practical conclusion” (Meehl 1986, 372−3).

Full paper here:
http://hypatia.ss.uci.edu/lps/psa2k/fifty-years.pdf
   202. philly Posted: January 08, 2005 at 12:13 AM (#1065002)
Zach #198

One idea I've played around with recently is the idea of normalized similarity scores. In each category of a player's stat line (BB, 1B, 2B, etc.) list the percentage of the player's total offensive value that is derived from drawing walks, hitting singles, etc.

I kicked around trying to dom something like that at one point, but didn't really take it anywehere systemically.

But one thing I noticed is that an average the ratio of production that comes from BB and singles to xbh changes dramatically as you move up the player development ladder.

Relative to your peers, you can be very productive in the lower minors with just walks and singles. By the time you get to the majors you need to be smacking a good share of xbh just to keep up with your new peer group.

I'm not exactly sure what to do with that observation, but I think it's true.
   203. Tango Tiger Posted: January 08, 2005 at 12:45 AM (#1065048)
Zach, if it helps, I do my sim scores somewhat along those lines.

***

Chris, Walt said not to use correlation in the specific case where you have multiple outcomes and we are trying to correlate each one... a drop in walks means an increase somewhere else, etc. But, if you do it as a binary approach, you don't have this issue, and I think Walt would have no problem with regression here. (I'm sure he'd have, correctly, an issue with the ordering of the binary tree.)

As well, I don't think your arbitrary ".010" range for OBA and MGL's analysis are discussing the same thing at all. Furthermore, taking on base / PA minus hits / AB is not even algebraically sound. I understand why you like it, but at what we are discussing, this is not what you should be using.

In any case, even if we agree with the .010, then you've got to do this for all the events.

You might counter that since a HR is 4 times more valuable than a walk, then your interval will only be .0025 for the HR (1.5 HR per 600 PA). Given that the correlation is weak for HR/PA, you might not even find any player matches!

Anyway, we should stick with the same methodology.
   204. BGW1 Posted: January 08, 2005 at 01:07 AM (#1065076)
Might the fact that MLEs are a projection be biasing the outcome of MGLs study from #36?

MLEs should include a hidden regression to the mean (since promotions are made primarily from above average hitters, this should increase the magnitude of the downward MLE adjustment). MLEs should also contain a hidden age adjustment (since promotions are made primarilt from those approaching their peak, this should mitigate the downward adjustment. This might explain why MLEs out-performed major league data in areas where regression coefficients are highest and age curves the steepest (s, d, t, hr) and under-performed in the remaining cases (bb, so).

The easiest test of the value of minor league data is to compare the best minor to major protection system with the best major to major projection system and see which one comes out on top.
   205. BGW1 Posted: January 08, 2005 at 01:14 AM (#1065081)
Al: The problem with the analogy is that scouts have a different set of data than do statisticians. Quite frankly, the test set up by the book was rigged because computers are capable of handling a smaller variety of information, but what they can handle they do well. Humans can assimilate a huge variety of data, but they lack the cognitive capacity to build a perfect model to process it. The test only used data that computers could handle though. If the competition were to diagnose a medical patient's illness just based upon a videotape of their physical appearance, behavior, and visible and audible symptoms, a doctor would do quite well, while a computer would have a difficult time.
   206. mgl Posted: January 08, 2005 at 04:07 AM (#1065404)
BGW1, yes, my MLE's do include that "hidden regression" that you speak of. You can't just divide all players' (who are promoted) major stats by their minor stats, as some people do. I don't remember off the top of my head how I do it, but I do account for the "natural" regressin you speak of.

And yes, whoever mentioned it, I rarely if ever talk about scouts. I denigrade GM's and teams all the time, but rarely if ever scouts or "scouting." Cameron's comments, about me at least, are a good example of using misinformation or implying something that isn't true to "prove" a point that may or may not be true, and in any case, doesn't really "mean" anything (that I am rude or arrogant, that I don't champion some "cause" properly, or whatever).

Tango, the first numbers in my data are MLE's. The league average BB MLE in the minor leagues is like .83 or something like that. If the players in the chart (the ones who were called up) had an average BB rate of .89, they were above average in the minors.

I can't comment on Chris' rebuttal of my data and conslusions (that minor league BB rate is very predictive of major league BB rate, or at least is the opposite of "not very predictive"), since I have no idea what he is talking about. The data are straighforward enough for one of our resident statisticians to easily say whether minor league BB rate has around the same predictive value as HR and K rates. Maybe the sample size is too small to say anything with confidence. I don't know. To engage in semantic arguments when the data is out there for all the world to see is ridiculous. If regressing minor league BB rate on next year's major league BB rate for payers with over 299 PA's in both years tells us what I think it tells us, given my limited statistical acumen, then clearly ML BB rate is quite predictive, around as predictive as HR and K rates, and more predicitve than hit rates. If a simple PPCC does not tell us what I think it tells us about the predictablity of minor league stats, then I am more than willing to listen to one of these statisticians...
   207. mgl Posted: January 08, 2005 at 04:09 AM (#1065406)
...tell me why it doesn't...
   208. BGW1 Posted: January 08, 2005 at 05:15 AM (#1065432)
mgl:

Can you give us the correlations for your major-to-major projections?
   209. mgl Posted: January 08, 2005 at 06:28 AM (#1065444)
I already posted that, I think. These are not projections. These are yty correlations for actual sample rates. Anyone can quickly do this (the major ones) with a career database and a spreadsheet. The only thing I did differently was I used my "context neutral" (park and opponent adjusted) stats, and not the raw ones, but for BB rates, it hardly makes a difference, at least in terms of the park adjustments (BB and K park factors are almost all near 1.00)...
   210. CFiJ Posted: January 08, 2005 at 06:29 AM (#1065445)
If I'm running a multi-million dollar business I think I can demand a higher level of thinking.

Exactly. If there was one thing I got out of my psychology education, it was that human perception is remarkably fragile, fickle, and ephemeral. Al's post is but one example. Eyewitness testimony is stupendously unreliable; our visual memory is run by the same cognitive functions as our imagination, leading to all sorts of memory corruption. Our cognitive processes are all set up not to make correct decisions, but to make easier, quicker decisions. Human judgment is rife with bias and error.

It's gets us by in our daily lives. It's good enough for that. But with some time, effort, and training, one can do better. At least when focusing on certain things. And the owner of a team should be able to expect that kind of focus in the decision making of his top officers. As should a GM.

Now, that said, I don't think the main problem with the sabermetric image is arrogance. It's snarkiness. Part of this is thanks to the internet; a good deal of stat-head snarkiness is inherited from Usenet, where snarkiness is prized. Part of it is from Bill James. BPro has come up with thousands of humorously snarky ways to denigrate bad players and GMs. They've come up with maybe some hundreds of ways to praise good players and GMs. Not that I'm singling out BPro, because the same is true of Neyer, and Primer. It's so much easier and more fun to go negative than stay positive.
   211. Tango Tiger Posted: January 08, 2005 at 11:10 AM (#1065508)
MGL, ah, got it.

Then, are your MLE conversions that you applied for the 2001-2004 time period data based on the data of that sample? I would hope not!

That is, you create your MLE conversion factors based on the out-of-sample data (say 1995-2000), and then apply those rates to the 2001-2004 time period.
   212. The Politics of Torre: How the HOF Really Works Posted: January 08, 2005 at 11:26 AM (#1065518)
Al, I think that book was discussed during a presentation at SABR34 in Cincy. IIRC, Kenneth Heard who is the chairman of the Committee on Science and Baseball did the presentation with another member.

IIRC, they mainly discussed how it applies to managers but they may have mentioned scouts as well.
   213. BGW1 Posted: January 08, 2005 at 04:34 PM (#1065830)
MGL: " already posted that, I think. These are not projections."

I know this and was asking if you could use major league projections for the correlation rather than what you did (relatively raw data).

MLEs are projections. You are comparing MajorY to MajorY+1 and "MinorY" to MajorY+1. The problems is that "MinorY" is itself a projection of MajorY+1. That is why the correlations for MLEs appear higher than those for major to major.

The solution is to use *full projections* for both minor and major leaguers and compare the correlations of your projections for each. This is also the most proper way to do the study in the first place, since the question was not do minor to major stats correlate, but can one predict future major league performance as well with minor data as with major.
   214. The Bones McCoy of THT ... of DOOM! Posted: January 08, 2005 at 04:48 PM (#1065854)
Schwarz - The Great Debate

So, this is a battle between sabermetrics and scouting to see who has the bigger Schwarz?

Best Regards

John
   215. mgl Posted: January 08, 2005 at 06:38 PM (#1065977)
Then, are your MLE conversions that you applied for the 2001-2004 time period data based on the data of that sample? I would hope not!

That is, you create your MLE conversion factors based on the out-of-sample data (say 1995-2000), and then apply those rates to the 2001-2004 time period.


I know that is an extreme no-no to use the same data set for the model and then for the testing. In some cases, that is luducrous. In this case, it's not all that bad. No matter what year(s) I use for establishing the models, I get around the same coefficients. But...

since the testing is a correlation and the model is a linear coefficient, it doesn't really matter what the coefficient is - the correlation (r) will be the same. In fact, no MLE coefficient is even necessary to generate the same correlation coefficient! You can do the same regression with raw minor league numbers, and you will come up with the same r! Which is another reason why I thought Chris' statement was so bizarre. You don't even have to know anything about MLE conversions in order to see that minor league BB rates correlate will with major league BB rates. Just look at players with low (normalized) BB rates in the minors and then look at their BB rates in the majors (again, normalized). Do the same for players with high BB rates. If there were little or no correlation, as Chris and Grabiner suggested, both groups would have around the same (league average) BB rates in the majors. I don't think ANYONE thinks that is the case, do they? Does Chris think that both groups (say you broke minor leaguers up into just 2 groups) will have around league average BB rates in the majors? If his assertion is true (that there is little or no correlation), then, by definition, both groups would have around the same (league average) BB rates in the majors. Of course, they do not! Seriosuly, we know that batting eye (BB and K rates) is an extreme talent in baseball. How can players have a certain BB rate in the minors based on their batting eye talent (and patience, hitting style, whatever), and then have that disappear or go completely random in the majors. Some things are obvious at first glance. This is one of those things. I don't know where Chris gets some of these things. In fairness to him, Grabiner did say essentially the same thing, and I have always thought of him as quite sensible.

BGW1, I see what you are saying and I don't have time right now to reply accordingly, but...

MLE's are not projections! They could be, but they are not. At least mine are not, and I don't think that James' were either. They do NOT include regression. The "hidden regression" that we talked about is accounted for in the model, but it is not "included" in the model (formula) itself.

IOW, of you looked at all players minor league stats from year X and major league stats from year X + 1 and ran a linear regression, you would of course get a formula which includes a regression - a coefficient and a constant. That essentially includes an adjustment (for the oppositon talent) and a regression to the mean (due to sample error in the year X stats). That is fine. We don't do it that way however. We could, but we don't, for various good reasons. We just use the coefficient.

So, for example, if player A hit 40 HR's in the minor leagues, his MLE might be 30 HR's, but that is certainly not his MLE projection! You would then have to take that 30 HR estimate of "what he would have done in the majors" (which is kind of a fiction) and regress it, depending upon how many PA's it is based on. So he might have a MLE projection of 22 HR's. One reason we don't include the regression in the MLE's is so we can compare apples to apples. We WANT the flutcuation to be included in the MLE's. We want to be able to combine MLE's with sample major league data when we do projections. We couldn;'t do that if MLE's inclused the regreessions necessary to do projections. If a player hits 40 HR's in the minors in 2003 and then 15 HR's in the majors in 2004, we want to combine 30 plus 15 and then do the regression, based on the total minor plus major PA's to come up with a projection.

Capeesh??
   216. BGW1 Posted: January 08, 2005 at 07:51 PM (#1066046)
I need an explanation of how MLEs are not projections. I've read that they are not; I just don't buy it.

My position is that they are projections, albeit not very good ones: MLEs are calculated based upon data collected at different times, and so are inherently future looking. At the very least they possess this projection-like quality. Is your argument just that they do not calculate regression factors based upon the number of PA's, but rather use a single regression factor calculated based upon the average of all player's PAs? If so, this means MLEs are bad projections, not that they aren't projections.

To re-iterate my point:

An MLE is a park adjustment (relative to one's league), a league adjustment (relative to other minor leagues), and a level adjustment (relative to the major leagues). The level adjustment is calculated by finding the average change in performance when a player moves from a minor league to the majors.* Since players cannot play in two leagues at once, this factor contains information on how players change over time.

Standard projections differ from the level adjustment because instead of just using one factor for every player, they use a unique factor for every player (actually a series of factors) based upon the player's age, PA, performance relative to the mean, the variability of the particular statistic being projected, etc. This should be expected to substantially increase accuracy.

However, minor league players that get promoted are by and large very similar to one another. Most are above average offensively for their level, so the direction of their regression will be the same for almost all -- they are more likely to have been lucky than unlucky or even. Most are young, so the direction of their age adjustment in a projection will almost all be the same -- they are approaching their peak. This means that most of the population used to calculate the level adjustment in an MLE (which again is based upon the difference between Minor Year 1 to Major year 2) will be a set of lucky young players in year 1 and average luck (maybe...) players, one year closer to their peak in year 2.

Perhaps I am missing something. I would really like to know. I wish I had a db of minor league stats, so I could analyze the following testable hypotheses:

A simple projection based on the standard factors and MLEs calculated using the same data should not be very different from one another.

Minor league projections should have low regression and age adjustments when the input data are MLEs, relative to minor to minor projections or major to major projections.

A level adjustment using just demotions to derive the level adjustment should look a lot different from an MLE of just promotions.

A level adjustment derived using just data from post-peak players should be smaller than one derived using just data from younger-than-peak players.

*Caveat: I have assumed that level adjustments contain more data from player's being promoted than demoted. This seems reasonable to me.
   217. kevin Posted: January 08, 2005 at 10:06 PM (#1066328)
This argument reminds me of what happened in the biological sciences after the genetic code was cracked. Shortly after that, experimental techniques utilizing the latest genetic findings were able to rapidly answer important questions in the biological sciences that the traditionalists were slugging it out with for decades. Soon, the old school guys found their research could no longer get funded and they were relegated to the scrapheap, condemned to the classroom or the admin office.

It seems the two old scouts are more concerned with the job security of themselves and their colleagues than they are with performing their jobs as best they can. So one of their buddies lost their job because a stat guy was hired in his place. Tough ####. It's a hardball world and you adapt or die. We all have to deal with the same thing. Major league baseball is a business, it's not an employment program.
   218. Kiko Sakata Posted: January 08, 2005 at 10:30 PM (#1066365)
I took the walk data that MGL posted on the previous page and ran a simple regression:

Major BB Rate = a + b*(Minor BB Rate)

The values for a and b were 0.269 and 0.643, respectively. The t-statistic on b was 8.93, meaning that the probability that b is actually equal to zero is less than 0.01%. The r-squared on the regression was 0.454.

I don't know how that compares to other components(K, HR, etc.), but it seems to me from this that minor-league walk rate is a pretty good predictor of major-league walk rate.
   219. Chris Dial Posted: January 09, 2005 at 12:13 AM (#1066592)
I don't know where Chris gets some of these things. In fairness to him, Grabiner did say essentially the same thing, and I have always thought of him as quite sensible.


Thanks, mgl. You're so sweet.
Page 3 of 3 pages  1 2 3

You must be Registered and Logged In to post comments.

 

<< Back to main

Support BBTF

donate

My Bookmarks

You must be logged in to view your Bookmarks.

Vivid Seats is a sports ticket broker, concert ticket broker and theater ticket broker offering the best baseball tickets like Yankees tickets, Cubs tickets, and Red Sox tickets, as well as Police reunion tour tickets and Jersey Boys tickets.

We have baseball tickets, the NFL schedule, college football tickets and Cowboys tickets. We have NBA tickets like Celtics tickets and Lakers tickets. Plus, buy Giants tickets, Patriots tickets and Colts tickets. Also check out our MLB baseball schedule

Buy Cheap MLB Tickets

Concerts Theatre NFL Angels Dodgers MLB Celtics Theater NBA Tickets Venues NHL Lakers Tickets NFL Yankees NHL Phillies NBA Wicked Marlins MLB Concerts Cubs Mets Red Sox Wicked WWE Red Sox Mets Yankees Dodgers

Page rendered in 0.6480 seconds
81 querie(s) executed