About Baseball Think Factory | Write for Us | Copyright © 1996-2008 Baseball Think Factory
User Comments, Suggestions, or Complaints | Privacy Policy | Terms of Service | Advertising
|
| |||
Baseball Primer Newsblog — The Best News Links from the Baseball Newsstand Thursday, January 06, 2005Baseball America - Schwarz - The Great DebateFor the past two years, the scouting and statistics communities have feuded like members of rival families. Baseball lifers who evaluate players with their eyes are derided as over-the-hill beanbags who don’t understand the next frontier. Numbers-oriented people are cast as cold, computer-wielding propellerheads with no appreciation for scouting intangibles. Not surprisingly, the camps have grown so polarized that they have retreated to their respective bunkers rather than engage in open and intelligent debate. Until now. |
My BookmarksYou must be logged in to view your Bookmarks. Hot TopicsNewsblog: Boston Globe: Tazawa agrees to 3-year Red Sox contract (13 - 1:02pm, Dec 01) Last: jacksone (AKA It's OK...) Newsblog: Conlin: Time to say bye, bye to McNabb and Burrell after decade in Philly (23 - 1:01pm, Dec 01) Last: Greg Maddux School of Reflexive Profanity Newsblog: McNary: George Steinbrenner a Hall of Famer? Puh-lease! (26 - 12:57pm, Dec 01) Last: Bring Me the Head of Alfredo Griffin (Vlad) Newsblog: THT: Daly: Hardball diplomacy (11 - 12:47pm, Dec 01) Last: The Politics of Torre: How the HOF Really Works Newsblog: TSN: Justice: Yanks sure could use their other closer in Sabathia derby (4 - 12:43pm, Dec 01) Last: RB in NYC (Now with Christmas Spirit!) Newsblog: The Columnists: Allen: Saluting two men who changed baseball history: Robinson & Maris
(8 - 12:40pm, Dec 01) Last: Howie Menckel |
||
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
Besides, as post 97 notes - everything will end up contaminated anyway. People talk/read...
Voros: Listen Eddie, there's these two guys, one walks a ton, and the other never walks. I think the first guy has a great eye.
Eddie (after scouting): Do you know why the first guy always walks? He can't hit, and he's facing pitchers that can't find the strike zone. He's taking pitches because he's smart. But, I saw him against a guy who can throw strikes, and he can't hit a lick.
Voros: Ah, I see. So, let me update my model with your observations, and look for those types of pitchers, and try to weight them better, so they don't influence my analysis.
Eddie: When should we get married?
I hope you can produce those records, but I don't recall it that way. Of course, I'm old and feeble-minded. And while it was better than Grabiner's, I think it was still much less than MLB-to-MLB.
You have the info; post it.
FWIW, I think you have corrected me on that before.
you showed hte work I was refering to.
Also, any better explanation on what you did? I don't see anything other than some numbers you just wrote down.
As you say, park adjustment is "pretty trivial".
I'd like it if you showed your work, rather than just results, and declaring something "clearly incorrect."
It is *NOT* incorrect because "you say so" (regardless of how much other readers like to think so).
Let's see some names and walk rates in the minors and some names and walk rates in the majors, as well as what you would consider "maintaining a walk rate" from the minors to the majors.
I always say that the essence of my work [as a GM] relies fundamentally on two basic principles: objectivity and observation, or "the two obs", as I call them.
Does anyone know if Stats Inc. tracks pecker size? Closest thing I can find is "PB" and it isn't even available for a lot of guys. But if that's it, I think I know why Plaschke was so sorry to see LoDuca go.
It's a debate that doesn't even exist. I remember in Economics class, the professor was discussing a concept, I forget the name, but it was how the utility of two things are worth much more when used together, like, he said "A left shoe and a right shoe, or a pen and a paper, or...", then a voice from the back of the class (always the back of the class, of course) came "Run and Coke!"
Thanks for bringing me back Black Hawk!
Is it me or did Voros gain like 100 lbs or something?
I remember seeing his picture somewhere and he was quite thin.
We're not selling jeans here.
No, I think they're on the same page. Chris claimed that minor league numbers were good predictors of major league numbers, except for walks. MGL's numbers show that, on the contrary, walks have one of the best correlations from minors to majors. They're not concerned with what constitutes a "high" or "low" correlation but rather the relative strengths of the correlations.
From MGL's numbers, he looks right on this one--and his numbers agree with everything I've seen in the past. Moreover, his critique of Grabiner's approach is right on.
Whereas mgl seems to be arguing that the walk rate is the 2nd highest correlated stat, but just because its the 2nd highest correlation doesn't mean it's a HIGH correlation, right
... I don't think it's worth trying to figure it out.
Whether you want to debate the merits of whether .67 is high or not, the BB rate still remains the 2nd most correlated stat. If you can't talk about the BB rate, then any other rate that correlates below it must be dismissed, too. I know you are not saying to do that, but just trying to figure out some explanation.
Relative to each other, we can see how valuable the walk rate is for predicatability. And, this is entirely expected, simply because the range in K and walk rates are higher than the other rate stats to begin with.
Correlation is caused by many factors, one of which is the true range of the item you are looking at, and another is how much that metric you observe represents something real. The K and walk rates are subject to less influence than HR and hit rates (i.e., represents something more real), and they have a greater standard deviation to begin with. It should be no surprise at all they they correlate the highest in majors-to-majors.
That the minors-to-majors correlates in the same order should not be unexpected. So, going in, I expected that walk rate in minors to be the 2nd most meaningful. MGL shows results of his study that confirms that. As far as I'm concerned, I'd be more interested to read a full study that shows that they are not as well correlated, than a full study from MGL that says what we should have expected.
The overall minors-to-majors is a separate issue.
My economics class was run by a visiting professor from Sweden, who was talking about the same subject. One of his examples of things that don't go well together was steak and lobster, to which we all yelled out "Surf and Turf!"
He looked stunned for a second; we had to explain to him what Surf and Turf was.
He just shook his head and said "America is a ridiculous country."
So I only deserve credit for the pretzels!
(Sorry for nicking someone else's line ...)
The line may have originated here from something else, but at least for me, I first read it in a BP intro to one of their annuals.
The Beer or Tacos line comes from Dayn Perry in a Prospectus article from 2003:
A question that's sometimes posed goes something like this: "Should you run an organization with scouts or statistics?" My answer is the same it would be if someone asked me: "Beer or tacos?" Both, you fool.
He followed up on that theme here.
Carry on.
Is it possible that high walk rate minor leaguers are more prone to the usual selection bias and that could partly explain the confusion?
I'm pretty sure that this is something that Silver brought up anecdotally with respect to PECOTA as well. He made the comment that while in the old days BP would tout an all walks minor leaguer like Jackie Rexrode, that would be much less likely to happen in the futire because of research from PECOTA development.
If Rexrode type players don't make the MLB sample than their failures to match their high ml walk rates won't be picked up in these correlation studies.
Anecdotally, I do think extreme walkers suffer more decline than most MLE systems would predict.
As mler Kevin Youkilis walked 18.8% of his PAs, but so far it's just 13.7% in his short MLB career.
The numbers are ven more stark for Eckstein, 14% as a mler and 7% as a MLBer.
"Is it possible that high walk rate minor leaguers are more prone to the usual selection bias and that could partly explain the confusion?"
Yes - and Jackie Rexrode sprang to mind for me as well. (Though I still don't think he got a fair shot in the high minors.)
notsoThis is the main problem with MLEs: the only guys that make the study are those players who are thought to be good enough (for whatever reason) to be brought to, *and allowed to stay in*, the majors. How teams do that is the bias to resolve (or account for).
As for extreme players, I would guess this is the case for extreme players of any metric. Again, it would be a pleasant surprise if this is not the case. Just means you have to change your regression parameters, that's all, so that players, as they go further from the mean, regress at a different rate.
These rates (single, double, etc.) are correlated. They're what we'd call competing risks with all the risks summing to 1. In any given PA, once you walk, you're no longer "at risk" of singling. Or to put it another way, an increase in a player's walk rate has to be accompanied by a decrease in at least one other rate (hopefully the out rate) because the sum of the rates must be 1.
Correlations are bivariate relationships -- how related are minor and major walk rates controlling for nothing else. But with all this correlation going on at both the minor and major league level, you would be well advised to estimate multivariate models (preferably simultaneous and/or non-linear).
For example, it is possible (not all that likely) that once you control for minors hit rate, then minors walk rate is no longer a significant predictor of majors walk rate. More importantly (and universally true at at least some tiny level), those other rates will help give you a better prediction of the majors walk rate than just using minor walk rate alone.
To try to put that more in a "real-world hypothetical"....
Imagine you have a very talented minor-league player. He's substantially better than the typical AA pitcher. So much so that he can easily hit for a high batting average with decent power. Since he finds AA pitchers so relatively easy to hit, he doesn't walk much -- he doesn't swing at many bad pitches but he can handle lots of the strikes they throw him and they're bound to throw him at least one pitch he thinks he can handle in any PA. So maybe this guy hits 330/380/630 at AA.
He jumps to the majors. Not surprisingly, he can't dominate majors pitchers like he did minors pitchers, so his BA and probably power are going to take a big hit -- that is he's going to see fewer pitches he can handle. But maybe he retains his ability to not swing at many bad pitches. So maybe this guy in his rookie year hits something like 270/350/470. (At least you better hope that not all of that hit rate differential gets picked up in the out rate)
This looks like a guy who didn't walk much in the minors but walks in the majors. Instead, he was a guy who, at both levels, hit pitches he can handle with authority and laid off most pitches out of the zone.
This is the sort of question an earlier poster was trying to ask about K/BB rates and translatability of stats. For example, for a player like the above, in the minors he'd have a high average, good power, not many walks and probably not many Ks. Maybe for a player with that kind of profile, we should expect them in the majors to have a decent BA, more walks, probably more Ks, and presumably less power at least for a while. A quick search didn't turn up any ideal real-world examples, but young guys like Blalock and Tracy are kinda along this line.
Then maybe we have a player with similar overall stats, but he also Ks a lot. This suggests that although he's smacking the pitches he can handle, he is also swinging at lots of bad pitches (and maybe just wailing away generally). For this kind of player, we might expect a drop in BA and power, no growth or even decline in walks, and an explosion in Ks. Maybe this guy ends up with 250/300/420. Off the top of my head, this is Brandon Larson.
I made those up of course and I don't know that such players exist. Truth be told, in the dozen or so young players I looked at, there tended to be little difference between major and minor-league walk rates. But to the extent we want to find out about stuff like that, correlations ain't gonna do it.
The Great Debate
When I pulled those two up I wanted to do a "regular" player too just to see if the anecdote made sense. For obvious reason the first name that popped up was Beltran - he of every other clutch hit thread.
I didnt right down which was which but at one level the % was 10.2 and at the other it was 9.7.
He may not be a good example in comparison to Eck/Youk because he made the majors so young and is so tools oriented, but I think it's basically true.
If the median BB% is 10%, then payers with rates of 9-11% may face limited decreases in the majors whereas players with 14% like Eck or 18% like Youk may face 30-50% reductions.
Tango is right that that finding - if it was really true - could be handled by varying the regression rates, but how many producers of MLEs are going to do that?
A lot ot think about from Walt. I like the real worl examples too.
"All your statistics are going to tell you is what a guy has done. Somebody has got to make the decision on what the guy’s going to do."
The Jets still hate the Sharks, and the Sharks still hate the Jets.
I agree wrt correlations, but I don't know as much about that as Walt.
As for real world examples, I asked that in 104.
This is why Voros does something cool, and for which I'm forever grateful, as I do something similar (but not exactly the same):
bb rate = BB / PA
k rate = K / (PA - BB)
hr rate = hr / (PA - BB - K)
etc, etc
This solves Walt's problem. But, it does so at a "leap of faith" level. While we can come up with a good argument for doing it this way, we haven't really proven that it should be this way. It allows us to handle Walt's issue in a prima facie logical way.
What's there to get? The question of "what the guy's going to do" is what divides us. Neither side is perfect and both can be used constructively together, methinks. I just don't see much value in DrIPS, UZR, or MLEs and I would surmise that most baseball executives don't either.
Not the people here, other than J1F, me, and some other Yankee fans...
Back in the old days, you went into your local bank, talked to the loan officer, they took some information, and "manually underwrote" your loan. While obviously they took notice of your income and expenses they'd also talk to you and assess whether they thought you were a good risk. And in the bad old days, the color of your skin or your gender played a big role too.
Then came credit scoring. This is where large credit rating agencies gathered tons of bill paying information and built huge models predicting future payment behavior based on past payment behavior. This gave them weights which could then be applied to any individual's credit history to produce a score to characterize their credit-worthiness. This gave banks a tremendous tool in assessing the riskiness of a loan, but of course they still relied on other information both objective (income/expenses) and personal.
Nowadays, we are mostly in a world of automated underwriting. There are several complex, proprietary models out there which take credit scores and about a dozen other pieces of information about the borrower (not including race or gender of course as that's illegal) and the potential loan characteristics and will (internally) calculate a default probability and, based on that, either automatically approve the loan or flag it for further investigation (probably to avoid lawsuits, the models never actually "reject" anyone).
The banks love this of course as most of their mortgages can be underwritten in a matter of a couple of minutes (loan officers are now mostly data collectors), with relatively few loans requiring manual underwriting. Manual underwriting takes a lot more time and therefore is more costly and the automated underwriting is more accurate.
Also, banks have internal models which help them to price the loan. Based on your characteristics, they'll project the likelihood you'll default on the loan and will charge you a corresponding interest rate.
Now, servicing the loan and loss mitigation of delinquent loans are becoming increasingly automated. There is now software (often script- and/or letter-generating) to flag problematic payment histories. Not a lot is known about these yet (all proprietary naturally), but as an example: there are borrowers who are almost always late with their payment and the software knows it's no big deal when they're late again; then there are folks who are always on time, so if they're late, the software throws up a flag and possibly recommends a particular script for the servicing folks to use when they call you.
Of course back in the old days when you were dealing with your small neighborhood bank and they were servicing all their own loans, they'd probably do the same thing ("Oh, that's Mr. Davis, he's always a couple weeks late"). But those banks don't exist anymore and lots sell their servicing rights and those that don't are running large, centralized servicing centers.
On one of Tango's comments, I want the scout to keep giving me a written scouting report too. This too can be statistically analyzed. This is part of what data mining for customer relations management is all about. There's at least one company working on similar stuff with the transcripts from mortgage service center calls (when they call you to ask you why you're late on your loan payment, do not say the word "motorcycle" during the conversation or they may foreclose on your ass immediately). Anyway there is software that can go through those scouting reports to find the occurrence of certain words, then see whether that's related to better or worse than expected performance.
As an earlier note said, baseball is just scratching the surface of what can be done here. The idea that so many other industries are moving as quickly as possible towards automated information and decision-making but baseball won't seems untenable.
Automation of an occupation is always tough. Generally, automation reduces the need for that occupation, turns most of the folks remaining in that occupation into the equivalent of 'data collectors', usually at much lower pay (though this may not be possible for scouts :-), while also creating a new set of higher-paying, higher-skilled jobs for the folks who design and run those new systems.
And, like many scouts, the people who are becoming de-skilled and/or tossed out of jobs are mighty unhappy about it.
Those with a better grasp of the history of scouting and the minors should chime in, but I suspect this has been going on in scouting in more subtle ways for a long time.
Imagine what it was like back in the early days of farm systems and before. Back before TV, videotape, Baseball America, etc. No doubt a good scout was the guy who had great networks among the high school and American Legion coaches (or whatever organized baseball there was in those days). I suspect it wasn't so much about spotting talent per se as it was that the good scouts were the ones who heard about that kid in the middle of nowhere, Oklahoma while the bad scouts didn't.
As the farm systems came along, this was formalized -- teams hired full-time scouts, cross-checkers, etc. and formed their own relationships with local coaches. At this point, the good scout starts to move towards talent evaluator because everybody knows there's a kid in the middle of Oklahoma striking everybody out.
Then you get scouting services and even Baseball America. At this point, all teams have access to about the same information. I'm not sure teams needed local networks at all anymore. Scouts are purely talent evaluators now. It also becomes more plausible for a team to either reduce their scouting staff or limit their drafting options to, say, college players. That is, if you've got Baseball America and other sources, what is the marginal value of a scout. Similarly, if you've got Baseball America and other sources, you'll definitely hear about the high school studs even if you don't scout high school players. At the very least, these sources of information allow teams to easily filter out the players they're not interested so they can concentrate fewer scouting resources into scouting fewer players.
But of course there's a symbiotic relationship between BA and scouts. Automation of that screening process is also a threat to BA and similar services/publications (perhaps part of why they seem opposed to it).
At some point in the future, all these individual team stat guys may get centralized into a league stat office, with all teams sharing the info. This will require fewer stat guys and you will hear unemployed stat guys complaining about how no centralized stat analysis could possibly compare to the unique statistical insight that they bring to the table.
Or maybe not. Given that organizational scouting costs probably are less than 1 year of Royce Clayton, there are diminishing returns to further reducing scouting/development costs via automation.
OBA rate = (H+BB) / PA
BB rate = BB / (H+BB)
SO rate = SO / (PA-H-BB-SO)
See, in this case, you are taking
node1A = H+BB
node1B = PA-H-BB
node1A_1 = BB
node1A_2 = H
node1B_1 = SO
node1B_2 = PA-H-BB-SO
Get it? You are trying to keep things as binary, and independent of each other.
As I said in the first post, just because I split it this seemingly logical way, doesn't make this a logical and correct process. But, I think it gets me somewhere.
Tango is talking about complimentary goods. Thats about what I can add to the converation.
Thanks you for your time.
He doesn't seem to understand what metric based scout is to me. It would be one thing if he said "I see what you are getting at, but I don't believe it." Bane comes across as understanding it in basic principle, and perhaps he's truly interested in learning more. He's at least expressed a desire to, whether it's a smokescreen or not. Instead Huges blindly dismisses it. And his laughing off of some stuff indicates he has zero desire to try to understand any of it. He's a stubborn good ol' boy (aren't they all?), and this discussion hasn't changed a thing, and could have made it worse in his case.
And Baseball America should feel threatened... one day, someone will create a network of baseball fans that will evaluate every single player from HS on up. I see imdb.com as a model. I prefer that to Roger Ebert. (With the appropriate controls for ballot-stuffing).
I'm reading through this thread now, so I apoligize for any repeating of posts, but this is my problem with Dallas McPherson.
The catch here, of course, is how good are non-professional scouts?
Still, I think we're seeing a teeny bit of movement in this direction - there's the calleaguers.com site, film clip of top ammy prospects located here and there with appropriate commentary, and so on.
Scouts now are pretty much talent evaluators. Back in the days before the draft, they were salesman. Its not enough to find a kid who can play, you also have to convince him to take your offer, expecially before the Yankees find out about him.
Tango, how much difference do those rate calculations really make? Lets say were calculating an MLE or applying an aging pattern. You take K/(PA-W) and someone else does K/PA. Once you recombine them, aren't the differences going to be trivial? Maybe one gets a k total of 120.7 and another 121.4?
OBA rate = (H+BB) / PA
BB rate = BB / (H+BB)
SO rate = SO / (PA-H-BB-SO)
See, in this case, you are taking
node1A = H+BB
node1B = PA-H-BB
node1A_1 = BB
node1A_2 = H
node1B_1 = SO
node1B_2 = PA-H-BB-SO
Get it? You are trying to keep things as binary, and independent of each other.
As I said in the first post, just because I split it this seemingly logical way, doesn't make this a logical and correct process. But, I think it gets me somewhere
Ok, this discussion is extremely interesting to me, so for me to understand I need to know what you are talking about. What does "node1B..." mean? I apoligize for my awful math skills?
OBA rate = (H+BB) / PA
BB rate = BB / (H+BB)
SO rate = SO / (PA-H-BB-SO)
See, in this case, you are taking
node1A = H+BB
node1B = PA-H-BB
node1A_1 = BB
node1A_2 = H
node1B_1 = SO
node1B_2 = PA-H-BB-SO
Get it? You are trying to keep things as binary, and independent of each other.
As I said in the first post, just because I split it this seemingly logical way, doesn't make this a logical and correct process. But, I think it gets me somewhere
Ok, this discussion is extremely interesting to me, so for me to understand I need to know what you are talking about. What does "node1B..." mean? I apoligize for my awful math skills.
Not when you consider the pretty heavy discounts some factors receive in the transition.
Tango, others - how do you treat HBP? I remove them before I do walks (though here we are talking about trivial gains).
Still, I think we're seeing a teeny bit of movement in this direction - there's the calleaguers.com site, film clip of top ammy prospects located here and there with appropriate commentary, and so on.
Prior to the draft, mlb.com provides scouting film on hundreds of amateur prospects. It's doable.
Tango, others - how do you treat HBP? I remove them before I do walks (though here we are talking about trivial gains).
I pretty much disregard them unless they are to such an extreme level that it leads me to believe that the player in question has this as a "skill", possibly due to how he sets up in the box or something else. Also, sometimes they give me injury worries.
Another question I would aski is that of the IBB issue, I pretty much disregard them, what does everyone else do?
***
The BB and K won't make much of a difference in the regression. But, it makes a world of difference let's say to the triple. See, for triples, the equation is:
3B rate = 3b / (2b+3b)
I treat a triple as a function of doubles only, and not of PA (which is how MGL presented it in his fine post). The correlation between 3B rate and SB to singles+walks is very high that this tells me that a triple is essentially a 2B+SB.
***
As for the "node" talk, I usually try to avoid using gibberish like that, but I wanted to draw some "binary trees" with words. What I'm trying to say is take all events (HBP, BB, k ,out, hr, single, double, triple), and group them in a logical fashion. And then every grouping is broken down to a subgroup, until each group (the node) is left with only one event. Then, you compare that event to the sum of all events at that node level. Maybe someone can explain it better. I'll try to draw it:
-HBP
-BB,SO,HR,3B,2b,1b,nonKout
--BB
--SO,HR,3B,2b,1b,nonKout
---SO
---HR,3B,2b,1b,nonKout
----HR
----3B,2b,1b,nonKout
-----nonKout
-----3B,2b,1b
------1B
------3B,2b
-------3B
-------2B
So, it's either a HBP or not.
If not, then it's either a walk or not.
If not, then it's either a K or not.
etc...
Ordering matters, especially at the low end.
We instead could have had
-SO,nonKout
--SO
--nonKout
-HBP,BB,HR,3B,2b,1b
--HBP
--BB,HR,3B,2b,1b
---BB
---HR,3B,2b,1b
etc, etc
*Pecker-high fastball
-HBP
-BB,SO,HR,3B,2b,1b,nonKout
--BB
--SO,HR,3B,2b,1b,nonKout
---SO
---HR,3B,2b,1b,nonKout
----HR
----3B,2b,1b,nonKout
-----nonKout
-----3B,2b,1b
------1B
------3B,2b
-------3B
-------2B
I understand now, thanks for reviving some of my high school mathematics knowledge. Is there a specific reason that you order the way you go down the binary tree or is this a preference? (Sorry if I'm asking you to repeat anything)Also, what I mean is if there is any linkable research on this.
If you are a prospect you shouldn't need to try to draw walks in HS unless they dont give you anything within a foot of the zone or you are facing a fellow top prospect, a Jason Neighborgall type (Top of the line stuff, poor command) who is difficult for even the best 18 yos to hit but will walk guys.
Those with a better grasp of the history of scouting and the minors should chime in, but I suspect this has been going on in scouting in more subtle ways for a long time.
The last great effort to eliminate scouts was the creation of the Central Bureau in the late 1960s and early 1970s. Teams subscribed and allowed a pooled resource of scouts to send them printouts and scouting reports on amatuer players. That was a great cost savings and many scouts were turfed as a result. But teams found the system constricting and by the early 1980s, most teams were hiring large scouting staffs and doing things for themselves. The Bureau survives today, but mainly as an afterthought.
Early scouts were ivory hunters - they went out in the wilds and looked for arms in the barns. In the 1940s, larger scouting staffs came into vogue as competition heated up for the talent. Then the draft was instituted and teams began to cut back on scouts and scouting expenses.
Scouts are the cockroaches - utterly survivable. Even with the introduction of fancy new ways of doing things, putting a pair of eyes on a player and rendering judgement will survive to some extent - that to me is the crux. Scouts that do real work (as they define it) cringe when some looks at some numbers on a spreadsheet and says whether a kid can play or not.
As far as I'm concerned, I'd be more interested to read a full study that shows that they are not as well correlated, than a full study from MGL that says what we should have expected.
Yes, of course this is true. It would be nice to look at both sets of data side by side, but absent that, I think the burden is on the side whose hypothesis is most implausible. I had run the numbers a long time ago, but after I read Chris' comment, honestly, I would have been shocked if he were correct.
My statement "clearly incorrect" is probably too harsh. My sample was only 98 batters. I don't know the confidence interval on the "r" off the top of my head. BTW, those are very high correlations (anything in the .600-.800 range)for binomials (rates) of average sample size 450. Remember that these correlations are a direct function of the sample size (number of PA's) of each of the data pairs. For example, you cannot have a correlation of 1 unless you have an infinite sample size (PA's) for each data point, even if the underlying true means are exactly the same.
In any case, I am willing to see where and why our conclusions are so markedly different (which they are). Here is my data. As I stated, I regressed 2000 MLE's on 2001 major stats, 2001 MLE's on 2002 major stats, up to 2003/2004. To qualify for a data pair, a player must have had at least 300 PA's in minors and majors. BB's were actually non-int. BB's plus HBP's. PA's were AB's plus non-int BB's plus HBP's plus SF's. The data is BB's per PA normalized to the league averager in that year. Both minor and major stats are park adjusted, although as I also stated, park adjustments for BB's are very small and insignificant. Major stats are also opponent adjusted, which is not that big deal either. Minor stats are not. Since, as I said, the BB rates in the following data are normlaized to league average, a value of 1.00 means league average for that year. And of course, the minor league BB rates are MLE's and not raw minor league stats. The year represents the minor league year. The major league year is the next year. The first number is the normalized minor league BB rate (MLE) and the next number is the nexct year's major league rate.
"Abernathy, Brent","00",.68,.93
"Anderson, Marlon","00",.81,.64
"Brown, Dee","00",.65,.52
"Darr, Mike","00",.98,1.16
"Eckstein, David","00",1.17,1.11
"Estrada, Johnny","00",.21,.47
"Hillenbrand, Shea","00",.28,.38
"Huff, Aubrey","00",1.04,.54
"LaRue, Jason","00",.61,.87
"Mientkiewicz, Doug","00",.99,1.24
"Nunez, Abraham","00",1.21,.93
"Pierre, Juan","00",.57,.83
"Richard, Chris","00",1.06,1.04
"Rivas, Luis","00",.78,.82
"Rollins, Jimmy","00",.81,.76
"Sierra, Ruben","00",.96,.57
"Soriano, Alfonso","00",.52,.59
"Tyner, Jason","00",.8,.5
"Wilson, Jack","00",.78,.42
"Young, Mike","00",.79,.78
"Dunn, Adam","01",1.4,1.94
"Ellis, Mark","01",1.03,1.35
"Gil, Geronimo","01",.49,.54
"Hall, Toby","01",.79,.46
"Hinske, Eric","01",1.11,1.28
"Izturis, Cesar","01",.37,.29
"Johnson, Nick","01",1.83,1.37
"Kielty, Bobby","01",1.33,1.67
"Lopez, Felipe","01",.86,.89
"McCracken, Quinton","01",.61,.92
"Mench, Kevin","01",.63,1.08
"Mohr, Dustan","01",.89,.78
"Patterson, Corey","01",.78,.43
"Pena, Carlos","01",1.54,1.15
"Roberts, Dave","01",.87,1.18
"Rowand, Aaron","01",.64,.57
"Sanchez, Alex","01",.68,.85
"Sandberg, Jared","01",1.15,1.03
"Truby, Chris","01",.73,.32
"Vazquez, Ramon","01",1.38,.99
"Wells, Vernon","01",.7,.53
"Wilson, Tom","01",1.55,1.24
"Bard, Josh","02",.53,.71
"Berroa, Angel","02",.72,.81
"Bigbie, Larry","02",.91,.99
"Blake, Casey","02",.97,.88
"Blalock, Hank","02",.74,.84
"Broussard, Ben","02",1.57,.95
"Byrd, Marlon","02",.92,.95
"Calloway, Ron","02",.97,.61
"Chavez, Endy","02",.72,.57
"Cintron, Alex","02",.33,.68
"Crawford, Carl","02",.57,.41
"Crede, Joe","02",.74,.75
"Crisp, Covelli","02",.7,.61
"Ensberg, Morgan","02",1.46,1.29
"Everett, Adam","02",.72,.78
"Gerut, Jody","02",1.03,.85
"Ginter, Keith","02",1.21,1.43
"Hafner, Travis","02",1.65,1.08
"Hart, Bo","02",.94,.61
"Harvey, Ken","02",.82,.69
"Hudson, Orlando","02",.81,.96
"Kata, Matt","02",.52,.86
"Monroe, Craig","02",.89,.66
"Morris, Warren","02",.63,.75
"Munson, Eric","02",1.43,1.13
"Nady, Xavier","02",.6,.79
"Olivo, Miguel","02",.79,.74
"Phillips, Brandon","02",.62,.49
"Phillips, Jason","02",.71,1.12
"Podsednik, Scott","02",.94,1.01
"Roberts, Brian","02",1.17,1.1
"Wigginton, Ty","02",1,.92
"Bay, Jason","03",1.52,1.17
"Cabrera, Miguel","03",.84,1.08
"Castillo, Jose","03",.66,.5
"Crosby, Bobby","03",1.2,1.23
"DeJesus, David","03",1.44,1.12
"Estrada, Johnny","03",1,.9
"Figgins, Chone","03",.95,.86
"Gonzalez, Luis","03",.89,.6
"Greene, Khalil","03",.74,.98
"Hairston, Scott","03",.8,.64
"Hall, Bill","03",.69,.52
"Hall, Billy","03",.69,.52
"Holliday, Matt","03",.71,.95
"Lamb, Mike","03",1.28,.97
"LaRoche, Adam","03",1.02,.82
"Martinez, Victor","03",1.04,1.03
"Miles, Aaron","03",.66,.61
"Morneau, Justin","03",.95,.79
"Nix, Laynce","03",.76,.58
"Redman, Tike","03",.86,.45
"Rios, Alexis","03",.66,.78
"Rivera, Juan","03",.77,.71
"Sledge, Terrmel","03",1.07,.9
"Tracy, Chad","03",.75,.85
To me, the numbers look exactly like I would have expected (high corr). As I said, I would have been shocked if there were little or no correlation. Since BB, HR, and K's are the 3 components which have by far the greatest (bivariate) correlation, if BB had little or no correlation, you would not expect there to be a high overall (batting) correlation from minors to majors, which is the thesis that everyone is trumpeting. IOW, you can't say "Minor league perforance predicts major league performance either exactly (which isn't true) or almost as well as major league performance," and at the same time say "There is little or no major league predictive value for minro league BB rates." Those two things are nearly mutually exclusive.
And yes, there are potentially enormous and unanswered (unexplored) questions aboyt selection bias when it comes to MLE's in general. The big problem we have, which has already been mentioned in thise thread, is that MLE coefficients may only apply to players whom the scouts deem worthy of being promoted. When we use them for all minor league players, we may be making big mistakes. Of course, the mistake may be just that the scout can narrow the confidence intervals of our MLE projections based on the minro lkeague stats and the fault may not lie in the MLE translations themselves. Or, it may be that there are lots of AAAA players for whom MLE's do not apply and the scouts can identify these players such that they don't get promoted or get much playing time in the majors if they do.
And yes, I don't have any reason to care whether I "reach" anyone or not, which is why I don't much care if people like Cameron view me as rude or arrogant, and that somehow this is counterproductive. Again, I say, counterproductive to what? This is not a social revolution that anyone is trying to achive, AFAIK. At least not for me. If I did or it was, I might have a different approach as I am smart enough to recognize that you catch more flies with honey than with vinegar...
- Was the last pitch a ball or strike?
- Given that it was a strike, did he make contact or not?
- Given that he made contact, were fielders involved?
- Given that fielders were involved, did he get a hit or not?
- Given that he got a hit, was it for extra bases or not?
- Given that it went from extra bases, how far did he run?
So, I've got no research backing this process. But, at least it passes the smell test. I've also got different matchings, depending on my mood that day.
And most NHL teams widely ignore the CSB rankings. Most NHL teams employ a scouting staff of their own, from anywhere between 10-20 scouts per team. The MLB does the same thing, but it is not publicized.
Baseball covers as big a territory - North America, plus Central and South America, Europe, Asia.
And Baseball America should feel threatened... one day, someone will create a network of baseball fans that will evaluate every single player from HS on up. I see imdb.com as a model. I prefer that to Roger Ebert. (With the appropriate controls for ballot-stuffing).
And how is that controlled and what good would it be? The same prejudices, probably less informed, will then shape the perception of players? As someone who has watched many HS games, I can spot a great player at that game who might be a terrible player in a better conference or district against better competition. Now I am in the same spot as the scout, with less background and ability to make those mental comaprisons between players that do have ability and those that don't but look like they do. If your effort is to eliminate the scout entirely, this is one crazy way to do it.
Not that 13.7% sucks.
The numbers are ven more stark for Eckstein, 14% as a mler and 7% as a MLBer.
C'mon, it's the Angels and Mickey Hatcher!
The weird one is Scutaro. He had 374 BB in about 3100 minor league AB. He had 13 BB in 111 AB with the Mets. Then he goes to an organization that loves guys who walk ... and he starts hacking at everything. I'm gonna blame Macha for now.
But in contrast to these guys you've also got Bellhorn: 6.3 AB per BB in the minors; 6.0 AB per BB in the majors. Or Hee Seop Choi (5.6 AB per BB) who walks a bit more in the majors than the minors. Or Bobby Crosby (9.4 in the majors, 9.6 in the minors).
In my quickie look in the earlier post, I went to ESPN to get all the players 25 and younger with at least 400 PA last year. I had a hard time finding players whose ML walk rates differ substantially from their minors. (OK, I just did the quickie OBP-BA and eyeballed things) Even the two I chose (Blalock and Tracy) weren't really that diffeent. Blalock's ML OBP-PA is 68 but it was 56 at AAA, 67 at A and rookie, but was 86 at AA. Tracy was 58 in the majors but 52, 45, and 56 at AAA, AA, A. In fact one might wonder how either of those qualify at all.
Anyway, regression to the mean is of course a perfectly good guess for guys like Youkilis, etc. Their high minor walk rates weren't reflective of their true talent, they just had a little extra luck. At the very least, we know that we'll find some examples like Youkilis and Eckstein given random variation.
My concern about medium average/high walk minor-leaguers is that their BA won't hold up. Choi hit 265 in AAA so it's no surprise that he hits 230 in the majors (at least for now). Even with good walks and power, that makes him Rob Deer, not Jim Thome (319 at AAA). I know there were some injury issues with Choi at times.
But Bellhorn is pretty similar -- 280 hitter at AAA, 240 in the majors. His great walk rate still only pulls his OBP up to 350. With his power, that's still good for a 2B. But a true 240 hitter could easily hit 200 in any given season and now you're sneaking up on Tony Batista territory.
Those are I believe fairly typical declines in BA when moving up from the minors, so I don't know that we need speical MLEs for these kinds of guys. It's more that the "casual stathead" needs to be careful in seeing a AAA OBP of 409 and OPS of 928 (Bellhorn's) and not automatically think this guy's gonna be an on-base machine. He's gonna lose 30-40 points off his average and so 40-60 points off his SLG ... and now he's a guy who damn well better have a good walk rate.
The next poster boy for one side or the other of this debate is going to be Nick Swisher. His AAA numbers look very similar to Bellhorn's (in fewer PAs and generally younger age). So he has more potential than Bellhorn, but he's likely to hit 230-250 for at least his first couple years in the majors. I think ZIPS has him at something like 235/340/440 -- which is not awesome for a RF. But it won't mean guys who walk have a hard time adjusting to the majors, it'll mean that guys who hit 280 in AAA hit 240 in the majors (which we already know).
This brings us to the big question of whether the skill of plate discipline can be taught at the major-league level. If it can, that suggests the high-average, low-walk minor-leaguer is a better bet than the medium-average, high-walk. If the first guy hits 320 at AAA, he'll still probably hit 270-280 in the majors and if you can teach him to walk, that guy could post a 380-400 OBP. (Obviously the high-average, high-walk guys are #1 on the list of call-ups).
Saying "he doesn't get it", then guessing at the evidence, is no way to go through life.
I didn't really like the fact that he hates reading that a high school hitter has a good eye, but, well, no one's perfect ...
He hates reading that as the FIRST thing. If he has no tools, but he has a good eye, that won't matter beyond HS unless he's planning to become an umpire. If you're going to pay a signing bonus to someone, you'd like it to be someone other than Lupus.
At least, that was my interpretation of it.
Isn't the height of arrogance an attitude that says "it is so because I saw it with my own eyes" as opposed to an attitude that says "it is so because I have compiled quite a bit of evidence"?
Arrogance is more like this: "I worked damn hard to reach the conclusion I did. That conclusion has served me well. I won't waste my time to reevaluate my conclusion in light of what you're telling me."
And that attitude can be seen on both sides of the debate.
Seeing with one's own eyes is quite a bit of evidence and should not be summarily dismissed because it doesn't agree with the numbers. The numbers can, and often do, have their own biases; witness the discussions on minor/major correlations in this very thread!
Firm belief in statistical conclusions without accounting for biases is like using the car to drive into a tree. ("My analysis says there's no tree there.") And with each stat guy driving into a tree comes another reason for someone not to trust the next stat guy.
After all, a scout could see the tree with his own eyes.
I just want to say how happy I am that Tango's back.
I'll second that. What drew me to this thread in the first place was I'd seen on the sidebar that Tango had posted here.
He's a stubborn good ol' boy (aren't they all?), and this discussion hasn't changed a thing, and could have made it worse in his case.
I don't think he's stubborn (or a good ol' boy)as much as he's responding to the notion that scouts don't add significant value. Nobody in that forum is expressing that notion, though it is widely held elsewhere. IOW, I think Hughes is attacking a strawman, nothing more.
But I agree, he doesn't appear to have won anyone over with his comments. At least, not on this site... but I don't think that was one of his goals. Pity.
Tango has written many smart things, but this may be the smartest. That sounds like an idea with terrific potential.
But Bellhorn is pretty similar -- 280 hitter at AAA, 240 in the majors. His great walk rate still only pulls his OBP up to 350. With his power, that's still good for a 2B. But a true 240 hitter could easily hit 200 in any given season and now you're sneaking up on Tony Batista territory.
Those are I believe fairly typical declines in BA when moving up from the minors, so I don't know that we need speical MLEs for these kinds of guys. It's more that the "casual stathead" needs to be careful in seeing a AAA OBP of 409 and OPS of 928 (Bellhorn's) and not automatically think this guy's gonna be an on-base machine. He's gonna lose 30-40 points off his average and so 40-60 points off his SLG ... and now he's a guy who damn well better have a good walk rate.
The next poster boy for one side or the other of this debate is going to be Nick Swisher. His AAA numbers look very similar to Bellhorn's (in fewer PAs and generally younger age). So he has more potential than Bellhorn, but he's likely to hit 230-250 for at least his first couple years in the majors. I think ZIPS has him at something like 235/340/440 -- which is not awesome for a RF. But it won't mean guys who walk have a hard time adjusting to the majors, it'll mean that guys who hit 280 in AAA hit 240 in the majors (which we already know).
This brings us to the big question of whether the skill of plate discipline can be taught at the major-league level. If it can, that suggests the high-average, low-walk minor-leaguer is a better bet than the medium-average, high-walk. If the first guy hits 320 at AAA, he'll still probably hit 270-280 in the majors and if you can teach him to walk, that guy could post a 380-400 OBP. (Obviously the high-average, high-walk guys are #1 on the list of call-ups).
Ok, this is where my admittedly qualitative research comes in. The issue, to me, is not that guys with extreme walk rates and less than stellar BA tend to have huge dropoffs, but rather that the guys who have these extreme BB rates and in addition K a lot have these dropoffs, on the whole. What I have noticed is that in general BA from minors to majors seems to correlate somewhat accurately, within .015 in either direction, and the cases where it does not, where there are huge drop offs for instance, tend to be with guys whose K numbers are out of whack. Of course, I haven't done the footwork that many of you on this thread have so I accept that it is doubtful, but it's something that I've noticed and would love to have verified or disproven.
Also, as far as Swisher, something to consider with him is that he played the entire season with some severe hand injury that they just found out about and corrected after the season.
There are a number of ways and orders to do it, but the simplest and perfectly reasonable way is to do the translations (context adjustments) first, and then worry about the expected regression in comparing it to a subsequent year's major league stats or projected major league stats.
For example, in 2003, Youkilis' raw minor league BB rate converted (adjusted) into a normalized MLE was 1.87. Since that is so high and is based on only 549 PA's, we expect to see quite a bit of regression the next year whether he plays inthe majors or minors. In fact, we can estimate the amount of regression from the correlation numbers I posted above. We are looking at around a 25% regression towards some population mean (league average for all major league players, rookies?). If we use 1.00 for our regression point, we get a projected normaized BB rate of 1.65 (the 1.87 got regressed 25% towards 1.00). If we look at his 02 MLE, we see that he was 1.44 in 106 PA's, so our projection goes down a little to around 1.60. He did in fact have a 1.69 BB rate in 250 PA's in the majors in 2004!
Let's look at Eckstein:
In 2000, his MLE BB rate was 1.17 in 489 PA's. Regressing that, we get a projection, or an expected BB rate in the majors of 1.13. Remarkeably, his BB rate in the majors was almost exactly 1.13 until 04, when it was .94, making his major league career BB rate around 1.09. If we look at Eckstein's 99 MLE's, we see a 1.44 rate. Combine (and weight) that with his 01 MLE's, and we get around 1.29. Regress that around 15% and we get a projection of 1.24, still not too far off from his major league career so far of 1.09.
I agree with Tango that you have to be crazy to say (and mean literally) that minor league stats are exactly as good as major league stats in predicting major league stats. While they may be almost (but not quite) true if you had perfect park and league adjustments, it can't possibly be anywhere close to being true, because it so difficult to do park and league adjustments on the minor league data. That alone makes MLE's quite a bit unreliable. Even if we were somehow able to do perfect adjustments with minor league data, as Tango stated, what would you rather have to project performance, past perforamcne against almost the exact same pool of opponents, or past performance against a completely different, but related, pool of opponents? It is not even close! The fact that minor to major correlations, even with the biased sampling and other problems, is so good, shocks me sometimes...
Having fans, predominantly fans interested in baseball research, evaluate major-league players is a lot different from having fans evaluate HS players. With MLB (or films, with IMDB), where you have multitudes who have seen the same performance over time, it's easier to weed out any person or group with an agenda. At the HS level, this would be damn near impossible to do. While with IMDB there isn't much of a motive for someone to tout
In The Days Of Trafalgar, there are motives for someone to tout falsely a HS prospect.
You'd think the Sox winning the Series would have me generally less pessimistic. Not so.
What would really be cool is to track guys that scouts say don't deserve to be called up, but actually end up being called up. And see how they did compared to "highly recommended" or "grudgingly recommended" players. That is, track the sentiment of the scout, and see if that adds anything to the regression. Does the guy with the "5-star recommendation with cherry on top" do better than a player with similar projection, but without the recommendation? Right there, you'll get a good chance to figure out the extent of the selection bias.
If your effort is to eliminate the scout entirely, this is one crazy way to do it.
No, no reason to eliminate scouts. Everyone has their point of view. I know not to trust imdb.com ratings, if it's a Star Trek movie, because maybe there are more techies that go to imdb.com than not.
There are plenty of biases to control in such a network, of course, especially with millions of dollars at stake.
But, if you have your scout as your baseline factor, you can figure out how honest someone is by comparing his rankings to the baseline. Kinda like an "ebay feedback" score. People go out of their way to make sure their feedback is 100%.
We simply did not accept opinion as fact, even if it was "expert" opinion. Yes, expert opinion is more likely to be true than, say, my opinion but it is still pattern-based and quite possibly wrong.
But, when you're young and challenge some thrity year employee to explain himself (and it was always a him) in his field of expertise, you're going to get called arrogant. We asked for facts and logical inferences and we got "because I know what I know" as an answer.
Yes, it would be nice to have a Richard Feynman type to carry the banner for the stat-heads. But I disagree strongly with Cameron's assertion that Neyer, et al are some sort of PR disaster. If people get offended that you ask them to actually bring some data and logic to the discussion, then they can take their candy-ass whining down the road. If I'm running a multi-million dollar business I think I can demand a higher level of thinking.
MGL, I have a question for you that is pretty much uncorrelated to this discussion, but hopefully you can answer it anyway. Super Linear Weights differentiates baserunning from hitting, assigning separate scores for each skill to a player. Is baserunning merely a function of SB and CS, or does it include more data? If more, triples data only, or do you take the same zone data you use for UZR, calculate an "average bases" for a particular type of hit, and sum the difference between actual bases taken and average bases taken for part of the baserunning LW? Does it even make sense to do that? Thanks in advance.
If someone came along and told me they had data that proved Africa does not exist, I would be inclined not to believe it.
I've never been to Africa. I've never seen it. It's not my field of expertise at all. I have no tangible evidence whatsoever that it does exist, and would be hard-pressed to say how I'd ever reached that conclusion. If he'd asked me for facts to support my claim that it does exist, I'd tell him, "I know what I know."
And, apparently, he'd call me arrogant. Whatever.
I always remind...
These measures are sample measures and we don't know how much luck/skill is inherent in them, unitl we do some more work.
So we want to be careful in defining, for example, baserunning lwts, as a "player's baserunning skill." It is a sample of his skill, but there may be little or no range of skill in the first place (like DIPS), which means that most variation we see in those samples is luck, and/or the sample size may be so small, that regardless of the range of skill in the population, most of the variation is luck.
Also rememeber that from a practical perspective "no skill" can mean truly that there is no skill in the population of human beings, or it can mean that there is some (maybe even a lot) of skill in the population of human beings, but that the range of skill among mjaor league baseball players is small or none at all (again, like DIPS)...
Yes, it would be difficult to isolate this (how often a player's double or single is because of his speed as compared to an average player) from the data, although I suppose it would be possible based on the location of the hit and the fielder's arm. I don't do that, but someday I might. As you said, it is important (well, at least useful) because speed has a very distinct aging curve (basically looks like a water slide without the stairs goig up)...
Don't you know people who have gone to Africa? Haven't you seen video of things taking place in Africa? Read history books? Looked at maps drawn by people who have seen Africa?
The only alternative to there being an Africa would be a vast conspiracy of people past and present all singularly determined to convince YOU that there's an extra continent.
On the other hand, a scout might simply be going on a feeling and something someone told him once.
All of this being old evidence, compiled in the days of my youth by someone else and purported to be African in nature. But does any of that pertain to the present day? I think so, but it's certainly possible that this person's new information is correct.
Nonetheless, I would not be inclined to believe it. Perhaps when it is confirmed by people I know and/or trust, then I would believe. But until then I have neither the time nor the ability to investigate it on my own, and have no reason to believe what that person is telling me.
And I'm sure he'd call me arrogant.
OTOH, some people who have spent some time being the "expert" really don't like anyone challenging their opinions. Human nature - but I think it's what we're seeing from the scouts in this discussion.
Or that it did exist once, but no longer does.
Or that the people who had visited Africa had actually been somewhere else, much as Columbus wasn't in India in 1492.
And so on...
Who wants flies?
Who wants flies?
You catch even more flies with crap.
Walt's post also points out that batting average is starting to get a short shrift from conventional sabermetric analysis. Most of a player's offensive value is going to come from how often a player gets a base hit. Over the last couple of years the two things I've wanted to know above all else when dealing with prospects have been:
- What will his likely defensive value be?
- For what kind of a BA will he hit?
All other tools/skills are pretty much secondary. A guy who can hit .400 and play a good shortstop is worth having even if he can do little else. A first baseman who's going to hit .220 and field like Mo Vaughn after an all-night Slushies binge had best be bringing huge walks and monster power to the table. Mark Belhorn with half the BB/PA would struggle to stick as a utility infielder and would always be one .210 away from being out of the game entirely. This is another reason why a minor-league batter's K/PA can't be dismissed. If someone's going to fan 160 times in 550 at bats, then even a .350 BABIP (including HRs) is going to leave him with a sub-.250 BA and with that average even a good walk rate (say 75) is going to leave him with an OBA under .340. Walt, I believe, has pointed this out a few times regarding Dunn: His BBs and HRs will make him a very useful player, but he'll never be a major star hitting .265 (and at 200 whiffs a year, .265 his all you can hope for).
Happy Base Ball
Not me. And I'll take beer over tacos (or pretzels) every time.
Don't you think you would have heard something about that?
Or that the people who had visited Africa had actually been somewhere else, much as Columbus wasn't in India in 1492.
Are you suggested another dimension that only the smug 30 yr. old knows about or don't you believe in satelitte imaging?
Idiom, don't you think there might reasonably be some things you're more sure of than others? Wouldn't the fact that Africa exists fit in the "more sure" category while "speed is just as important as the other 4 tools" might not?
Posted by Bill Murray on January 07, 2005 at 07:42 PM (#1064557)
Who.
Wants.
Flies?
If you were doing an MLE projection would you use the major league average for the pop. mean? Or a minor league population?
If you take all minor leaguers who also played in the majors, you regress their performance towards the minor league mean. After you regress, then you can do a "competition level conversion".
So, in MGL's case, the walk rates of those players was a mean of .89 ( I think). This means that the average player called up had a lower than average BB rate... sure sounds strange. Anyway, you regress them up towards the league mean of 1.00. Say that gives you .94. Then, you apply a conversion rate of say 90%, and that gives you a major league walk rate of .85.
I'm not too sure about this, but I think I'm right.
I'm still perplexed here. MGL, can that possibly be right? The guys who get called up were below minor league average in walks? What does this mean, that the minor league are filled with guy who gets lots of walks and either can't hit otherwise, or aren't thought as being able to hit in the majors?
Weird...
I think it says what *I* (and Grabiner) said.
Take those players: they'll all hit (for the purposes of this exercise) .265 (about league average).
For math purposes, we'll say the league average walk rate (which I define as OBP-BA) is 0.070. So the league is 0.265/0.335.
If you are discussing how a player is going to perform, 37 of these guys performed 0.01 OBP points worse, 37 performed between -0.01 and +0.01, and 24 performed 0.01 better.
0.01 represents a 1/7th change (just a round number - we can select another if you would like) I picked that number before I noted any particular split in players - it's an unbiased, if incorrect number. It's the difference between a .335 OBP, a .325 OBP and a .345 OBP - where I personally consider the difference to matter when I am "guessing" performance from a leadoff hitter or whatever. That's also 14.2%
So I have 100 players. 37% will have a OBP 14.2% lower than their MLEs suggest, 37% will be right at (within 10 points) and 24% will outperform their MLE by at least 14%.
That says to me that the minor league walk rate differed by a significant margin and in a direction that could not be determined. I'm not sure how that will play out with MLB players - perhaps tehy see a similar fluctuation, but as I recall there was a significantly larger percentage of MLB players that saw less fluctuation.
Now, I'm using MGL's MLEs, rather than my own, or Dan's or even Clay's.
Players whose RelWR fell below 80%: 32
Players whose RelWR stayed within 80-120%: 45
Players whose RelWR went up by more than 120%: 21
So from MGL's list there's roughly 50-50 that a players WR will stay teh same and 30% it will go down and 20% it will go up.
Doesn't that seem consistent with the paragraph that MGL quoted in Grabiner's article?
Of course, that's what MGL already did.
For math purposes, we'll say the league average walk rate (which I define as OBP-BA) is 0.070. So the league is 0.265/0.335.
Perhaps MGL and others who've done the footwork on this used this method also, but I feel it's inaccurate in judging how well a player's ability to take a walk or manage the strike zone, as I see it, translates from the minors to the majors. Using your definition of BB rate ignores the distortions that could come into play due to SF, SH, HBP, AND IBB.
Voros: Listen Eddie, there's these two guys, one walks a ton, and the other never walks. I think the first guy has a great eye...
I see, I had taken "analyst" to be "boss", but that's not the case. Oh, and please come back.
I made the mistake of checking USS Mariner, and I don't get why Cameron includes MGL in his list. I haven't read every MGL post in existence, but he seems a lot more aggressive with bad stats than with good (or bad) scouts. That's a very good thing IMHO.
I don't think those have to be eliminated. I think they are small enough to be ignored.
It's nice, but not always correct to remove those things. HBP is skill-like. I liken it to taking a ball - "Can this pitch hit me and cause little damage? Okay.." Plunk. That's the same as a walk to me.
J.Cross -
didn't Walt say that correlations aren't the right analysis?
I like to separate HBP from BB because pitchers have a lot better control at higher levels. YMMV.
161: "At least, that was my interpretation of it."
Mine too - I was fine with that.
I don't think those have to be eliminated. I think they are small enough to be ignored.
It's nice, but not always correct to remove those things. HBP is skill-like. I liken it to taking a ball - "Can this pitch hit me and cause little damage? Okay.." Plunk. That's the same as a walk to me.
I can see where you're coming from in terms of not accouting for those things, but I feel that it begins to add up at some point, though defining HBP as a skill helps this "problem".
I don't really like to say skill as an attitude.
I think every player could get HBP a dozen or more times a season. They simply decide it isn't worth the pain.
I don't have an issue with removing HBPs from pitching analysis, but I prefer to leave them in hitting analysis.
Take IBBs - Bond' IBBs have done two things - he's eliminated (for him) the "contextual" bias of IBBs (That there must be RISP in late close games or the 8 hole. Bonds gets IBBs with runners on first. IBBs for Bonds have simply beocome a skill he has.
The really high IBB guys have created teh IBB on their record by being monster hitters. Vlad is getting that way. As such, as they become less context dependent, tehy are more talent driven, so *always* removing IBBs is an incorrect treatment for a handful of players. It's like ignoring HBPs altogether - it doesn't matter for most players because they only have 2-7, but for a handful, they are an enormous weapon.
One idea I've played around with recently is the idea of normalized similarity scores. In each category of a player's stat line (BB, 1B, 2B, etc.) list the percentage of the player's total offensive value that is derived from drawing walks, hitting singles, etc.
So for example, if the runs a player creates are evenly distributed between BB,1B,2B,3B, and HR, the line would be .20 .20 .20 .20 .20. Whereas a player with no power might be .35 .35 .10 .10 .10
Then the similarity score is simply the distance between the two points in a 5-D coordinate system: (2*(.20-.35)^2+3*(.20-.10)^2)^.5 The idea being that the type of batter you are -- where you draw your value from[\em] is more relevant than how good you happened to be in a particular year.
Using normalized similarity scores, you could look for types of players who will adjust well to the major leagues by looking at how similar major league batters are to their minor league selves, with the idea that players whose style doesn't change much going to the majors are those who adjust easily, whereas those players whose style changes dramatically are doing so because their minor league style doesn't work well in the major leagues.
That's totally different way of looking at it than correlation. Looks like the confidence interval is too wide for your taste.
Take one guy with a .380 OBP and another with a .310 projected. Player A is 50% likely to be within .370 and .390, while player B will be between .300 and .320.
It may not predict exactly what the player will do, but the better OBP players in the minors will still be the better ones in the majors.
sounds right. Actually 20% is wider than the 0.01 mark. It's closer to a .365-.395 range. That doesn't seem very close.
Take those guys: the 380 guy is 50% likely to be between those, but he has a 20% chance of posting a 420 and a 30% chance of posting a 330.
You must be Registered and Logged In to post comments.
<< Back to main