Answers Could Come to Elusive Questions
Baseball America’s Alan Schwarz looks at the future of defensive evaluation.
This article appeared in the most current issue of Baseball America, and is reprinted here with permission of Alan Schwarz as a favor to Baseball Primer Readers.
NEW YORK?You are the general manager of a team whose home park has a massive outfield. You decide that defense is paramount, and are resolved to acquire the best center fielder in baseball. Who do you go after?
Scout favorite Andruw Jones? Torii Hunter, baseball?s Dominique Wilkins? Jim Edmonds, who dives more often than Martha Stewart?s Q rating? How about Vernon Wells, whom some scouts rate over Hunter? While evaluating these guys as hitters is a snap?dozens of statistical methods have the exactitude of heat-seeking missiles?fielding statistics are still as clunky as the Flintstone Flier. Errors don?t measure range. Range factor does that, but can?t reveal how many balls entered their zone. Zone rating does that, but can?t recognize where the fielder started. Around and around we go.
What we want to know, of course, is who turns the most catchable balls into outs. It?s as basic as it is elusive. But the answer is closer than you think.
This spring, Major League Baseball?s Internet portal, mlb.com, will install in select parks a three-camera set-up to measure pitch speeds, locations and breaks?to automate the collection of pitch data that until now has been generally eyeballed. This is only the first step, though, in mlb.com?s three-year plan to have up to six cameras in every major league stadium capturing everything?from line-drive trajectories to outfielder running speeds.
We?ll finally be able to know whether Derek Jeter?who is aesthetically wonderful?actually has the range statistics say he doesn?t. We?ll measure Vladimir Guerrero?s throwing speed and accuracy from right field. And we?ll get a lot closer to identifying the best center fielder in the game.
Century-Long Quest
Believe it or not, this technique was first explored almost 100 years ago. Hugh Fullerton, a nationally known baseball writer for the Chicago Examiner, wrote a long magazine piece called ?The Science of Baseball,? where he personally measured not just how many balls infielders reached but also the speed, thanks to his 20th-of-a-second stopwatch, with which those grounders were hit. (One exasperated reader replied that baseball was no place for ?a tape-measure, a T-square and an intimate knowledge of algebra and fractions.?)
Fielding statistics stayed relatively stagnant until the early 1980s, when Bill James popularized range factor, which measured how many outs a fielder made per game. (This helped illustrate that even though Bobby Grich made a few extra errors a year, he also reached 50 more balls, making him extraordinarily valuable.) Ten years later zone rating, invented by STATS Inc. president John Dewan, recorded (by sight) the percentage of balls hit into a fielder?s area that were turned into outs.
Today, Dewan and his new company, Baseball Info Solutions, have refined that method into a plus-minus system, showing which fielders made the most outs compared to the league average at his position. By that measure, Jones led center fielders with a plus-19, while Edmonds had plus-10. This information is sold only to teams?the Red Sox were the first client?while fans cannot access it.
Mlb.com?s plan is to make all its data available on the Website, probably part of a subscription service.
Brave New World
That system, somewhat QuesTec-like?please Curt, don?t hurt us?will focus three cameras on the tunnel between pitcher and batter, allowing them to three-dimensionally measure the speed, location and trajectory of pitches. (We?ll be able to see whose fastballs really do have late movement, and perhaps whose hits come off the bat hardest.) Each system costs about $40,000. MLB has signed off on the expenditure, and mlb.com is in talks with Seattle stat company Tendu to work together on the real-time processing of the data.
If that linkup works?and it should, after the inevitable early hiccups?the next step is to add a few cameras to capture the whole field. Everything, from the ball to the runners to the fielders, will be followed as if by little global positioning systems. It will afford fans a whole new picture of the game: Who is fastest going from first to third? Which helps a right fielder more, 4 extra mph on the throw or 18 inches of accuracy? Does Juan Pierre?s speed make up for his vertiginous routes to fly balls?
Most important, though, is we will finally capture the slippery concept of range. The cameras will finally measure where the fielder is stationed before the ball is hit?a skill in itself?and how quickly he gets to any ball hit near him. Fan arguments, and team decisions, might never be the same.
Mlb.com CEO Bob Bowman vows to take his grand plan slowly. ?We?ve waited over 100 years, we can wait another one or two,? he says. ?We?re going to walk, then run, then sprint.?
Soon, we?ll know just how fast.
Alan Schwarz is the Senior Writer of Baseball America. His first book, “The Numbers Game: Baseball’s Lifelong Fascination With Statistics,” will be published by St. Martin’s Press in July.
Baseball America is one of the leading authorities on baseball today and contains coverage of the minors and college baseball that is unparalleled in depth. A yearly subscription to Baseball America costs only $69.95 for 26 big issues and includes complete access to BA’s voluminous web archive. To subscribe, go to Baseball America Online store, call 1-800-845-2726 Monday through Friday, 900-500PM EST, or send payment to Baseball America, P.O. Box 2089, Durham, NC 27702.
For more information, check out BA’s website and browse all the free content available such as top-to-bottom minor league stats and the 2003 MLB draft database.
Alan Schwarz
Posted: January 26, 2004 at 06:00 AM |
35 comment(s)
Login to Bookmark
Related News:
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. tangotiger Posted: January 27, 2004 at 04:06 AM (#614502)Why would a team buy something that MGL gives out for free, with apparently similar results? MGL is the Linus Torvalds of sabermetrics.
Charlie is right - the where's and why's of defense are also important - which is why I don't care for reducing the granularity of the data.
David Pinto did a great job in his presentation to do just that for 3 players.
I agree that you want to see intermediary results, but that doesn't necessarily mean that you can get something from that either. At some level of granualarity, we will have little confidence in the numbers, and you have to rely much more on scouting.
think of us what you will, but there is no treachery or deception intended.
I apologize to you for offending your senses.
I didn't think Visitors Dugout implied all you read it to be: I felt it just meant someone not "on staff" - usually, that's someone without another forum.
We're making a change to clarify.
I appreciate the critique, and I'll work on the things that bother you. Email me if you have something specific you think I should do differently, or you just want to clarify your statements (rather than here).
A puffer fish? I thought I was more of a gourami. ><>*
Back to the article:
I subscribe to mlb.com and I think the total access to archived video is fantastic. Of course, the availability of Game 6 helps.
I like their stat pages too, although the hit chart information could be better, wrt allowing to see "all parks" at once, and having a pitchers hit spray chart.
I'm pretty stoked about the idea of tracking ball flight and "movement" by fielders.
Anyone know where the article he references can be found? What about his data?
Therefore, there was no choice for the editor other than having the article appear as it did. Apologize for any confusion this may have caused to the people incapable of discerning that the article had previously appeared elsewhere.
Please feel free to e-mail if you feel misled and we can discuss it further.
Of course, there are many idiots in the world.
I'm thoroughly confused as to why there should be any problem....
Thanks for taking the high road.
Tangotiger:
You raise a good point. Obviously, one answer is that MLB executives are unaware of MGL's work. Another answer though is that his estimates are built on a theoretical model and generated from statistical data. You can make the case that the estimates in the article are also based on a theoretical model, but they are generated from observational/recorded data. Therefore, they may have more face validity.
On the other hand, to the extent that the two approaches generate fairly similar outcomes, I would see the new data as a validation of MGL's model, which in turn could improve its market value.
that does keep some fun in it, but even within these evaluations, I think that the "whys" will still allow for plenty of discussion - Cal Ripken was known for playing deep and being saavy enough to position himself well - these cameras may make it appear that he doesn't have range, or is just lucky that the ball gets hit near him. It will be difficult to quantify how smart some guys are wrt position, and we'll end up making some indirect determinations on why they make plays - cause it sure doesn't look like they are trying.
What Dewan presently offers, above MGL, is really just the name "John Dewan". MGL's work isn't *probably* going to be significantly different from BIS.
STATS published their own UZR data and it is similar to what MGL does, if not exact - I think the only difference will be in run value distribution. In fact there is no reason to think that STATS UZR isn't an outcropping of DA/DR from the Project Scoresheet and Baseball Workshop (originally created out of the BJ Abstracts). There is no reason to re-invent the wheel, and BJ had to know about it - and BJ didn't care for the original design of ZR counting (he didn't like that it approached 1).
If a company paid MGL instead of BIS, they would come out ahead, assuming MGL would be cheaper, but that's no guarantee. ;-)
I read the article last night, posted this morning. I was thinking that what Dewan was selling was drawn from data generated by the new cameras. I realize now that's not the case.
I guess it's not too surprising that the estimates are similar given that they are working from similar data bases.
Has anyone done a correlational study to compare various offensive metrics like eqa, RC, etc.? I suspect they'd be relatively highly correlated.
I'll then raise a point in the context of what I thought I was intending to say earlier.
Let's say we have three indices of player value, eqa, RC/27, and an aggregate rating by a scout on traditional dimensions like hitting, speed, etc. Eqa and RC/27 correlate .95, and both correlate about .20 with the aggregate rating. Does this pattern tell us anything about the validity of either the eqa or RC/27? I can make a case that it does not tell us much. It does tell us that different ways of combining data from the same starting point gets us to a similar point, and that point is at odds with traditional baseball thinking. But, would it make the established baseball community more accepting of either newer metric?
If, however, you began from a radically different starting point and created values that correlated moderately well with data-driven values, I think that would be more persuasive. I'm not sure what that would be for hitting, but I could envision, say, a breakdown of pitches thrown (changes in velocity, location, pct over the plate, etc.) perhaps correlating with something like DIPS.
If I could, I'd insist that PBP scorers include "hang time" (time from ball to ground or glove). There is a problem with "snags" of course.
However, this approach of using cameras to plot trajectories in 3-D (and hopefully include fielder positioning at the time the ball is hit) gives me great great interest.
I would love to spend 15 hours a day for the next 6 months analyzing all this data.
I'm already having nightmares of those 15 hour days when the data becomes available.
The thing about offensive metrics are that they are "clean." We know exaclty what happens when a batter steps up to the plate, and we also know exactly how any particular outcome relates to run scoring. The only thing we don't know for sure is the relationship between a batter's talent and his sample outcomes. That can be improved, but not by a whole lot. For exaplme, eventually with the video data and sophisticated compute analysis, we will not care what teh actual outcome of a plate appearance is, be it a walk, single, fly out, etc. We will use the more granular data, like the pitch locations, the batted ball speed and trajectory, fielder positioning, etc., to construct a "virtual" result, like my "virtual home runs." Then we will either use the virtual result (like a line drive going 100 MPH 6 feet off the ground is .8 of a single) to generate the batter's virtual perforamnce, or we will have a whole new set of linear weights. The new lwts, rather than .47 for single, .78 for a double, etc., will be .4 for a 110 mph line drive 6 feet off the ground, -.2 for a 100 foot high fly ball to sector XY on the field, etc.
For fielding metrics, things are not so clean (although people who don't like metrics like ZR and UZR do not give them NEARLY the credit they deserve for their "cleanliness"). The clean part of a PBP fielding metric, and what makes them not so different from a good offensive metric, is that when a ball is put in play, we can the equivalent of a BA for each player on the field. Obviously, only 1 or 2, or at the most, 3 players are even implicated for every batted ball, and most of the time, it is only one. So everyh player on the field gets a batting average when a ball is put into play. That's not so different than a batter's BA, but we seem to accept a batter's BA as being worth something, but we dismiss a fielder's BA (basically a simple ZR). Why is that?
Now if we introduce some lther variables, like the location of each ball when it is caught or not caught, we can upgrade that very legitimate fielder BA (in fact, it was originally called DA by Sheri Nichols and some other people) to a sort of fielder SA (we can bypass OBP since there are no "fielder walks"). Again, people love SA, especially when copmbined with BA or OBP, but again some people somehow disdain almost the exact fielder counterpart, which is a simple version of UZR. So ZR=BA. OPS=a simple UZR. There is not relaly much more to it than that. Soem of theother adjustments that go into a mroe complex UZR actually takes UZR somewhat beyond a batter OPS! It is like adjusting a batter's OPS for whether he happened to get lots of line drive hits, or bloop hits, or cheap doubloes down the line or something like that. So in some sense, a coplex UZR is better than a batter OPS!
That's the "clean" part of a PBP defensive metric. What are the sticky parts, which make a defensive metric like ZR or UZR actually a little (but not nearly a lot) worse than, say, batter OPS, even though it should be better? Not too much really when you think about it. One you migh think is fielder positioning. I say that is not a negative factor in ZR or UZR. I don't care where a fielder postinions himself! That is part of his defensive talent! If he positions himself such that he gets more or less UZR, than he should get credit for it! That's like saying that a batter's stats are problematic because they all have different stances in the batter's box, making it more o rless difficult to reach certain pitches!
What about fielder interaction. Definitely a source of problems, mainly for OF'es. How much of a problem? IMO, not too much. Not too much at all!
What about accurately recording the location (and speed if that is used) of the batted balls? Well, again, in some senmse that is like saying that we are not confident in a batter's offensive stats becasue we don't know how hard and where each ball was hit - we only know whether it was "scored" as a singl;e, double, out, etc. Well, how about the fact that when a ball is hit near a certain fielder, we know whether it was scored as a hit or an out! Isn't that almost the same thing? DO we NEED to know exaclty where and how hard the ball was hit in order to get a very good idea as to how good a fieldre is at prebenting hits and ultiamtely runs? OF cours enot! It wouild be nice, but it is hardly necessary. In fact, all of the research I have done on PBP defensive evaluations indicates that a basic, simple ZR or UZR tells you 90% of what you need to know about a fielder's ability to prevent balls in play. It is no accident that UZR has almost the same y-t-y correlation as OPS for the same number of opportunities!
Let's face it folks, there is this icredible bias against any PBP defensive metric, which is really unbecoming to the sabermetric crowd who normally pride themlseves in being able to separate fact and truth from bias and illusion. And I know where this bias comes from. One, it is the "crowd" effect. For years, that's what everyone heard - "it's so hard to measure defense, oh it's so hard, it's so hard." Well, I got news for you! It ain't so hard! The second and perhaps mroe important reason, is that offensive metrics are much more "intuitive." We watch a game, and teh camera focuses on the batter and pitcher. Batter hits the ball, and it is an out, or a single, or a double! We mentally chalk up the outcome ot the batter and the pitcher, and of course during the game and at the end, the announcers and the papers tell us how each player did (so-and-so went 2 for 4), and how th epitcher did (he gave up 6 ER's - oh, a bad perforemnce). What if everyhting were turned upside down, and the camera's focused on a fielder. "The ball is hit, and oh, the SS makes the out! Or, the ball is hit and the SS doesn't make the out! Or the ball is hit towards zone XY and the RF makes teh catch! What a great catch! Then at the end of the game we see of the fielder's stats. Wow, Jeter had 11 balls hit near him today and he caught 9 of them! Or Soriano had 5 balls hit to zone 4M and he caught all 5!
Now all of sudden we would take it for granted that these defensive metrics are just fine! Seriously, it is all an illusion (that offensive metrics are fine and defensive ones are problematic).
The last thing is that it is so easy to validate offensive metrics by correlating them with runs scored, which no one wil argue with. Not so easy to do it with a fielding metric. A good fielding metric is supposed to be highly correlatied with runs allowed, but it is difficult to do that! First, we have pitchers being inextricably related. Then we have the idea that separating out each fielder's responsibiliy for preventing runs is difficult, and finally we have the problem of batters being th eprime influence on runs scoring, such that fielding essentially gets drowned out.
Sorry for the rant, but I am so sick of people complaining and whining about defensive metrics (trust me, I could care less whether UZR was mine or anyone elses. Heck I stole the idea from STATS UZR anyway. Anyone who knows me knows that I couldn't care less about who invented or computed what metric), that I could throw up! They are fine. They are about somewhere between OBP and OPS, and that;'s no small thing. If I had a dollar for every so-called smart person on Primer alone who throws the baby out with the bath water when it comes to UZR or a similar defensive metric, I wouldn't need to sell anyhting to an MLB team! Get over your fear of PBP defensive metrics guys! They are not perfect, but they are good! Very good! That's the bottom line!
It is no accident that UZR has almost the same y-t-y correlation as OPS for the same number of opportunities!
Now, now. The regression equation for UZR is 420/(420+BIP). For LWTS, it's 209/(209+PA). Given 1500 opps, that's a regression of 12% for LWTS and 22% for UZR. It's still GREAT for UZR.
In terms of "equivalency", having 1500 BIP from a fielder tells you as much as having 750 PA from a hitter. That's why scouting provides some value. It might be that 750 BIP from a fielder + scouting = 750 PA from a hitter and no scouting.
y-t-y corr. for > 300 PA (mean PA=540) PA or around 400 opps for UZR.
OPS .594
park adjusted lwts .675
UZR .451
BA=.392
That corresponds to 260/(260+PA) for OPS and 500/(500+BIP) for UZR. I used the Superlwts files for the UZR and adj lwts and regular unadjusted batting lines for OPS and BA.
Let me re-run the r for UZR from my UZR files:
For SS, min of 70 games per year, 132 games average, or around 500 BIP:
r=.505
Let me check CF from the UZR files:
r=.533 for 127 defensive games, which is still around 500 BAP I think.
For 2B, r=.404 for 126 defrensive games which is around 400 BIP I think.
So I'm comfortable with an "X" of 400-600 for the UZR regression and 200-300 for OPS regression.
Hey .500 and .591 for 500 BIP for UZR and 500 PA for OPS is pretty close. Close enough to say almost?
That corresponds to 260/(260+PA) for OPS and 500/(500+BIP
So, with 750 batting PAs, that's a regression towards the mean of 25.7%. To get the equivalent with UZR, you'd need a BIP = 1442.
I said that 750 PA = 1500 BIP. So, close enough.
- for pitchers: fastball/curve/slider/change, speed, spin, trajectory, location
- for batters: trajectory, speed
- for fielders: time from point A to point B, speed/trajectory of throw
- with a context of: count, fielder positioning, park, pitcher, batter
that you essentially don't need to know the result of that play. (You'd only need to know it at a league level.)
This is some very, very exciting stuff. This would be the pinnacle of sabermetrics.
Personally, this is more valuable with minor league players, but I can see that this will only be done in MLB.
- arm angle (and previous arm angles in this game and previous games),
- speed of previous pitches (and pitch speed in this game and previous games, and pitch speed patterns in this game and previous games, and perhaps batter's susceptibility to pitch speed changes),
- favorite pitching locations (and in-game and historical adjustments),
- favorite hitting locations for the batter (and in-game and historical adjustments for injury, stance, players on the bases, etc.)...
Arm angle I did not consider, but that should be added, and easily tracked. All the other ones would already be in the database.
You want to add leg injuries and whatnot, fine. Again, that's easily recordable data.
What is it that you think a scout can see that we can't quantify to a useful degree? The look in Schiraldi's and Rivera's eyes? The way David Cone loses his temper?
You record all the data, and you create a model that would match that data the closest.
However, for game theory, once the model is known, then that comes into play as well. Once all the pitchers know that Mike Piazza has NEVER swung at a 3-0 count from 2000-2003 (but threw over 40% of their pitches for balls), they'll start to throw more pitches in the strike zone. Piazza, now seeing that they know how he hits, starts to swing more. Pitchers, now realizing that Piazza knows that they know, will start to go for the corners. Piazza, seeing that the pitchers realize...
Ah, that's going to be the fun and interesting part. As more and more teams try and use this data to tailor their strategies versus particular players and teams, they better know about game theory!
In addition to the example Tango gave....
What if team A finds out that Piazza can't handle the high inside fastball with 2 strikes. Well, if that's pretty much all they start throwing him with 2 strikes, and he knows it's almost definitely coming, he'd have to be an idiot to keep missing that pitch. I have to assume that when a major league batter really knows what pitch is coming (like 90% of the time), even if that is his toughest pitch to handle, surely he can't do too badly...
Mathematicians of the world, unite! You have nothing to lose but that nasty chalk mark on the front of your shirt.
Why did you bring up Phelps? Is he also a particularly smart player? (His BB/K don't look it.)
First, I think you throw a huge strawman into the argument when you claim incredible barriers between UZR and general stathead acceptance. I haven't counted the number of times your metric has been mentioned in the discussions of various fielders on many different blogs; however, its a significant number. Google it sometime, you may be surprised.
Second, I think there are several reasons why it hasn't had even wider spread acceptance. In no particular order:
a. The raw data is unavailable. We all can get hits, doubles, steals, walks, runs, innings pitched, wins, etc. from many, many sources. With that, we can calculate RC, LWTS, XR, BsR or whatever. By playing with the inputs, the end user can get a better understanding of the output. That the pbp data may become publicly available through mlb's camera process described above will help acceptance of any system, offensive or defensive, that is based on the underlying data.
b. Your methodology is not fully available. You have done a thorough job in explaining the concept behind your calculations. As a result, I don't know anyone who has problems with those concepts. They can be improved by better data (hangtime, etc.) but the concept appears sound. However, you haven't presented the nuts and bolts. That certainly is your right, and may be a very reasonable choice depending on how and whether you intend to ultimately market UZR, but the user has to assume you make the correct adjustments for parks, day/night, pitchers, etc. Explaining the concept is not the same as revealing the process. I certainly expect that you've been careful and attentive, but for all I *know* you may have switched signs or divided where you should have multiplied.
d. I believe you regress many of the elements to better arrive at a 'true' talent level, the only output that is usually offered. For purposes of projecting the future, conceptually regression is correct (although a reader has no idea if you regress enough, too much or not enough). However, by doing so, you leave out of the published results what has actually happened on the field. What has happened is objective, what will happen is not. I don't know if I am alone in this, but I'd rather see the former and work out the later myself.
e. Its published on Baseball Primer and Fanhome. While I think this website is wonderful, I assume it gets much less traffic than do mlb.com or espn.com. Even more so, it hasn't been published in book form. A tangible product seems more real than the very etheral internet. It won't change my already positive opinion, but if you want wider spread acceptance, you and Tangotiger need to get cracking on that book.
1) AFAIK, my complete methodolgy is essentially publicly available, although as you said, without the PBP data, no one really pays that much attention to it.
2) Although I do tend to be the "regression man," there is no regressing in the UZR stats the way I present them. I present the actual perfomance results! The only regression - sort of - is in some of the adjustments. You have to do that with adjustments (see my QOC article), otherwise the noise you add to the results may more than cancel out the accuracy you were trying to gain by doing the adjustments.
3) I understand that lots of people use it for defensive evaluation (as well they should), as they use RC, Baseruns, Eqa, OPS, Win Shares, lwts, lwts ratio, etc. for offensive evaluation. It's just a little frustrating that some people, even smart analysts (I won't mention any names), have a bias against the "new," good defensive metrics to start with. Much of that bias is not legitimate.
4) The book (at least the first one) won't inlcude things like evaluation metrics. It is strictly going to be about strategy.
5) I don't really care how widely accepted UZR is - honestly. I'm just trying to help out in evaluating defense, that's all, and have fun in doing so. Not trying to sell it to the public or anything like that.
***
If I can't measure it, and you can't measure it, then what do you do with it? Nothing! That's part of the error range in your probability distribution model.
***
If the pitcher knows or don't know.... I don't know. That's part of the error range.
***
You take all the data you know, make allowances for the data you don't know, and you model that. And I'm saying, yeah, I can discard whether a ball fell for a single or was snagged by Mike Cameron, if I know the trajectory, speed, all those contexts I mentioned, etc.
Look, I don't know how well Jeter might have been feeling the day that he hit a liner that Cameron snagged. He was feeling particularly good, or particularly bad? I don't know, and I don't care. That's part of the error range.
If Jeter hits a ball on a particular count, against a particular pitcher, with that particular fielding alignment, and that particular CF there, and that 80% of his peers, when they hit a bad with a 90mph, reaching a high of 100 feet over 2B, and falling 100 feet past 2B on a straight line was a single, then I don't care what happens to that ball (whether Cameron caught it or not). It's .80 singles. If you tell me that Jeter was feeling particularly good, then maybe he wouldn't have hit thar particular ball that well that day, and maybe it would only be .30 singles.
I don't know! But, my contention is that this randomness should certainly be taken care of.
After all, 1800 PAs regress 10% towards the mean, and that's with knowing NO context. You keep adding context, and the reliability of the 1800 PAs without context might be worth the same as 100 PA with "complete" context. I'll never get it down to 1 PA. And, regardless, you'll always have an error range.
***
we know precisely that the batter+pitcher strategy combo for this pitch results in 50% homers and 50% strikeouts
I never said anything about precision. And, there's always an error range. Do you not read what I write for the last 4 years?
I'm sure I could come up with a model that for a particular context (pitcher,batter, fielder, park,time of day, count, previous PAs, etc, etc), that I'd say that this current matchup, at this exact point in time, would give up a .10 chance for a HR, +/- .04 95% of the time, or some such.
***
Btw, how I wrote, and how you interpreted, there's a big hole in there. Since I don't write well, you are better off asking me first what I mean exactly. It'll save us both a whole lot of writing.
As I said, for any one given PA, who the heck knows. The error range, for 1 PA in isolation, is enormous. However, the more data you have, properly interpreted, the better. How could it not be better?
What we in sabermetrics want is exactly what the scouts want: tools. This is what it comes down to. The combination of the physical, mental, experience, within a specific context, (etc, etc) will produce an expected result. If we knew exactly all these tools for a player, we wouldn't even care about even if he played a game! (This is what video games strive for.)
Of course, we don't have those kinds of tools. We infer a player's toolset based on his performance. Right now, those are results-based performances.
My contention is that if you can get everything I want recorded (3-D path of the ball from pitcher to hitter to fielder, along with all the contexts that we can get, etc, etc), then I don't care if a particular ball drops in for a hit or not. If I know exactly the type of pitch, speed, location, and the batter's swing speed and plane going through and his wrist action, then it doesn't matter if it was Mike Piazza or Jim Rice who swung. From the perspective of the ball, the result is the same: 80% hit, and 20% out (e.g.).
MGL - getting back the discussion dlf started; one reason that some of us aren't really into the PBP data (though I trust it for current evaluation) is that we want to evaluate players historically. Have you done any work to try to correlate a metric based on the traditional stats (adjusted for everything of course, like Charlie Saeger and Bill Jame with WS attempt to do) with your PBP data? That would be somewhat of a 'Holy Grail' for those of us that want to know whether Joe DiMaggio was better than Willie Mays with the leather, etc..
Also, how would you say that your methods compare with Diamond Mind's defensive ratings (if you are familiar with them). Are they generally correct when they rate someone PR, FR, AV, VG or EX?
You must be Registered and Logged In to post comments.
<< Back to main