You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Tuesday, January 21, 2003Scoring Position AverageDave looks at 2002’s best tablesetters. I’ve developed an annual habit this time of year when complete statistics for the previous year become available. I create a spreadsheet that contains key data from the previous season, and I play with it to see what I can find. Most recently, while using Ray Kerby’s Astros Statistical Software, I tried to look at offensive production in a different light. Most of us understand that OPS is a tremendous tool to evaluate offense. But offensive production is a complicated process that often requires several different approaches to understand it fully. So I changed some of the parameters a little bit. That is, instead of thinking primarily along the two dimensions of offense (getting on base and slugging); I thought it might be helpful to think of three fundamental run production elements:
I calculate that these three components account for 90% to 95% of all runs scored. Plus, thinking of these components yields additional insight into teams’ and individuals’ hitting strengths and weaknesses. As an example, let’s pick on two relatively equal teams from the NL East: the Mets and Phillies. The Phillies scored 710 runs in 2002, the Mets 690. When you correct for ballpark effects, they virtually come out even. But there were different factors accounting for their offensive production:
The following graph shows the number of at bats and batting average with runners in scoring position for every major league team. This chart tells many stories. For instance, Anaheim only hit 152 home runs in 2002, but the offensive strength that carried it to World Series victory is clear on this chart. Tampa Bay, one of the weakest offensive teams in the American League, was one of the best at getting runners into scoring position, but their poor batting average in those situations undermined that advantage. Obviously, there is a strong correlation between how well teams hit, how they hit in the clutch, and the number of runners they get into scoring position. But large variances from that correlation, such as the Mets’ and Phillies’, have a strong impact on offensive performance. These trends are also telling when applied to individual players. Why was Edgardo Alfonzo’s RBI total so low (56)? Well, he only had 103 at bats with runners in scoring position, though he batted well (.330) in those situations. How about his apparent batting order replacement, Cliff Floyd? Floyd only had 79 RBIs last year, a low total for such an outstanding hitter. Floyd did have 150 at bats with runners in scoring position, but his batting average in those situations was .265, twenty points below his overall average. Most notably, he was 1 for 14 with the bases loaded. Thanks to the Internet, individual player information regarding home runs and hitting with runners in scoring position is readily available. One other question intrigued me, however. How do we assess the ability of individual players to get into scoring position? Can this be a meaningful analysis? To begin, here is a list of top ten baserunners ranked by how often they were in scoring position. The ranking is based on the number of total plate appearances in which that player was a runner on second or third: National League
This list includes some classic leadoff hitters, such as Castillo and Vina, as well as some other very good hitters such as Walker, Lee and Boone. Corey Patterson, at number ten, is a surprise. He had a very poor OBP last year, but his ability to hit doubles, steal bases and move around the bath paths contributed to his standing. Here’s the American League list: American League
Again, a mix of leadoff and very good hitters who don’t hit a ton of home runs. Randy Winn also was the primary driver of Tampa Bay’s large number of at bats with runners in scoring position. Obviously, this information is skewed by a number of factors, such as the total number of at bats each player had. To correct for this, I calculated the following list, in which each player’s total scoring position opportunities is divided by their total plate appearances (minimum of 400 plate appearances): National League
Dave Roberts had a great season, didn’t he? Craig Counsell is certainly a surprise; this is probably due to his position in a strong lineup and a hitter’s park. Note that Corey Patterson stays on the list. American League
Adam Kennedy jumps to the top of the list and Winn stays at number two. Kenny Lofton’s entire season stats are listed here, even though he split time between leagues. Mark Ellis? As noted in the Counsell listing, these rankings are a reflection of both the individual player and his team. Perversely, some players may be on this list because they reach scoring position once in a while and stay there, while their teammates are unable to bat them in for two or three plate appearances. To refine the analysis a bit more, I analyzed individual players based on their ability to get into scoring position on their own. I’ll call the number "Scoring Position Average," or SPA. To compute SPA, I first calculated the number of times batters reached scoring position as the result of their at bats (in other words, doubles and triples). I then added each event in which they reached second or third base from first base without the assistance of a base hit or walk by a teammate. Examples of these events include stolen bases, balks, wild pitches and advances on outs. I divided this sum by total plate appearances to calculate SPA. SPA, in other words, represents the percent of plate appearances in which a player advanced into scoring position under his own power. The results (minimum of 400 plate appearances): National League
American League
These two lists include a number of hitters we typically regard as very good hitters, such as Abreu and Guerrero. But they also include a number of hitters who are typically not highly thought of, such as Juan Pierre, Chris Singleton, Jerry Hairston and Corey Patterson. It is Patterson’s persistence on each of these lists that makes me think we should perhaps reevaluate our perception of some of these lowOBP, speedy players. Dave Studenmund
Posted: January 21, 2003 at 06:00 AM  16 comment(s)
Login to Bookmark
Related News: 
BookmarksYou must be logged in to view your Bookmarks. Hot TopicsWhat do you do with Deacon White?
(17  1:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  8:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  10:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  4:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  3:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH SABR 40: Impressions of a FirstTime Attendee (5  11:12pm, Aug 19) Last: Joe Bivens, Minor Genius St. Louis Cardinals Midseason Report (12  12:42am, Aug 10) Last: bjhanke Napoleon Lajoie: Definition of Grace (9  12:38am, Jul 01) Last: Hang down your head, Tom Foley Youth Baseball Hitting Drills: Shine the Light (5  6:47am, Mar 11) Last: Pat Rapper's Delight 



Page rendered in 0.5271 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. John Posted: January 21, 2003 at 02:24 AM (#608464)Lots of guys on lousy (craptastic is kind of played) teams on the leaderboards, especially in the NL. Helps explain why those teams were lousy, but the good teams don't have guys with good SPA's (Anaheim2, other 7 playoff teams3). Wonder why?
John, good point about the high SPA players being on some craptastic teams. As I've thought about it further, I've realized that these types of players have more value to lousy teams.
Take Eric Young, for instance. On the surface, his Milwaukee contract looked like a bad idea. But Young had a .340 OBP this past year, with a good set of skills for moving himself into scoring position. That's not a bad thing on a team without a lot of good hitters in a row moving runners along. In fact, it's arguably exactly the sort of thing a team like Milwaukee should be doing.
This is a wild guess, but maybe players on teams with really poor offenses take more risks on the bases. One limitation of SPA is it doesn't capture failed attempts to take the extra base (e.g., caught stealing and being thrown out at second when trying to stretch a single into a double, etc.).
Dave, so what if Corey Patterson hits a double to lead off an inning and then sits there while the following three batters all strikeout. Does this count as three opportunities?
Perhaps I am misreading the above statement of yours. I see it as: (1) how many plate appearances by his teammates occured while that player was on second or third. Maybe it really is: (2) how many plate appearances by a particular player resulted in that particular player becoming a runner in scoring position.
I think the "was" in your statement above is confusing me. Thanks in advance if you help clarify this for me.
PSA, however, is a measure of how often the player got to scoring position on their own, divided by their own plate appearances. So it eliminates that problem. Seems to me this does have some value beyond just being interesting.
Did you calculate SPAs from previous seasons as well? I'd be curious to see how some players trend.
I have some qualms with the measure, at least as a measure of any skill. I'm not sure we should credit these guys with advancing on outs. At the very least, the number of times they advance on sacrifice bunts should probably be removed. It's also easier to move from 1 to 2 if the batter after you hits more groundballs than flyballs. Moving from 2 to 3 is almost automatic for all baserunners on groundballs to the right side or deep fly balls to anywhere but left.
And as noted, either HR should be removed from the denominator or, better yet, added to the numerator  home plate seems like the best of all scoring positions to me. This measure necessarily rewards guys who hit doubles instead of HRs (which is one factor that contributes to all the Angels on the list). Tony Muser would have loved this stat. :)
I also don't see why these guys should necessarily be rewarded for wild pitches. Balks I can see an argument for, though my impression is that most balks these days are the result of mental errors, not attmepts to deceive the runner.
Finally, seems leadoff hitters have an automatic advantage since they're guaranteed at least one PA with no outs. It's easier to advance to scoring position at some point in that inning if you get on base with no outs.
How about this measure: # of teammates' PAs spent in scoring position divided by # of teammates' PAs spent on base. This part gives us the frequency with which they move up when they do reach base. Then multiply by OBP and I think that gives us something like th number of teammates' PA spent in scoring position per player's PA. OK, that's not quite right, but we've somehow got to get the number of oppoturnities in there.
For example ... well, ESPN surprisingly doesn't give splits by # outs, but they do give the none on/no out split. Patterson had 177 such PAs, Sosa had 122. And he didn't even lead off the whole year. Dave Roberts, also a parttime leadoff guy, had 210 none on/out PA. Eckstein, a fulltime leadoff guy, had 259 such PAs, about 37% of all his PAs.
Of course Sammy also had an embarassing 279 OBP in those situations (still, better than Patterson's!). So maybe Barry and his 121 PAs would be a better example.
One, I didn't worry about home runs in this stat, because I considered home runs to be another part of run production (as laid out in the beginning of the article. I would lean toward taking them out of the denominator (good suggestion).
Secondly, I've got to say the the Phillies' (and Mets) performance with runners in scoring position was simply a matter of luck. Look for the Phillies to increase significantly next year in runs scored, for no other reason than this.
Really, the ability to get runners in scoring position (the "x" axis on the chart) is a very good indicator of the team's overall batting ability. Hence, teams that are below the imputed linear relationship will tend to float back up, while those above the line will tend to float down.
Third, I'd love to calculate this stat over time, but I just don't have the time. I'm more interested in refining the stat itself first (if I even have time for that!).
Third, I think, Walt, that you hit the nail on the head with your comments. I went back and forth a lot regarding advancing on outs. I think there is a good rationale for crediting players who have speed and advance on outs that others don't. But you're right, this strongly biases the average toward leadoff hitters, who get on base with no outs more often. This is big, because advancing on outs accounts for half of all runneronly advances (stolen bases is second).
So I recalculated these stats by taking home runs out of the denominator and removing advances on outs altogether. As you can imagine, the list changes dramatically to emphasize nonleadoff guys who are strong doubles and triples hitters.
Garciaparra and Garrett Anderson lead the list in the AL at .091. Winn drops to 14th.
In the NL, the great Japanese hitter Kevin Millar leads at .087 and Abreu is second at .084. Corey Patterson drops to about 45th.
I'd put the tables in this post, but I don't know how to format them.
Not nearly so interesting, and probably closer to the "truth." Still, I can't stop feeling that some credit should be given for advancing on outs, but I'm not sure how to do that. I didn't follow your line of logic, Walt.
Any thoughts?
Phillies hit 165 home runs, and the Mets hit 160. Correcting for ballpark effect again, they were about even. But the next two factors are more telling.
Actually, according to baseballreference, the phillies played in a tougher home park, for hitters, than the Mets did.
Veterans Stadium 2002 batting park factor: 91
Shea Stadium 2002 batting park factor: 94
So, according to this, the difference, offensively, between the two clubs is real, and actually wider than it appears at first.
Can someone explain that?
Sure. It's luck....barely. Let's start with a generic example. What is the likely range for a 250 hitter over 1,000 ABs.
We use the handy binomial distribution. The mean of the binomial distribution is p*N and the variance is calculated as p*(1p)*N, where p is the likelihood of success and N is the number of "trials." In this example, p is .250 and N=1000.
This gives us a mean of 250 (we knew that) and a variance of 187.5. We can get the standard deviation by taking the square root of the variance, giving us 13.7. Now, for a large number of trials, the binomial distribution becomes the normal distribution. This means we can calculate a 95% confidence interval as the mean +/ 1.96*SD. In this case, that tells us that we can expect a 250 hitter in 1,000 ABs to get 250 hits +/ 27 hits. So in 1,000 ABs, a 223 hitter is not statistically significantly different from a 250 hitter.
People may not be willing to say that a 223 hitter is as good as a 250 hitter over 1,000 ABs, but the stats say you can't reliably tell them apart.
Now, back to the example at hand. The Phils hit 237 in 1,474 AB with RISP vs 259 overall. Are these significantly different? Well, let's assume their "true" value is 259. In 1,474 ABs, the expected number of hits is 382 and the variance is 282.9. The square root of that is 16.8. The resulting 95% confidence interval is 382 hits +/ 33 hits. As luck would have it, 349/1474 is .237. So the Phils were right on the borderline there.
Now one thing to keep in mind is that, technically, this test is only valid if we had chosen the Phils randomly. But they weren't, they were chosen precisely because they'd done so poorly. The point here is that statistical significance is determined (generally) by whether the value lies outside the 95% confidence interval. But, by chance alone, even if the true BA is 259, 5% of all sets of 1,474 ABs will fall outside that interval. In other words, on average, every season we'd expect 1.5 ML teams (i.e. 5% of all teams) to have a BA/RISP "significantly different" from their overall BA.
Or ignore everything I just said and rely on the fact that, to my knowledge, no one has yet been able to demonstrate that RISP differences maintain from seasontoseason. If those differences were "real", they should.
Now, the leaguewide difference on 2 outs vs. less than 2 outs. That's definitely significantly different, though I'd be more interested in OBPs and SLGs. You touched on one part of the puzzle sac flies. Unfortunately br doesn't give team/league SF totals, so I have no idea how big that effect might be. Intentional walks to good hitters with 2 outs is probably another piece. And maybe particular positions (like the 8th/9th spots) in the batting order are systemically more likely to have BA/RISP with 2 outs than other spots.
And this does give us another possible explanation for the Phils poor performance  maybe they had a disproportionate number of ABs/RISP when there were 2 outs. Bound to happen when you've got Doug Glanville and Jimmy Rollins at the top of your lineup. :)
Finally, as to my "logic", I didn't offer any way of better dealing with advancements on outs. About the only way you might do that is with PBP data. I do agree that fast players deserve some credit here, I'm just not sure how much.
My proposal was to measure the denominator differently in hopes of correcting for the "lead off" bias. Instead of number of PAs resulting in the batter eventually making 2nd/3rd divided by their number of PAs, I say you go back to the first measure in your article (number of teammates' PAs that a player spends in scoring position), then divide it by the total number of teammates' PAs that a player spends on base.
For example, Patterson singles to lead off the inning. He remains at 1st after Bellhorn makes an out. He advances to second on Sammy's grounder. McGriff strands him. So Patterson was on 2nd base for 1 of his teammates' PAs, but he was on base for 3 PA's. So he'd get a 1/3. He had 3 opportunities to be in scoring position for his teammates and he was there for 1 of them.
Compare this to Patterson out, Bellhorn out, Sammy doubles, McGriff out. Sammy had only 1 opportunity to be in scoring position for a teammate, and he was, so he gets a 1/1.
So that might be the rate stat, but we need to somehow "correct" it for guys who don't get on base. That is, Patterson may indeed be really good at advancing (have a high rate). But that doesn't necessarily make him good at getting into scoring position because he rarely gets on base. A guy who advances at a lower rate but gets on base a lot should still come out higher on this measure. That's where I was going with the idea of multiplying it by OBP, but the resulting formula looks like complete nonsense:
(teammates' PAs spent in scoring position)*(times on base)
divided by:
(teammates' PAs spent on base)*(own PAs)
In all, players reached scoring position 34,000 times in 2002. Of those, 30% were the result of a batter hitting a double or triple. Another 50% were the result of a batter reaching first, and then moving into scoring position as a result of a positive contribution from another hitter (hit, walk, hbp, or sacrifice). The remaining 20% were the result of the runner on first moving to second or third "on his own" (stolen base), thanks to the defense (balks, wild pitches, etc.) or on a lessthanpositive contribution by the hitter (nonsacrifice outs).
That was kind of interesting. Then I looked at the situation based on number of outs. Players are more likely to move into scoring situations when there are no outs than when there are two outs. This is obvious, due to a couple of reasons:
1. Many more sacrifice bunts with none out. The batter moving the runner along increases approximately 15% with none out.
2. Similarly, runners can't move along on outs when there are two outs.
I also found that the percent of times runners moved on stolen bases or defensive lapses was constant throughout the out situations.
Walt, you're exactly right about the Phillies: they led the league in at bats with runners in scoring position with two outs. This was a factor in their low BA w/RISP. One reason is that their highest SPA batter, Abreu, was not their leadoff batter.
In the end, the "out factor" is huge. Guys who get on base with no outs (that is, leadoff hitters) obviously SHOULD get into scoring position sooner or later. So here's what I did:
I recalculated SPA to credit sacrifices to the batter. I still credited the runner for moving up on outs and defensive mistakes. Debatable, but what the heck. I subtracted home runs from plate appearances and recalculated SPA.
Next step: I then added this new SPA to a modified OBP (OBP without the home runs in denominator or numerator) to get another new stat: OBSPA. Given that the most important job of a leadoff hitter is to get on base (so other guys can bat him around), some measure of pure OBP should be included. After playing with a lot of formulas, I basically just added the two.
Then I only looked at batters who got to first base with no outs at least 40% of all times they got to first base. That basically gave me a list of leadoff hitters.
Anyway, here are the OBSPA leaders for the National League (pray for formatting):
Name Team Lg SPA OBSPA
Luis Castillo FLO NL 0.150 0.511
Dave Roberts LAN NL 0.162 0.506
Eric Young MIL NL 0.140 0.469
Alex Sanchez MIL NL 0.131 0.468
Juan Pierre COL NL 0.139 0.466
Mark Kotsay SDN NL 0.110 0.450
Todd Walker CIN NL 0.108 0.447
Craig Counsell ARI NL 0.104 0.446
Fernando Vina SLN NL 0.106 0.437
Tony Womack ARI NL 0.114 0.431
Rafael Furcal ATL NL 0.115 0.426
Reggie Sanders SFN NL 0.120 0.416
Jimmy Rollins PHI NL 0.118 0.411
Todd Zeile COL NL 0.069 0.402
Aaron Boone CIN NL 0.118 0.401
Kevin Young PIT NL 0.098 0.399
Corey Patterson CHN NL 0.132 0.397
I hope that looks okay. Final thing: if anyone wants this, send me an email. I'll try to make the Excel sheets I used understandable and send them to you. I worked in Windows XP Excel 2002, but I can probably save it to an earlier version if needed.
By the way, there probably is an issue in which runners reach a high SPA if they play for poor hitting teams, as Adam says. I didn't take the time to try and correct for that. Maybe someday I will.
And once again, I'm not claiming this is the "be all and end all" stat. Just interesting. My thought is that combining SPA and OBP might be a good value stat for a leadoff hitter.
Anyway (if you're still here) here are the American League leaders (I'll retry the formatting):
Name Team Lg SPA OBSPA
Ray Durham OAK AL 0.148 0.502
Ichiro Suzuki SEA AL 0.110 0.489
Shannon Stewart TOR AL 0.124 0.485
Randy Winn TBA AL 0.138 0.483
Johnny Damon BOS AL 0.141 0.483
Kenny Lofton CHA AL 0.138 0.473
David Eckstein ANA AL 0.112 0.461
Alfonso Soriano NYA AL 0.135 0.430
D'Angelo JimenezCHA AL 0.092 0.416
Jacque Jones MIN AL 0.107 0.416
Melvin Mora BAL AL 0.098 0.415
Matt Lawton CLE AL 0.083 0.403
Ruben Sierra SEA AL 0.089 0.387
Cristian Guzman MIN AL 0.107 0.385
Mike Young TEX AL 0.090 0.381
Brent Abernathy TBA AL 0.094 0.375
Name Team Lg SPA OBSPA
Luis Castillo FLO NL 0.150 0.511
Dave Roberts LAN NL 0.162 0.506
Eric Young MIL NL 0.140 0.469
Alex Sanchez MIL NL 0.131 0.468
Juan Pierre COL NL 0.139 0.466
Mark Kotsay SDN NL 0.110 0.450
Todd Walker CIN NL 0.108 0.447
Craig Counsell ARI NL 0.104 0.446
Fernando Vina SLN NL 0.106 0.437
Tony Womack ARI NL 0.114 0.431
Rafael Furcal ATL NL 0.115 0.426
Reggie Sanders SFN NL 0.120 0.416
Jimmy Rollins PHI NL 0.118 0.411
Todd Zeile COL NL 0.069 0.402
Aaron Boone CIN NL 0.118 0.401
Kevin Young PIT NL 0.098 0.399
Corey Patterson CHN NL 0.132 0.397
Looks like Patterson is in his rightful place.
Name Team Lg SPA OBSPA
Luis Castillo FLO NL 0.150 0.511
Dave Roberts LAN NL 0.162 0.506
Eric Young MIL NL 0.140 0.469
Alex Sanchez MIL NL 0.131 0.468
Juan Pierre COL NL 0.139 0.466
Mark Kotsay SDN NL 0.110 0.450
Todd Walker CIN NL 0.108 0.447
Craig Counsell ARI NL 0.104 0.446
Fernando Vina SLN NL 0.106 0.437
Tony Womack ARI NL 0.114 0.431
Rafael Furcal ATL NL 0.115 0.426
Reggie Sanders SFN NL 0.120 0.416
Jimmy Rollins PHI NL 0.118 0.411
Todd Zeile COL NL 0.069 0.402
Aaron Boone CIN NL 0.118 0.401
Kevin Young PIT NL 0.098 0.399
Corey Patterson CHN NL 0.132 0.397
Your finding regarding batting average with RISP is consistent with my approach. It's the one situation in which BA is actually a valuable stat. I'm not sure I'd draw the same conclusion as you regarding RISP with two outs, but it could be. Multivariate regression analysis is a bear.
FYI, I ran a regression of my three key offensive variables on runs scored and got an R squared of .94. As a reminder, my three variables are hit a home run anytime/anywhere, get runners in scoring position, and hit (BA) with runners in scoring position. Of those three, HR hitting has the highest t stat, BARISP is second and PAs with RISP is third in importance.
You must be Registered and Logged In to post comments.
<< Back to main